Blog _
About
I am a machine learning researcher & engineer working on artificial intelligence/ systems architecture. I design neural, language, vision, math, and physics models with a particular emphasis on long-range temporal reasoning, multimodal sequence modeling, and energy-based transformer (EBT) architectures. I deploy & evaluate models using distributed home clusters and edge accelerators, including TPUs such as Google Coral paired with Raspberry Pi class devices. My present work turns on three working theses: 1) that the quadratic complexity of attention is essentially a solved problem, with state space models representing the definitive path forward. 2) model merging serves as a powerful, orthogonal alternative to full-scale pretraining. 3) enabling performant inference on consumer hardware is a paramount objective for democratization.
Beyond my individual projects, I am an active contributor to the open-source ecosystem and believe in accelerating progress through collaborative research.
“ Most of human knowledge is actually not language so those systems [LLMs] can never reach human level intelligence — unless you change the architecture. ” - Yann LeCun
“ Large language models can talk endlessly because they are trained on huge bodies of knowledge; but genuine intelligence is “the ability to create knowledge – spot a problem, invent a solution, test it, and improve it as humans do. ” - David Deutsch