Hello World
Introducing my research blog.
Hello world.
I study the mathematical foundations of machine learning - how models learn, why they generalize, and what makes that learning efficient. My work sits at the intersection of three subfields that I believe are more deeply connected than they are usually treated: mechanistic interpretability, learning theory, and machine learning systems.
The questions I find most compelling are ones that reach across these research areas. What internal computations does a model develop over the course of training, and why those rather than others? Which properties of data, architecture, and optimization determine what is learned, how efficiently, and how reliably it transfers? How do the representations that interpretability research uncovers relate to the generalization behavior that learning theory tries to bound? And how do these abstractions hold up when confronted with the constraints of real systems - memory, throughput, latency, and scale? My intuition is that progress on any one of these questions is bottlenecked by the others and that the most informative answers come from working on them simultaneously.
I am incredibly fond of both theory and experimentation. Too many times theoretical ideas yield no concrete practical applications, and many experiments yield unexplainable results. To frame it as a machine learning problem: theory without applications is of bad directional research taste, while unexplainable empirical result explanations are an irrepeatable process.
This blog is meant to be a working notebook. Some posts will be sketches of arguments I am still trying to formalize. Others will be experimental notes - plots, ablations, anomalies - that I have not yet been able to fully explain. Occasionally there will be more developed pieces that bring theory and experiment together on a single question. None of these are intended as substitutes for full papers. The goal is to make my entire research process visible.
I expect some of my ideas to be wrong. Early feedback, disagreement, and refinement are how informal claims become reliable ones, and I would rather surface ideas while they can still be cheaply revised. Posts will aim for technical soundness and clarity even when they are short and for honesty about what is established versus what is still conjectural.
These notes are written for other researchers, though I hope they remain accessible to anyone curious about why machine learning systems behave the way they do. If something here is useful, wrong, or incomplete, I would be glad to hear about it. Progress on these questions is not an individual undertaking; the field moves forward when its working drafts are shared, not only its finished ones.
Thanks for reading.