Posts
Mamba No. 5 (A Little Bit Of...)
In this post, I attempt to provide a walkthrough of the essence of the Mamba state space model architecture, occasionally sacrificing some rigor for intuition and overall pedagogical friendliness.
I don’t assume readers have any familiarity with state space models, but I do assume some familiarity with machine learning and mathematical notation.
If at any point you spot any errors, typos, or confusing wording, please...
See moreNano Perceiver
Tl;dr
The Perceiver family of models from DeepMind decouple context length from memory and compute requirements. Perceiver AR extends this with support for autoregressive generation. It also has a refreshingly simple implementation, since at it’s core it is just a small variation on top of an otherwise standard decoder-only transformer.
I’ve provided a lightweight implementation here and provide additional context in this post.
... See moreFilling in the middle for great good
Tl;dr
My favorite research findings are ones that make me go “Why didn’t I think of that?” due to a paradoxical combination of simplicity and cleverness. The OpenAI paper “Efficient Training of Language Models to Fill in the Middle” (FIM) is the most recent thing I’ve read that makes me feel this way.
Language modeling 101
For the uninitiated, modern language model...
See moreTraining a chatbot to talk like me
There has been much well-deserved attention paid towards the latest advances in machine learning these days. I feel like I see a new paper or model every week that promises the Earth, moon, and stars.
Perhaps it’s a new approach that will finally™ solve the problem of quadratic scaling of transformers w.r.t. context length, be it via clever tweaks inspired by convolutions, literally...
See more
subscribe via RSS