Sparse Notes

Posts

Feb 12, 2024
Mamba No. 5 (A Little Bit Of...)
In this post, I attempt to provide a walkthrough of the essence of the Mamba state space model architecture, occasionally sacrificing some rigor for intuition and overall pedagogical friendliness.

I don’t assume readers have any familiarity with state space models, but I do assume some familiarity with machine learning and mathematical notation.

If at any point you spot any errors, typos, or confusing wording, please...
See more
Aug 13, 2023
Nano Perceiver
Tl;dr

The Perceiver family of models from DeepMind decouple context length from memory and compute requirements. Perceiver AR extends this with support for autoregressive generation. It also has a refreshingly simple implementation, since at it’s core it is just a small variation on top of an otherwise standard decoder-only transformer.

I’ve provided a lightweight implementation here and provide additional context in this post.
...
See more
Jul 5, 2023
Filling in the middle for great good
Tl;dr

My favorite research findings are ones that make me go “Why didn’t I think of that?” due to a paradoxical combination of simplicity and cleverness. The OpenAI paper “Efficient Training of Language Models to Fill in the Middle” (FIM) is the most recent thing I’ve read that makes me feel this way.

Language modeling 101

For the uninitiated, modern language model...
See more
Jun 26, 2023
Training a chatbot to talk like me
There has been much well-deserved attention paid towards the latest advances in machine learning these days. I feel like I see a new paper or model every week that promises the Earth, moon, and stars.

Perhaps it’s a new approach that will finally™ solve the problem of quadratic scaling of transformers w.r.t. context length, be it via clever tweaks inspired by convolutions, literally...
See more

Posts

Mamba No. 5 (A Little Bit Of...)

Nano Perceiver

Tl;dr

Filling in the middle for great good

Tl;dr

Language modeling 101

Training a chatbot to talk like me