Posts

  • Mamba No. 5 (A Little Bit Of...)

    In this post, I attempt to provide a walkthrough of the essence of the Mamba state space model architecture, occasionally sacrificing some rigor for intuition and overall pedagogical friendliness.

    I don’t assume readers have any familiarity with state space models, but I do assume some familiarity with machine learning and mathematical notation.

    If at any point you spot any errors, typos, or confusing wording, please...

    See more
  • Nano Perceiver

    Tl;dr

    The Perceiver family of models from DeepMind decouple context length from memory and compute requirements. Perceiver AR extends this with support for autoregressive generation. It also has a refreshingly simple implementation, since at it’s core it is just a small variation on top of an otherwise standard decoder-only transformer.

    I’ve provided a lightweight implementation here and provide additional context in this post.

    ...

    See more
  • Filling in the middle for great good

    Tl;dr

    My favorite research findings are ones that make me go “Why didn’t I think of that?” due to a paradoxical combination of simplicity and cleverness. The OpenAI paper “Efficient Training of Language Models to Fill in the Middle” (FIM) is the most recent thing I’ve read that makes me feel this way.

    Language modeling 101

    For the uninitiated, modern language model...

    See more
  • Training a chatbot to talk like me

    There has been much well-deserved attention paid towards the latest advances in machine learning these days. I feel like I see a new paper or model every week that promises the Earth, moon, and stars.

    Perhaps it’s a new approach that will finally™ solve the problem of quadratic scaling of transformers w.r.t. context length, be it via clever tweaks inspired by convolutions, literally...

    See more

subscribe via RSS