Conclusion

We introduce a selection mechanism to structured state space models, allowing them to perform context-dependent reasoning while scaling linearly in sequence length.
When incorporated into a simple attention-free architecture, Mamba achieves state-of-the-art results on a diverse set of domains: most notably language, where it matches or exceeds the performance of strong Transformer models.
We are excited about the broad applications of selective state space models to build foundation models for different domains, especially in emerging modalities requiring long context such as genomics, audio, and video.
Our results suggest that Mamba is a strong candidate to be a general sequence model backbone.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Table of Contents

Introduction

Self-attention

Structured State Space Sequence Models (SSMs)

Improvements

Contributions

Table of Contents

State Space Models

State Space Models

Discretization

Properties of Discretization

Computation

Computation

Linear Time Invariance

Limitations of Linear Time Invariance

Structure and Dimensions

Table of Contents

Motivation

Compressing Context into a Smaller State

Copying, Selective Copying, and Induction Heads Tasks

The Failure Mode of LTI Models

The Failure Mode of LTI Models

Improving SSMs with Selection

Algorithm: SSM vs. SSM + Selection

Efficient Implementation of Selective SSMs

Efficient Implementation: Observations

Efficient Implementation: Main Idea

Efficient Implementation: Parallel Scan

Efficient Implementation: Recomputation

Mamba: A Simplified SSM Architecture

Mamba: Number of Parameters

Properties of Selection Mechanisms

Theorem 1: Connection to Gating Mechanisms

Variable Spacing

Filtering Context

Table of Contents

Empirical Evaluation

Synthetic Tasks

Language Modeling: Scaling Laws

Language Modeling: Downstream Evaluations

DNA Modeling

DNA Classification

Audio Modeling and Generation

Long-Context Autoregressive Pretraining

Autoregressive Speech Generation

Speed and Memory Benchmark

Table of Contents

Conclusion

Summary