WaveTransformer
O(n) Attention
via Wave Propagation
Replace quadratic self-attention with wave propagation on a learned lattice. Linear scaling in sequence length. No KV cache needed. 5.8x better perplexity at matched compute.
How It Compares
Standard attention scales quadratically. WaveTransformer replaces the attention matrix with wave propagation, achieving linear scaling.
| Property | Standard Transformer | WaveTransformer |
|---|---|---|
| Time complexity | O(n²) | O(n) |
| Memory (inference) | O(n²) KV cache | O(n) lattice state |
| Long sequences | Quadratic wall | Linear scaling |
| Parameter efficiency | Baseline | 4x fewer at iso-quality |
| Perplexity (iso-compute) | Baseline | 5.8x better |
Target Applications
Long-Document Processing
Process 100K+ token documents without quadratic memory blowup. Legal, medical, scientific texts.
Edge / Mobile Inference
No KV cache means dramatically lower memory footprint. Run on devices that can't hold standard transformer state.
Real-Time Streaming
O(n) per-token cost enables true streaming without growing memory. Chat, transcription, live analysis.
Efficient Pre-Training
4x fewer parameters at matched quality. Train larger effective models on the same hardware budget.
Retrieval-Augmented Gen
Process retrieved passages linearly. No quadratic penalty for adding more context documents.
Scientific Simulation
Sequence-to-sequence modeling of physical systems with inherently long temporal correlations.
Current Status
WaveTransformer has been validated on small-scale language modeling tasks (E15a/b experiments). Large-scale validation and public release forthcoming.
Want early access?
We're offering early benchmark access and consulting for organizations that want to evaluate WaveTransformer on their workloads.