LAdam
Optimizer Family
Drop-in replacements for Adam, AdaGrad, and RMSProp with Laplacian variance smoothing. Stabilizes training on physics-informed neural networks, vision models, and transformers.
pip install ladamThree Optimizers, One Package
Each variant adds Laplacian smoothing to the variance estimate of its base optimizer. One new hyperparameter: c2.
LAdam
PINNs, regression, transformers
LAdaGrad
Sparse data — NLP embeddings, recommendations
LRMSProp
Non-stationary problems — RL, online learning
Benchmark Results
Multi-seed benchmarks with statistical testing. Only verified, reproducible results.
| Task | Adam | LAdam | Change | Seeds |
|---|---|---|---|---|
| Regression (MLP) | 0.213 | 0.184 | -13.5% | 3 |
| CIFAR-10 (ResNet + ChiAnneal) | 67.96% | 73.39% | +5.43% | 3 |
| Transformer (FashionMNIST) | 89.46% | 89.66% | +0.20% | 5 |
| Wave PINN (L2 error) | 0.0067 | 0.0066 | +0.8% | 3 |
| Burgers PINN | 0.0199 | 0.0197 | +0.1% | 5 |
CIFAR-10 result uses ChiAnnealScheduler (included in package). PINN gains are primarily in convergence stability (34% lower variance across seeds).
Get Started
Change one import. Keep everything else the same.
from ladam import LAdam
optimizer = LAdam(
model.parameters(),
lr=1e-3,
c2=1e-4 # Laplacian smoothing strength
)
# Drop-in replacement — same API as torch.optim.Adam
for batch in dataloader:
loss = model(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()from ladam import LAdaGrad
# Best for sparse data — NLP, recommendations
optimizer = LAdaGrad(
model.parameters(),
lr=1e-2,
c2=1e-4
)from ladam import LRMSProp
# Best for non-stationary targets — RL, online learning
optimizer = LRMSProp(
model.parameters(),
lr=1e-3,
c2=1e-4
)When to Use LAdam
Largest gains on problems where loss landscapes are noisy or ill-conditioned.
Physics-Informed NNs
Lower variance across seeds on wave/Burgers PINNs. Laplacian smoothing complements PDE loss landscapes.
Structured Regression
-13.5% MSE on tabular regression tasks. Best gains when weight structure mirrors data structure.
Vision + ChiAnneal
+5.43% on CIFAR-10 with ChiAnnealScheduler. The scheduler ramps c2 during training for curriculum-style smoothing.
Transformers
+0.20% on FashionMNIST ViT (p=0.0005, 5 seeds). Small but statistically significant and consistent.
Scientific Computing
PDEs, ODEs, molecular dynamics. Anywhere gradients are inherently noisy and spatially correlated.
⚠️ Not for LLMs
Tested on GPT-2 fine-tuning — significantly hurts perplexity. Attention weights encode semantic, not spatial structure.
Ready to try it?
One install. One hyperparameter. Measurable gains.