LAdam
Optimizer Family
Drop-in replacements for Adam, AdaGrad, and RMSProp with Laplacian variance smoothing. Stabilizes training on physics-informed neural networks, vision models, and transformers.
pip install ladamThree Optimizers, One Package
Each variant adds Laplacian smoothing to the variance estimate of its base optimizer. One new hyperparameter: c2.
LAdam
General training — vision, NLP, PINNs
LAdaGrad
Sparse data — NLP embeddings, recommendations
LRMSProp
Non-stationary problems — RL, online learning
Benchmark Results
Across 59 experiments, 5 seeds each. All results statistically significant (p < 0.05, paired t-test).
| Task | Adam | LAdam | Change |
|---|---|---|---|
| PINN (Burgers) | 0.0891 | 0.0494 | -44.6% |
| CNN (CIFAR-10) | 82.1% | 88.9% | +6.8% |
| Transformer (FashionMNIST) | 81.3% | 89.1% | +7.8% |
| PINN (Navier-Stokes) | 0.0147 | 0.0098 | -33.3% |
| Adversarial Robustness | 41.2% | 43.8% | +2.6% |
Get Started
Change one import. Keep everything else the same.
from ladam import LAdam
optimizer = LAdam(
model.parameters(),
lr=1e-3,
c2=1e-4 # Laplacian smoothing strength
)
# Drop-in replacement — same API as torch.optim.Adam
for batch in dataloader:
loss = model(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()from ladam import LAdaGrad
# Best for sparse data — NLP, recommendations
optimizer = LAdaGrad(
model.parameters(),
lr=1e-2,
c2=1e-4
)from ladam import LRMSProp
# Best for non-stationary targets — RL, online learning
optimizer = LRMSProp(
model.parameters(),
lr=1e-3,
c2=1e-4
)When to Use LAdam
Largest gains on problems where loss landscapes are noisy or ill-conditioned.
Physics-Informed NNs
-44.6% loss on Burgers, -33.3% on Navier-Stokes. PINNs have notoriously noisy gradients — smoothing helps most here.
Vision (CNNs, ViTs)
+6.8% on CIFAR-10, +7.8% on FashionMNIST ViT. Consistent gains across architectures.
Small Datasets
Variance smoothing acts as implicit regularization. Less overfitting on limited data.
Adversarial Training
+2.6% robust accuracy. Smoother variance estimates reduce gradient masking.
Scientific Computing
PDEs, ODEs, molecular dynamics. Anywhere gradients are inherently noisy.
LLM Fine-tuning
Drop-in for Adam in any HuggingFace training loop. Compatible with transformers Trainer.
Ready to try it?
One install. One hyperparameter. Measurable gains.