Waves All the Way Down
How a transformer learns to compute modular addition, and what it reveals about generalization
How a transformer learns to compute modular addition, and what it reveals about generalization
Learning a Bayesian data generation model for a noisy sprial
Learning the heat equation from sparse data