Simons collaboration on the physics of learning and neural computation

Harnessing the fundamental sciences to break AI out of its black box

Our mission

Recent advances in artificial intelligence, including deep learning, large language models, and generative AI, stand poised to transform our economy, society and the very nature of scientific research itself. However, quite alarmingly, this rapid engineering progress far outstrips the rate at which we can scientifically understand it.

Our collaboration thus seeks to elucidate fundamental scientific principles underlying AI. To do so, we employ and develop powerful tools for complex systems analysis from physics, mathematics, computer science, neuroscience, and statistics to understand how large neural networks learn, compute, scale, reason, and imagine. By studying AI as a complex physical system, we aim to break AI out of its black box.

Indeed ideas from the physics of complex systems have long played a profound role in the development and analysis of machine learning and neural computation, ranging from the Hopfield model and Boltzmann machine (2024 physics Nobel prize), and the understanding of optimization dynamics and geometry in high dimensional disordered systems (2021 physics Nobel prize), to more recent advances in the discovery and analysis of scaling laws, and the inspiration of nonequilibrium statistical mechanics for diffusion models in generative AI.

However, the highly performant AI systems of today open up entirely new opportunities for the concerted interaction of theory and experiment to both advance the science of AI and improve AI in a principled manner. In particular, we seek to understand how the structure of data, the architecture of neural networks, and the dynamics of learning, all interact to give rise to the striking scaling properties and emergent capabilities of modern AI, as well as its mysterious failures. We work across multiple domains, spanning visual perception, language understanding, reasoning and creativity.

Meet the team

Francesco Cagnetta

Affiliate

Yasaman Bahri

Principal Investigator

Andrey Gromov

Affiliate

Survey lectures

Lenka

Zdeborová

Attention-based models and how to solve them using tools from quadratic networks and matrix denoising

Nov 11, 2025

Video

This talk presents recent progress on analytically solvable models that bridge neural network theory, matrix denoising, and attention mechanisms. We begin with two-layer networks of extensive width and quadratic activations, where Bayes-optimal and empirical-risk estimators can be analyzed in closed form using tools from high-dimensional inference—Gaussian universality, matrix sensing, and rotationally invariant denoising. These results yield sharp asymptotics for test errors, interpolation thresholds, and weight spectra, echoing empirical scaling laws in modern deep learning. The same framework extends to sequence and attention models, mapping bilinear and attention-indexed architectures to generalized matrix recovery problems and clarifying their inductive biases within a unified statistical-physics perspective.