Webinar Series | Physics Of Learning

Survey lectures

In this series of webinars, the PIs of the collaboration give survey lectures about their respective subfields and their own research.

Florent

Krzakala

How Do Neural Networks Learn Simple Functions with Gradient Descent?

Feb 13, 2025

Video

In this talk, I will review the mechanisms by which two-layer neural networks can learn simple high-dimensional functions from data over time. We will focus on the intricate interplay between algorithms, iterations, and the complexity of tasks at hand, and how gradient descent and stochastic gradient descent can learn features of the function, and improve generalization over random initialization and kernels. I will also illustrate how ideas and methods at the intersection of high-dimensional probability and statistical physics provide fresh perspectives on these questions.

Michael

Douglas

Mathematics, Economics and AI

Jan 31, 2025

Video

An overview of my recent and current research:
(1) ML and verified numerics for geometric PDE.
(2) Economics: LLM simulations, AI scaling laws.
(3) Models of structure in data and in search.
(4) Autonomous mathematical discovery and interestingness.

Yuhai

Tu

Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization

Dec 19, 2024

Video

In this talk, we will describe our recent work in developing a theoretical foundation for feedforward deep-learning neural networks underlying the current AI revolution by using a statistical physics approach. In particular, we will discuss our recent work on the learning dynamics driven by stochastic gradient-descend (SGD) and the key determinants for generalization based on an exact duality relation we discovered between neuron activities and network weights. Time permit, we will discuss a few future directions that are worth pursuing by using physics-based approach.

Surya

Ganguli

An analytic theory of creativity for convolutional diffusion models

Dec 5, 2024

Video

We obtain the first analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in a fully analytic, completely mechanistically interpretable, equivariant local score (ELS) machine that, (3) without any training can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy. Our ELS machine reveals a locally consistent patch mosaic model of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches in different image locations. Our theory also partially predicts the outputs of pre-trained self-attention enabled UNets, revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics.

Eva

Silverstein

Hamiltonian dynamics for stabilizing neural simulation-based inference

Nov 21, 2024

Video

After reviewing and updating the theory of energy-conserving Hamiltonian dynamics for optimization and sampling, I'll explain a new application to precision scientific data analysis in which NN initialization variance has been a bottleneck. Specifically, we choose a Hamiltonian whose measure on phase space concentrates the results in a controlled way, and describe a simple prescription for hyperparameter defaults. In a set of experiments on likelihood ratio estimation, using small simulated and real (Aleph) particle physics data sets, we find this reduces the error as predicted, performing better than Adam in this regard (both for defaults and with small hyperparameter scans). Time permitting, I'll discuss potential applications to broader research on the physics of learning, and a few unrelated PoL ideas.

Miranda

Cheng

GUD: Generation with Unified Diffusion

Oct 24, 2024

Video

We preset the GUD model, a unified diffusion model interpolating between standard diffusion and autoregressive models. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates , 2) the prior distribution that data is transformed into during diffusion , and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom.

Bernd

Rosenow

Random matrix analysis of neural networks: distinguishing noise from learned information

Sep 26, 2024

Video

We analyze the weight matrices of deep neural networks using Random Matrix Theory (RMT) to distinguish between randomness and learned information. Our findings show that most singular values and eigenvectors conform to universal RMT predictions, indicating randomness, while the largest singular values deviate, signifying the encoding of learned features. Based on this insight, we propose a noise filtering algorithm that removes small singular values and adjusts large ones to improve generalization performance, especially in networks with label noise. We extend our analysis to Transformer-based language models, identifying regions associated with feature learning versus lazy learning, and highlight the importance of small singular values in fine-tuning.

Jim

Halverson

Neural networks and conformal field theory

Sep 20, 2024

Video

We present a new construction of conformal fields based on neural network theory and Dirac's embedding formalism, which relates the Lorentz group in higher dimensions to the global conformal group. Key results include a free CFT limit via the neural network-Gaussian process correspondence, the extension to deep networks with recursive conformal fields, and a realization of the free boson.

Julia

Kempe

Synthetic data - friend or foe in the age of scaling?

Sep 6, 2024

Video

As AI model size grows, neural **scaling laws** have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data.

In this talk, we ask: **How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus?** Will future models, still improve, or be doomed to degenerate up to total **(model) collapse**? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the LLM Llama2. We also propose solutions to circumvent degradation in learning by pruning the generated data.

Matthieu

Wyart

Learning hierarchical representations with deep architectures

Aug 23, 2024

Video

Learning generic tasks in high dimensions is impossible. Yet, deep networks classify images, large models learn the structure of language and produce meaningful texts, and diffusion-based models generate new images of high quality. In all these cases, building a hierarchical representation of the data is believed to be key to success. How is it achieved? How many data are needed for that, and how does it depend on the data structure? Once such a representation is obtained, how can it be used to compose a new whole data from known low-level features? I will introduce generative models of hierarchical data for which an understanding of these questions is emerging. I will discuss recent results on supervised learning and score-based generative models. In the last case, our framework makes novel predictions that we test on image data sets.

Cengiz

Pehlevan

Mean-field theory of deep network learning dynamics and applications to neural scaling laws

Aug 23, 2024

Video

I will review recent developments in obtaining a mean-field description of the high-dimensional learning dynamics of deep neural networks. These mean-field theories result from various infinite limits, including width, depth and attention-heads. I will present applications of these ideas to neural scaling laws in lazy and feature-learning regimes.

Survey lectures

Florent

Krzakala

How Do Neural Networks Learn Simple Functions with Gradient Descent?

Feb 13, 2025

Michael

Douglas

Mathematics, Economics and AI

Jan 31, 2025

Yuhai

Tu

Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization

Dec 19, 2024

Surya

Ganguli

An analytic theory of creativity for convolutional diffusion models

Dec 5, 2024

Eva

Silverstein

Hamiltonian dynamics for stabilizing neural simulation-based inference

Nov 21, 2024

Miranda

Cheng

GUD: Generation with Unified Diffusion

Oct 24, 2024

Bernd

Rosenow

Random matrix analysis of neural networks: distinguishing noise from learned information

Sep 26, 2024

Jim

Halverson

Neural networks and conformal field theory

Sep 20, 2024

Julia

Kempe

Synthetic data - friend or foe in the age of scaling?

Sep 6, 2024

Matthieu

Wyart

Learning hierarchical representations with deep architectures

Aug 23, 2024

Cengiz

Pehlevan

Mean-field theory of deep network learning dynamics and applications to neural scaling laws

Aug 23, 2024

Physics of Learning