Survey lectures
In this series of webinars, the PIs of the collaboration give survey lectures about their respective subfields and their own research.
Florent
Krzakala
How Do Neural Networks Learn Simple Functions with Gradient Descent?
Feb 13, 2025
In this talk, I will review the mechanisms by which two-layer neural networks can learn simple high-dimensional functions from data over time. We will focus on the intricate interplay between algorithms, iterations, and the complexity of tasks at hand, and how gradient descent and stochastic gradient descent can learn features of the function, and improve generalization over random initialization and kernels. I will also illustrate how ideas and methods at the intersection of high-dimensional probability and statistical physics provide fresh perspectives on these questions.
Michael
Douglas
Mathematics, Economics and AI
Jan 31, 2025
An overview of my recent and current research:
(1) ML and verified numerics for geometric PDE.
(2) Economics: LLM simulations, AI scaling laws.
(3) Models of structure in data and in search.
(4) Autonomous mathematical discovery and interestingness.
Yuhai
Tu
Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization
Dec 19, 2024
In this talk, we will describe our recent work in developing a theoretical foundation for feedforward deep-learning neural networks underlying the current AI revolution by using a statistical physics approach. In particular, we will discuss our recent work on the learning dynamics driven by stochastic gradient-descend (SGD) and the key determinants for generalization based on an exact duality relation we discovered between neuron activities and network weights. Time permit, we will discuss a few future directions that are worth pursuing by using physics-based approach.
Surya
Ganguli
An analytic theory of creativity for convolutional diffusion models
Dec 5, 2024
We obtain the first analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in a fully analytic, completely mechanistically interpretable, equivariant local score (ELS) machine that, (3) without any training can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy. Our ELS machine reveals a locally consistent patch mosaic model of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches in different image locations. Our theory also partially predicts the outputs of pre-trained self-attention enabled UNets, revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics.
Eva
Silverstein
Hamiltonian dynamics for stabilizing neural simulation-based inference
Nov 21, 2024
After reviewing and updating the theory of energy-conserving Hamiltonian dynamics for optimization and sampling, I'll explain a new application to precision scientific data analysis in which NN initialization variance has been a bottleneck. Specifically, we choose a Hamiltonian whose measure on phase space concentrates the results in a controlled way, and describe a simple prescription for hyperparameter defaults. In a set of experiments on likelihood ratio estimation, using small simulated and real (Aleph) particle physics data sets, we find this reduces the error as predicted, performing better than Adam in this regard (both for defaults and with small hyperparameter scans). Time permitting, I'll discuss potential applications to broader research on the physics of learning, and a few unrelated PoL ideas.
Miranda
Cheng
GUD: Generation with Unified Diffusion
Oct 24, 2024
We preset the GUD model, a unified diffusion model interpolating between standard diffusion and autoregressive models. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates , 2) the prior distribution that data is transformed into during diffusion , and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom.
Bernd
Rosenow
Random matrix analysis of neural networks: distinguishing noise from learned information
Sep 26, 2024
We analyze the weight matrices of deep neural networks using Random Matrix Theory (RMT) to distinguish between randomness and learned information. Our findings show that most singular values and eigenvectors conform to universal RMT predictions, indicating randomness, while the largest singular values deviate, signifying the encoding of learned features. Based on this insight, we propose a noise filtering algorithm that removes small singular values and adjusts large ones to improve generalization performance, especially in networks with label noise. We extend our analysis to Transformer-based language models, identifying regions associated with feature learning versus lazy learning, and highlight the importance of small singular values in fine-tuning.
Jim
Halverson
Neural networks and conformal field theory
Sep 20, 2024
We present a new construction of conformal fields based on neural network theory and Dirac's embedding formalism, which relates the Lorentz group in higher dimensions to the global conformal group. Key results include a free CFT limit via the neural network-Gaussian process correspondence, the extension to deep networks with recursive conformal fields, and a realization of the free boson.
Julia
Kempe
Synthetic data - friend or foe in the age of scaling?
Sep 6, 2024
As AI model size grows, neural **scaling laws** have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data.
In this talk, we ask: **How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus?** Will future models, still improve, or be doomed to degenerate up to total **(model) collapse**? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the LLM Llama2. We also propose solutions to circumvent degradation in learning by pruning the generated data.
Matthieu
Wyart
Learning hierarchical representations with deep architectures
Aug 23, 2024
Learning generic tasks in high dimensions is impossible. Yet, deep networks classify images, large models learn the structure of language and produce meaningful texts, and diffusion-based models generate new images of high quality. In all these cases, building a hierarchical representation of the data is believed to be key to success. How is it achieved? How many data are needed for that, and how does it depend on the data structure? Once such a representation is obtained, how can it be used to compose a new whole data from known low-level features? I will introduce generative models of hierarchical data for which an understanding of these questions is emerging. I will discuss recent results on supervised learning and score-based generative models. In the last case, our framework makes novel predictions that we test on image data sets.
Cengiz
Pehlevan
Mean-field theory of deep network learning dynamics and applications to neural scaling laws
Aug 23, 2024
I will review recent developments in obtaining a mean-field description of the high-dimensional learning dynamics of deep neural networks. These mean-field theories result from various infinite limits, including width, depth and attention-heads. I will present applications of these ideas to neural scaling laws in lazy and feature-learning regimes.