Ongoing:
Title: TBD
Abstract: TBD
Paper Link: TBD
Previous:
Epoch-wise Double Descent:
A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.
Paper Link: ICML 2022
Blogpost: 🌐
Zero-sum game optimization:
Adversarial formulations such as generative adversarial networks (GANs) have rekindled interest in two-player min-max games. A central obstacle in the optimization of such games is the “rotational dynamics” that hinder their training. Taking inspiration from physics, in this work, we propose a novel second-order optimizer (LEAD) to mitigate this issue. Our work stands in contrast to existing literature that attempts to tackle the problematic rotations by introducing carefully hand-designed mechanisms during training by adopting a more systematic method of attack. Through the use of Lyapunov stability theory and spectral analysis, we demonstrated that LEAD exhibits linear convergence in the case of the bilinear min-max game for both continuous and discrete-time settings. Furthermore, through empirical evaluation of our method on synthetic setups such as 8-Gaussian generation tasks and CIFAR-10 image generation, we demonstrate marked improvements over baseline methods.
Paper Link: arXiv:2010.13846
Blogpost: 🌐
Particle-hole symmetry and composite Fermi liquids:
We study theoretically the magnetoresistance oscillations near a half-filled lowest Landau level (ν=1/2) that result from the presence of a periodic one-dimensional electrostatic potential. We use the Dirac composite fermion theory of Son [Phys. Rev. X 5 031027 (2015)], where the ν=1/2 state is described by a (2+1)-dimensional theory of quantum electrodynamics. We extend previous work that studied these oscillations in the mean-field limit by considering the effects of gauge field fluctuations within a large flavor approximation. A self-consistent analysis of the resulting Schwinger--Dyson equations suggests that fluctuations dynamically generate a Chern-Simons term for the gauge field and a magnetic field-dependent mass for the Dirac composite fermions away from ν=1/2. We show how this mass results in a shift of the locations of the oscillation minima that improves the comparison with experiment [Kamburov et. al., Phys. Rev. Lett. 113, 196801 (2014)]. The temperature-dependent amplitude of these oscillations may enable an alternative way to measure this mass. This amplitude may also help distinguish the Dirac and Halperin, Lee, and Read composite fermion theories of the half-filled Landau level.
Paper Link: arXiv:1901.08070
Back to Top