A common view of deep learning is that deep networks provide a hierarchical means of processing input data, where early layers extract “simple” features, later ones “complex” features, and so on. In this talk, I will argue that there is another way of viewing depth in (practical, large-scale) networks: as a method for computing equilibrium points of non-linear dynamical systems. To start, I’ll introduce the notion of implicit layers, layers that are not specified by an explicit computation graph, but by the solution of some (nonlinear) set of equations. This concept traces back to some of the original work in recurrent architectures, bit has become increasingly popular in many modern approaches to deep learning. Using these implicit layers, I’ll describe the Deep Equilibrium Model (DEQ) architecture, an approach based upon finding the equilibrium point of a “single” nonlinear function. I’ll show that 1) DEQ models are representational able to capture any function expressable by a standard deep network, and 2) in practice, DEQ models perform as well as state-of-the-art approaches for large-scale sequential modeling problems. I’ll further show that “stacking” DEQ models provides no increase in expressivity, hence the claim that “one layer is all you need”.
Bio: Zico Kolter is an Associate Professor in the Computer Science Department at Carnegie Mellon University, and also serves as chief scientist of AI research for the Bosch Center for Artificial Intelligence. His work spans the intersection of machine learning and optimization, with a large focus on developing more robust and rigorous methods in deep learning. In addition, he has worked in a number of application areas, highlighted by work on sustainability and smart energy systems. He is a recipient of the DARPA Young Faculty Award, a Sloan Fellowship, and best paper awards at NeurIPS, ICML (honorable mention), IJCAI, KDD, and PESGM.