Neural networks have been shown to significantly outperform kernel methods (including neural tangent kernels) in problems such as image classification. Most theoretical explanations of this performance gap focus on learning a complex hypothesis class. In this talk, I will demonstrate a related but simple hypothesis class which explains this performance gap based on finding a sparse signal in the presence of noise. Specifically, we show that, for a simple data distribution with sparse signal amidst high-variance noise, a simple convolutional neural network trained using stochastic gradient descent learns to threshold out the noise and find the signal. On the other hand, the corresponding neural tangent kernel, with a fixed set of predetermined features, is unable to adapt to the signal in this manner. This is joint work with Stefani Karp, Ezra Winston and Yuanzhi Li.