In the context of deep learning, predictive models serve multiple purposes. One use is to drive representation learning, as the features required to support prediction are often useful to other tasks. Another is to constitute a mental simulator that can be used for planning and counterfactual reasoning. In the first part of my talk, I will describe recent work in which we study the representational effects of predictive learning in a deep RL system, and how these compare to representational changes observed across multiple brain regions during learning. We draw a connection between the auxiliary predictive model of the RL system and hippocampus, hypothesizing an additional role of predictive learning in hippocampus is to drive feature learning in upstream areas. In the second part of the talk, I will describe work using Graph Neural Networks to learn differentiable simulators of complex, large scale dynamical systems. I will describe how these models can be made to support accurate, generalizable prediction, efficient gradient-based optimization, and realistic simulations with hundreds of objects. I will also cover similarities and differences between these predictive modeling approaches, and how they relate to popular techniques for deep learning (like transformers).