Quantifying uncertainty in deep learning is a challenging and yet unsolved problem. Predictive uncertainty estimates are important to know when to trust a model’s predictions, especially in real-word applications, where the train and test distributions can be very different. The first part of the talk will be focused on detecting out-of-distribution (OOD) inputs. Deep generative models are believed to be more robust to OOD inputs, but I’ll present counter-examples where generative models can assign higher likelihood to OOD inputs than the training data. Specifically, we found that the model density from deep generative models, which are trained on one dataset (e.g. CIFAR-10) assign higher likelihood to OOD inputs from another dataset (e.g. SVHN). I’ll discuss some recent follow-up work where we further investigate some of these failure modes in detail, and present solutions. The second part of the talk will be focused on predictive uncertainty estimation for discriminative models. I’ll discuss results from our large-scale benchmark study on calibration under dataset shift and present some of our work on advancing the state-of-the-art for calibration under shift.