Quantifying behavior is crucial for many applications across the life sciences and engineering. Videography provides easy methods for the observation and recording of animal behavior in diverse settings, yet extracting particular aspects of a behavior for further analysis can be highly time consuming and computationally challenging. I will present an efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data (DeepLabCut). I will show that for both pretrained and networks trained from random initializations, better ImageNet-performing architectures perform better for pose estimation, with a substantial improvement on out-of-domain data when pretrained on ImageNet. Subsequently, I will present methods that enable robust zero-shot performance. Overall, I will illustrate the versatility of this framework by tracking various body parts in multiple species across a broad collection of behaviors from egg-laying flies to 3D pose estimation on cheetahs.