Machine learning models struggle to generalize when the number of class instances is numerically imbalanced. Data augmentation (DA) is a leading approach to improve generalization for under-represented classes. Despite its wide-spread use, the mechanisms by which DA works are not clearly understood.In this dissertation, we take a step toward understanding how DA works with imbalanced data. We begin by building three novel algorithms, which incorporate data augmentation, to improve generalization for under-represented classes. Based on insights gleaned from this process, we focus on the latent features learned by machine learning (ML) models as potential culprits in generalization. We design a suite of tools, with latent features, that can be used to understand data complexity and class overlap.We also find that certain DA methods and parametric ML classifiers (CNN, logistic regression, SVM) incorporate hidden linearity at the front-end of training and during inference that may affect generalization, when learning with imbalanced data. Further, we demonstrate that parametric ML models rely heavily on the magnitude of a limited number of latent features. During inference, they predict classes based on a combination of latent feature magnitudes that sum to a requisite threshold.