Engagement is complex and multifaceted, yet crucial to learning. One context in which engagement frequently plays an important role is in computerized learning environments. Computerized learning environments can provide a superior learning experience for students by automatically detecting student engagement (and, thus also disengagement), adapting to it, and providing better feedback and evaluations to teachers and students. This dissertation provides an overview of several studies that utilized facial features to automatically detect student engagement. Several aspects of engagement detection are discussed, including affective, cognitive, and behavioral components.Studies in laboratory environments illustrated the efficacy of several types of facial features extracted for engagement detection. Engagement was detected in several learning domains for the first time as well, including essay writing, computer programming, and illustrated textbook reading. Each domain has its own challenges; for example, textbook reading is non-interactive and unlikely to trigger discriminative facial expressions at predictable times. Methods were thus tailored for each domain to address their unique challenges. Engagement detection was also researched in a classroom environment, where data noise introduced by talking, gesturing, and distractions such as cell phones and other students add additional challenges. Engagement detection methods were effective in many instances despite these distractions. This dissertation also explores face-based detection of mind wandering (MW), a type of cognitive disengagement in which students' thoughts drift away from the learning task toward internal thoughts. A dataset of videos in which students reported MW (disengaged) has been developed which can be used to answer key questions about engagement and MW detection. Individual models for various types of facial features were developed to answer questions of how MW manifests in students' expressions. Human observers were also asked to judge video clips as MW or non-MW, to determine how well automated methods compare to human annotations. Automated detectors were also improved by incorporating observers' MW annotations using techniques such as feature engineering specifically to capture facial expressions noted by observers, and weighting training instances that were exceptionally well- or poorly- classified by observers in automatic detector training. Finally, implications of results are discussed.