The problem of finding structure in big data sets is becoming increasingly relevant to psychologists as it becomes easier and cheaper to collect data on human behavior. This dissertation focuses on the problem of identifying important structural features like main effects, nonlinear effects, and interactions in big data sets when the number of predictors is large. In general, this goal can be referred to as exploratory regression analysis. Exploratory regression analysis is beneficial because the results suggest testable hypotheses, can limit the number of plausible models, and help avoid errors in model specification. Exploratory regression analysis is usually carried out using basic data visualization techniques, simple statistical models, or by fitting a number of parametric models and selecting the best from among them. However, these procedures can require strong assumptions and may not be feasible when the number of predictors is large.Gradient tree boosting (friedman_greedy_2001) is a promising alternative for exploratory regression analysis because it builds an interpretable model that approximates nonlinear effects and interactions among predictors without a priori specification. However, it is not clear how to build and interpret gradient tree boosting models in the context of multivariate, longitudinal, and hierarchically clustered data commonly found in psychological research.This dissertation develops two procedures for estimating gradient tree boosting models for multivariate, longitudinal and hierarchically clustered data. Multivariate tree boosting selects predictors that explain covariance in multiple outcomes. Mixed effects tree boosting takes hierarchically clustered data into account by treating a grouping variable as random. Longitudinal data can be modeled in boosted decision trees by including time as a candidate for splitting in mixed effects tree boosting. These procedures are illustrated by application to real data. Simulations demonstrate that the methods balance true and false positive rates when selecting variables, and achieve low prediction error at sample and effect sizes commonly observed in psychology.