This thesis analyzes the effect of training on the mutual information between the input and output of a system with unknown parameters. The work consists of two parts including a theory in the computation of mutual information with training, and its applications in wireless communication, signal processing, and machine learning which can be modeled by quantized large-scale systems and employ training as part of their operation.In the first part, we develop a theory in computing the mutual information between the input and output conditioned on training for large-scale systems that do not necessarily have Gaussianity or linearity in the model, without considering any particular parameter estimate, and without resorting to linearization or any worst-case noise analysis. Such mutual information can be computed as the difference between two derivatives of a single function. In the second part, we show that a quantized large-scale system with unknown parameters and training signals can be analyzed by examining an equivalent system with known parameters by modifying the signal power and noise variance in a prescribed manner. Applications to training in wireless communications, signal processing, and machine learning are shown. In wireless communications, we show that the number of training signals can be significantly smaller than the number of transmitting elements. Similar conclusions can be drawn when considering the symbol error rate in signal processing applications, as long as the number of receiving elements is large enough. In machine learning, where a linear classifier is considered, we show that the misclassification rate is not sensitive to the number of classes when the number of training examples is large, and is approximately inversely proportional to the size of the training set. We show that a linear analysis of this nonlinear training problem can be accurate when the additive thermal noise power is high.