Many communication systems have to cope with channels with unknown and time-varying state, including fading, inter-symbol interference, and more general finite-state Markov channels. This work proposes a novel successive decoding paradigm, or more specifically, a time-multiplexed transmission of multiple codes with multi-staged reception, to address the two fundamental aspects of these channels: the estimation of achievable information rate and the design of a practical coding system. The first part of this work shows that successive decoding under a deep rectangular interleaver essentially decomposes the original channel into a set of asymptotic memoryless subchannels with semi-infinite preceding training symbols. The achievable information rate of the original channel under a given input distribution can be efficiently computed from this set. Furthermore, the conventional coding system that separates estimation from decoding can operate on these memoryless channels without loss of mutual information. These results allow us to characterize accurately the binary-input capacity of correlated fading channels and to operate within 1.1 dB to the binary-input capacity using AWGN channel optimized LDPC codes. The second part of this work deals with the design of more practical successive decoding with a small number of levels. The main idea is to incorporate an irregular interleaving pattern and when necessary iterative estimation decoding within each level. These techniques provide a more flexible configuration of successive decoding for balancing the performance and the delay. The analysis of the achievable rate and the extrinsic information transfer (EXIT) charts are proposed for code rate allocation and for the efficient optimization of interleavers. Both the optimal random interleaver and a good construction of the deterministic interleaver are numerically shown to have performance very close to its binary-input capacity with a small number of levels.