Emerging 3D die-stacked DRAMs is a promising solution to satisfy the ever-increasing demands of computer systems on memory throughput, power efficiency, capacity, and cost. This dissertation seeks to model and design 3D DRAM memory systems with high performance and high power-efficiency from micro-architecture level to system level.This dissertation introduces CACTI-3DD, the first architecture-level integrated power, area, and timing modeling framework for 3D die-stacked DRAM main memory. CACTI-3DD incorporates TSV models, improves models for 2D off-chip DRAM main memory over current versions of CACTI, and includes 3D integration models that enable the analysis of a full spectrum of 3D DRAM designs at various stacking granularities. CACTI-3DD enables an in-depth study of architecture level tradeoffs of power, area, and timing for 3D die-stacked DRAM designs.By extending CACTI-3DD, the dissertation builds a comprehensive modeling framework that accurately models cost, bandwidth, and energy efficiency of 3D-stacked DRAM. With the modeling framework, an extensive design space exploration for 3D stacked DRAMs is performed in a wide range of stack size, partition granularity, bank count, and IO width. The results identify the best designs of 3D DRAM stacks with the highest bandwidth and energy efficiency, and lowest cost targeting the high-performance and low-cost memory markets respectively.In addition, this dissertation introduces a Hybrid Memory Cube Cache (HMC$) DRAM that employs an adaptive caching scheme and an intelligent predictor to support full associativity at various granularities and a large number of banks in 3D memory systems. By increasing the row buffer cache hit rate and improving data caching efficiency, HMC$ dramatically reduces memory access latency and dynamic power. Furthermore, in a heterogeneous 3D memory system with non-uniform memory access latencies, an HMC$ is effective at hiding the long latencies of far memories and maintaining high performance at low cost. With memory-intensive workloads, HMC$ outperforms the state-of-the-art row buffer cache for 3D DRAM by 33.5% for a system with a single memory stack, and improves EDP by 2.4x for a system with both on-socket and off-socket stacked memories.