Chip multiprocessors are one of several emerging architectures that address the growing processor memory performance gap. At the same time, advances in chip manufacturing enable the integration of processing logic and dense DRAM on the same die. This thesis analyzes the use of such merged DRAM as a shared cache for a large-scale chip multiprocessor. Simulation results reveal that maximizing concurrency in the cache is of paramount importance, greater even than cache hit rate. Concurrency is achieved through ports gained from creating a multi-banked cache, and multiple paths to main memory. Results demonstrate that maximizing the number of cache banks is the most important design goal, followed by providing adequate associativity to minimize miss rates. Furthermore, the optimal cache block size is heavily dependent on the workload. The off-chip memory organization impacts performance to a lesser degree, with providing multiple paths to memory outperforming a wider memory bus.