With expansion from multi-core to "manycore,'' including hundreds of cores per chip many HPC systems are beginning to include additional layers of memory between the main memory and the top of the cache hierarchy. These changes to hardware force us to reconsider how to design multithreaded codes so that we maximize the benefit of a complex memory system.This thesis investigates various algorithmic strategies and implements the strategies on applications such as sort, matrix multiply, and Fast Fourier Transforms FFT. These results indicate that the placement of data onto physical memory channels can have a significant impact on performance. The key strategy developed in this work uses a library that allows the user to explicitly manage the relationship between memory channels, directories, and cores. This library is shown to improve performance of memory-sensitive codes on a A64FX node by up to a 1.1x speedup.