Modern-day computers heavily rely on the von Neumann architecture where the data is moved from memory to processing units for computation which is commonly referred to as the von Neumann bottleneck. Coupled with the vast amounts of data gathered on the web and the need to process large amounts of data, the von Neumann bottleneck can significantly thwart the speed and efficiency of computation. On the other hand, with the advancements in artificial intelligence and machine learning, the demand for computation is ever-increasing. Processing in-memory (PIM) is a new computing paradigm that offers an attractive solution to the von Neumann bottleneck. With PIM, computation happens in the memory which avoids expensive data movement associated with the von Neumann architecture. Furthermore, emerging technologies are among the top contenders for realizing PIM designs due to their frequent non-volatility, high On/Off ratios, and compatibility with CMOS fabrication processes. Although PIM designs using emerging technologies can offer significant improvement over traditional computing fabrics, they often observe non-idealities that can affect different figures of merit. These non-idealities may be present on any of the traditional design layers (device, circuit, architecture, algorithm, and application) and impact the behavior of the designed system on other layers. Furthermore, it is important to consider both top-down and bottom-up perspectives to not only minimize the adverse effects of non-idealities but also maximize gains from hardware-software co-design. As such, in this dissertation, I adopt a cross-layer design perspective to propose and evaluate PIM solutions for machine learning applications. Specifically, I propose PIM solutions for neural network training and inference, nearest neighbor classification, few-shot learning, and hyperdimensional computing. I ensure that my designs not only improve hardware benchmarks such as energy and latency but also achieve software-equivalent accuracies.