Supercomputers have grown in power in the last 30 years. The Cray-1 in 1976 produced 133 MegaFLOPs. The Earth Simulator in 2002 produces 35.860 TeraFLOPs. This is a speedup of 269,624 times in 26 years. To keep up this growth in computing power, new discoveries in semiconductor technology, processor architecture, operating systems, and programming models must be made. Processing-In-Memory (PIM) is one concept towards that goal. By supporting fine-grained multithreading, a message passing system, and taking advantage of computing at the memory sense-amps, a framework is laid for future research into a scalable supercomputer with multiple-PetaFLOPs peak performance. This thesis presents one possible implementation of a PIM device. This includes a history of Dataflow and multithreaded machines, how it applies to the PIMLite processor, a description of the PIMLite Instruction Set Architecture, and also state diagrams and source code in VHDL.