We describe the VLSI implementation of PIM Lite, a prototype of the first multithreaded PIM chip. We give details about the RTL VHDL model, the floorplanning of the chip, design of its clock and power distribution networks and the deep-submicron DRC that was employed like antenna rule checking and minimum density rule checking. We also describe the techniques that we used to verify the RTL VHDL model and the chip layout. We present the area, timing and power results that we obtained using the chip layout. We analyze these results and discuss the inferences. We also present lessons that we have learned about VLSI design in the deep-submicron era, obtained from our experience of designing this chip in a deep-submicron process.