Lightweight Processing addresses the memory wall problem with a different approach. It tolerates memory latency by providing fast access to multiple lightweight threads of execution. Besides this, it associates a wide range of extended memory states (EMS) with each memory word for fast produce consumer synchronization. In this work, we develop a cycle accurate version of an existing behavioral simulator and model novel aspects of the LWP architecture. Using this simulation tool, we present a bottom up approach for the evaluation of the architecture on a set of micro-benchmarks. In particular, we measure the impact of memory latency and degree of banking, outstanding memory references per thread and the network on-chip (NoC) topology on the execution time of micro- benchmarks.