To support large shared memory applications, mNUMA (migrational NonUniform Memory Access) architectures combine scalable system design with hardware supported multithreading and thread migration, which replace or augment traditional coherence schemes while providing fine grained resource sharing to tolerate long memory access latencies. This research provides architectural improvements to memory mechanics used by existing lightweight multithreaded architecture designs. This work also presents the Mobile Subjective (MoS) programming model, which was designed alongside mNUMA and provides a novel low-level abstraction for partitioned global address space (PGAS) multithreaded applications. Using the MoS programming model, this research provides a unique approach to parallel graph algorithms specifically tailored to the mNUMA execution model. Active edge graph communication primitives provide building blocks to compose graph algorithms that avoid data hotspots found in other theoretical graph algorithm designs. In combination, these program and system design techniques are used to show efficient performance scaling of concurrent graph algorithms to over 64000 processing cores running over 1 million threads in simulation.