This paper presents an architecture and implementation for a low-latency, high-throughput message passing tool, referred to as the NYNET (ATM wide area network testbed in New York state) Communication System (NCS), which can support a variety of HPDC applications with different Quality of Services requirements. NCS uses multithreading to provide efficient techniques that overlap computation and communication. NCS uses read/write trap routines to bypass traditional operating system calls. This reduces latency and avoids using inefficient communication protocols. By separating data and control paths, NCS eliminates unnecessary control transfers. This optimizes the data path and improves the performance. Benchmarking results show that the performance of NCS is at least a factor of two better than the performance of corresponding p4 and PVM primitives.