Tri is a PhD candidate at Princeton University. He was advised by Professor David Wentzlaff in the field of Computer Engineering at the Princeton Parallel Computing research group. Tri is now a postdoc in the Lee Lab at Harvard Medical School to understand the brain circuit motifs.

Besides studying architecture, Tri dedicates whatever left of his time to guitar, video/board gaming, running, and last but not least, learning cool stuff on Youtube that he wished he had studied instead.

Resume (01/2019)


September 2018: Joined the Lee Lab at HMS as a postdoctoral fellow. Hopefully my architectural simulation skill will somehow be of use in the workflow of automated reconstruction of neurons :)

September 2018: I defended my thesis!!!


July 2018: Two papers--CABLE and PiCL--got accepted to MICRO'18! Japan, I'm coming!!!

April 2018: I have accepted a postdoc appointment at the Harvard Medical School to work in the Lee Lab. I'm excited to learn how neurons work and create a revolution for neural network machine learning!

Research at a Glance

For his PhD thesis, Tri's primary interest is in understanding the bandwidth wall, its cause, manifestation, and solution. To risk on oversimplification, the bandwidth wall refers to the widening gap between computation performance (core count or FLOP/s) and memory performance (# of memory channels, DRAM latency). Limited bandwidth is already a problem for today's throughput computing systems, and for future computing systems such as data centers and super-computers, memory will truly become a first-class optimization problem. In his research, Tri takes the viewpoint of systems 10+ years in the future where throughput of commercial manycore servers is more valuable than single-threaded performance of today's consumer desktops.

Selected Publications

MORC: Manycore Cache Compression (MICRO'15) pdf,slides

An approach to overcome limited off-chip bandwidth is through localizing data movement on-chip as much as possible. Cache compression is a promising technique to increase effective cache capacity, improve cache hit rate, and decrease off-chip accesses. Much like file compression for email, cache compression compacts the data residing in caches in order to store more cache lines. Unfortunately, cache compression is notoriously hard to implement, and is plagued with internal fragmentation, external fragmentation, data store expansion, and last but not least low performance compression algorithm.

To solve these challenges all in one fell swoop, MORC utilizes a novel log-based cache organization to compress a log composed of multiple cache lines together, gzip-style. This approach trades off a slight increase in access latency for vastly improved compression ratios, higher throughput, and lower energy consumption for future manycore architectures. MORC was published in MICRO'15 in Waikiki.

MORC Figure

Piton (website)

Piton (not pronounced `python` contrary to popular belief) is a manycore prototype designed in-housed at Princeton and tape-out at IBM fab (now GlobalFoundries) at 32nm. The computational core is based on the OpenSPARC T1, and it has all the traditional features you have ever wanted in a manycore prototype: tile-based, distributed shared caches, directory-based shared mem, 3 NoCs, seamless multi-chip...

Piton chip


Harvard Medical School
Harvard Medical School
Boston, MA
Postdoc Fellow in Neurobiology
HHMI Janelia Farm
HHMI Janelia Research Campus
Ashburn, VA
Visiting Researcher
Princeton University
Princeton University
Princeton, NJ
Research Assistant
NVIDIA Research
Redmond, WA
Research Intern
AMD Research
Boxborough, MA
Research Intern
Giheung, South Korea


Email: ${firstname}_${lastname} at hms dot harvard dot edu