Unstructured Data Processor (Recoding Engine) and 10x10 (Systematic Heterogeneity)

Computing the Representation to optimize Data Movement and Storage Its widely recognized that data movement (from memory, from SSD, within a parallel machine, even in the wires across a chip) is the critical cost and performance limiter in computer systems. We are building architectures that enable efficient rapid transformation of information encodings, to reduce size and computation cost. UAP, UDP, and now the Recoding Engine. Initial designs and studies show that benefits of 4x to 1000x can be achieved in specific cases. Critical challenges include how to expose these new ideas to software: (e.g. transformer libraries, or to view C++ arrays as abstract data types with a different concrete type implementation), as well as a variety of functional and implementation architecture issues. These efforts came out of the 10x10 project, that pursued a  a principled, systematic approach to heterogeneity in computer architecture. A 10x10 architecture exploits deep workload analysis to drive co-design of a federated heterogeneous architecture that exploits customization for energy efficiency, but federates a set of customized engines to achieve general-purpose coverage.    The 10x10 project built 7 accelerators and federated them in a study that assessed overall benefit .  The most interesting accelerators were all data-oriented.  The three data-oriented accelerators (generalized pattern matching, small sort, and gather-scatter) were merged into a new architecture called the Unstructured Data Processor (UDP) and sometimes Unified Automata Processor (UAP)

9/20/2018 News: We are part of the new IRIS-HEP NSF grant , and we will explore acceleration of data science / High-energy physics with UDP/Recode ideas.

  1. Yuanwei Fang, Chen Zou, Aaron Elmore, and Andrew A. Chien. UDP: A Programmable Accelerator for Extract-Transform-Load Workloads and More , in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50), October 2017, Boston, Massachusetts.
  2. Yuanwei Fang, Andrew A. Chien, Andrew Lehane, and Lee Barford. Performance of Parallel Prefix Circuit Transition Localization of Pulsed Waveforms, IEEE International Instrumentation and Measurement Technology Conference, May 23-26, 2016, Taipei, Taiwan.
  3. Tung Hoang, Amirali Shambayati, and Andrew A. Chien, A Data Layout Transformation (DLT) Accelerator: Architectural Support for Data Movement Optimization in Accelerated Systems, Design, Automation and Test in Europe (DATE), 14-18 March 2016, Dresden, Germany, also  Department of Computer Science Technical Report, University of Chicago, March 2015. 
  4. Yuanwei Fang, Tung Hoang, Michela Becchi, and Andrew A. Chien. Fast Support for Unstructured Data Processing: The Unified Automata Processor, in Proceedings of IEEE Conference on Micro-architecture (MICRO-48), December 2015, Honolulu, Hawaii.   Preprint Available
  5. Yuanwei Fang, EFFCLIP + UAP: Unified, Efficient Representation and Architecture for Automata Processing, Masters Thesis, November 2015.
  6. Tung Thanh Hoang, Amirali Shambayati, Henry Hoffmann, Andrew A. Chien, Does arithmetic logic dominate data movement? a systematic comparison of energy-efficiency for FFT accelerators, In Proceedings of the 26th IEEE International Confere nce on Application-specific Systems, Architectures and Processors, (ASAP 2015), Pages 66-67, Toronto, Ontario, August 2015.   Also Department of Computer Science Technical Report, University of Chicago, January 2015.
  7. A. Chien, D. Vasudevan, T. Hoang, Y. Fang, and A. Shambayati, 10x10: A Case Study for Federated Heterogeneous Computing , Computer Architecture News,  Volume 43 Issue 3, May 2015,  Pages 2-9.   Also available at UChicago Computer Science Technical Report 2015-08.
  8. Yuanwei Fang, Andrew Lehane, and Andrew A. Chien.  "EffCLiP: Efficient Coupled-Linear Packing", Dept of Computer Science Technical Report 2015-5,  January 2015.
  9. Dilip Vasudevan and Andrew A. Chien, “BNB: Bit-Nibble-Byte Microengine For Accelerating Low-Level Bit Operations” , in Proceedings of the Great Lakes Symposium on VLSI, (GLSVLSI), Pittsburgh, PA, May 2015.
  10. Yuanwei Fang, Tung Hoang, Michela Becchi, and Andrew A. Chien.  "The Unified Automata Processor", November 2014.
  11. Tung Hoang, Calvin Deutschbein, Hank Hoffmann, and Andrew A. Chien.  “ Performance and Energy Limits of a Processor Integrated FFT Accelerator ”, in High-performance Extreme Computing (HPEC-2014), September 2014, Waltham, Massachusetts. 
  12. Yuanwei Fang, Raihan Rasool, Dilip Vasudevan, and Andrew A. Chien, " Generalized Pattern Matching Micro-engine ", in 4th Workshop on Architectures and Systems for Big Data (ASBD) held with the International Symposium on Computer Architecture (ISCA), June 2014, Minneapolis, Minnesota.  
  13. Amirali Shambayati, Data Layout Transformation Micro-engine: A Specialized Architecture to Manage Data Movements for Performance and Energy Efficiency, Masters Thesis, March 2014.
  14. Andrew A. Chien and Vijay Karamcheti,  Moore’s Law: The First Ending and A New Beginning , IEEE Computer Magazine, December 2013.
  15. P. Cicotti, L. Carrington, and Andrew A. Chien.   Towards Application-specific Memory Reconfiguration for Energy Efficiency , in Proceedings of the First Workshop on Energy Efficient Supercomputing, November 2013, at the ACM/IEEE Conference on Supercomputing.
  16. Apala Guha; Yao Zhang; Raihan ur Rasool; Andrew A Chien.   Calibrating the Relationship between Hardware Customization and Energy Efficiency . University of Chicago, Department of Computer Science Technical Report 2013-04, July 2013.
  17. Cicotti, Carrington, and Chien, Customizing Caches for Energy Efficiency: A Workload Driven Approach , University of Chicago CS-TR-2013-06, available from https://www.cs.uchicago.edu/research/publications/techreports/TR-2013-06.
  18. Apala Guha, Yao Zhang, Raihan ur Rasool, and Andrew A. Chien. 2013. Systematic evaluation of workload clustering for extremely energy-efficient architectures. SIGARCH Comput. Archit. News 41, 2 (May 2013), 22-29.
  19. Yao Zhang, Mark Sinclair II, and Andrew A. Chien,  Improving Performance Portability in OpenCL Programs , in the IEEE International Supercomputing Conference (ISC), June 16-20, 2013, Leipzig, Germany.
  20. Prasanna Balaprakash, Darius Buntinas, Anthony Chan, Apala Guha, Rinku Gupta, Sri Hari Krishna Narayanan, Andrew Chien, Paul Hovland, Boyana Norris ,  Exascale Workload Characterization and Architecture Implications , 21st High Performance Computing Symposium, at 2013 SCS Spring Simulation Multi-conference (Springsim '13), April 7-10, 2013, San Diego, CA. (Best Paper Award Winner!)
  21. Andrew A. Chien and Vijay Karamcheti, Moore’s Law: The First Ending and A New Beginning , IEEE Computer Magazine, 2013. Also available as UChicago CS TR 2012-06.
  22. Rinku Gupta, Prasanna Balaprakash, Darius Buntinas, Anthony Chan, Apala Guha, Sri Hari Krishna Narayanan, Andrew Chien, Paul Hovland, Boyana Norris, Exascale Workload Characterization and Architecture Implications , 2013 IEEE International Symposium on Performance Analysis of Systems Software, April 2013, Poster.
  23. Rinku Gupta, Prasanna Balaprakash, Darius Buntinas, Anthony Chan, Apala Guha, Sri Hari Krishna Narayanan, Andrew Chien, Paul Hovland, Boyana Norris, An Exascale Workload Study , ACM/IEEE Conference on Supercomputing, November 2012, Poster.
  24. Apala Guha and Andrew A. Chien, Systematic Evaluation of Workload Clustering for Designing Heterogeneous, General-purpose Architectures , June 2012, available as UChicago CS TR 2012-05.
  25. Apala Guha and Andrew A. Chien, The 10x10 Foundation for Heterogeneity , January 2012, available as UChicago CS TR 2012-01 
  26. Shekhar Borkar and Andrew A. Chien, The Future of Microprocessors , Communications of the Association for Computing Machinery (CACM), May 2011.   BorkarChien2011,
  27. Mark Gahagan, Allan Snavely, and Andrew A. Chien, 10x10 a General-purpose Architectural Approach to Heterogeneity and Energy-efficiency , International Conference on Computational Science, ICCS 2011  , Singapore, June 2011. ICCS2011.
  28. Andrew A. Chien, 10x10 must replace 90/10: the Future of Computer Architecture , Salishan Conference on High Performance Computing, May 2010.   10x10May2010,

People: Yuanwei (Kevin) Fang, Lang Yu, Chen Zou,  Dilip Vasudevan, Amirali Shambayati, Tung Hoang, Soyoung Eom, Willem Longendyke, Calvin Deutschbein, Hank Hoffmann, Andrew A. Chien (UChicago), Andrew Lehane (Keysight), Lee Barford (Keysight)

Previous Members:  Pietro Cicotti , Laura Carrington  (UCSD/SDSC),  Wen-mei Hwu (Illinois), Thomas Jablin, Heeseok Kim, Izzat El Hajj,  Raihan ur Rasool, Lei Zhang, Tong Hu

Collaborators:  Apala Guha (IIIT-Delhi),  Vivek De, Ram Krishnamurthy (Intel)

We gratefully acknowledge support for the 10x10 project from the National Science Foundation (NSF) Defense Advanced Research Projects Administration (DARPA) , and Keysight Corporation (formerly Agilent).