G. Călin Caşcaval

Office address: Google LLC
10 Madden Street, CBD, Auckland, New Zealand, 1010
Work email:cascaval@google.com
Personal Email: cascaval@acm.org
LinkedIn: https://www.linkedin.com/in/cascaval

Dr. Călin Caşcaval is Director of Engineering at Google Research, leading research in scalable distributed systems.
He is an experienced technical executive with strategic vision and proven team building, research and product delivery of computer systems. He identified industry trends, defined, built, and delivered first of a kind prototypes and products, including: the first P4 production compiler and networking stack (at Barefoot Networks), the first mobile heterogeneous computing runtime and parallel browser (at Qualcomm Research), system software for the Blue Gene family of supercomputers and the first UPC compiler to scale to hundreds of thousands of processors, (at IBM Research). Călin has over 50 peer-reviewed publications, 38 awarded patents. He is an IEEE Fellow and an ACM Senior Member.

Professional Experience

Director of Engineering, Google Research February 2020 - Present
Google LLC., Auckland, New Zealand
Leading research in scalable distributed systems.

Sr. Director, Compilers and Tools September 2016 - February 2020
Barefoot Networks, an Intel Company, Palo Alto, CA.
Lead the development and productization of language, compilers and tools for programmable network devices.
  • Led the development and productization of the first domain specific networking stack using the P4 language on the Tofino family of programmable packet processing ASICs. Managed the compilers and tools teams.
  • Lead the development of the P4 Language: worked with the p4.org community to evolve the P4 language to support customer-specific requirements; acted as co-chair of the P4 Architecture Working Group, responsible for defining the Portable Switch Architecture.
  • Defined the strategy around P4 tools and designed, developed and productized the P4 Insight visualization tool.
  • Worked with internal Barefoot Networks teams to define the architecture of several generations of Tofino.

Sr. Director of Engineering (2013-2016), Director (2009-2013) October 2009 - September 2016
Qualcomm Research Silicon Valley, Santa Clara, CA.
Led Qualcomm's Power Aware Computing strategy.

Manager, Programming Models and Tools for Scalable Systems Group September 2004 - October 2009
IBM T.J. Watson Research Center, Yorktown Heights, NY.
Job responsibilities include leading research projects with globally distributed teams. Developed and contributed to the IBM Research strategy in compilers and systems software.
Led the PERCS (DARPA HPCS) Compilers team. As part of this effort we explored and developed a number of technologies to improve programmer productivity:
  • Continuous Program Optmization (CPO): combine static and dynamic compilation and allow statically compiled languages to be executed using a managed runtime, in order to continuously monitor and adapt to changes in the execution environment. We developed the CPO vision and implemented several optimization prototypes that take as input system wide monitoring and optimize an application across runs. Papers published in PACT 2005, IBM System Journal, and PAC2 2004.
  • The xlUPC compiler: lead the design and development of a UPC compiler for IBM platforms, including IBM pSeries and Blue Gene. The xlUPC compiler is the first compiler and runtime system that scales PGAS programs to hundreds of thousands of threads. The compiler was used also in the HPC Challenge Productivity Competition, and won the award every year we participated (2005, 2006, 2008, 2009)
  • Initiated and supervised the development of math libraries for high performance linear algebra.
  • Drove the design and development of tracing libraries for the monitoring of the full execution stack. Contributions to the design on the performance counter design in the IBM pSeries processors, OS tracing infrastructure and application monitoring

Led exploratory projects in parallel programming models and parallel languages to improve programmer productivity:
  • Asynchronous execution to improve parallel execution: load balancing through work stealing, performance portability, acceleration support
  • Transactional Memory and Thread Level Speculation -- using speculative execution to improve programmability and efficiency

Mentoring IBM employees, PhD students and student interns.
Extensive collaborations with academia.

Research Staff Member July 2000 - September 2004
IBM T.J. Watson Research Center, Yorktown Heights, NY.
Participated in the design and development of system software and performance analysis tools for massively parallel systems: Blue Gene and PERCS.
  • PERCS (DARPA HPCS): Participated in the initial phases of the project, initially as a member of the compiler team, and starting in 2004, leading it. Established the research agenda and demonstrated the initial feasability for the Continuous Program Optimization, UPC compiler, Linear Algebra Compiler, etc.
  • Blue Gene. Designed and developed a highly multithreaded simulator for the Blue Gene/C chip. Participated in the design and developed an initial prototype for the job launching system on Blue Gene/L. Designed and developed an initial prototype of the xl UPC compiler for the Blue Gene/L architecture. Designed and evaluated Thread Level Speculation support for Blue Gene/Q. Application evaluation across multiple Blue Gene families.
  • Performance monitoring tools. Designed and developed internal tools for performance analysis using the hardware performance monitoring counters and provided feedback to the Power architecture performance team. Developed a methodology for performance monitoring and adaptation based on histories (see PACT 2003 paper).

Graduate Research Assistant August 1996 - June 2000
Computer Science Department, Univ. of Illinois at Urbana-Champaign, Urbana, IL.
Conducted research in the Polaris and Delphi projects, working on compile-time performance prediction, data locality, and parallel programming models.
  • Extended the Polaris parallelizing compiler with code generation passes for TreadMarks and Fast Messages. Maintained the Polaris development environment.
  • Developed MATmarks, an environment for parallel programming with MATLAB based on TreadMarks. Obtained linear speedups for several applications on a network of workstations.
  • Worked on data locality and false sharing characterization of applications running on multiprocessor programs. Metrics derived from this study are to be used for both driving compiler optimizations for locality and program performance prediction for new architectures.
  • Developed a compile-time model for cache memory hierarchies that is able to predict program memory behavior within 5% of the actual cache behavior.

Research Associate May 1995 - July 1996
CyberMarche, Inc., Morgantown WV
Responsible for the design, implementation, and testing of systems targeted towards knowledge gathering and sharing for project management.
  • Enterprise Engineering Knowledge Base (EEKB), a project targeted towards knowledge gathering and sharing in engineering environments.
  • Project Assessment and Coordination for Teams (PACT), a Project Management System (PMS) to be used by Hitachi, Ltd.
Additional responsibilities included system administrator for a computer network consisting of Sun and IBM-PC computers.

Graduate Research Assistant August 1993 - May 1995
Concurrent Engineering Research Center-WVU, Morgantown, WV
Conducted research in communication scheduling algorithms and software verification.
  • Designed and implemented a Software Project Model which provides a customizable view of a project for each team involved in software development and verification. The work was part of the Independent Verification and Validation of Software (IV&V) Project, a NASA sponsored effort to develop a collaborative environment for teams of engineers involved in software production.
  • Developed an object oriented gateway to access Oracle services from CORBA compliant clients as part of the Information Sharing System (ISS) Project, a CORBA based system for transparent access to information stored in heterogenous repositories. Also as part of this project, implemented a Scheme interface to Orbix's Dynamic Invocation Interface. The work involved programming in C++, Scheme, Oracle, Orbix, and Web technologies on Unix platforms.
  • The research for the Master's Thesis involved design and implementation of several communication scheduling algorithms.

Research Associate June 1991 - August 1993
IPA (Institute for Design in Automation), Cluj-Napoca, Romania
Designed and developed production software.
  • Designed and developed an object-oriented Digital Components Simulator to be used in a Digital Boards Testing. System sold in France.
  • Devised and implemented a Cartographic System for Survey Measurements.

Education

PhD in Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, June 2000
Thesis Title: Compile-time Performance Prediction of Scientific Programs. Available as UIUC Technical Report UIUCDCS-R-2000-2167.
MS in Computer Science, West Virginia University, Morgantown, WV, May 1995
Thesis Title: Optimizing Communication in Parallel Compilers
MS in Computer Engineering, Technical University Cluj-Napoca, Romania, June 1991
Thesis Title: Sistem pentru Analiza si Recunoasterea Semnalului Vocal (Speech Analysis and Recognition System)

Awards

Keynote Presentations

  1. Qualcomm Symphony: Orchestrating Heterogeneity for Power Aware Computing, Workshop on Architectures and Systems for Real-time Mobile Vision Applications (ASR-MOV) - In conjunction with CGO 2016
  2. Are scripting languages ready for mobile computing?, The 2014 International Symposium on Code Generation and Optimization, CGO 2014
  3. Parallel Programming for Mobile Computing, The 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013
  4. Programming for Mobile Gadgets, The First International Workshop on Parallelism in Mobile Platforms, PRISM-1 - In conjunction with HPCA 2013
  5. Power Programming, Programming Models for Emerging Architectures (PMEA) - In conjunction with PACT 2010

Publications

Google Scholar citations: 4750 (as of July 20, 2021).

Five most relevant recent publications:
  1. Deoptimization for dynamic language JITs on typed, stack-based virtual machines
    Madhukar N. Kedlaya, Behnam Robatmili, Calin Cascaval, Ben Hardekopf
    Virtual Execution Environments (VEE 2014), Mar 2014. Best Paper Award.

  2. Zoomm: A Parallel Web Browser Engine for Multicore Mobile Devices
    Calin Cascaval, Seth Fowler, Pablo Montesinos, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar
    Proceedings of The 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), Shenzhen, China, Feb 2013.

  3. How Much Parallelism is There in Irregular Applications?
    Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali, Calin Cascaval
    Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'09, Raleigh, NC, February 2009

  4. ACM DL Author-ize serviceSoftware transactional memory: why is it only a research toy?
    Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, Sid Chatterjee
    Communications of the ACM, Nov 2008

  5. Bulk Disambiguation of Speculative Threads in Multiprocessors
    Luis Ceze, James Tuck, Calin Cascaval, and Josep Torrellas
    Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA 2006, Boston, MA, June 2006

  1. p4v: practical verification for programmable data planes
    Jed Liu, William Hallahan, Cole Schlesinger, Milad Sharif, Jeongkeun Lee, Robert Soule, Han Wang, Calin Cascaval, Nick McKeown, Nate Foster
    SIGCOMM 2018, Budapest, Hungary, Aug 2018

  2. Concurrency in Mobile Browser Engines
    Calin Cascaval, Pablo Montesinos-Ortego, Behnam Robatmili, Dario Suarez-Gracia
    IEEE Pervasive Computing, Vol. 14, Issue 3, July-Sept, 2015

  3. MuscalietJS: rethinking layered dynamic web runtimes
    Behnam Robatmili, Calin Cascaval, Mehrdad Reshadi, Madhukar N. Kedlaya, Seth Fowler, Vrajesh Bhavsar, Michael Weber, Ben Hardekopf
    Virtual Execution Environments (VEE 2014), Mar 2014

  4. Deoptimization for dynamic language JITs on typed, stack-based virtual machines
    Madhukar N. Kedlaya, Behnam Robatmili, Calin Cascaval, Ben Hardekopf
    Virtual Execution Environments (VEE 2014), Mar 2014. Best Paper Award.

  5. Zoomm: A Parallel Web Browser Engine for Multicore Mobile Devices
    Calin Cascaval, Seth Fowler, Pablo Montesinos, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar
    Proceedings of The 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), Shenzhen, China, Feb 2013.

  6. Automatic Discovery of Performance and Energy Pitfalls in HTML and CSS
    Adrian Sampson, Calin Cascaval, Luis Ceze, Pablo Montesinos, and Dario Suarez-Gracia
    2012 IEEE International Symposium on Workload Characterization (IISWC), Nov 2012

  7. Multidimensional dynamic behavior in mobile computing
    Mehrdad Reshadi, Calin Cascaval
    2012 IEEE International Symposium on Workload Characterization, Nov 2012

  8. A Case for Parallelizing Web Pages
    Haohui Mai, Shuo Tang, Samuel T. King, Calin Cascaval, Pablo Montesinos
    4th USENIX Workshop on Hot Topics in Parallelilsm (HotPar'12), Jun 2012

  9. Heterogenous Systems Programming
    Calin Cascaval, Pablo Montesinos
    2nd Workshop on SoC Architecture, Accelerators and Workloads (SAW-2), Feb, 2011

  10. A Taxonomy of Accelerator Architectures and their Programming Models
    Calin Cascaval, Siddhartha Chatterjee, Hubertus Franke, Kevin Gildea, and Pratap Pattnaik
    IBM Journal of Research and Development, vol 54, issue 5, Sept/Oct 2010

  11. ACM DL Author-ize service The Bulk Multicore Architecture for Improved Programmability
    Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval, Pablo Montesinos, Wonsun Ahn, Milos Prvulovic
    Communications of the ACM, Dec 2009

  12. Analytical Modeling of Pipeline Parallelism
    Angeles Navarro, Rafael Asenjo, Siham Tabik and Calin Cascaval
    Proceedings of the IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT 2009), Raleigh, NC, Sept 2009

  13. ACM DL Author-ize serviceLoad balancing using work-stealing for pipeline parallelism in emerging applications
    Angeles Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval
    ICS '09 Proceedings of the 23rd International conference on Supercomputing, 2009

  14. Porting k-means clustering to accelerators with the APGAS runtime
    David Cunningham, Sayantan Sur, George Almasi, Vijay Saraswat, and Calin Cascaval
    Proceedings of the first Workshop on Asynchrony in the Partition Global Address Space Languages, Yorktown Heights, NY, June 2009 (with ICS09)

  15. Parallelization Spectroscopy: Analysis of Thread-level Parallelism in HPC Programs
    Arun Kejariwal and Calin Cascaval
    Proceedings of the The 2nd Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures (PESPMA 2009), Austin, TX, June 2009

  16. Scalable RDMA performance in PGAS languages
    Montse Farreras, George Almasi, Calin Cascaval, and Toni Cortes
    Proceedings of the IEEE International Parallel & Distributed Processing Symposium, Rome, Italy, May 2009

  17. Lonestar: A Suite of Parallel Irregular Programs
    Milind Kulkarni, Martin Burtscher, Calin Cascaval, and Keshav Pingali
    Proceedings of the 2009 IEEE International Symposium on Performance Analysis of Systems and Software, Boston, MA, April 2009

  18. ACM DL Author-ize service How much parallelism is there in irregular applications?
    Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali, Calin Cascaval
    Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'09, Raleigh, NC, February 2009

  19. Parallelization Spectroscopy: Analysis of thread level paralellism in HPC programs
    Arun Kejariwal, Calin Cascaval
    Poster presentation at the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'09, Raleigh, NC, February 2009

  20. ACM DL Author-ize service Software transactional memory: why is it only a research toy?
    Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, Sid Chatterjee
    Communications of the ACM, Nov 2008

  21. Compiler and Runtime Techniques for Software Transactional Memory Optimization
    Peng Wu, Maged Michael, Christoph von Praun, Takuya Nakaike, Rajesh Bordawekar, Harold W. Cain, Calin Cascaval, Siddhartha Chatterjee, Stefanie Chiras, Rui Hou, Mark Mergen, Xiaowei Shen, Michael F. Spear, Hua Yong Wang , Kun Wang
    In the Journal of Concurrency and Computation: Practice and Experience

  22. Compiler-driven Program Dependence Profiling for To Guide Program Parallelization
    Peng Wu, Arun Kejariwal, and Calin Cascaval
    Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing, LCPC'08, Edmonton, AB, 2008


  23. ACM DL Author-ize service Modeling optimistic concurrency using quantitative dependence analysis
    Christoph von Praun, Rajesh Bordawekar, and Calin Cascaval
    Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'08, Salt Lake City, UT, February 2008

  24. Concurrency Control with Data Coloring
    Luis Ceze, Christoph von Praun, Calin Cascaval, Pablo Montesinos, and Josep Torrellas
    Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, Seattle, WA, March 2008

  25. Multidimensional blocking in UPC
    Christopher Barton, Calin Cascaval, George Almasi, Rahul Garg, Jose Nelson Amaral, and Montse Farreras
    Proceedings of the 20th International Workshop on Languages and Compilers for Parallel Computing, LCPC'07, Urbana, IL, 2007

  26. ACM DL Author-ize service Implicit parallelism with ordered transactions
    Christoph von Praun, Luis Ceze, and Calin Cascaval
    Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'07, San Jose, CA, March 2007

  27. A Characterization of Shared Data Access Patterns in UPC Programs
    Christopher Barton, Calin Cascaval, and Jose Nelson Amaral
    Proceedings of the 19th International Workshop on Languages and Compilers for Parallel Computing, LCPC'06, New Orleans, LA, 2006

  28. ACM DL Author-ize service Bulk Disambiguation of Speculative Threads in Multiprocessors
    Luis Ceze, James Tuck, Calin Cascaval, and Josep Torrellas
    Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA 2006, Boston, MA, June 2006

  29. ACM DL Author-ize service Shared memory programming for large scale machinesShared Memory Programming for Large Scale Machines
    Christopher Barton, Calin Cascaval, George Almasi, Yili Zheng, Montse Farreras, Siddhartha Chatterjee, Jose Nelson Amaral
    Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, PLDI 2006, Ottawa, Canada, June 2006

  30. Performance and environment monitoring for Continuous Program Optimization
    Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, Robert W. Wisniewski
    IBM Journal of Research and Development, Volume 50, Number 2/3, March 2006

  31. Multiple Page Size Modeling and Optimization
    Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, and Robert W. Wisniewski
    Proceedings of the The Fourteenth International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), Sept. 2005, St. Louis, MO

  32. Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture
    David Rodenas, Xavier Martorell, Eduard Ayguade, Jesus Labarta, George Almasi, Calin Cascaval, Jose Castanos, Jose E. Moreira
    Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), April 2005, Denver, CO.

  33. Performance and Environment Monitoring for Whole-System Characterization and Optimization
    Robert W. Wisniewski, Peter F. Sweeney, Kartik Sudeep, Matthias Hauswirth, Evelyn Duesterwald, Calin Cascaval, and Reza Azimi
    The first Watson Conference on Interaction between Architecture, Circuits, and Compilers, Oct. 2004, Yorktown Heights, NY.

  34. Characterizing and Predicting Program Behavior and its Variability
    Evelyn Duesterwald, Calin Cascaval, and Sandhya Dwarkadas
    Proceedings of the Twelfth International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), Sept. 2003, New Orleans, LA.

  35. An Overview of the Blue Gene/L System Software Organization
    George Almasi, Ralph Bellofatto, Calin Cascaval, Jose G. Castanos, Luis Ceze, Paul Crumley, C. Christopher Erway, Joseph Gagliano, Derek Lieber, Jose E. Moreira, Alda Sanomiya, and Karin Strauss
    Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2003) Aug. 2003, Klagenfurt, Austria.

  36. Estimating Cache Misses and Locality Using Stack Distances
    Calin Cascaval and David A. Padua
    Proceedings of the International Conference on Supercomputing (ICS 2003), June 2003, San Francisco, California, USA.

  37. Evaluation of OpenMP for the Cyclops Mulithreaded Architecture
    George Almasi, Eduard Ayguade, Calin Cascaval, Jose Castanos, Jesus Labarta, Fracisco Martinez, Xavier Martorell, Jose Moreira
    Proceedings of the Workshop on OpenMP Applications and Tools (WOMPAT 2003), Jun. 2003, Toronto, Canada.

  38. Full Circle: Simulating Linux Clusters on Linux Clusters
    Luis Ceze, Karin Strauss, George Almasi, Patrick Bohrer, Jose R. Brunheroto, Calin Cascaval, Jose G. Castanos, Derek Lieber, Jose E. Moreira, Alda Sanomiya, and Eugen Schenfeld
    Proceedings of the Fourth LCI International Conference on Linux Clusters, Jun. 2003, San Jose, CA.

  39. System Management in the BlueGene/L Supercomputer
    G. Almasi, L. Bachega, R. Bellofatto, J. Brunheroto, C. Cascaval, J. Castanos, P. Crumley, C. Erway, J. Gagliano, D. Lieber, P. Mindlin, J.E. Moreira, R.K. Sahoo, A. Sanomiya, E. Schenfeld, R. Swetz, M. Bae, G. Laib, K. Ranganathan, Y. Aridor, T. Domany, Y. Gal, O. Goldshmidt, E. Shmueli
    Proceedings of the Third Workshop on Massively Parallel Processing (WMPP 2003), Apr. 2003, Nice, France.

  40. An Overview of the BlueGene/L Supercomputer
    N Adiga et al.,
    In Supercomputing, Nov, 2002

  41. Dissecting Cyclops: A Detailed Analysis of a Multithreaded Architecture
    George Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau Derek Lieber, Jose E. Moreira, and Henry S. Warren, Jr.
    Proceedings of the Workshop on Chip MultiProcessor: Processor Architecture and Memory Hierarchy Related Issues (MEDEA 2002), Sept. 2002, Charlottesville, VA

  42. Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
    George S. Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau, Wilm Donath, Maria Eleftheriou, Mark Giampapa, Howard Ho, Derek Lieber, Jose E. Moreira, Dennis Newns, Marc Snir, Henry S. Warren, Jr.
    International Journal of Parallel Programming (IJPP), 30(4), Aug. 2002.

  43. Calculating Stack Distances Efficiently
    George Almasi, Calin Cascaval, and David A. Padua
    Proceedings of the Workshop on Memory System Performance (MSP 2002), June 2002, Berlin, Germany

  44. A survey of compiler techniques for energy efficient computing
    Calin Cascaval, Jose G. Castanos, Derek Lieber, and Jose E. Moreira
    Austin Conference on Energy-Efficient Design, Feb. 2002, Austin, TX.

  45. Evaluation of a Multithreaded Architecture for Cellular Computing
    Calin Cascaval, Jose G. Castanos, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, Jose E. Moreira, Karin Strauss, Henry S. Warren, Jr.
    Proceedings of the 8th International Symposium on High-Performance Computer Architecture, Feb 2002, Cambridge, MA, USA

  46. Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflop Computer
    George S. Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau, Wilm Donath, Maria Eleftheriou, Mark Giampapa, Howard Ho, Derek Lieber, Jose E. Moreira, Dennis Newns, Marc Snir, Henry S. Warren, Jr.
    Proceedings of the International Conference on Supercomputing (ICS 2001), June 2001, Sorrento, Italy

  47. Blue Gene: A vision for protein science using a petaflop supercomputer
    IBM Blue Gene team
    IBM Systems Journal, Volume 40, Number 2, 2001

  48. Compile-time Based Performance Prediction
    Calin Cascaval, Luiz DeRose, David Padua, Daniel Reed
    Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing (LCPC99).

  49. MATmarks: A Shared Memory Environment for MATLAB Programming
    George Almasi, Calin Cascaval, and David A. Padua
    Poster presentation to HPDC'99.

  50. PACT - A Software Package to Manage Projects and Coordinate People
    K.J. Cleetus, Calin Cascaval, and K. Matsuzaki
    Proceedings of the fifth WETICE, Stanford, CA, June 1996, published by the IEEE Computer Society Press.

  51. Web* - A Technology to Make Information Available on the Web
    George Almasi, Anca Suvaiala, Ion Muslea, Calin Cascaval, Tad Davis, V. "Juggy" Jagannathan
    Proceedings of the forth workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Berkeley Springs, West Virginia, April 1995, published by the IEEE Computer Society Press.

  52. TclDii: A TCL Interface to the Orbix(TM) Dynamic Invocation Interface
    George Almasi, Anca Suvaiala, Cristian Goina, Calin Cascaval, V. "Juggy" Jagannathan
    OOPSLA 1995

  53. A Collaborative Environment for Independent Verification and Validation of Software
    Raghu Karinthi, Kankanahalli Srinivas, Sumitra Reddy, Ramana Reddy, Calin Cascaval, Walter Jackson, Srinivasan Venkatraman, Honglan Zheng
    Proceedings of the third workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Morgantown, West Virginia, April 1994, published by the IEEE Computer Society Press.

Books Co-Edited

Issued Patents

The most up-to-date list of patents at the US Patent Office (38 as of July 20, 2021).
  1. US 10,592,292 Method and apparatus for optimized execution using resource utilization maps Reshadi, Salamat, Cascaval, Fowler, Ermolinskiy, Rychlik
  2. US 10,169,105 Method for simplified task-based runtime for efficient parallel computing Zhao, Montesinos Ortego, Raman, Robatmili, Cascaval
  3. US 10,114,681 Identifying enhanced synchronization operation outcomes to improve runtime operations Suarez Gracia, Cascaval, Zhao, Kumar, Natarajan, Raman
  4. US 10,067,865 System and method for allocating memory to dissimilar memory devices using quality of serviced De, Stewart, Cascaval, Chun
  5. US 10,063,585 Methods and systems for automated anonymous crowdsourcing of characterized device behaviors Salajegheh, Mahmoudi, Sridhara, Christodorescu, Cascaval
  6. US 10,013,554 Time varying address space layout randomization Gathala, Cascaval, Gupta
  7. US 9,819,687 Reducing web browsing overheads with external code certification Ceze, Cascaval, Reshadi
  8. US 9,804,893 Method and apparatus for optimized execution using resource utilization maps Reshadi, Salamat, Cascaval, Fowler, Ermolinskiy, Rychlik
  9. US 9,798,528 Software solution for cooperative memory-side and processor-side data prefetching Gao, Cascaval, Kielstra, Tremaine, Wazlowski, Zhang
  10. US 9,740,504 Hardware acceleration for inline caches in dynamic languages Robatmili, Cascaval, Kedlaya, Suarez Gracia
  11. US 9,733,978 Data management for multiple processing units using data transfer costs Suarez Gracia, Kumar, Natarajan, Hastantram, Cascaval, Zhao
  12. US 9,710,388 Hardware acceleration for inline caches in dynamic languages Robatmili, Cascaval, Kedlaya, Suarez Gracia
  13. US 9,632,569 Directed event signaling for multiprocessor systems Suarez Gracia, Zhao, Montesinos Ortego, Cascaval, Xenidis
  14. US 9,501,328 Method for exploiting parallelism in task-based systems using an iteration space splitter Robatmili, Aga, Suarez Gracia, Raman, Natarajan, Cascaval, Montesinos Ortego, Zhao
  15. US 9,372,836 HTML5 I-frame extension Reshadi, Cascaval
  16. US 9,171,097 Memoizing web-browsing computation with DOM-based isomorphism, Ceze, Cascaval, Wang, Mahan, Dhillon, Ruotsi, Mandyam
  17. US 9,092,327 System and method for allocating memory to dissimilar memory devices using quality of service, De, Stewart, Cascaval, Chun
  18. US 9,003,380 Execution of dynamic languages via metadata extraction, Cascaval, Reshadi.
  19. US 8,886,887 Uniform external and internal interfaces for delinquent memory operations to facilitate cache optimization, Cascaval, Gao, Martin, Mendell.
  20. US 8,595,443 Varying a data prefetch size based upon data usage, Arimilli, Cascaval, Sinharoy, Speight, Zhang.
  21. US 8,572,341: Overflow handling of speculative store buffers, Blundell, Cain, Cascaval, Michael.
  22. US 8,539,486: Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode, Cain, Cascaval, Michael.
  23. US 8,510,237: Machine learning method to identify independent tasks for parallel layout in web browsers, Cascaval, Sampson, Wang
  24. US 8,392,694: System and method for software initiated checkpoint, Blundell, Cain, Cascaval, Michael.
  25. US 8,266,381: Varying an amount of data retrieved from memory based upon an instruction hint, Arimili, Cascaval, Sinharoy, Speight, Zhang.
  26. US 8,255,913: Notification to task of completion of GSM operations by initiator node, Arimili, Blackmore, Cascaval, Rajamony.
  27. US 8,255,626: Atomic commit predicated on consistency of watches, Blundell, Cain, Cascaval, Michael.
  28. US 8,250,307: Sourcing differing amounts of prefetch data in response to data prefetch requests, Arimili, Cascaval, Sinharoy, Speight, Zhang.
  29. US 8,239,879: Notification to task of completion of GSM operations at target node, Arimili, Blackmore, Cascaval, Rajamony.
  30. US 8,136,103: Combining static and dynamic compilation to remove delinquent loads, Cascaval, Gao, Kielstra, Stoodley.
  31. US 8,122,439: Method and computer program product for dynamically and precisely discovering deliquent memory operations, Cascaval, Gao, Yotov.
  32. US 7,954,094: Method for improving performance of executable code, Cascaval, Chatterjee, Duesterwald, Kielstra, Stoodley.
  33. US 7,610,266: Method for vertical integrated performance and environment monitoring, Cascaval, Duesterwald, Sweeney, and Wisniewski.
  34. US 7,596,680: System and method for encoding and decoding architecture registers, Cascaval and Chatterjee.
  35. US 7,380,086: Scalable runtime system for global address space languages on shared and distributed memory machines, Archambault, Bolmarcich, Cascaval, Chatterjee, Elefteriou, Mak.
  36. US 7,376,808: Method and system for predicting the performance benefits of mapping subsets of application data to multiple page sizes, Cascaval, Duesterwald, Sweeney, Wisniewski.
  37. US 7,289,939: Mechanism for on-line prediction of future performance measurements in a computer system, Cascaval, Duesterwald, and Dwarkadas.
  38. US 7,072,805: Mechanism for on-line prediction of future performance measurements in a computer system, Cascaval, Duesterwald, and Dwarkadas.

Service

Steering Committee
Principles and Practices of Parallel Programming (PPoPP), Chair (2012-2018), Member (2011-)
International Conference on Computing Frontiers (2011-2014)
Editorial
IEEE Micro Special Issue on Mobile Systems, Feb 2015
PhD Committee Member
Jiho Choi, PhD, UIUC, 2018
Daniel Ahn, PhD, UIUC, 2012
Luis Ceze, PhD, UIUC, 2007
PhD Student Mentor
Luis Ceze, UIUC
Christopher (Kit) Barton, Univ. of Alberta
Karin Strauss, UIUC
Technical Program Committee
ASPLOS: Architectural Support for Programming Languages and Operating Systems (2013-2016, 2025)
CGO: Code Generation and Optimization (2004)
CPC: Compilers for Parallel Computing (2009)
HotPar: USENIX Workshop on Hot Topics in Parallelism (2012)
ICPP: International Conference on Parallel Processing (2008)
ICS: International Conference on Supercomputing (2012)
IEEE Micro Top Picks (2006, 2007)
IPDPS: International Parallel & Distributed Processing Symposium (2011)
ISCA: International Symposium on Computer Architecture (2016)
LCPC: Languages and Compilers for Parallel Computing (2005-2009, 2012-2015, 2022)
MICRO: International Symposium on Microarchitecture (2009)
PACT: International Conference on Parallel Architectures and Compilation Techniques (2010, 2015, 2016, 2021, 2022)
PGAS: Partitioned Global Address Space Programming Models (2009, 2010)
PLDI: Programming Language Design and Implementation (PLDI) (2014, 2023)
PPoPP: Principles and Practice of Parallel Programming (2009, 2013)
SC: Supercomputing (2007)
Other workshops
Organizing Committee
PACT 2023 (Program Chair)
LCPC 2006, 2013 (General Chair)
SBAC-PAD 2012 (Program Vice Chair)
Computing Frontiers 2011 (General Chair)
PPoPP 2011 (General Chair)
ICPP 2011 (Program Vice Chair)
CPC 2009(General Co-Chair)
PPoPP 2008 (Publications Chair)
PACT 2007 (Finance Chair)
PPoPP 2006 (Local Arrangements Chair)
External Review Committee
ASPLOS (2010, 2012, 2017, 2019, 2021)
ISCA (2010, 2017, 2024)
PLDI (2015)
PPoPP (2010, 2019)
National Science Foundation (NSF) panelist
Guest Editor for IEEE Micro Special Issue on Mobile Systems.
Mentored 3 PhD students and several colleagues in IBM.
Program Chair, PACT 2023
Steering Committee chair, PPoPP (2012-2018)
General Chair PPoPP 2011, Computing Frontiers 2011, LCPC 2013, CPC 2009.
Program Committee Vice-Chair ICPP 2011.
Program Committee Member for ASPLOS, CGO, ICPP, ICS, IPDPS, LCPC, MICRO, PACT, PGAS, PLDI, PPoPP, SBAC-PAD, SC, and many workshops.
Organizing Committee Member for PPoPP, PACT, LCPC, and SBAC-PAD.
Panelist for NSF.
Reviewer for numerous journals and technical conferences.