G. Călin Caşcaval

Office address: Google LLC

10 Madden Street, CBD, Auckland, New Zealand, 1010
Work email: cascaval@google.com
Personal Email: cascaval@acm.org
LinkedIn: https://www.linkedin.com/in/cascaval
ORCID: https://orcid.org/0000-0002-2780-6763

Dr. Călin Caşcaval is Director of Engineering at Google DeepMind, leading research in scalable distributed systems.
He is an experienced technical executive with strategic vision and proven team building, research and product delivery of computer systems. He identified industry trends, defined, built, and delivered first of a kind prototypes and products, including: the first P4 production compiler and networking stack (at Barefoot Networks), the first mobile heterogeneous computing runtime and parallel browser (at Qualcomm Research), system software for the Blue Gene family of supercomputers and the first UPC compiler to scale to hundreds of thousands of processors (at IBM Research). Călin has over 50 peer-reviewed publications, 38 awarded patents. He is an IEEE Fellow and an ACM Senior Member.

Professional Experience

Director of Engineering, Google DeepMind	May 2024 - Present
Director of Engineering, Google Research	February 2020 - May 2024
Google LLC., Auckland, New Zealand
Leading research in scalable distributed systems, ML compilers, and ML systems for improved efficiency.

Sr. Director, Compilers and Tools	September 2016 - February 2020
Barefoot Networks, an Intel Company, Palo Alto, CA.
Lead the development and productization of language, compilers and tools for programmable network devices.
Led the development and productization of the first domain specific networking stack using the P4 language on the Tofino family of programmable packet processing ASICs. Managed the compilers and tools teams. Lead the development of the P4 Language: worked with the p4.org community to evolve the P4 language to support customer-specific requirements; acted as co-chair of the P4 Architecture Working Group, responsible for defining the Portable Switch Architecture. Defined the strategy around P4 tools and designed, developed and productized the P4 Insight visualization tool. Worked with internal Barefoot Networks teams to define the architecture of several generations of Tofino.

Sr. Director of Engineering (2013-2016), Director (2009-2013)	October 2009 - September 2016
Qualcomm Research Silicon Valley, Santa Clara, CA.
Led Qualcomm's Power Aware Computing strategy.
Led projects on heterogeneous mobile computing, including the Snapdragon Heterogeneous Compute SDK, and parallel libraries, e.g., best in class ARM math libraries (Snapdragon Math Libraries) Led the development of the first end-to-end parallel browser and parallel JavaScript engine for mobile devices. Responsible for management, mentoring, and hiring. Led collaborations with academia (UC Berkeley, UIUC, UT Austin, University of Washington, Georgia Tech).

Manager, Programming Models and Tools for Scalable Systems Group	September 2004 - October 2009
IBM T.J. Watson Research Center, Yorktown Heights, NY.
Job responsibilities include leading research projects with globally distributed teams. Developed and contributed to the IBM Research strategy in compilers and systems software.
Led the PERCS (DARPA HPCS) Compilers team. As part of this effort we explored and developed a number of technologies to improve programmer productivity: Continuous Program Optmization (CPO), the xlUPC compiler, math libraries for HPC, performance counter design in the IBM pSeries processors, OS tracing infrastructure, and application monitoring. Continuous Program Optmization (CPO): combine static and dynamic compilation and allow statically compiled languages to be executed using a managed runtime, in order to continuously monitor and adapt to changes in the execution environment. We developed the CPO vision and implemented several optimization prototypes that take as input system wide monitoring and optimize an application across runs. Papers published in PACT 2005, IBM System Journal, and PAC2 2004. The xlUPC compiler: lead the design and development of a UPC compiler for IBM platforms, including IBM pSeries and Blue Gene. The xlUPC compiler is the first compiler and runtime system that scales PGAS programs to hundreds of thousands of threads. The compiler was used also in the HPC Challenge Productivity Competition, and won the award every year we participated (2005, 2006, 2008, 2009) Initiated and supervised the development of math libraries for high performance linear algebra. Drove the design and development of tracing libraries for the monitoring of the full execution stack. Contributions to the design on the performance counter design in the IBM pSeries processors, OS tracing infrastructure and application monitoring Led exploratory projects in parallel programming models and parallel languages to improve programmer productivity: Asynchronous execution, Transactional Memory, and Thread Level Speculation Asynchronous execution to improve parallel execution: load balancing through work stealing, performance portability, acceleration support Transactional Memory and Thread Level Speculation -- using speculative execution to improve programmability and efficiency Mentoring IBM employees, PhD students and student interns. Extensive collaborations with academia.

Research Staff Member	July 2000 - September 2004
IBM T.J. Watson Research Center, Yorktown Heights, NY.
Participated in the design and development of system software and performance analysis tools for massively parallel systems: Blue Gene and PERCS.
PERCS (DARPA HPCS): Participated in the initial phases of the project, initially as a member of the compiler team, and starting in 2004, leading it. Established the research agenda and demonstrated the initial feasability for the Continuous Program Optimization, UPC compiler, Linear Algebra Compiler, etc. Blue Gene. Designed and developed a highly multithreaded simulator for the Blue Gene/C chip. Participated in the design and developed an initial prototype for the job launching system on Blue Gene/L. Designed and developed an initial prototype of the xl UPC compiler for the Blue Gene/L architecture. Designed and evaluated Thread Level Speculation support for Blue Gene/Q. Application evaluation across multiple Blue Gene families. Performance monitoring tools. Designed and developed internal tools for performance analysis using the hardware performance monitoring counters and provided feedback to the Power architecture performance team. Developed a methodology for performance monitoring and adaptation based on histories (see PACT 2003 paper).

Graduate Research Assistant	August 1996 - June 2000
Computer Science Department, Univ. of Illinois at Urbana-Champaign, Urbana, IL.
Conducted research in the Polaris and Delphi projects, working on compile-time performance prediction, data locality, and parallel programming models.
Extended the Polaris parallelizing compiler with code generation passes for TreadMarks and Fast Messages. Maintained the Polaris development environment. Developed MATmarks, an environment for parallel programming with MATLAB based on TreadMarks. Obtained linear speedups for several applications on a network of workstations. Worked on data locality and false sharing characterization of applications running on multiprocessor programs. Metrics derived from this study are to be used for both driving compiler optimizations for locality and program performance prediction for new architectures. Developed a compile-time model for cache memory hierarchies that is able to predict program memory behavior within 5% of the actual cache behavior.

Research Associate	May 1995 - July 1996
CyberMarche, Inc., Morgantown WV
Responsible for the design, implementation, and testing of systems targeted towards knowledge gathering and sharing for project management.
Enterprise Engineering Knowledge Base (EEKB), a project targeted towards knowledge gathering and sharing in engineering environments. Project Assessment and Coordination for Teams (PACT), a Project Management System (PMS) to be used by Hitachi, Ltd. Additional responsibilities included system administrator for a computer network consisting of Sun and IBM-PC computers.

Graduate Research Assistant	August 1993 - May 1995
Concurrent Engineering Research Center-WVU, Morgantown, WV
Conducted research in communication scheduling algorithms and software verification.
Designed and implemented a Software Project Model which provides a customizable view of a project for each team involved in software development and verification. The work was part of the Independent Verification and Validation of Software (IV&V) Project, a NASA sponsored effort to develop a collaborative environment for teams of engineers involved in software production. Developed an object oriented gateway to access Oracle services from CORBA compliant clients as part of the Information Sharing System (ISS) Project, a CORBA based system for transparent access to information stored in heterogenous repositories. Also as part of this project, implemented a Scheme interface to Orbix's Dynamic Invocation Interface. The work involved programming in C++, Scheme, Oracle, Orbix, and Web technologies on Unix platforms. The research for the Master's Thesis involved design and implementation of several communication scheduling algorithms.

Research Associate	June 1991 - August 1993
IPA (Institute for Design in Automation), Cluj-Napoca, Romania
Designed and developed production software for testing equipment.
Designed and developed an object-oriented Digital Components Simulator to be used in a Digital Boards Testing. System sold in France. Devised and implemented a Cartographic System for Survey Measurements.

Education

PhD in Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, June 2000
Thesis Title: Compile-time Performance Prediction of Scientific Programs. Available as UIUC Technical Report UIUCDCS-R-2000-2167.

MS in Computer Science, West Virginia University, Morgantown, WV, May 1995
Thesis Title: Optimizing Communication in Parallel Compilers

MS in Computer Engineering, Technical University Cluj-Napoca, Romania, June 1991
Thesis Title: Sistem pentru Analiza si Recunoasterea Semnalului Vocal (Speech Analysis and Recognition System)

Awards

2025 SIGCOMM Networking Systems Award for the The Barefoot Tofino Programmable Switch Chips and the P4 Programming Language
IEEE Fellow, 2021 (for contributions to programming models for parallel machines and heterogeneous mobile devices).
ACM Senior Member, 2009
The HPC Challenge Productivity Award, 2005, 2006, 2008, 2009
IBM Outstanding Technical Achievement Award, 2008
IBM Invention Achievement Award, 2005, 2006, 2007, 2008, 2009
IBM Research Division Award, 2003, 2006

Aug 2025 - SIGCOMM Networking Systems Award for the The Barefoot Tofino Programmable Switch Chips and the P4 Programming Language
Dec 2020 - IEEE Fellow for contributions to programming models for parallel machines and heterogeneous mobile devices.
Apr 2009 - ACM Senior Member
Nov 2008 - The 2008 HPC Challenge Class 2 Award
Mar 2008 - IBM Outstanding Technical Achievement Award
Mar 2008 - IBM Invention Achievement Award, Fifth plateau
Apr 2007 - IBM Invention Achievement Award, Third plateau
Nov 2006 - The 2006 HPC Challenge Class 2 Award - Best Performance and Productivity
Jul 2006 - IBM Invention Achievement Award, Second plateau
Jul 2006 - IBM Research Division Technical Group Award
Feb 2006 - IBM Research Division Award
Nov 2005 - The 2005 HPC Challenge Class 2 Award - Best Productivity
Apr 2005 - IBM Invention Achievement Award, First plateau
Oct 2003 - IBM Research Division Technical Group Award
June 1996 - Software Developer Excellence Award, CyberMarche Inc.
May 1991 - First Prize (with team) at the National Student Contest, Software Section, Timisoara, Romania.
May 1990 - First Prize (with team) at the National Student Contest, Hardware Section, Bucharest, Romania.

Keynote Presentations

Qualcomm Symphony: Orchestrating Heterogeneity for Power Aware Computing
Workshop on Architectures and Systems for Real-time Mobile Vision Applications (ASR-MOV) - In conjunction with CGO 2016
Are scripting languages ready for mobile computing?
The 2014 International Symposium on Code Generation and Optimization, CGO 2014
Parallel Programming for Mobile Computing
The 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013
Programming for Mobile Gadgets
The First International Workshop on Parallelism in Mobile Platforms, PRISM-1 - In conjunction with HPCA 2013
Power Programming
Programming Models for Emerging Architectures (PMEA) - In conjunction with PACT 2010

Publications

Google Scholar citations: 5049 (as of June 19, 2024).

Most relevant publications:

Logical Synchrony and the bittide Mechanism
Sanjay Lall, Calin Cascaval, Martin Izzard, Tammo Spalink
IEEE Transactions on Parallel and Distributed Systems, vol 35, Issue 11, Nov 2024
p4v: practical verification for programmable data planes
Jed Liu, William Hallahan, Cole Schlesinger, Milad Sharif, Jeongkeun Lee, Robert Soule, Han Wang, Calin Cascaval, Nick McKeown, Nate Foster
SIGCOMM 2018, Budapest, Hungary, Aug 2018.
Deoptimization for dynamic language JITs on typed, stack-based virtual machines
Madhukar N. Kedlaya, Behnam Robatmili, Calin Cascaval, Ben Hardekopf
Virtual Execution Environments (VEE 2014), Mar 2014. Best Paper Award.
Zoomm: A Parallel Web Browser Engine for Multicore Mobile Devices
Calin Cascaval, Seth Fowler, Pablo Montesinos, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar
Proceedings of The 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), Shenzhen, China, Feb 2013.
How Much Parallelism is There in Irregular Applications?
Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali, Calin Cascaval
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'09, Raleigh, NC, February 2009.
Software transactional memory: why is it only a research toy?
Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, Sid Chatterjee
Communications of the ACM, Nov 2008.
Bulk Disambiguation of Speculative Threads in Multiprocessors
Luis Ceze, James Tuck, Calin Cascaval, and Josep Torrellas
Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA 2006, Boston, MA, June 2006.

Timetide: A programming model for logically synchronous distributed systems
Logan Kenwright, Partha Roo, Nathan Allen, Calin Cascaval, and Avinash Malik
CASES 2025, ACM Transactions on Embedded Systems, Oct, 2025
Logical Synchrony and the bittide Mechanism
Sanjay Lall, Calin Cascaval, Martin Izzard, and Tammo Spalink
IEEE Transactions on Parallel and Distributed Systems, vol 35, Issue 11, Nov 2024
Logical Synchrony Networks: A Formal Model for Deterministic Distribution
Logan Kenwright, Partha Roop, Nathan Allen, Sanjay Lall, Calin Cascaval, Tammo Spalink, and Martin Izzard
IEEE Access, vol 12, Jun 7, 2024
On Buffer Centering for bittide Synchronization
Sanjay Lall, Calin Cascaval, Martin Izzard, and Tammo Spalink
International Conference on Control, Decision, and Information Technologies, 2023.
Modeling and Control of bittide Synchronization
Sanjay Lall, Calin Cascaval, Martin Izzard, and Tammo Spalink
American Control Conference, ACC’2022. Atlanta, GA, USA, 2022
Resistance Distance and Control Performance for Bittide Synchronization
Sanjay Lall, Calin Cascaval, Martin Izzard, and Tammo Spalink
European Control Conference, ECC’2022. London, England, UK, 2022
p4v: practical verification for programmable data planes
Jed Liu, William Hallahan, Cole Schlesinger, Milad Sharif, Jeongkeun Lee, Robert Soule, Han Wang, Calin Cascaval, Nick McKeown, Nate Foster
SIGCOMM 2018, Budapest, Hungary, Aug 2018
Concurrency in Mobile Browser Engines
Calin Cascaval, Pablo Montesinos-Ortego, Behnam Robatmili, Dario Suarez-Gracia
IEEE Pervasive Computing, Vol. 14, Issue 3, July-Sept, 2015. (PDF)
MuscalietJS: rethinking layered dynamic web runtimes
Behnam Robatmili, Calin Cascaval, Mehrdad Reshadi, Madhukar N. Kedlaya, Seth Fowler, Vrajesh Bhavsar, Michael Weber, Ben Hardekopf
Virtual Execution Environments (VEE 2014), Mar 2014
Deoptimization for dynamic language JITs on typed, stack-based virtual machines
Madhukar N. Kedlaya, Behnam Robatmili, Calin Cascaval, Ben Hardekopf
Virtual Execution Environments (VEE 2014), Mar 2014. Best Paper Award.
Zoomm: A Parallel Web Browser Engine for Multicore Mobile Devices
Calin Cascaval, Seth Fowler, Pablo Montesinos, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar
Proceedings of The 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), Shenzhen, China, Feb 2013.
Automatic Discovery of Performance and Energy Pitfalls in HTML and CSS
Adrian Sampson, Calin Cascaval, Luis Ceze, Pablo Montesinos, and Dario Suarez-Gracia
2012 IEEE International Symposium on Workload Characterization (IISWC), Nov 2012
Multidimensional dynamic behavior in mobile computing
Mehrdad Reshadi, Calin Cascaval
2012 IEEE International Symposium on Workload Characterization, Nov 2012
A Case for Parallelizing Web Pages
Haohui Mai, Shuo Tang, Samuel T. King, Calin Cascaval, Pablo Montesinos
4th USENIX Workshop on Hot Topics in Parallelilsm (HotPar'12), Jun 2012
Heterogenous Systems Programming
Calin Cascaval, Pablo Montesinos
2nd Workshop on SoC Architecture, Accelerators and Workloads (SAW-2), Feb, 2011
A Taxonomy of Accelerator Architectures and their Programming Models
Calin Cascaval, Siddhartha Chatterjee, Hubertus Franke, Kevin Gildea, and Pratap Pattnaik
IBM Journal of Research and Development, vol 54, issue 5, Sept/Oct 2010
The Bulk Multicore Architecture for Improved Programmability
Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval, Pablo Montesinos, Wonsun Ahn, Milos Prvulovic
Communications of the ACM, Dec 2009
Analytical Modeling of Pipeline Parallelism
Angeles Navarro, Rafael Asenjo, Siham Tabik and Calin Cascaval
Proceedings of the IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT 2009), Raleigh, NC, Sept 2009
Load balancing using work-stealing for pipeline parallelism in emerging applications
Angeles Navarro, Rafael Asenjo, Siham Tabik, Calin Cascaval
ICS '09 Proceedings of the 23rd International conference on Supercomputing, 2009
Porting k-means clustering to accelerators with the APGAS runtime
David Cunningham, Sayantan Sur, George Almasi, Vijay Saraswat, and Calin Cascaval
Proceedings of the first Workshop on Asynchrony in the Partition Global Address Space Languages, Yorktown Heights, NY, June 2009 (with ICS09)
Parallelization Spectroscopy: Analysis of Thread-level Parallelism in HPC Programs
Arun Kejariwal and Calin Cascaval
Proceedings of the The 2nd Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures (PESPMA 2009), Austin, TX, June 2009
Scalable RDMA performance in PGAS languages
Montse Farreras, George Almasi, Calin Cascaval, and Toni Cortes
Proceedings of the IEEE International Parallel & Distributed Processing Symposium, Rome, Italy, May 2009
Lonestar: A Suite of Parallel Irregular Programs
Milind Kulkarni, Martin Burtscher, Calin Cascaval, and Keshav Pingali
Proceedings of the 2009 IEEE International Symposium on Performance Analysis of Systems and Software, Boston, MA, April 2009
How much parallelism is there in irregular applications?
Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali, Calin Cascaval
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'09, Raleigh, NC, February 2009
Parallelization Spectroscopy: Analysis of thread level paralellism in HPC programs
Arun Kejariwal, Calin Cascaval
Poster presentation at the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'09, Raleigh, NC, February 2009
Software transactional memory: why is it only a research toy?
Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, Sid Chatterjee
Communications of the ACM, Nov 2008
Compiler and Runtime Techniques for Software Transactional Memory Optimization
Peng Wu, Maged Michael, Christoph von Praun, Takuya Nakaike, Rajesh Bordawekar, Harold W. Cain, Calin Cascaval, Siddhartha Chatterjee, Stefanie Chiras, Rui Hou, Mark Mergen, Xiaowei Shen, Michael F. Spear, Hua Yong Wang , Kun Wang
In the Journal of Concurrency and Computation: Practice and Experience
Compiler-driven Program Dependence Profiling for To Guide Program Parallelization
Peng Wu, Arun Kejariwal, and Calin Cascaval
Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing, LCPC'08, Edmonton, AB, 2008
Performance without pain = productivity: data layout and collective communication in UPC
Rajesh Nishtala, George Almasi, and Calin Cascaval
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'08, Salt Lake City, UT, February 2008
Modeling optimistic concurrency using quantitative dependence analysis
Christoph von Praun, Rajesh Bordawekar, and Calin Cascaval
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'08, Salt Lake City, UT, February 2008
Concurrency Control with Data Coloring
Luis Ceze, Christoph von Praun, Calin Cascaval, Pablo Montesinos, and Josep Torrellas
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, Seattle, WA, March 2008
Multidimensional blocking in UPC
Christopher Barton, Calin Cascaval, George Almasi, Rahul Garg, Jose Nelson Amaral, and Montse Farreras
Proceedings of the 20th International Workshop on Languages and Compilers for Parallel Computing, LCPC'07, Urbana, IL, 2007
Implicit parallelism with ordered transactions
Christoph von Praun, Luis Ceze, and Calin Cascaval
Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'07, San Jose, CA, March 2007
A Characterization of Shared Data Access Patterns in UPC Programs
Christopher Barton, Calin Cascaval, and Jose Nelson Amaral
Proceedings of the 19th International Workshop on Languages and Compilers for Parallel Computing, LCPC'06, New Orleans, LA, 2006
Bulk Disambiguation of Speculative Threads in Multiprocessors
Luis Ceze, James Tuck, Calin Cascaval, and Josep Torrellas
Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA 2006, Boston, MA, June 2006
Shared memory programming for large scale machines Shared Memory Programming for Large Scale Machines
Christopher Barton, Calin Cascaval, George Almasi, Yili Zheng, Montse Farreras, Siddhartha Chatterjee, Jose Nelson Amaral
Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, PLDI 2006, Ottawa, Canada, June 2006
Performance and environment monitoring for Continuous Program Optimization
Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, Robert W. Wisniewski
IBM Journal of Research and Development, Volume 50, Number 2/3, March 2006
Multiple Page Size Modeling and Optimization
Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, and Robert W. Wisniewski
Proceedings of the The Fourteenth International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), Sept. 2005, St. Louis, MO
Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture
David Rodenas, Xavier Martorell, Eduard Ayguade, Jesus Labarta, George Almasi, Calin Cascaval, Jose Castanos, Jose E. Moreira
Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), April 2005, Denver, CO.
Performance and Environment Monitoring for Whole-System Characterization and Optimization
Robert W. Wisniewski, Peter F. Sweeney, Kartik Sudeep, Matthias Hauswirth, Evelyn Duesterwald, Calin Cascaval, and Reza Azimi
The first Watson Conference on Interaction between Architecture, Circuits, and Compilers, Oct. 2004, Yorktown Heights, NY.
Characterizing and Predicting Program Behavior and its Variability
Evelyn Duesterwald, Calin Cascaval, and Sandhya Dwarkadas
Proceedings of the Twelfth International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), Sept. 2003, New Orleans, LA.
An Overview of the Blue Gene/L System Software Organization
George Almasi, Ralph Bellofatto, Calin Cascaval, Jose G. Castanos, Luis Ceze, Paul Crumley, C. Christopher Erway, Joseph Gagliano, Derek Lieber, Jose E. Moreira, Alda Sanomiya, and Karin Strauss
Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2003) Aug. 2003, Klagenfurt, Austria.
Estimating Cache Misses and Locality Using Stack Distances
Calin Cascaval and David A. Padua
Proceedings of the International Conference on Supercomputing (ICS 2003), June 2003, San Francisco, California, USA.
Evaluation of OpenMP for the Cyclops Mulithreaded Architecture
George Almasi, Eduard Ayguade, Calin Cascaval, Jose Castanos, Jesus Labarta, Fracisco Martinez, Xavier Martorell, Jose Moreira
Proceedings of the Workshop on OpenMP Applications and Tools (WOMPAT 2003), Jun. 2003, Toronto, Canada.
Full Circle: Simulating Linux Clusters on Linux Clusters
Luis Ceze, Karin Strauss, George Almasi, Patrick Bohrer, Jose R. Brunheroto, Calin Cascaval, Jose G. Castanos, Derek Lieber, Jose E. Moreira, Alda Sanomiya, and Eugen Schenfeld
Proceedings of the Fourth LCI International Conference on Linux Clusters, Jun. 2003, San Jose, CA.
System Management in the BlueGene/L Supercomputer
G. Almasi, L. Bachega, R. Bellofatto, J. Brunheroto, C. Cascaval, J. Castanos, P. Crumley, C. Erway, J. Gagliano, D. Lieber, P. Mindlin, J.E. Moreira, R.K. Sahoo, A. Sanomiya, E. Schenfeld, R. Swetz, M. Bae, G. Laib, K. Ranganathan, Y. Aridor, T. Domany, Y. Gal, O. Goldshmidt, E. Shmueli
Proceedings of the Third Workshop on Massively Parallel Processing (WMPP 2003), Apr. 2003, Nice, France.
An Overview of the BlueGene/L Supercomputer
N Adiga et al.,
In Supercomputing, Nov, 2002
Dissecting Cyclops: A Detailed Analysis of a Multithreaded Architecture
George Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau Derek Lieber, Jose E. Moreira, and Henry S. Warren, Jr.
Proceedings of the Workshop on Chip MultiProcessor: Processor Architecture and Memory Hierarchy Related Issues (MEDEA 2002), Sept. 2002, Charlottesville, VA
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
George S. Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau, Wilm Donath, Maria Eleftheriou, Mark Giampapa, Howard Ho, Derek Lieber, Jose E. Moreira, Dennis Newns, Marc Snir, Henry S. Warren, Jr.
International Journal of Parallel Programming (IJPP), 30(4), Aug. 2002.
Calculating Stack Distances Efficiently
George Almasi, Calin Cascaval, and David A. Padua
Proceedings of the Workshop on Memory System Performance (MSP 2002), June 2002, Berlin, Germany
A survey of compiler techniques for energy efficient computing
Calin Cascaval, Jose G. Castanos, Derek Lieber, and Jose E. Moreira
Austin Conference on Energy-Efficient Design, Feb. 2002, Austin, TX.
Evaluation of a Multithreaded Architecture for Cellular Computing
Calin Cascaval, Jose G. Castanos, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, Jose E. Moreira, Karin Strauss, Henry S. Warren, Jr.
Proceedings of the 8th International Symposium on High-Performance Computer Architecture, Feb 2002, Cambridge, MA, USA
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflop Computer
George S. Almasi, Calin Cascaval, Jose G. Castanos, Monty Denneau, Wilm Donath, Maria Eleftheriou, Mark Giampapa, Howard Ho, Derek Lieber, Jose E. Moreira, Dennis Newns, Marc Snir, Henry S. Warren, Jr.
Proceedings of the International Conference on Supercomputing (ICS 2001), June 2001, Sorrento, Italy
Blue Gene: A vision for protein science using a petaflop supercomputer
IBM Blue Gene team
IBM Systems Journal, Volume 40, Number 2, 2001
Compile-time Based Performance Prediction
Calin Cascaval, Luiz DeRose, David Padua, Daniel Reed
Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing (LCPC99).
MATmarks: A Shared Memory Environment for MATLAB Programming
George Almasi, Calin Cascaval, and David A. Padua
Poster presentation to HPDC'99.
PACT - A Software Package to Manage Projects and Coordinate People
K.J. Cleetus, Calin Cascaval, and K. Matsuzaki
Proceedings of the fifth WETICE, Stanford, CA, June 1996, published by the IEEE Computer Society Press.
Web* - A Technology to Make Information Available on the Web
George Almasi, Anca Suvaiala, Ion Muslea, Calin Cascaval, Tad Davis, V. "Juggy" Jagannathan
Proceedings of the forth workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Berkeley Springs, West Virginia, April 1995, published by the IEEE Computer Society Press.
TclDii: A TCL Interface to the Orbix(TM) Dynamic Invocation Interface
George Almasi, Anca Suvaiala, Cristian Goina, Calin Cascaval, V. "Juggy" Jagannathan
OOPSLA 1995
A Collaborative Environment for Independent Verification and Validation of Software
Raghu Karinthi, Kankanahalli Srinivas, Sumitra Reddy, Ramana Reddy, Calin Cascaval, Walter Jackson, Srinivasan Venkatraman, Honglan Zheng
Proceedings of the third workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Morgantown, West Virginia, April 1994, published by the IEEE Computer Society Press.

Books Co-Edited

Calin Cascaval and Pablo Montesinos-Ortego (Eds.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science Vol. 8664, Springer-Verlag, 2014.
Calin Cascaval, Pedro Trancoso, and Viktor Prasanna (Eds.) International Journal of Parallel Programming. Volume 41 Number 3
George Almasi, Calin Cascaval and Peng Wu (Eds.) Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science 4382, Springer-Verlag, 2007.

Issued Patents

The most up-to-date list of patents at the US Patent Office (38 as of July 20, 2021).

US 10,592,292 Method and apparatus for optimized execution using resource utilization maps Reshadi, Salamat, Cascaval, Fowler, Ermolinskiy, Rychlik
US 10,169,105 Method for simplified task-based runtime for efficient parallel computing Zhao, Montesinos Ortego, Raman, Robatmili, Cascaval
US 10,114,681 Identifying enhanced synchronization operation outcomes to improve runtime operations Suarez Gracia, Cascaval, Zhao, Kumar, Natarajan, Raman
US 10,067,865 System and method for allocating memory to dissimilar memory devices using quality of serviced De, Stewart, Cascaval, Chun
US 10,063,585 Methods and systems for automated anonymous crowdsourcing of characterized device behaviors Salajegheh, Mahmoudi, Sridhara, Christodorescu, Cascaval
US 10,013,554 Time varying address space layout randomization Gathala, Cascaval, Gupta
US 9,819,687 Reducing web browsing overheads with external code certification Ceze, Cascaval, Reshadi
US 9,804,893 Method and apparatus for optimized execution using resource utilization maps Reshadi, Salamat, Cascaval, Fowler, Ermolinskiy, Rychlik
US 9,798,528 Software solution for cooperative memory-side and processor-side data prefetching Gao, Cascaval, Kielstra, Tremaine, Wazlowski, Zhang
US 9,740,504 Hardware acceleration for inline caches in dynamic languages Robatmili, Cascaval, Kedlaya, Suarez Gracia
US 9,733,978 Data management for multiple processing units using data transfer costs Suarez Gracia, Kumar, Natarajan, Hastantram, Cascaval, Zhao
US 9,710,388 Hardware acceleration for inline caches in dynamic languages Robatmili, Cascaval, Kedlaya, Suarez Gracia
US 9,632,569 Directed event signaling for multiprocessor systems Suarez Gracia, Zhao, Montesinos Ortego, Cascaval, Xenidis
US 9,501,328 Method for exploiting parallelism in task-based systems using an iteration space splitter Robatmili, Aga, Suarez Gracia, Raman, Natarajan, Cascaval, Montesinos Ortego, Zhao
US 9,372,836 HTML5 I-frame extension Reshadi, Cascaval
US 9,171,097 Memoizing web-browsing computation with DOM-based isomorphism, Ceze, Cascaval, Wang, Mahan, Dhillon, Ruotsi, Mandyam
US 9,092,327 System and method for allocating memory to dissimilar memory devices using quality of service, De, Stewart, Cascaval, Chun
US 9,003,380 Execution of dynamic languages via metadata extraction, Cascaval, Reshadi.
US 8,886,887 Uniform external and internal interfaces for delinquent memory operations to facilitate cache optimization, Cascaval, Gao, Martin, Mendell.
US 8,595,443 Varying a data prefetch size based upon data usage, Arimilli, Cascaval, Sinharoy, Speight, Zhang.
US 8,572,341: Overflow handling of speculative store buffers, Blundell, Cain, Cascaval, Michael.
US 8,539,486: Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode, Cain, Cascaval, Michael.
US 8,510,237: Machine learning method to identify independent tasks for parallel layout in web browsers, Cascaval, Sampson, Wang
US 8,392,694: System and method for software initiated checkpoint, Blundell, Cain, Cascaval, Michael.
US 8,266,381: Varying an amount of data retrieved from memory based upon an instruction hint, Arimili, Cascaval, Sinharoy, Speight, Zhang.
US 8,255,913: Notification to task of completion of GSM operations by initiator node, Arimili, Blackmore, Cascaval, Rajamony.
US 8,255,626: Atomic commit predicated on consistency of watches, Blundell, Cain, Cascaval, Michael.
US 8,250,307: Sourcing differing amounts of prefetch data in response to data prefetch requests, Arimili, Cascaval, Sinharoy, Speight, Zhang.
US 8,239,879: Notification to task of completion of GSM operations at target node, Arimili, Blackmore, Cascaval, Rajamony.
US 8,136,103: Combining static and dynamic compilation to remove delinquent loads, Cascaval, Gao, Kielstra, Stoodley.
US 8,122,439: Method and computer program product for dynamically and precisely discovering deliquent memory operations, Cascaval, Gao, Yotov.
US 7,954,094: Method for improving performance of executable code, Cascaval, Chatterjee, Duesterwald, Kielstra, Stoodley.
US 7,610,266: Method for vertical integrated performance and environment monitoring, Cascaval, Duesterwald, Sweeney, and Wisniewski.
US 7,596,680: System and method for encoding and decoding architecture registers, Cascaval and Chatterjee.
US 7,380,086: Scalable runtime system for global address space languages on shared and distributed memory machines, Archambault, Bolmarcich, Cascaval, Chatterjee, Elefteriou, Mak.
US 7,376,808: Method and system for predicting the performance benefits of mapping subsets of application data to multiple page sizes, Cascaval, Duesterwald, Sweeney, Wisniewski.
US 7,289,939: Mechanism for on-line prediction of future performance measurements in a computer system, Cascaval, Duesterwald, and Dwarkadas.
US 7,072,805: Mechanism for on-line prediction of future performance measurements in a computer system, Cascaval, Duesterwald, and Dwarkadas.

Service

Steering Committee: Principles and Practices of Parallel Programming (PPoPP), Chair (2012-2018), Member (2011-); International Conference on Computing Frontiers (2011-2014)
Editorial: IEEE Micro Special Issue on Mobile Systems, Feb 2015
PhD Committee Member: Jiho Choi, PhD, UIUC, 2018; Daniel Ahn, PhD, UIUC, 2012; Luis Ceze, PhD, UIUC, 2007
PhD Student Mentor: Luis Ceze, UIUC
Christopher (Kit) Barton, Univ. of Alberta
Karin Strauss, UIUC
Technical Program Committee: ASPLOS: Architectural Support for Programming Languages and Operating Systems (2013-2016, 2025-2026)
CGO: Code Generation and Optimization (2004)
CPC: Compilers for Parallel Computing (2009)
HotPar: USENIX Workshop on Hot Topics in Parallelism (2012)
HPCA: IEEE International Symposium on High-Performance Computer Architecture (2026)
ICPP: International Conference on Parallel Processing (2008)
ICS: International Conference on Supercomputing (2012)
IEEE Micro Top Picks (2006, 2007)
IPDPS: International Parallel & Distributed Processing Symposium (2011)
ISCA: International Symposium on Computer Architecture (2016)
LCPC: Languages and Compilers for Parallel Computing (2005-2009, 2012-2015, 2022)
MICRO: International Symposium on Microarchitecture (2009)
PACT: International Conference on Parallel Architectures and Compilation Techniques (2010, 2015, 2016, 2021-2023)
PGAS: Partitioned Global Address Space Programming Models (2009, 2010)
PLDI: Programming Language Design and Implementation (PLDI) (2014, 2023, 2025)
PPoPP: Principles and Practice of Parallel Programming (2009, 2013)
SC: Supercomputing (2007)
Other workshops
Organizing Committee: PACT 2023 (Program Chair)
LCPC 2006, 2013 (General Chair)
SBAC-PAD 2012 (Program Vice Chair)
Computing Frontiers 2011 (General Chair)
PPoPP 2011 (General Chair)
ICPP 2011 (Program Vice Chair)
CPC 2009(General Co-Chair)
PPoPP 2008 (Publications Chair)
PACT 2007 (Finance Chair)
PPoPP 2006 (Local Arrangements Chair)
External Review Committee: ASPLOS (2010, 2012, 2017, 2019, 2021)
ISCA (2010, 2017, 2024)
PLDI (2015)
PPoPP (2010, 2019)
National Science Foundation (NSF) panelist

Guest Editor for IEEE Micro Special Issue on Mobile Systems.
Mentored 3 PhD students and several colleagues in IBM.
Program Chair, PACT 2023
Steering Committee chair, PPoPP (2012-2018)
General Chair PPoPP 2011, Computing Frontiers 2011, LCPC 2013, CPC 2009.
Program Committee Vice-Chair ICPP 2011.
Program Committee Member for ASPLOS, CGO, HPCA, ICPP, ICS, IPDPS, LCPC, MICRO, PACT, PGAS, PLDI, PPoPP, SBAC-PAD, SC, and many workshops.
Organizing Committee Member for PPoPP, PACT, LCPC, and SBAC-PAD.
Panelist for NSF.
Reviewer for numerous journals and technical conferences.