Dr. Călin Caşcaval is Director of Engineering at Google
Research, leading research in scalable distributed systems.
is an experienced technical executive with strategic vision and proven
team building, research and product delivery of computer systems. He
identified industry trends, defined, built, and delivered first of a
kind prototypes and products, including: the first P4 production
compiler and networking stack (at Barefoot Networks), the first mobile
heterogeneous computing runtime and parallel browser (at Qualcomm
Research), system software for the Blue Gene family of supercomputers
and the first UPC compiler to scale to hundreds of thousands of
processors, (at IBM Research). Călin has over 50 peer-reviewed
publications, 37 awarded patents and is an IEEE Fellow and an ACM
|Director of Engineering,
||February 2020 - Present
| Google LLC.,
Mountain View, CA.
|Leading research in scalable
|Sr. Director, Compilers
||September 2016 - February 2020
|Barefoot Networks, an Intel Company,
Palo Alto, CA.
|Lead the development and
productization of language, compilers and tools for programmable
- Led the development and productization of the first domain
specific networking stack using the P4
language on the Tofino family
of programmable packet processing ASICs. Managed the compilers and
- Lead the development of the P4
Language: worked with the p4.org community to evolve the P4
language to support customer-specific requirements; acted as
co-chair of the P4 Architecture Working Group, responsible for defining the Portable Switch Architecture.
- Defined the strategy around P4 tools and designed, developed and
productized the P4 Insight visualization tool.
- Worked with internal Barefoot Networks teams to define the architecture of several generations of Tofino.
|Sr. Director of Engineering
(2013-2016), Director (2009-2013)
||October 2009 - September 2016
Research Silicon Valley, Santa Clara, CA.
|Led Qualcomm's Power Aware Computing
- Led projects on heterogeneous mobile computing, including the
Heterogeneous Compute SDK, and parallel libraries, e.g., best in
class ARM math libraries
- Led the development of the first
browser and parallel
- Responsible for management, mentoring, and hiring.
- Led collaborations with academia (UC Berkeley, UIUC, UT Austin,
University of Washington, Georgia Tech).
|Manager, Programming Models and
Tools for Scalable Systems Group
||September 2004 - October 2009
T.J. Watson Research Center, Yorktown Heights,
|Job responsibilities include leading
research projects with globally distributed teams. Developed and
contributed to the IBM Research strategy in compilers and systems
Led the PERCS
(DARPA HPCS) Compilers team. As part of this effort we explored
and developed a number of technologies to improve programmer
- Continuous Program
Optmization (CPO): combine static and dynamic compilation and allow
statically compiled languages to be executed using a managed
runtime, in order to continuously monitor and adapt to changes in
the execution environment. We developed the CPO vision and
implemented several optimization prototypes that take as input
system wide monitoring and optimize an application across
runs. Papers published in PACT 2005, IBM System Journal, and PAC2 2004.
- The xlUPC
compiler: lead the design and development of a UPC compiler for
IBM platforms, including IBM pSeries and Blue Gene. The xlUPC
compiler is the first compiler and runtime system that scales PGAS
programs to hundreds of thousands of threads. The compiler was
used also in the HPC
Challenge Productivity Competition, and won the award every
year we participated (2005, 2006, 2008, 2009)
- Initiated and supervised the development of math libraries for
high performance linear algebra.
- Drove the design and development of tracing libraries for the
monitoring of the full execution stack. Contributions to
the design on the performance counter design in the IBM pSeries
processors, OS tracing infrastructure and application monitoring
Led exploratory projects in parallel programming models and
parallel languages to improve programmer productivity:
Mentoring IBM employees, PhD students and student interns.
Extensive collaborations with academia.
- Asynchronous execution to improve parallel execution: load
balancing through work stealing, performance portability,
- Transactional Memory and Thread Level Speculation -- using
speculative execution to improve programmability and efficiency
|Research Staff Member
||July 2000 - September 2004
T.J. Watson Research Center, Yorktown Heights,
Participated in the design and development of system software and
performance analysis tools for massively parallel systems: Blue Gene
(DARPA HPCS). Participated in the initial phases of the
project, initially as a member of the compiler team, and starting in
2004, leading it. Established the research agenda and demonstrated
the initial feasability for the Continuous Program Optimization, UPC
compiler, Linear Algebra Compiler, etc.
- Blue Gene.
Designed and developed a highly multithreaded simulator for the Blue
Gene/C chip. Participated in the design and developed an initial
prototype for the job launching system on Blue Gene/L. Designed and
developed an initial prototype of the xl UPC compiler for the Blue
Gene/L architecture. Designed and evaluated Thread Level Speculation
support for Blue Gene/Q. Application evaluation across multiple Blue
- Performance monitoring tools. Designed and developed internal
tools for performance analysis using the hardware performance
monitoring counters and provided feedback to the Power architecture
performance team. Developed a methodology for performance monitoring
and adaptation based on histories (see PACT 2003
|Graduate Research Assistant
||August 1996 - June 2000
Science Department, Univ. of Illinois at
Urbana-Champaign, Urbana, IL.
|Conducted research in
and Delphi projects, working on compile-time performance
prediction, data locality, and parallel programming models.
- Extended the Polaris parallelizing compiler with code generation
passes for TreadMarks
Messages. Maintained the Polaris development environment.
- Developed MATmarks, an
environment for parallel programming with MATLAB based on
TreadMarks. Obtained linear speedups for several applications on a
network of workstations.
- Worked on data locality and false sharing characterization of
applications running on multiprocessor programs. Metrics derived from
this study are to be used for both driving compiler optimizations for
locality and program performance prediction for new architectures.
- Developed a compile-time model for cache memory hierarchies that
is able to predict program memory behavior within 5% of the actual
||May 1995 - July 1996
|CyberMarche, Inc., Morgantown WV
Responsible for the design, implementation, and testing of systems
targeted towards knowledge gathering and sharing for project management.
Additional responsibilities included system administrator for
a computer network consisting of Sun and IBM-PC computers.
- Enterprise Engineering Knowledge Base (EEKB), a project
targeted towards knowledge gathering and sharing in engineering
- Project Assessment and Coordination for Teams (PACT), a Project
Management System (PMS) to be used by Hitachi, Ltd.
||August 1993 - May 1995
Engineering Research Center-WVU, Morgantown, WV
Conducted research in communication scheduling algorithms and
- Designed and implemented a Software Project Model which
provides a customizable view of a project for each team involved in
software development and verification. The work was part of the
Independent Verification and Validation of Software (IV&V) Project, a
NASA sponsored effort to develop a collaborative environment for teams
of engineers involved in software production.
- Developed an object oriented gateway to access Oracle services
from CORBA compliant clients as part of the Information Sharing System
(ISS) Project, a CORBA based system for transparent access to
information stored in heterogenous repositories. Also as part of this
project, implemented a Scheme interface to Orbix's Dynamic Invocation
Interface. The work involved programming in C++, Scheme, Oracle,
Orbix, and Web technologies on Unix platforms.
- The research for the Master's Thesis involved design and
implementation of several communication scheduling algorithms.
||June 1991 - August 1993
|IPA (Institute for Design in
Automation), Cluj-Napoca, Romania
Designed and developed production software.
- Designed and developed an object-oriented Digital
Components Simulator to be used in a Digital Boards Testing.
System sold in France.
- Devised and implemented a Cartographic System for Survey
Five most relevant recent publications:
- Deoptimization for
dynamic language JITs on typed, stack-based virtual
Madhukar N. Kedlaya, Behnam Robatmili, Calin
Cascaval, Ben Hardekopf
Virtual Execution Environments (VEE 2014), Mar 2014. Best Paper Award.
- Zoomm: A Parallel Web Browser Engine for Multicore Mobile
Calin Cascaval, Seth Fowler, Pablo Montesinos, Wayne
Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and
Proceedings of The 18th ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming (PPoPP 2013),
Shenzhen, China, Feb 2013.
How Much Parallelism is There in Irregular Applications?
Milind Kulkarni, Martin Burtscher, R. Inkulu, Keshav Pingali, Calin
Proceedings of the 14th ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, PPoPP'09,
Raleigh, NC, February 2009
Disambiguation of Speculative Threads in Multiprocessors
Luis Ceze, James Tuck, Calin Cascaval, and Josep Torrellas
Proceedings of the 33rd Annual International
Symposium on Computer Architecture, ISCA 2006, Boston, MA, June
Guest Editor for IEEE Micro Special Issue on Mobile Systems.
Mentored 3 PhD students and several colleagues in IBM.
Steering Committee chair for PPoPP (2012-present)
General Chair PPoPP 2011, Computing Frontiers 2011, LCPC 2013, CPC 2009.
Program Committee Vice-Chair ICPP 2011.
Program Committee Member for ASPLOS, CGO, ICPP, ICS, IPDPS, LCPC,
MICRO, PACT, PGAS, PLDI, PPoPP, SBAC-PAD, SC, and many workshops.
Organizing Committee Member for PPoPP, PACT, LCPC, and SBAC-PAD.
Panelist for NSF.
Reviewer for numerous journals and technical conferences.