Dr. Călin Caşcaval is Director of Engineering at Google
Research, leading research in scalable distributed systems.
He
is an experienced technical executive with strategic vision and proven
team building, research and product delivery of computer systems. He
identified industry trends, defined, built, and delivered first of a
kind prototypes and products, including: the first P4 production
compiler and networking stack (at Barefoot Networks), the first mobile
heterogeneous computing runtime and parallel browser (at Qualcomm
Research), system software for the Blue Gene family of supercomputers
and the first UPC compiler to scale to hundreds of thousands of
processors, (at IBM Research). Călin has over 50 peer-reviewed
publications, 38 awarded patents. He is an IEEE Fellow and an ACM
Senior Member.
Director of Engineering,
Google Research
| February 2020 - Present |
Google LLC.,
Auckland, New Zealand |
Leading research in scalable
distributed systems. |
|
Sr. Director, Compilers
and Tools
| September 2016 - February 2020 |
Barefoot Networks, an Intel Company,
Palo Alto, CA. |
Lead the development and
productization of language, compilers and tools for programmable
network devices. |
- Led the development and productization of the first domain
specific networking stack using the P4
language on the Tofino family
of programmable packet processing ASICs. Managed the compilers and
tools teams.
- Lead the development of the P4
Language: worked with the p4.org community to evolve the P4
language to support customer-specific requirements; acted as
co-chair of the P4 Architecture Working Group, responsible for defining the Portable Switch Architecture.
- Defined the strategy around P4 tools and designed, developed and
productized the P4 Insight visualization tool.
- Worked with internal Barefoot Networks teams to define the architecture of several generations of Tofino.
|
|
Sr. Director of Engineering
(2013-2016), Director (2009-2013)
| October 2009 - September 2016 |
Qualcomm
Research Silicon Valley, Santa Clara, CA. |
Led Qualcomm's Power Aware Computing
strategy. |
- Led projects on heterogeneous mobile computing, including the
Snapdragon
Heterogeneous Compute SDK, and parallel libraries, e.g., best in
class ARM math libraries
(Snapdragon
Math Libraries)
- Led the development of the first
end-to-end parallel
browser and parallel
JavaScript engine for mobile devices.
- Responsible for management, mentoring, and hiring.
- Led collaborations with academia (UC Berkeley, UIUC, UT Austin,
University of Washington, Georgia Tech).
|
|
Manager, Programming Models and
Tools for Scalable Systems Group
| September 2004 - October 2009 |
IBM
T.J. Watson Research Center, Yorktown Heights,
NY. |
Job responsibilities include leading
research projects with globally distributed teams. Developed and
contributed to the IBM Research strategy in compilers and systems
software. |
Led the PERCS
(DARPA HPCS) Compilers team. As part of this effort we explored
and developed a number of technologies to improve programmer
productivity:
- Continuous Program Optmization (CPO): combine static and dynamic
compilation and allow statically compiled languages to be executed using
a managed runtime, in order to continuously monitor and adapt to changes
in the execution environment. We developed the CPO vision and
implemented several optimization prototypes that take as input system
wide monitoring and optimize an application across
runs. Papers published
in PACT
2005, IBM System
Journal, and PAC2 2004.
- The xlUPC
compiler: lead the design and development of a UPC compiler for
IBM platforms, including IBM pSeries and Blue Gene. The xlUPC
compiler is the first compiler and runtime system that scales PGAS
programs to hundreds of thousands of threads. The compiler was
used also in the HPC
Challenge Productivity Competition, and won the award every
year we participated (2005, 2006, 2008, 2009)
- Initiated and supervised the development of math libraries for
high performance linear algebra.
- Drove the design and development of tracing libraries for the
monitoring of the full execution stack. Contributions to
the design on the performance counter design in the IBM pSeries
processors, OS tracing infrastructure and application monitoring
Led exploratory projects in parallel programming models and
parallel languages to improve programmer productivity:
- Asynchronous execution to improve parallel execution: load
balancing through work stealing, performance portability,
acceleration support
- Transactional Memory and Thread Level Speculation -- using
speculative execution to improve programmability and efficiency
Mentoring IBM employees, PhD students and student interns.
Extensive collaborations with academia.
|
|
Research Staff Member
| July 2000 - September 2004 |
IBM
T.J. Watson Research Center, Yorktown Heights,
NY. |
Participated in the design and development of system software and
performance analysis tools for massively parallel systems: Blue Gene
and PERCS. |
- PERCS (DARPA HPCS): Participated in the initial phases of the project,
initially as a member of the compiler team, and starting in 2004, leading
it. Established the research agenda and demonstrated the initial
feasability for the Continuous Program Optimization, UPC compiler, Linear
Algebra Compiler, etc.
- Blue Gene.
Designed and developed a highly multithreaded simulator for the Blue
Gene/C chip. Participated in the design and developed an initial
prototype for the job launching system on Blue Gene/L. Designed and
developed an initial prototype of the xl UPC compiler for the Blue
Gene/L architecture. Designed and evaluated Thread Level Speculation
support for Blue Gene/Q. Application evaluation across multiple Blue
Gene families.
- Performance monitoring tools. Designed and developed internal tools
for performance analysis using the hardware performance monitoring counters
and provided feedback to the Power architecture performance team. Developed
a methodology for performance monitoring and adaptation based on
histories
(see PACT 2003
paper).
|
|
Graduate Research Assistant
| August 1996 - June 2000 |
Computer
Science Department, Univ. of Illinois at
Urbana-Champaign, Urbana, IL. |
Conducted research in
the Polaris
and Delphi projects, working on compile-time performance
prediction, data locality, and parallel programming models.
|
- Extended the Polaris parallelizing compiler with code generation
passes for TreadMarks
and Fast
Messages. Maintained the Polaris development environment.
- Developed MATmarks, an
environment for parallel programming with MATLAB based on
TreadMarks. Obtained linear speedups for several applications on a
network of workstations.
- Worked on data locality and false sharing characterization of
applications running on multiprocessor programs. Metrics derived from
this study are to be used for both driving compiler optimizations for
locality and program performance prediction for new architectures.
- Developed a compile-time model for cache memory hierarchies that
is able to predict program memory behavior within 5% of the actual
cache behavior.
|
|
Research Associate
| May 1995 - July 1996 |
CyberMarche, Inc., Morgantown WV |
Responsible for the design, implementation, and testing of systems
targeted towards knowledge gathering and sharing for project management.
|
- Enterprise Engineering Knowledge Base (EEKB), a project
targeted towards knowledge gathering and sharing in engineering
environments.
- Project Assessment and Coordination for Teams (PACT), a Project
Management System (PMS) to be used by Hitachi, Ltd.
Additional responsibilities included system administrator for
a computer network consisting of Sun and IBM-PC computers.
|
|
Graduate Research
Assistant
| August 1993 - May 1995 |
Concurrent
Engineering Research Center-WVU, Morgantown, WV |
Conducted research in communication scheduling algorithms and
software verification. |
- Designed and implemented a Software Project Model which
provides a customizable view of a project for each team involved in
software development and verification. The work was part of the
Independent Verification and Validation of Software (IV&V) Project, a
NASA sponsored effort to develop a collaborative environment for teams
of engineers involved in software production.
- Developed an object oriented gateway to access Oracle services
from CORBA compliant clients as part of the Information Sharing System
(ISS) Project, a CORBA based system for transparent access to
information stored in heterogenous repositories. Also as part of this
project, implemented a Scheme interface to Orbix's Dynamic Invocation
Interface. The work involved programming in C++, Scheme, Oracle,
Orbix, and Web technologies on Unix platforms.
- The research for the Master's Thesis involved design and
implementation of several communication scheduling algorithms.
|
|
Research Associate
| June 1991 - August 1993 |
IPA (Institute for Design in
Automation), Cluj-Napoca, Romania |
Designed and developed production software. |
- Designed and developed an object-oriented Digital
Components Simulator to be used in a Digital Boards Testing.
System sold in France.
- Devised and implemented a Cartographic System for Survey
Measurements.
|
|
Guest Editor for IEEE Micro Special Issue on Mobile Systems.
Mentored 3 PhD students and several colleagues in IBM.
Program Chair, PACT 2023
Steering Committee chair, PPoPP (2012-2018)
General Chair PPoPP 2011, Computing Frontiers 2011, LCPC 2013, CPC 2009.
Program Committee Vice-Chair ICPP 2011.
Program Committee Member for ASPLOS, CGO, ICPP, ICS, IPDPS, LCPC,
MICRO, PACT, PGAS, PLDI, PPoPP, SBAC-PAD, SC, and many workshops.
Organizing Committee Member for PPoPP, PACT, LCPC, and SBAC-PAD.
Panelist for NSF.
Reviewer for numerous journals and technical conferences.