# Computational Science

### General Papers

• This article is part of the Encyclopedia of Applied and Computational Mathematics. It is an overview of organizational strategies employed in scientific code, and how particular programming language constructs are used to implement them. I give examples from C, C++, Fortran, and Python, and libraries that use all of them.

## Automated Finite Element Discretization and Solution of Nonlinear FEM Systems

### Linear and Nonlinear preconditioning

 Barry Smith, Peter Brune, Xuemin Tu, Jed Brown, and I are working on approaches to nonlinear preconditioning. The most important aspect here is composability. This is exactly what makes KSP/PC so useful. Complex, optimal solvers can be built from simple pieces. We would like to carry over this design to the nonlinear problem. Our ideas for linear multiphysics preconditioning are contained in this paper and some follow-on work by Peter and Jed about very efficient Jacobian approximation and lagging. We now have a PETSc implementation of nonlinear preconditioning, e.g. SNESSetNPC() sets a nonlinear preconditioner, which support both left and right preconditioning, and also additive and multiplicative combinations of nonlinear solvers. Our paper detailing the preconditioners, has just been submitted. You can also see an example of me using this in a recent talk. This paper describes the PETSc approach to linear preconditioning for multiphysics systems. We show that our existing preconditioners can be composed to generate all previously published block preconditioners, and also to invert this relationship so that block preconditioners can be used on each level of multigrid or FAS. This talk decmonstrates how to construct these preconditioners for the Stokes equation, and here I construct an unstructured FAS cycle for the p-Laplacian using only command line options. We have the ability to attach auxiliary operators to the DM so that they can be used for preconditioning blocks of the system. Building on the FEM pieces described below, we will soon be able to form analytic approximations to operator blocks. For several years now, I have developed the BIBEE approximation for boundary integral operators with Jay Bardhan. This strategy approximates the surface-to-surface operator directly, and thus can provide good bounds for both the large and small eignevalues. This is detailed in the section on biological applications below.

### Discretization

 I have recently introduced PETSc abstractions for quadrature, function spaces, dual spaces, and finite elements. Some good examples of this in PETSc are the Poisson equation with nonlinear coefficient and the Stokes equation on both simplicial and hexhedral meshes. Jed and I have also produced a 2nd order TVD finite volume example that can solve advection, shallow water, and the Euler equations with a variety of limiters and least-squares reconstruction. The discretization interface plays well with the DM interface for topology representation. This lets the user specify boundary conditions on any marked part of the boundary, automatically refine the mesh in parallel, create interpolation operators for the mesh hierarchy, and extract submanifolds from computation or visualization. In fact, SNES ex12 showcases an automatically assembled FAS (nonlinear multigrid) scheme for the p-Laplacian. The user need only specify the equations and boundary conditions as pointwise functions, and PETSc can assemble the rest. This is very much like the FEniCS approach. For example, the functions for Stokes: void f1u(PetscScalar u[], PetscScalar gradU[], PetscScalar a[], PetscScalar gradA[], PetscReal x[], PetscScalar f1[]) { for (comp = 0; comp < Ncomp; ++comp) { for (d = 0; d < dim; ++d) f1[comp*dim+d] = gradU[comp*dim+d]; /* (grad v, grad u) */ f1[comp*dim+comp] -= u[Ncomp]; /* (grad v, -p */ } } void f1p(PetscScalar u[], PetscScalar gradU[], PetscScalar a[], PetscScalar gradA[], PetscReal x[], PetscScalar f0[]) { f0[0] = 0.0; for (d = 0; d < dim; ++d) f0[0] += gradU[d*dim+d]; } And the boundary condition can be specified using: void quadratic_u_2d(const PetscReal x[], PetscScalar *u, void *ctx) { u[0] = x[0]*x[0] + x[1]*x[1]; u[1] = 2.0*x[0]*x[0] - 2.0*x[0]*x[1]; }  I collaborate with Ridgway Scott, Andy Terrel, Peter Brune, and Robert Kirby on automation of finite element discretization, as well as better integration of mesh abstraction with discretization and solver routines. I am also involved with the FEniCS project at Simula Research, working in particular with Hans Petter Langtangen and Anders Logg. The overarching theme for this work is that automation in scientific computing, exemplified by compilers, should be raised to a higher level of abstraction. By confining ourselves to a well understood domain, in this case finite element methods, we can make the symbolic problem tractable. A prime example of this program is the FEniCS Form Compiler (FFC) which I use to generate GPU integration code in this paper.

### PETSc

 I have been a member of the PETSc Team since 1997. PETSc is an open-source software package designed for the solution of partial differential equations and used in diverse applications from surgery to environmental remediation. PETSc is the most complete, most flexible, most powerful, and easiest-to-use platform for solving the nonlinear algebraic equations arising from the discretization of partial differential equations. PETSc is designed to allow engineers and scientists to perform large-scale numerical simulations of physical phenomena rapidly and efficiently. Representative simulations to date include fluid flow for aircraft, ship, and automobile design; blood flow simulation for medical device design; porous media flow for oil reservoir simulation for energy development and groundwater contamination modeling; materials properties; economic modeling; structural mechanics for construction design; combustion modeling; and nuclear fission and fusion for energy development. These simulations allow the effects of design decisions to be evaluated and compared, including cost benefits, safety concerns, and environmental impact. PETSc has enabled simulations by industry, universities, and research laboratories, from desktop systems to the most powerful supercomputers, and for critical applications affecting the nation's energy, environment, and health. A testament to the widespread recognition of this package is the fact that PETSc has had over 400 downloads per month during the past three years. A Google search for "PETSc" on January 6, 2009, showed 199,000 pages referencing PETSc. In addition, a search of Google books found 662 books that reference PETSc. PETSc won the R & D 100 Award from in 2009. PETSc solvers were cited as one of the Top Ten Computational Science Accomplishments of the U.S. Department of Energy (DOE) in 2008. Two HPC hardware companies, Cray and SiCortex, provide and support PETSc with their products. Several commercial simulation packages, including FIDAP 8.5, TigerCAD, and RF3P, use PETSc for their algebraic solvers. Also, products from the companies Tech-X and Actel use PETSc. Microsoft has distributed previous versions of PETSc with its HPC software suite, the Microsoft Cluster CD. PETSc has been used by Boeing for computational fluid dynamics simulations and Shell for solving inverse problems for oil reservoir management. CFDResearch in Huntsville, Alabama, uses PETSc in its fluid simulations. PETSc solvers have also been used at the U.S. Army Engineering Research and Development Center and by the South Florida Water Management District modeling the Everglades.

### Papers

• This paper explains our framework for combining nonlinear solvers to generate scalable multilevel solvers for complex multiphysics problems. We develop both left and right nonlinear preconditioning, and discuss how previous work fits into our scheme. We have a composable implementation in PETSc so that all these new solvers can be built dynamically from the command line, just like PETSc linear preconditioners. We show examples of convergence acceleration using lid-driven cavity flow, large deformation elasticity, and the p-Laplacian.
• This paper describes the PETSc approach to multiphysics preconditioning. Using the PCFIELDSPLIT class, we are able to produce all the well-known block, or "physics-based", preconditioners. Furthermore, our design allows these block preconditioners to commute with multigrid so that we can provide segregated-KKT smoothers using the same infrastructure. In fact, the blocks given to PC FieldSplit can be arbitrary, so that this can also incorporate domain decomposition.
• In this paper, we introduce a novel splitting of the geometric and analytic parts of an element matrix. The geometric part changes for each cell, but is independent of the particular weak form. The analytic part depends on the weak form, but is independent of the mesh and can be calculated once and stored. We also address the question of optimizing the evaluation of these matrices. By finding redundant computations, we are able to significantly reduce the cost of building local stiffness matrices for the Laplace operator and for the trilinear form for Navier--Stokes operators. For the Laplace operator in two space dimensions, we have developed a heuristic graph algorithm that searches for such redundancies and generates code for computing the local stiffness matrices. Up to cubics, we are able to build the stiffness matrix on any triangle in less than one multiply-add pair per entry. Up to sixth degree, we can do it in less than about two pairs. Preliminary low-degree results for Poisson and Navier-Stokes operators in three dimensions are also promising.
• In this paper, we describe the implementation of general FEM methods using the Sieve API. The intention is to write a single engine capable of supporting simplicial and hexhedral meshes, in any dimension, with an arbitrary element and weak form, coupled to scalable PETSc solvers. We believe this is possible using Sieve, FEniCS and PETSc.
• We have written a paper providing a provably O(N) multigrid method based upon the coarsening developed by Dafna Talmor at CMU, with her advisor Gary Miller and Shang-Hua Teng. The 3D implementation cannot yet be shown to be O(N), but our results show the optimal behavior.
• In this paper, we describe the Sieve API for unstructured meshes, both representation and manipulation. It is very concise and can handle any cell complex, or anything that can be described with a Hasse diagram. As an example, we describe distributing meshes over a set of parallel processes, and also associated data, to demonstrate the power of the interface. We give examples of both hexahedral and simplicial meshes in 2D and 3D.
• In this paper, We present some of the mathematical abstractions employed by automated modeling that allow a user to switch between finite elements, linear solvers, mesh refinement and geometry, and weak forms with very few modifications to the code. To evaluate the modularity provided by one of these abstractions, namely switching finite elements, we provide a numerical study based upon the many different discretizations of the Stokes equations. We also analyze the quality of the approximation with several measures.
• In Exponential grids in high-dimensional space, Peter Brune, Ridgway Scott, and I show how to use prismatic elements in high dimensions to calculate some simple problems in eletronic structure explicitly.

## GPU Computing

 The combination of low power consumption, large memory bandwidth, high flop rate, and low cost make GPUs a promising platform for desktop scientific computing. The large latency is problematic for strong scaling, but there are many candidates for this type of throughput device. especially since interconect latency looks set to drop rapidly. Moreover, it is likely that a large fraction of future supercomputers will also employ accelerators (as 3 of the top ten machines in the world already do). In the last 4 decades of scientific computing, the most successful strategy for accomodating new hardware is the production of high quality libraries, such as MPI and PETSc. We propose to develop not only parallel algebraic solvers, but discretization and multiphysics libraries, suitable for hybrid machines with significant accelerator components. Our strategy is to use a combination of low-level library routines and code generation techniques to produce implementations for our higher level API. We use the Thrust and Cusp libraries from NVIDIA to provide linear algebra, shown here, and more basic operations such as sorting and keyed reductions used in this paper. For operations that a more variable, such as those that depend on the weak form, we use code generation. The key point is to circumscribe the domain so that the generation process becomes tractable. In generating FEM integration routines, we produce not only different block sizes, but different organization of memory writes and computation. With Andy Terrel and Karl Rupp, we are exploring code transformations which will enable us to automatically optimize more general calculations. In particular, we are reorganizing traversal, and using code transformation tools like Ignition, to explore quadrature-based methods for FEM residual and Jacobian evaluation. In our upcoming paper, we show that excellent vectorization is possible with CUDA and OpenCL, even for forms with variable coefficient.

### Papers

 We have just completed a paper with Karl Rupp and Andy Terrel using a pattern we call thread transposition which allows us to evaluate integrals efficiently using quadrature without reductions among threads, and simultaneously vectorizing over quadrature points and basis functions. We apply this to the problem of FEM residual and Jacobian evaluation. We are working on a paper with Andy Terrel and Nathan Bell on assembly of sparse matrices from element matrices on the GPU. We have published a paper with Andy Terrel to TOMS on integration of weak forms in affine mesh geometries using exact integration, rather than quadrature. We use the geometric-analytic splitting first developed in this paper. We focus on low-order discretizations which currently lack efficient integration strategies. We show results for the P1 Laplacian in 3D and Elasticity operator in 2D, and achieve more than 100GF. A description of the GPU implementation in PETSc was written with Victor Minden and Barry Smith. PETSc is organized as a class library with classes for vectors, matrices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix subclass that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel using these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA. We developed an efficient method to calculate a high-dimensional (6d) contraction with a lot of memory reuse. This kind of oepration arose in the computation of a nonlinear averaging operation used in Classical DFT for biological systems. In particular, this calculates the reference density in the RFD method of D. Gillespie. Larry Zheng developed a structured MG code which solves the variable viscosity Stokes equations using the discretization method T. Gerya. We have published a paper using FMM to solve a boundary element problem arising in bioelectrostatics using 512 GPUs of the DEGIMA cluster at Nagasaki. The BIBEE approximation is used to accelerate the BEM solve. The applications demonstrated include the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied using 128 nodes, each with 4 GPUs. Delicate tuning has resulted in a strong scaling with parallel efficiency of 0.5 for 512 GPUs. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 GPUs.

## The Fast Multipole Method and Radial Basis Function Interpolation

 I collaborate with Felipe Cruz on the PetFMM code for simulation using the fast multipole method. The main idea is that a very simple serial implementation of FMM can be combined with optimization techniques from graph partitioning to produce an efficient and scalable parallel FMM. PetFMM completely reuses the serial code, and also uses templating to optimize the implementation. It templates over the structure used to describe source and targets, the neighborhood stencil, the expansion order, and the dimension. We are able to guide the optimization routine with good theoretical estimates of the communication and computation complexity. In fact, this paper will show that optimal communication complexity is achievable with PetFMM. I also work with Rio Yokota and Andreas Klöckner on a version of FMM which will autogenerate expansions given the kernel function. We believe this is the step that most severely limits the wide adoption of the method. The implementation uses sympy for symbolic manipulation and code generation, and PyCUDA for kernel execution and organization. I also collaborate with Rio Yokota and Lorena Barba on the PetRBF code for parallel, radial basis function interpolation. PetRBF uses a scalable preconditioning strategy which uses small ASM blocks tailored to the function overlap. This code can interface with FMM libraries, or other fast evaluation schemes, to support large scale simulation.

### Papers

• We have a publication detailing the main algorithms and code structure of PetFMM. It uses a novel application of graph paritioning to parallelize the tree traversal which permits complete reuse of the serial code for the parallel implementation. Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this latter class belongs the well-known fast multipole method (FMM), which offers $\mathcal{O}(N)$ complexity. The FMM is a complex algorithm, and the programming difficulty associated with it has arguably diminished its impact, being a barrier for adoption. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based $N$-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by $N$ vortex particles in two dimensions, ample verification and testing of the library was performed. Strong scaling results are presented with 10 million particles on up to 256 processors, including both speedup and parallel efficiency. The largest problem size that has been run with the PetFMM library at this point was 64 million particles in 64 processors. The library is currently able to achieve over 85% parallel efficiency for 64 processes. The performance study, computational model, and application demonstrations presented in this paper are limited to 2D. However, the current code is capable of large, parallel 3D runs (see ). The software library is open source under the PETSc license, even less restrictive than the BSD license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications.
• We have written a paper with Dave May on calculating the gravity anomaly of a configuration using PetFMM. We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM). The methods utilizing the Poisson formulation described here differ from previously published approaches used in gravity modeling in that they are optimal, implying that both the memory and computational time required scale linearly with respect to the number of unknowns in the potential field. Additionally, all of the implementations presented here are developed such that the computations can be performed in a massively parallel, distributed memory computing environment. Through numerical experiments, we compare the methods on the basis of their discretization error, CPU time and parallel scalability. We demonstrate the parallel scalability of all these techniques by running forward models with up to $10^8$ voxels on 1000's of cores.
• We have written a paper detailing the preconditioning method used to solve the RBF interpolation problem. We developed a parallel algorithm for radial basis function (RBF) interpolation that exhibits $\mathcal{O}(N)$ complexity, requires $\mathcal{O}(N)$ storage, and scales excellently up to a thousand processes. The algorithm uses a GMRES iterative solver with a restricted additive Schwarz method (RASM) as a preconditioner and a fast matrix-vector algorithm. Previous fast RBF methods --- achieving at most $\mathcal{O}(N\log N)$ complexity --- were developed using multiquadric and polyharmonic basis functions. In contrast, the present method uses Gaussians with a small variance with respect to the domain, but with sufficient overlap. This is a common choice in particle methods for fluid simulation, our main target application. The fast decay of the Gaussian basis function allows rapid convergence of the iterative solver even when the subdomains in the RASM are very small. At the same time we show that the accuracy of the interpolation can achieve machine precision. The present method was implemented in parallel using the PETSc library. Numerical experiments demonstrate its capability in problems of RBF interpolation with more than 50 million data points, timing at 106 seconds (19 iterations for an error tolerance of $10^{-15}$) on 1024 processors of a Blue Gene/L (700 MHz PowerPC processors). The parallel code is freely available in the open-source model.
• We are writing a paper with Dave May about the scalability of PetFMM.
• We are writing a paper about the optimality of partitioning in PetFMM.

# Applications

## Crustal Dynamics

 I am co-author of the PyLith code with Brad Aagaard and Charles Williams. We have developed open-source finite-element software for 2-D and 3-D dynamic and quasi-static modeling of crustal deformation. This software, PyLith (current release is 2.0) can be used for quasi-static viscoelastic-plastic modeling or dynamic spontaneous rupture and/or ground-motion modeling. Unstructured and structured finite-element discretizations provide support for spatial scales ranging from tens of meters to hundreds of kilometers with temporal scales in dynamic problems ranging from milliseconds to minutes and temporal scales in quasi-static problems ranging from minutes to thousands of years. PyLith development is part of the NSF funded Computational Infrastructure for Geodynamics (CIG) and the software runs on a wide variety of platforms (laptops, workstations, clusters, massively parallel machines). Binaries (Linux, Darwin, and Windows systems) and source code are available from geodynamics.org. PyLith uses a suite of general, parallel, graph data structures from PETSc called DMPlex for storing and manipulating finite-element meshes. This permits use of a variety of 2-D and 3-D cell types including triangles, quadrilaterals, hexahedra, and tetrahedra. It also allows faithful representation of faults, automatic parallel refinement, and selection of arbitrary surfaces and volumes for output. Current features include prescribed fault ruptures with multiple earthquakes and aseismic creep, spontaneous fault ruptures with a variety of fault constitutive models, time-dependent Dirichlet and Neumann boundary conditions, absorbing boundary conditions, time-dependent point forces, and gravitational body forces. PyLith supports infinitesimal and small strain formulations for linear elastic rheologies, linear and generalized Maxwell viscoelastic rheologies, power-law viscoelastic rheologies, and Drucker-Prager elastoplastic rheologies. Current development focuses on coupling quasi-static and dynamic simulations to resolve multi-scale deformation across the entire seismic cycle and coupling elasticity to heat and/or fluid flow.

### Papers

• Our upcoming paper will detail the use of quadratic elements, an improved fault formulation, and general finite element residual formulation.
• The current PyLith paper concerns the fault friction implementation. Each aspect of the model, algorithms, and code is examined to show design decisions that were made to support the fault model. We show good algorithmic scaling, although not constant yet, and good scalability outside of the ML AMG setup phase.

## Magma Dynamics

 I am working with Richard Katz and Marc Spiegelman on magma dynamics problems, including the mid-ocean ridge spreading shown on the left. Magmatic plate boundaries (mid ocean ridges, subduction zones) and intra-plate hot spots are among the most important and complex dynamic and chemical systems in the solid earth. While volumetrically small, these regions have global consequences for mantle flow and plate tectonics through the spontaneous localization of weak plate boundaries. Magmatism in these regions also provides the dominant mechanism for geochemical fractionation, mixing and sampling of the Earth's mantle. Many of the observational constraints on both plate-boundary processes and global mantle dynamics come from regional studies of lavas from mid-ocean ridges, subduction zones and oceanic hot spots and form focus areas for many initiatives such as GeoPRISMS, EarthScope and core EAR and MGG Geophysics programs. Optimal preconditioners for many multiphysics problems, such as variable-viscosity Stokes and thermal convection, arise by combining existing optimal preconditioners for individual blocks of the Jacobian, associated with a given physics. For instance, a classic Stokes preconditioner applies multigrid to the momentum balance and scaling to the mass balance, combining them multiplicatively, using one block elimination step. The PETSc composable solvers allow a user to define different preconditioners for each equation block, and also different combination methods, inclduing Schur complements, between sets of equations, and it can all be specified dynamically on the command line without any code changes. Thus, the same code that runs full LU to verify a solution can also scale optimally on the largest machines available. Moreover, we can choose to split at various levels. Multigrid could be used for the overall Stokes system, and then splitting can be used for each smoother (so-called segregated KKT smoothers), inverting the classical scheme above. Finally, we can use split schemes to precondition the fully-coupled problem.

### Papers

• In this paper, we describe the implementation of a subduction simulation using PETSc, and describe the mapping of geodynamics problems to PETSc components. Geodynamics simulations are characterized by rheological nonlinearity, localization, three-dimensional effects, and separate but interacting length scales. These features represent a challenge for computational science. We discuss how a leading software framework for advanced scientific computing (the Portable Extensible Toolkit for Scientific Computation, PETSc) can facilitate the development of geodynamics simulations. To illustrate our use of PETSc, we describe simulations of (i) steady-state, non-Newtonian passive flow and thermal structure beneath a mid-ocean ridge, (ii) magmatic solitary waves in the mantle, and (iii) the formation of localized bands of high porosity in a two-phase medium being deformed under simple shear. We highlight two supplementary features of PETSc, structured storage of application parameters and self-documenting output, that are especially useful for geodynamics simulations.

## Electrostatics and Classical Density Functional Theory for Biological Systems

 Electrostatic interactions play an essential role in the structure and function of biomolecules (proteins, DNA, cell membranes, etc.). One of the most challenging aspects for understanding these interactions is the fact that biologically active molecules are almost always in solution---that is, they are surrounded by water molecules and dissolved ions. These solvent molecules add many thousands or even millions more degrees of freedom to any theoretical study, many of which are of only secondary importance for investigations of interest. Model the solvent using a discontinuous dielectric coefficient, and solve the Poisson equation directly using a boundary-integral formulation and the boundary-element method, BEM, to determine the induced charge distribution on the molecular surface which accounts for the change in polarization charge across the dielectric boundary. I am collaborating with Jay Bardhan to develop the BIBEE approximation algorithm for electrostatics problems with discontinuous dielectric coefficient in complex geometries. BIBEE approximates the operator arising from a boundary integral discretization of the Poisson equation, giving rigorous upper bounds and effective lower bounds for the solvation energy. I am collaborating with Dirk Gillespie and Robert Eisenberg at Rush University Medical Center on the modeling of biological ion channels using classical Density Functional Theory of fluids. Since its inception 30 years ago, classical density functional theory (DFT) of fluids has developed into a fast and accurate theoretical tool to understand the fundamental physics of inhomogeneous fluids. To determine the structure of a fluid, DFT minimizes a free energy functional of the fluid density by solving the Euler-Lagrange equations for the inhomogeneous density profiles of all the particle species When solving on a computer, the density can be discretized (called free minimization) or a parameterized function form such as Guassians can be assumed (called parameterized minimization). This approach has been used to model freezing, electrolytes, colloids, and charged polymers in confining geometries and at liquid-vapor interfaces. DFT is different from direct particle simulations where the trajectories of many particles are followed over long times to compute averaged quantities of interest (e.g., density profiles). DFT computes these ensemble-averaged quantities directly. However, developing an accurate DFT is difficult and not straightforward. In fact, new, more accurate DFTs are still being developed for such fundamental systems as hard-sphere fluids, electrolytes, and polymers. Our group has already applied one dimensional DFT to biological problems involving ion channel permeation, successfully matching and predicting experimental data. We have extended this to 3D using efficient numerical schemes for functional evaluation and nonlinear system solution.

### Software

1. Ellipsoidal Potential Theory contains both a MATLAB and Python implementation of ellipsoidal harmonic expansions for solving problems of potential theory using separation of variables.
2. The DFT development repository

## Nonlinear Wave Mechanics

 I am collaborating with David Ketcheson, Aron Ahmadia and Kyle Mandli on nonlinear wave mechanics and the development of the PyClaw framework, of which PetClaw is the parallel extension. The PyClaw design is based upon a loosely-coupled model for simulation software, in which computational operations are wrapped in a higher level language, here Python, and spliced together. The glue in Python is numpy, a lightweight library which allows no-copy access across languages to multidimensionl arrays. With this, ClawPack could be joined to PETSc efficiently using only Python code. It took 100 lines of Python to parallelize PyClaw, subclassing only three classes.

### Papers

• We have written paper describing the architecture of PyClaw, which is based on PETSc and Clawpack. The simplicity, capability, and performance of this approach are demonstrated through application to example problems in shallow water flow, compressible flow and elasticity. We show excellent scaling up to 65,536 cores on KAUST's Shaheen BG/P.
• PetClaw: A scalable parallel nonlinear wave propagation solver for Python, Amal Alghamdi, Aron Ahmadia, David I. Ketcheson, Matthew G. Knepley, Kyle T. Mandli, and Lisandro Dalcin, ACM/SIGSIM HPC 2011 (SpringSim '11), April, 2011.

## Fracture Mechanics

 I am collaborating with Blaise Bourdin to produce a scalable code to simulate fracture mechanics. In the area of brittle fracture, there has seen tremendous progress over the recent years, but some issues remain problematic. The most widely accepted theories, based on Griffith's criterion, are limited to the propagation of an isolated, pre-existing crack along a given path. Extending Griffith's theory into a global minimization principle, while preserving its essence, the concept of energy restitution in between surface and bulk terms, G. Francfort and J.-J. Marigo proposed a new formulation for the brittle fracture problem, capable of predicting the creation of new cracks, their path, and their interactions, in two and three space dimensions. The essence of this model is the global minimization over all crack sets $K$ and all admissible displacement fields $u$ of an energy similar to $$\mathcal{E}= \int_{\Omega \setminus K} W(\mathrm{e}(u))\,dx + \alpha \mathcal{H}^{N-1}(K\cap \bar{\Omega})$$ where $W$ denotes an elastic potential and $\mathcal{H}^{N-1}$ denotes the $N-1$-dimensional Hausdorff measure. The numerical minimization of such a functional is of course challenging. The numerical experiments on this page are based on the concept of variational approximations of functionals by $Gamma$-convergence, and are described in more detail here. Below, I have embedded some videos of our largest runs to date. This simulates thermal fracture arising from Newtonian cooling. It is fully three dimensional, use 24M elements, and runs on 2400 processors of LoneStar at TACC. The full simulation was completed in just 6 hours.

# The Wonderful People Who Pay My Salary

• NSF Grant OCI-1147680: SI2-SSE: SPIKE-An Implementation of a Recursive Divide-and-Conquer Parallel Strategy for Solving Large Systems of Linear Equations
• U.S. DOE Contract DE-AC02-06CH11357: Extending PETSc's Composable Hierarchically Nested Linear Solvers
• NSF Grant EAR-0949446: The Computational Infrastructure for Geodynamics
• U.S. DOE Contract DE-AC01-06CH11357: Nonlinear Algorithms to Circumvent the Memory Bandwidth Limitations of Implicit PDE Simulations
• U.S. Army Research Office Contract W911NF-09-1-0488: Dielectric Interfaces in Classical Density Functional Theory
• NSF Grant OCI-0850680: Mechanical Transformation of Knowledge to Libraries

Matt at

# Papers

Community: My Mendeley Profile and ORCID site.

## Peer-Reviewed Conference Papers

• Preliminary Implementation of PETSc Using GPUs, with Victor Minden and Barry Smith, GPU Solutions to Multi-scale Problems in Science and Engineering, Lecture Notes in Earth System Sciences, pp. 131-140, 2013. [Springer] [ANL Preprint]
• Accurate Evaluation of Local Averages on GPGPUs, with Dmitry A. Karpeev and Peter R. Brune, GPU Solutions to Multi-scale Problems in Science and Engineering, Lecture Notes in Earth System Sciences, pp. 487-501, 2013. [Springer]
• GPU Implementation of Multigrid Solver for Stokes equation with strongly variable viscosity, with Larry Zheng, Taras V. Gerya, David A.Yuen, Huai Zhang, and Yaolin Shi, GPU Solutions to Multi-scale Problems in Science and Engineering, Lecture Notes in Earth System Sciences, pp. 321-333, 2013. [Springer] [MSI]
• Why scientists and engineers need gpus today, with David A. Yuen, GPU Solutions to Multi-scale Problems in Science and Engineering, Lecture Notes in Earth System Sciences, pp. 3-11, 2013. [Springer]
• Composable linear solvers for multiphysics, with Jed Brown, Dave A. May, Lois C. McInnes, Barry F. Smith, 11th International Symposium on Parallel and Distributed Computing (ISPDC 2012), 2012. Also Preprint ANL/MCS-P2017-0112, Argonne National Laboratory. [ANL]
• Amal Alghamdi, Aron Ahmadia, David I. Ketcheson, Matthew G. Knepley, Kyle T. Mandli, and Lisandro Dalcin. PetClaw: A scalable parallel nonlinear wave propagation solver for Python. In Proceedings of SpringSim 2011. ACM, 2011.
• Blaise Bourdin, Matthew G. Knepley, and C. Maurini. Numerical simulation of reservoir stimulation - a variational approach. In Proceedings of the 37th Stanford Geothermal Workshop, Stanford, CA, 2010. [SGW]
• Blaise Bourdin, Matthew G. Knepley, and C. Maurini. Secondary thermal cracks in EGS: a variational approach. In Proceedings of the 34th Annual Meeting of the Geothermal Resources Council, Sacramento, CA, 2010. [LSU].
• Felipe A. Cruz, Lorena A. Barba, and Matthew G. Knepley. Fast multipole method for particle interactions: an open source parallel library component. In Tromeur-Dervout et. al., editor, Proceedings of ParCFD2008. Elsevier, 2008.
• Charles A. Williams, Carl Gable, Bradford H. Hager, Brendan Meade, Brad Aagaard, and Matthew G. Knepley. Modeling of multiple earthquake cycles in southern california using the SCEC community fault model. In Proceedings of Geosciences '08, Wellington, NZ, November 2008.
• Matthew G. Knepley, Vivek Sarin, and Ahmed H. Sameh. Multilevel preconditioning for parallel CFD. In International Conference On Preconditioning Techniques For Large Sparse Matrix Problems In Industrial Applications, Minneapolis, MN, June 1999.
• Matthew Knepley, Ahmed H. Sameh, and Vivek Sarin. Design of large scale parallel simulations. In Proceedings of Parallel CFD '99. Elsevier, 1999.
• Matthew Knepley and Vivek Sarin. Algorithm development for large scale computing. In Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing. SIAM, 1999. [Google Books].
• Matthew Knepley and Denis Vanderstraeten. Parallel building blocks for finite element simulations: Application to solid-liquid mixture flows. In Proceedings of Parallel CFD '97, pages 281-287. Elsevier, 1998.
• Matthew Knepley, Ahmed H. Sameh, and Vivek Sarin. Parallel simulation of particulate flows. In Solving Irregularly Structured Problems in Parallel, volume 1457 of Lecture Notes in Computer Science, pages 226-237, 1998.

## Book Chapters

• A.R. Terrel, R.C. Kirby, M.G. Knepley, L.R. Scott, and G.N. Wells, Finite elements for incompressible fluids, in A. Logg, K.A. Mardal, and G. N. Wells, editors. Automated solutions of differential equations by the finite element mehod. Springer-Verlag. Lecture Notes in Computational Science and Engineering, Vol 84, 2011.
• R.C. Kirby, M.G. Knepley, A. Logg, L.R. Scott, and A.R. Terrel, Discrete optimization of finite element matrix evaluation, in A. Logg, K.A. Mardal, and G. N. Wells, editors. Automated solutions of differential equations by the finite element mehod. Springer-Verlag. Lecture Notes in Computational Science and Engineering, Vol 84, 2011.
• Matthew G. Knepley, Richard F. Katz, and Barry Smith. Developing a geodynamics simulator with PETSc. In Are Magnus Bruaset and Aslak Tveito, editors, Numerical Solution of Partial Differential Equations on Parallel Computers, volume 51 of Lecture Notes in Computational Science and Engineering, pages 413-438. Springer Berlin Heidelberg, 2006. 10.1007/3-540-31619-1_12.

## Other Conference Papers and Technical Reports

• Matthew G. Knepley, Jed Brown, Karl Rupp, and Barry Smith, Achieving High Performance with Unified Residual Evaluation, 2013. [arXiv]
• Barry Smith, Lois Curfman McInnes, Emil Constantinescu, Mark Adams, Satish Balay, Jed Brown, Matthew G. Knepley, and Hong Zhang. PETSc's software strategy for the design space of composable extreme-scale solvers. Preprint ANL/MCS-P2059-0312, Argonne National Laboratory, 2012. DOE Exascale Research Conference, April 16-18, 2012, Portland, OR. [ANL]
• Peter Brune, Matthew Knepley, Barry Smith, and Xuemin Tu. Composing scalable nonlinear solvers. Preprint ANL/MCS-P2010-0112, Argonne National Laboratory, 2012. [ANL]
• Peter R. Brune, Matthew G. Knepley, and L. Ridgway Scott. Exponential grids in high-dimensional space. Technical Report TR-2011-07, University of Chicago, December 2011. [UC]
• David I. Ketcheson, Aron Ahmadia, and Matthew G. Knepley. Conference review: High performance computing and hybrid programming concepts for hyperbolic pde codes. SIAM News, 44(7), September 2011. [SIAM]
• Satish Balay, Jed Brown, Kris Buschelman, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Barry F. Smith, and Hong Zhang. PETSc users manual. Technical Report ANL-95/11 - Revision 3.3, Argonne National Laboratory, 2012. [ANL]
• Charles A. Williams, Brad Aagaard, and Matthew G. Knepley. Pylith: A finite-element code for modeling quasi-static and dynamic crustal deformation. In Eos Transactions of the AGU. American Geophysical Union, 2011. Fall Meeting Supplemental, Abstract DI14A-08.
• Robert C. Kirby, Matthew G. Knepley, and L. Ridgway Scott. Languages and compilers for variational forms. Technical Report TR-2010-09, University of Chicago, October 2010. [UC].
• Robert C. Kirby, Matthew G. Knepley, and L. Ridgway Scott. Evaluation of the action of finite element operators. Technical Report TR-2010-08, University of Chicago, October 2010. [UC].
• M. G. Knepley, D. A. Karpeev, R. S. Eisenberg, and D. Gillespie. Energetics of Calcium Selectivity: A Three-Dimensional Classical Density Functional Theory Approach. Biophysical Journal, 96:661, feb 2009.
• Dave A. May, Matthew G. Knepley, and Michael Gurnis. CitcomSX: Robust preconditioning in CitcomS via PETSc. In Eos Transactions of the AGU. American Geophysical Union, 2009. Fall Meeting Supplemental, Abstract P31A-A1241.
• Liang Zheng, Taras Gerya, David A. Yuen, Matthew G. Knepley, Huai Zhang, and Yaolin Shi. GPU implementation of Stokes equation with strongly variable coefficients. In Eos Transactions of the AGU. American Geophysical Union, 2010. Fall Meeting Supplemental, Abstract IN41A-1350.
• David A. Yuen, Matthew G. Knepley, Gordon Erlebacher, and Grady B. Wright. The coming role of GPU in computational geodynamics. In Eos Transactions of the AGU. American Geophysical Union, 2009. Fall Meeting Supplemental, Abstract DI22A-05.
• Charles A. Williams, Brad Aagaard, and Matthew G. Knepley. PyLith: A finite-element code for modeling quasi-static and dynamic crustal deformation. In Eos Transactions of the AGU. American Geophysical Union, 2007. Fall Meeting Supplemental, Abstract T21B-1798.
• C. Zhang, M. G. Knepley, D. A. Yuen, and Y. Shi. Two new approaches in solving the nonlinear shallow water equations for tsunamis. Preprint ANL/MCS-P1459-0907, ANL, September 2007.
• Matthew G. Knepley and Dmitry A. Karpeev. Mesh algorithms for PDE with Sieve I: Mesh distribution. Technical Report ANL/MCS-P1455-0907, Argonne National Laboratory, February 2007. [ANL]
• Charles A. Williams, Brad Aagaard, and Matthew G. Knepley. Development of software for studying earthquakes across multiple spatial and temporal scales by coupling quasi-static and dynamic simulations. In Eos Transactions of the AGU, volume 86. American Geophysical Union, 2005. Fall Meeting Supplemental, Abstract S53A-1072.
• Matthew G. Knepley and Dmitry A. Karpeev. Flexible representation of computational meshes. Technical Report ANL/MCS-P1295-1005, Argonne National Laboratory, October 2005. [ANL]
• Robert C. Kirby, Matthew G. Knepley, and L. Ridgway Scott. Optimal evaluation of finite element matrices. Technical Report TR-2004-04, University of Chicago, May 2004. [UC]
• Andrew Cleary and Matthew G. Knepley. Solvers as operators. Technical Report UCRL-ID-135342, Lawrence Livermore National Laboratory, 1999.

## Presentations

• All my PETSc Tutorial presentations are here
• 2014
• 2013
• 2012
• 2011
• 2010
• 2009
• December: AGU Fall Meeting, San Francisco, CA
• November: Sharing Data and Code in Computational Science, New Haven, CT
• October: NSF-NAIS Intelligent Software Workshop, Edinburgh, Scotland
• September: Department of Mathematics Colloquim, Louisiana State University, Baton Rouge, LA
• August: International Workshop on Geodynamical Phenomena, Suzdal, Russia
• July: HPC Group, Shanghai Supercomputing Center, Shanghai, China
• July: International Workshop on Modern Computational Geoscience Frontiers, GUCAS, Beijing, China
• March: Path to Petascale (GPU Meeting), University of Illinois Urbana-Champaign, IL
• March: SIAM CS&E, Miami, FL
• 2008
• August: ICES Seminar, University of Texas at Austin, TX
• July: SIAM Annual Meeting, San Diego, CA
• July: Workshop for Advancing Numerical Modeling of Mantle Convection and Lithospheric Dynamics, University of California Davis, CA
• June: Workshop on Numerical Modeling of Crustal Deformation and Earthquake Faulting, Colordao School of Mines, Golden, CO
• June: Sandia CSRI Workshop on Next-Generation Scalable Applications, Albuquerque, NM
• March: Workshop on Automating the Development of Scientific Computing Software, Louisiana State University, Baton Rouge, LA
• February: AuScope Inaugural Conference, Monash University, Victoria, Australia
• 2007
• October: The Role of Symbolic, Numeric and Algebraic Computation in CDI, NSF, Washington D.C.
• October: CIG Adaptive Mesh Refinement Workshop, University of Colorado Boulder, CO
• October: Mathematics Seminar, University of Duisberg-Essen, Essen, Germany
• October: Special Semester on Biological Computing, University of Linz, Linz, Austria
• August: VLAB Seminar, University of Minnesota, Minneapolis, MN
• June: Biomedical Flows Workshop, Simula Research, Oslo, Norway
• 2006
• December: Supercomputing Institute Seminar, University of Minnesota, Minneapolis, MN
• November: CBC Seminar, Simula Research, Oslo, Norway
• November: FEniCS 06, TU Delft, Delft, Netherlands
• September: Multiphysics Simulation Workshop, Idaho National Laboratory, Idaho Falls, ID
• August: Magma Dynamics Workshop, Columbia University, New York, NY
• July: SIAM Annual Meeting, Boston, MA
• June: Fault Systems Workshop, Colorado School of Mines, Golden, CO
• March: Compressible Convection Workshop, Purdue University, West Lafayette, IN
• 2005
• November: CIG Science Steering Committee Meeting, California Institute of Technology, Pasadena, CA
• October: FEniCS 05, Toyota Technical Institute, Chicago, IL
• October: CIG Meeting, Monash University, Melbourne, Australia
• September: Computer Science Seminar, Indiana University, Bloomington, IN
• July: Short-Term Crustal Dynamics Workshop, Los Alamos National Laboratory, Los Alamos, NM
• June: Mantle Convection Workshop, University of Colorado Boulder, CO
• May: CIG Executive Committee Meeting, University of California Berkeley, CA
• April: Parallel Computing Workshop, University of Houston, Houston, TX
• March: CIG Meeting, California Institute of Technology, Pasadena, CA
• February: SIAM CS&E, Orlando, FL
• 2004
• October: Mathematics and Computer Science Seminar, Argonne National Laboratory, Lemont, IL
• October: CIG Meeting, Monash University, Melbourne, Australia
• October: CRI Seminar, Purdue University, West Lafayette, IN
• August: Domain Specific Languages for PDE Constrained Optimization, Argonne National Laboratory, Lemont, IL
• July: Computer Science Seminar, Carnegie-Mellon University, Pittsburgh, PA
• June: Climate Simulation Colloquium, University of Chicago, Chicago, IL
• May: Parallel CFD, Gran Canaria, Canary Islands
• February: APAM Lecture, Columbia University, New York, NY
• January: CIG Kickoff Meeting, LAX, Los Angeles, CA

Matthew G. Knepley
Computation Institute
University of Chicago
Searle Chemistry Laboratory
5735 S. Ellis Avenue
Chicago, IL 60637
Work: (773) 41-EULER
Cell: (773) 450-3622

Unique Visitors
View My Stats