Matt Knepley's HomePage

Computational Science

General Papers

Automated Finite Element Discretization and Solution of Nonlinear FEM Systems

Linear and Nonlinear preconditioning

Barry Smith, Peter Brune, Xuemin Tu, Jed Brown, and I are working on approaches to nonlinear preconditioning. Our approach is described in this paper. The most important aspect here is composability. This is exactly what makes KSP/PC so useful. Complex, optimal solvers can be built from simple pieces. We would like to carry over this design to the nonlinear problem. Our ideas for linear multiphysics preconditioning are contained in this paper and some follow-on work by Peter and Jed about very efficient Jacobian approximation and lagging. We now have a PETSc implementation of nonlinear preconditioning, e.g. SNESSetNPC() sets a nonlinear preconditioner, which support both left and right preconditioning, and also additive and multiplicative combinations of nonlinear solvers. Our paper detailing the preconditioners, has just been submitted. You can also see an example of me using this in a recent talk.

This paper describes the PETSc approach to linear preconditioning for multiphysics systems. We show that our existing preconditioners can be composed to generate all previously published block preconditioners, and also to invert this relationship so that block preconditioners can be used on each level of multigrid or FAS. This talk decmonstrates how to construct these preconditioners for the Stokes equation, and here I construct an unstructured FAS cycle for the p-Laplacian using only command line options. We have the ability to attach auxiliary operators to the DM so that they can be used for preconditioning blocks of the system. Building on the FEM pieces described below, we will soon be able to form analytic approximations to operator blocks.

For several years now, I have developed the BIBEE approximation for boundary integral operators with Jay Bardhan. This strategy approximates the surface-to-surface operator directly, and thus can provide good bounds for both the large and small eignevalues. This is detailed in the section on biological applications below.

Mesh Representation and Manipulation

I developed the unstructured mesh support in PETSc, which is embodied by the DMPlex class. DMPlex can represent any CW complex, which is a much wider class of discretized surfaces than typically used in simulations, e.g. it can handle non-orientable and non-manifold surfaces, and uses hybrid meshes with prisms in PyLith. DMPlex can automatically create faces and edges for cell-vertex meshes. In PyLith, it automaitcally inserts prismatic cells for use in the cohesive cell formulation of fault mechanics. It can also extract surfaces and volumes, create ghost cells, find boundaries, and orient meshes in parallel. It interfaces with most common mesh formats, and can output to HDF5 in parallel. You might want to configure with

--download-triangle --with-ctetgen
--download-chaco --download-metis --download-parmetis
for automatic mesh generation, partitioning, and output.

It interfaces with the discretization code described below, and is intended for multiphysics simulation. You can run a full set of PDE tests using the new Python test system

python2.7 ./config/ check src/snes/examples/tutorials/ex12.c
python2.7 ./config/ check src/snes/examples/tutorials/ex62.c
The test suite exhibits all the traditional preconditioners for the iso-viscous Stokes problem using PC FieldSplit (command lines), as well as matrix-free application of the operator combined with user-defined preconditioning, along with an FAS example using the p-Laplacian.

For several years now, we have been exploring unstructured geometric coarsening, and its application to unstructured multigrid methods. Peter has taken the lead on this work. Our recent paper shows simple applications of these ideas, but we would like to extend this to incorporate AMG ideas and also to larger and more complex problems. This work rests on the DMPlex (Sieve) API for unstructured mesh representation and manipulation, detailed in this paper. Here is another paper by Anders Logg discussing his implementation of these ideas in FEniCS.


I have recently introduced PETSc abstractions for quadrature, function spaces, dual spaces, and finite elements. Some good examples of this in PETSc are the Poisson equation with nonlinear coefficient and the Stokes equation on both simplicial and hexhedral meshes. Jed and I have also produced a 2nd order TVD finite volume example that can solve advection, shallow water, and the Euler equations with a variety of limiters and least-squares reconstruction.

The discretization interface plays well with the DM interface for topology representation. This lets the user specify boundary conditions on any marked part of the boundary, automatically refine the mesh in parallel, create interpolation operators for the mesh hierarchy, and extract submanifolds from computation or visualization. In fact, SNES ex12 showcases an automatically assembled FAS (nonlinear multigrid) scheme for the p-Laplacian. The user need only specify the equations and boundary conditions as pointwise functions, and PETSc can assemble the rest. This is very much like the FEniCS approach. For example, the functions for Stokes:

void f1u(PetscScalar u[], PetscScalar gradU[], PetscScalar a[], PetscScalar gradA[],
         PetscReal x[], PetscScalar f1[]) {
  for (comp = 0; comp < Ncomp; ++comp) {
    for (d = 0; d < dim; ++d) f1[comp*dim+d] = gradU[comp*dim+d]; /* (grad v, grad u) */
    f1[comp*dim+comp] -= u[Ncomp];                                /* (grad v, -p      */
void f1p(PetscScalar u[], PetscScalar gradU[], PetscScalar a[], PetscScalar gradA[],
         PetscReal x[], PetscScalar f0[]) {
  f0[0] = 0.0;
  for (d = 0; d < dim; ++d) f0[0] += gradU[d*dim+d];
And the boundary condition can be specified using:
void quadratic_u_2d(const PetscReal x[], PetscScalar *u, void *ctx)
  u[0] = x[0]*x[0] + x[1]*x[1];
  u[1] = 2.0*x[0]*x[0] - 2.0*x[0]*x[1];

I collaborate with Ridgway Scott, Andy Terrel, Peter Brune, and Robert Kirby on automation of finite element discretization, as well as better integration of mesh abstraction with discretization and solver routines. I am also involved with the FEniCS project at Simula Research, working in particular with Hans Petter Langtangen and Anders Logg. The overarching theme for this work is that automation in scientific computing, exemplified by compilers, should be raised to a higher level of abstraction. By confining ourselves to a well understood domain, in this case finite element methods, we can make the symbolic problem tractable. A prime example of this program is the FEniCS Form Compiler (FFC) which I use to generate GPU integration code in this paper.


I have been a member of the PETSc Team since 1997. PETSc is an open-source software package designed for the solution of partial differential equations and used in diverse applications from surgery to environmental remediation. PETSc is the most complete, most flexible, most powerful, and easiest-to-use platform for solving the nonlinear algebraic equations arising from the discretization of partial differential equations. PETSc is designed to allow engineers and scientists to perform large-scale numerical simulations of physical phenomena rapidly and efficiently. Representative simulations to date include fluid flow for aircraft, ship, and automobile design; blood flow simulation for medical device design; porous media flow for oil reservoir simulation for energy development and groundwater contamination modeling; materials properties; economic modeling; structural mechanics for construction design; combustion modeling; and nuclear fission and fusion for energy development. These simulations allow the effects of design decisions to be evaluated and compared, including cost benefits, safety concerns, and environmental impact. PETSc has enabled simulations by industry, universities, and research laboratories, from desktop systems to the most powerful supercomputers, and for critical applications affecting the nation's energy, environment, and health. A testament to the widespread recognition of this package is the fact that PETSc has had over 400 downloads per month during the past three years. A Google search for "PETSc" on January 6, 2009, showed 199,000 pages referencing PETSc. In addition, a search of Google books found 662 books that reference PETSc.

PETSc won the SIAM/ACM Prize in Computational Science and Engineering in 2015, and the R & D 100 Award in 2009. PETSc solvers were cited as one of the Top Ten Computational Science Accomplishments of the U.S. Department of Energy (DOE) in 2008. Two HPC hardware companies, Cray and SiCortex, provide and support PETSc with their products. Several commercial simulation packages, including FIDAP 8.5, TigerCAD, and RF3P, use PETSc for their algebraic solvers. Also, products from the companies Tech-X and Actel use PETSc. Microsoft has distributed previous versions of PETSc with its HPC software suite, the Microsoft Cluster CD. PETSc has been used by Boeing for computational fluid dynamics simulations and Shell for solving inverse problems for oil reservoir management. CFDResearch in Huntsville, Alabama, uses PETSc in its fluid simulations. PETSc solvers have also been used at the U.S. Army Engineering Research and Development Center and by the South Florida Water Management District modeling the Everglades.



GPU Computing

The combination of low power consumption, large memory bandwidth, high flop rate, and low cost make GPUs a promising platform for desktop scientific computing. The large latency is problematic for strong scaling, but there are many candidates for this type of throughput device. especially since interconect latency looks set to drop rapidly. Moreover, it is likely that a large fraction of future supercomputers will also employ accelerators (as 3 of the top ten machines in the world already do). In the last 4 decades of scientific computing, the most successful strategy for accomodating new hardware is the production of high quality libraries, such as MPI and PETSc. We propose to develop not only parallel algebraic solvers, but discretization and multiphysics libraries, suitable for hybrid machines with significant accelerator components.

Our strategy is to use a combination of low-level library routines and code generation techniques to produce implementations for our higher level API. We use the Thrust and Cusp libraries from NVIDIA to provide linear algebra, shown here, and more basic operations such as sorting and keyed reductions used in this paper. For operations that a more variable, such as those that depend on the weak form, we use code generation. The key point is to circumscribe the domain so that the generation process becomes tractable. In generating FEM integration routines, we produce not only different block sizes, but different organization of memory writes and computation.

With Andy Terrel and Karl Rupp, we are exploring code transformations which will enable us to automatically optimize more general calculations. In particular, we are reorganizing traversal, and using code transformation tools like Ignition, to explore quadrature-based methods for FEM residual and Jacobian evaluation. In our upcoming paper, we show that excellent vectorization is possible with CUDA and OpenCL, even for forms with variable coefficient.


The Fast Multipole Method and Radial Basis Function Interpolation

I collaborate with Felipe Cruz on the PetFMM code for simulation using the fast multipole method. The main idea is that a very simple serial implementation of FMM can be combined with optimization techniques from graph partitioning to produce an efficient and scalable parallel FMM. PetFMM completely reuses the serial code, and also uses templating to optimize the implementation. It templates over the structure used to describe source and targets, the neighborhood stencil, the expansion order, and the dimension. We are able to guide the optimization routine with good theoretical estimates of the communication and computation complexity. In fact, this paper will show that optimal communication complexity is achievable with PetFMM.

I also work with Rio Yokota and Andreas Klöckner on a version of FMM which will autogenerate expansions given the kernel function. We believe this is the step that most severely limits the wide adoption of the method. The implementation uses sympy for symbolic manipulation and code generation, and PyCUDA for kernel execution and organization.

I also collaborate with Rio Yokota and Lorena Barba on the PetRBF code for parallel, radial basis function interpolation. PetRBF uses a scalable preconditioning strategy which uses small ASM blocks tailored to the function overlap. This code can interface with FMM libraries, or other fast evaluation schemes, to support large scale simulation.




Crustal Dynamics

I am co-author of the PyLith code with Brad Aagaard and Charles Williams. We have developed open-source finite-element software for 2-D and 3-D dynamic and quasi-static modeling of crustal deformation. This software, PyLith (current release is 2.0) can be used for quasi-static viscoelastic-plastic modeling or dynamic spontaneous rupture and/or ground-motion modeling. Unstructured and structured finite-element discretizations provide support for spatial scales ranging from tens of meters to hundreds of kilometers with temporal scales in dynamic problems ranging from milliseconds to minutes and temporal scales in quasi-static problems ranging from minutes to thousands of years. PyLith development is part of the NSF funded Computational Infrastructure for Geodynamics (CIG) and the software runs on a wide variety of platforms (laptops, workstations, clusters, massively parallel machines). Binaries (Linux, Darwin, and Windows systems) and source code are available from PyLith uses a suite of general, parallel, graph data structures from PETSc called DMPlex for storing and manipulating finite-element meshes. This permits use of a variety of 2-D and 3-D cell types including triangles, quadrilaterals, hexahedra, and tetrahedra. It also allows faithful representation of faults, automatic parallel refinement, and selection of arbitrary surfaces and volumes for output.

Current features include prescribed fault ruptures with multiple earthquakes and aseismic creep, spontaneous fault ruptures with a variety of fault constitutive models, time-dependent Dirichlet and Neumann boundary conditions, absorbing boundary conditions, time-dependent point forces, and gravitational body forces. PyLith supports infinitesimal and small strain formulations for linear elastic rheologies, linear and generalized Maxwell viscoelastic rheologies, power-law viscoelastic rheologies, and Drucker-Prager elastoplastic rheologies. Current development focuses on coupling quasi-static and dynamic simulations to resolve multi-scale deformation across the entire seismic cycle and coupling elasticity to heat and/or fluid flow.



  1. The PyLith Homepage
  2. The PyLith development repository
  3. The PyLith buildbot

Magma Dynamics

I am working with Richard Katz and Marc Spiegelman on magma dynamics problems, including the mid-ocean ridge spreading shown on the left. Magmatic plate boundaries (mid ocean ridges, subduction zones) and intra-plate hot spots are among the most important and complex dynamic and chemical systems in the solid earth. While volumetrically small, these regions have global consequences for mantle flow and plate tectonics through the spontaneous localization of weak plate boundaries. Magmatism in these regions also provides the dominant mechanism for geochemical fractionation, mixing and sampling of the Earth's mantle. Many of the observational constraints on both plate-boundary processes and global mantle dynamics come from regional studies of lavas from mid-ocean ridges, subduction zones and oceanic hot spots and form focus areas for many initiatives such as GeoPRISMS, EarthScope and core EAR and MGG Geophysics programs.

Optimal preconditioners for many multiphysics problems, such as variable-viscosity Stokes and thermal convection, arise by combining existing optimal preconditioners for individual blocks of the Jacobian, associated with a given physics. For instance, a classic Stokes preconditioner applies multigrid to the momentum balance and scaling to the mass balance, combining them multiplicatively, using one block elimination step. The PETSc composable solvers allow a user to define different preconditioners for each equation block, and also different combination methods, inclduing Schur complements, between sets of equations, and it can all be specified dynamically on the command line without any code changes. Thus, the same code that runs full LU to verify a solution can also scale optimally on the largest machines available. Moreover, we can choose to split at various levels. Multigrid could be used for the overall Stokes system, and then splitting can be used for each smoother (so-called segregated KKT smoothers), inverting the classical scheme above. Finally, we can use split schemes to precondition the fully-coupled problem.


Electrostatics and Classical Density Functional Theory for Biological Systems

Electrostatic interactions play an essential role in the structure and function of biomolecules (proteins, DNA, cell membranes, etc.). One of the most challenging aspects for understanding these interactions is the fact that biologically active molecules are almost always in solution---that is, they are surrounded by water molecules and dissolved ions. These solvent molecules add many thousands or even millions more degrees of freedom to any theoretical study, many of which are of only secondary importance for investigations of interest. Model the solvent using a discontinuous dielectric coefficient, and solve the Poisson equation directly using a boundary-integral formulation and the boundary-element method, BEM, to determine the induced charge distribution on the molecular surface which accounts for the change in polarization charge across the dielectric boundary.

I am collaborating with Jay Bardhan to develop the BIBEE approximation algorithm for electrostatics problems with discontinuous dielectric coefficient in complex geometries. BIBEE approximates the operator arising from a boundary integral discretization of the Poisson equation, giving rigorous upper bounds and effective lower bounds for the solvation energy.

I am collaborating with Dirk Gillespie and Robert Eisenberg at Rush University Medical Center on the modeling of biological ion channels using classical Density Functional Theory of fluids. Since its inception 30 years ago, classical density functional theory (DFT) of fluids has developed into a fast and accurate theoretical tool to understand the fundamental physics of inhomogeneous fluids. To determine the structure of a fluid, DFT minimizes a free energy functional of the fluid density by solving the Euler-Lagrange equations for the inhomogeneous density profiles of all the particle species When solving on a computer, the density can be discretized (called free minimization) or a parameterized function form such as Guassians can be assumed (called parameterized minimization). This approach has been used to model freezing, electrolytes, colloids, and charged polymers in confining geometries and at liquid-vapor interfaces. DFT is different from direct particle simulations where the trajectories of many particles are followed over long times to compute averaged quantities of interest (e.g., density profiles). DFT computes these ensemble-averaged quantities directly. However, developing an accurate DFT is difficult and not straightforward. In fact, new, more accurate DFTs are still being developed for such fundamental systems as hard-sphere fluids, electrolytes, and polymers.

Our group has already applied one dimensional DFT to biological problems involving ion channel permeation, successfully matching and predicting experimental data. We have extended this to 3D using efficient numerical schemes for functional evaluation and nonlinear system solution.



  1. Ellipsoidal Potential Theory contains both a MATLAB and Python implementation of ellipsoidal harmonic expansions for solving problems of potential theory using separation of variables.
  2. The DFT development repository

Nonlinear Wave Mechanics

I am collaborating with David Ketcheson, Aron Ahmadia and Kyle Mandli on nonlinear wave mechanics and the development of the PyClaw framework, of which PetClaw is the parallel extension. The PyClaw design is based upon a loosely-coupled model for simulation software, in which computational operations are wrapped in a higher level language, here Python, and spliced together. The glue in Python is numpy, a lightweight library which allows no-copy access across languages to multidimensionl arrays. With this, ClawPack could be joined to PETSc efficiently using only Python code. It took 100 lines of Python to parallelize PyClaw, subclassing only three classes.



  1. The PyClaw Homepage

Fracture Mechanics

I am collaborating with Blaise Bourdin to produce a scalable code to simulate fracture mechanics. In the area of brittle fracture, there has seen tremendous progress over the recent years, but some issues remain problematic. The most widely accepted theories, based on Griffith's criterion, are limited to the propagation of an isolated, pre-existing crack along a given path. Extending Griffith's theory into a global minimization principle, while preserving its essence, the concept of energy restitution in between surface and bulk terms, G. Francfort and J.-J. Marigo proposed a new formulation for the brittle fracture problem, capable of predicting the creation of new cracks, their path, and their interactions, in two and three space dimensions.

The essence of this model is the global minimization over all crack sets \(K\) and all admissible displacement fields \(u\) of an energy similar to
\mathcal{E}= \int_{\Omega \setminus K} W(\mathrm{e}(u))\,dx + \alpha \mathcal{H}^{N-1}(K\cap \bar{\Omega})
where \(W\) denotes an elastic potential and \(\mathcal{H}^{N-1}\) denotes the \(N-1\)-dimensional Hausdorff measure. The numerical minimization of such a functional is of course challenging. The numerical experiments on this page are based on the concept of variational approximations of functionals by \(Gamma\)-convergence, and are described in more detail here.

Below, I have embedded some videos of our largest runs to date. This simulates thermal fracture arising from Newtonian cooling. It is fully three dimensional, use 24M elements, and runs on 2400 processors of LoneStar at TACC. The full simulation was completed in just 6 hours.

The Wonderful People Who Pay My Salary


Matt at


Community: My Mendeley Profile and ORCID site.

In Progress

Submitted Papers

Published Journal Papers

Peer-Reviewed Conference Papers

Book Chapters

Other Conference Papers and Technical Reports

Academic Service

Meetings Organized


Matthew G. Knepley
Computation Institute
University of Chicago
Searle Chemistry Laboratory
5735 S. Ellis Avenue
Chicago, IL 60637
Work: (773) 41-EULER
Cell: (773) 450-3622

Unique Visitors
View My Stats