N-th-order Accurate, Distributed Interpolation Library

The research contained herein yielded an open source interpolation library implemented in and designed for use with the Python programming language. This library, named smbinterp, yields an interpolation to an arbitrary degree of accuracy. The smbinterp module was designed to be mesh agnostic. A plugin system was implemented that allows end users to conveniently and consistently present their numerical results to the library for rapid prototyping and integration. The library includes modules that allow for its use in high-performance parallel computing environments. These modules were implemented using builtin Python modules to simplify deployment. This implementation was found to scale linearly to approximately 180 participating compute processes.


Introduction and Background
As engineers attempt to find numeric solutions to large physical problems, simulations involving multiple physical models or phenomena, known as multiphysics simulations, must be employed. This type of simulation often involves the coupling of disparate computer codes. When modeling physically different phenomena the numeric models used to find solutions to these problems employ meshes of varying topology and density in their implementation. For example, the unstructured/structured mesh interfaces seen in the combustor/turbo machinery interface [Sha01], or the coupling of Reynolds-Averaged Navier-Stokes and Large Eddy Simulation (RANS/LES) codes in Computational Fluid Dynamics (CFD) [Med06]. A similar situation with disparate meshes arises in the analysis of helicopter blade wake and vortex interactions, as for example when using the compressible flow code SUmb and the incompressible flow code CDP [Hah06]. When this is the case, and the mesh elements do not align, the engineer must perform interpolation from the upstream code to the downstream code.
Frameworks exist that perform interpolation for multiphysics simulations. In general, frameworks of this variety try to solve two problems. First, the framework should rapidly calculate the interpolation. Secondly, the interpolation should be accurate.
CHIMPS (Coupler for High-performance Integrated Multi-Physics Simulations) is a Fortran API (with Python bindings) that implements an efficient Distributed Alternating Digital Tree for rapid distributed data lookup [Alo06], [Hah09]. By default, CHIMPS can only provide the user with linear (second-order accurate) interpolations. While CHIMPS can provide third-order and higher accurate interpolations, it is not automatic; higherorder interpolations are only performed if the engineer supplies the CHIMPS API with higher-order terms. If this information is unavailable, then CHIMPS can only yield linear interpolations.
Another interpolation framework exists that can perform automatic higher-order interpolation. AVUSINTERP [Gal06] (Air Vehicles Unstructured/Structured Interpolation Tool) is a tool that provides linear and quadratic interpolations requiring only the physical values at points in a donor mesh, i.e. no a priori knowledge of higher-order terms. While this framework implements a superior interpolation scheme to the tri-linear interpolation found in CHIMPS, AVUSINTERP was not implemented in a parallel fashion, nor does it allow for the engineer to arbitrarily choose the order of the interpolation past third-order accuracy.
The research presented herein describes the development of a library that is a union of the best parts of the aforementioned tools. Namely, this research provides a library, named smbinterp, that implements the interpolation of a physical value from a collection of donor points to a destination point and performs this interpolation to an arbitrary degree of accuracy. The library can perform this interpolation in both two-and three-dimensional space. Also, the library was designed and implemented to be used in a high-performance parallel computing environment. The smbinterp library is implemented as a python module that builds upon the numpy and scipy libraries and presents an API for use in multiphysics simulation integration. The library is released under the GPL, and project is available on github [smbinterp].

Method
The numerical method implemented in smbinterp was first proposed by Baker [Bak03]. This interpolation method comprises the adjustment of a linear interpolation by a least squares estimate of higher-order terms. The Baker interpolation of the physical value of interest (denoted q) to the point Ξ is defined by: where q linear is the linear interpolation, and f (Ξ) is an estimation of the higher-order error terms. The following explanation is specific to two-dimensional space; three-dimensional space is treated in [McQ11]. The participating geometry required to implement this method in two spatial dimensions is shown in figure 1. The blue points (R) and green points (S) represent points in a source mesh, and the red point Ξ is the point to which an interpolation is desired. R represents a simplex that surrounds the destination point Ξ, and S 1..m is a collection of extra points surrounding the simplex R. The triangles A 1 − A 3 represent the areas formed by Ξ and R. Barycentric coordinates , denoted φ j (Ξ), are used to perform the linear interpolation. In geometric terms, the barycentric coordinates of a point in a simplex are the values of the normalized areas A j /A total opposite the vertex R j in the simplex R.
The barycentric coordinates define the influence that each point in the simplex R contributes to the linear interpolation. In other words, the ratio of A j /A total represents the influence from 0 ≤ φ ≤ 1 that q(R j ) has over the linear interpolant. If Ξ = R j , the value of q linear (Ξ) should then be influenced entirely by the known value of q(R j ). If Ξ is placed in such a way as to give , the value q(R j ) at each point R j contributes equally to the calculated value of q linear (Ξ).
The linear interpolant, which requires the simplex R and Ξ as inputs, is defined as where N + 1 is the number of points in a simplex (3 in twodimensional space, and 4 in three-dimensional space). The values of the basis functions φ j (Ξ) is the only unknown in equation 2.
To solve for φ j (Ξ) a system of linear equations will be defined involving the points in the simplex R j , Ξ, and equation 2. If q(Ξ) is a constant, q 1 = q 2 = q 3 = q linear = q constant , and equation 2 can be modified by dividing by q constant , that is: ( Furthermore, the basis functions must be calculated so that equation 2 also interpolates geometric location of the point Ξ, hence The values of the basis functions φ j (Ξ) can be found by solving the following system of linear equations involving equations 3, 4 and 5: which yields the values for φ j (Ξ), providing a solution for equation 2.
At this point the first of two unknowns in equation 1 have been solved, however the least squares approximation of error terms f (Ξ) remains unknown. If q(Ξ) is evaluated at any of the points R j in the simplex, then q(R j ) is exact, and there is no need for an error adjustment at R j , hence f (Ξ) = 0. Similarly, if q(Ξ) is being evaluated along any of the opposite edges to R ι of the simplex R, the error term should have no influence from φ ι (Ξ), as A ι = 0. This condition is satisfied when expressing the error terms using the linear basis functions as In equation 7 the three double products of basis functions are the set of distinct products of basis functions that are quadratic in the two spatial dimensions x and y, and zero when evaluated at each of the verticies in R. This term represents a thirdorder accurate approximation for the error up to and including the quadratic terms. This equation introduces three unknowns whose values must be solved, namely a, b, and c.
Recall that S k , k = 1, 2, . . . , m is the set of m points surrounding Ξ that are not in the simplex R j . A least squares system of equations is defined using the values of the basis functions at these points, the values of a linear extrapolation for each of those points using the simplex R, and the values of a, b, and c in equation 7. Define A as (a, b, c) T . Applying least squares theory a, b, and c are found by inverting the following 3 × 3 matrix: The matrix B is defined using the identical basis function pattern as in equation 7. Denote φ j (S k ) as the value of φ j evaluated using equation 2 and the data point S k (in lieu of Ξ). The matrix B in equation 8 is thus defined: The value of q(S k ) is known a priori (values of q at each point S k in the donor mesh). The value of q linear (S k ) (the linear extrapolant) can also be calculated using equation 2. Define w in equation 8 as Equation 8 is populated with the information from each of the surrounding points in S k , then the unknown A can be calculated. Knowing A, equation 7 is evaluated for f (Ξ). Subsequently the previously calculated value of q linear (Ξ) and the recently calculated value of f (Ξ) are used to solve equation 1 for q(Ξ).
There exist known limitations to this least squares-based interpolation method. First a change in vertex stencil will generally yield a discontinuity in interpolation results. While this property makes this method insufficient for graphical applications, it has been shown to yield sufficiently accurate results to be used in engineering applications [Bak03], [Gal06].
Secondly, while solutions to the linear system in equation 2 are well-behaved, certain vertex configurations can lead to a singular system of equations in equation 7. These pathological vertex configurations occur when more than n − 1 of the extra points lie on one extended edge of the simplex R [Bak03]. If this occurs, the covariance matrix B T B will be singular, the solution will not be unique, and the error approximation will not generally aid in improving the interpolation.
Extension of this method into three dimensions is non-trivial, and is explained in depth in [McQ11]. A pattern exists to define any error approximation function f (Ξ) and covariance matrix B T B parametrized by order of approximation and dimension. Define ν as the desired order of accuracy less one (i.e. for cubic interpolation ν is 3). As defined above, N is the spatial degree. The pattern for the combinations of basis functions that are used to define f (Ξ) is collection of ν-th ordered combinations of N + 1 basis functions φ j that are unique and non-duplicate, triplicate, etc. The following code implements this pattern: 1 from itertools import product The dynamic calculation of the basis function pattern in this fashion is powerful, in that it can be calculated for any arbitrary ν, and for any spatial dimension (although only N of 2 and 3 are dealt with herein). However, for each point Ξ the calculation of the pattern must be performed once for the calculation of f (Ξ) and once per extra point S k participating in the current interpolation for each row in the B matrix. There is only one valid pattern per set of inputs N and ν, which must both remain constant throughout a single interpolation. The calculation of the pattern is a computationally intensive operation, and so a caching mechanism has been implemented in smbinterp that only calculates the pattern if it has not been previously calculated. This concept is known as memoization, and is implemented using the following function wrapper: 1 from functools import wraps Baker's method gives a reasonable interpolation solution for a general cloud of points. However, the method suggested by Baker for the vertex selection algorithm for the terms R and S k consists of simply selecting the points nearest Ξ. While this is the most general point selection algorithm, it can lead to the aforementioned pathological vertex configurations. This configuration is prevalent when the source mesh is composed of a regular grid of vertices, and must be addressed if the method is to yield a good interpolation.
Furthermore a mesh may have been designed to capture the gradient information, and therefore the mesh topology should be respected. Simply selecting the closest points to Ξ would yield inferior results. By selecting the more topologically (according to the mesh) adjacent points the information intended to be captured in the mesh's design will be preserved.  The cells and cells_for_verts data structures are used when searching for a containing simplex. The structures are populated with connectivity information before a round of interpolations. The method employed in the default implementation for the location of the containing simplex in an upstream mesh is straight forward: first the spatial tree structure is used to find the location of the nearest vertex to the point of interest, then the cells are recursively visited in topologically adjacent order and tested for inclusion of the point Ξ.
The selection of the extra points S k is also implemented in the base grid class. The default algorithm simply queries the kdtree structure for (N + 1) + m points and discards the points that are already in the simplex R.
Plugins are defined as classes that inherit from the base grid object, and that implement the requisite functionality to populate the cells and cells_for_vert data structures. If either of the default simplex and vertex selection methods do not provide the desired functionality they could be overridden in the derived class to provide a more tuned R and S k selection algorithms. This gives engineers complete control over point selection and makes the interpolation library mesh agnostic.
A parallel mechanism for calculating q(Ξ) was implemented in smbinterp. As is illustrated in figure 2, a stream of requested interpolations are presented to a queuing mechanism that then distributes the task of calculating the interpolations to a set of minions. The server.py application implements the four queues required to implement this method: a queue for tasks to be performed, a queue for results, and two queues for orchestrating the control of a round of interpolations between a master and a set of minions. Masters and minions authenticate and connect to these four queues to accomplish the tasks shown in the flowchart in figure 2. The master.py script is responsible for orchestrating the submission of interpolations and events associated with starting and stopping a set of interpolations. Each of the minions has access to the entire domain and are responsible for performing the interpolations requested by the end user.
The crux of the solution lies in providing the minions with a steady stream of work, and a pipeline into which the resultant interpolations can be returned. The mechanism developed in smbinterp uses built-in Python modules to minimize the deployment expense. The multiprocessing module provides a manager class which facilitates the access of general objects to authenticated participants over a network. The built-in Queue objects, which implement a multi-producer, multi-consumer first-in-first-out queue, are presented to the minions and masters using the functionality in the manager class.

Results and Discussion of Results
The root mean square (RMS) of the errors was used to determine the accuracy of the smbinterp module. A continuous function whose values varied smoothly in the test domain was required to calculate the error; the following equation was used: q(x, y) = (sin (xπ) cos (yπ)) 2 . (11) A plot of this function is found in figure 3. Each error ε i was calculated as the difference between the actual value (from equation 11) and calculated interpolations (at each point in the destination domain using smbinterp), or ε i (Ξ) = q exact (Ξ) − q calculated (Ξ).

Fig. 3: Plot of Equation 11
A mesh resolution study was performed to determine how the RMS of error varied with mesh density. The source mesh was generated using gmsh, and the lowest-resolution mesh is shown in figure 4. The results of this study are show in figure 5. A collection of 1000 random points were used as the destination for interpolation. Fig. 4: Lowest-resolution test mesh Figure 5 plots the relationship between mesh spacing and RMS of error of all interpolations in the collection of destination vertexes. The x-axis represents the spacing between the regular mesh elements. The y-axis was calculated by performing interpolation from each resolution of mesh to a static collection of random points. The lines in each plot are representative of the slope that each collection of data should follow if the underlying numerical method is truly accurate to the requested degree of accuracy. As an example, the collection of points for ν of 2 should be third-order accurate, and should follow a line with slope of 3; this is closely demonstrated in the plots.  Figure 5 shows the results of the resolution study for the twodimensional test case meshes. The three dimensional test case meshes yielded similar results and are presented in [McQ11]. As the meshes were refined the RMS of error decreased. The fourth-and sixth-order results (ν of 3 and 5) matched the slope lines almost exactly, whereas the third-and fifth-order results were slightly lower than expected for that level of accuracy.
As mesh element size decreased, the RMS of error decreased as well. The RMS of error for the highest ν decreased more than that of the lowest ν. The RMS of error of the most coarse mesh (far right) ranges within a single order of magnitude, whereas the RMS of errors at the most fine spacing (far left) span four orders of magnitude. The results exhibit a slight banding, or unevenness between each order. Also, the data very closely matches the plotted lines of slope, indicating that the order of accuracy is indeed provided using this numerical method. The rate at which error decreases as the average mesh element size decreases in figure 5 is indicative of the order of accuracy of the numerical method implemented in smbinterp. There is slight banding for the two-dimensional meshes between quadratic and cubic interpolation, and again for quartic an quintic interpolation. While this indicates that the method does not perfectly interpolate to those orders of accuracy, in general increasing the ν parameter of the smbinterp library provides a more accurate interpolation. Furthermore, the cases where the points diverge from the slope of appropriate order, the divergence occurs in a favorable direction (i.e. less error). Also, the fine meshes experience a more significant decrease in RMS of error than the coarse meshes while increasing the order of approximation, ν. While this is an intuitive result, it emphasizes the notion that mesh density should be chosen to best match the underlying physical systems and to provide optimally accurate results.
The parallel algorithm employed by smbinterp was found to scale quasi-linearly to approximately 180 participating minion.py processes. Speedup is defined as the ratio of time to execute an algorithm sequentially (T 1 ) divided by the time to execute the algorithm with p processors [WSU], or S p = T 1 T p . A parallel algorithm is considered to have ideal speedup if S p = p.
A more meaningful parameter for instrumenting the performance of a parallel algorithm is known as the efficiency of the algorithm, denoted E p . Efficiency of a parallel algorithm is defined as the speedup divided by the number of participating processors, or E p = T p p . The efficiency of an algorithm ranges from 0 to 1, and is shown for smbinterp in figure 6. The parallelization algorithm employed by the smbinterp library has near-linear speedup up to approximately 128 participating minions. It has an efficiency above 90 percent up to 181 participating nodes, but the efficiency drops substantially when using more minions. If an algorithm does not have an efficiency of 1, it is usually indicative of communication overhead or bottlenecks of some form. It was observed that the cpu utilization of the server.py script increased linearly up to 181 minions (CPU utilization of 200%), but then did not increase past that point. The implementation of the server.py script represents the bottleneck of this implementation.

Conclusions
The smbinterp module was developed to provide a highperformance interpolation library for use in multiphysics simulations. The smbinterp module provides an interpolation for a cloud of points to an arbitrary order of accuracy. It was shown, via a mesh resolution study, that the algorithm (and implementation thereof) provides the the end user with the expected level of accuracy, i.e. when performing cubic interpolation, the results are fourth-order accurate, quartic interpolation is fifth-order accurate, etc.
The smbinterp module was designed to be mesh agnostic. A plugin system was implemented that allows end users to conveniently and consistently present their numerical results to the library for rapid prototyping and integration.
The smbinterp module was designed with parallel computing environments in mind. The library includes modules that allow for its use in high-performance computing environments. These modules were implemented using built-in Python modules to simplify deployment. This implementation was found to scale linearly approximately 180 participating compute processes. It is suggested to replace the queuing mechanism with a more highperformance queuing library (e.g. ØMQ) and a more advanced participant partitioning scheme to allow the library to scale past this point.