Design and Implementation of pyPRISM: A Polymer Liquid-State Theory Framework

In this work, we describe the code structure, implementation, and usage of a Python-based, open-source framework, pyPRISM, for conducting polymer liquid-state theory calculations. Polymer Reference Interaction Site Model (PRISM) theory describes the equilibrium spatial-correlations, thermodynamics, and structure of liquid-like polymer systems and macromolecular materials. pyPRISM provides data structures, functions, and classes that streamline predictive PRISM calculations and can be extended for other tasks such as the coarse-graining of atomistic simulation force-fields or the modeling of experimental scattering data. The goal of providing this framework is to reduce the barrier to correctly and appropriately using PRISM theory and to provide a platform for rapid calculations of the structure and thermodynamics of polymeric fluids and polymer nanocomposites.


Introduction
Free and open-source (FOSS) scientific software lowers the barriers to applying theoretical techniques by codifying complex approaches into usable tools that can be leveraged by non-experts. Here, we describe the implementation and structure of pyPRISM, a Python tool which implements Polymer Reference Interaction Site Model (PRISM) theory. [MGIJ + 18], [dis] PRISM theory is an integral equation formalism that describes the structure and thermodynamics of polymer liquids. [SC87] Despite the successful application of PRISM theory to study a variety of complex soft-matter systems, [SC94] its use has been limited compared to other theory and simulation methods that have available open-source tools, such as Self-Consistent Field Theory (SCFT), [psc], [ [RM11]. Some important factors contributing to this reduced usage are the complexities associated with implementing PRISM theory and the lack of an available open-source codebase. Our previous publication, [MGIJ + 18], focused primarily on the theoretical aspects of the method and presented several case studies to illustrate the utility of PRISM theory. In this work, we focus more specifically on the practical implementation and usage of PRISM theory within the pyPRISM framework. In the following sections, we will briefly discuss the basics of PRISM theory, Fig. 1: A schematic representation of the components of a coarse grained polymer nanocomposite made up of polymer chains and large spherical nanoparticles. This system is the focus of reference [HS05]. In this example, there are two site-types: a monomer site-type (M) in green and a nanoparticle site-type (P) in yellow. Also labeled are the polymer-polymer intra-molecular (Ω M,M (r)) and inter-molecular correlation functions (H M,M (r) and C M,M (r)).
our implementation of the theory in pyPRISM, our approach toward educating the scientific community about PRISM theory and pyPRISM, and finally our view for the future of the tool.

PRISM Theory
For a detailed discussion of PRISM theory, as well as a review of key applications of the theory, we direct the reader to our previous publication. [MGIJ + 18] Here, we briefly highlight the salient points of PRISM theory in order to help motivate the design of our class structure.
PRISM theory describes the spatial correlations in a liquidlike polymer system made up of spherical interacting "sites." The category of liquid-like polymers includes melts, blends, solutions, and nanocomposites of both homopolymers and copolymers. Within these systems, PRISM is able to handle varying chain chemistry, monomer sequence, and topology. The traditional PRISM formalism is spherically symmetric, which in general prevents the use of PRISM to study glassy, crystalline, phaseseparated or otherwise non-isotropic materials. While there is a modified PRISM formalism for oriented (liquid-crystalline) materials, [OS05a], [OS05b], [PS00], [PS99] those modifications are outside the scope of the current work. Figure 1 shows a schematic of a polymer nanocomposite that could be studied with PRISM theory using a two-site model.

D R A F T
In general, PRISM sites represent a segment of a molecule or polymer chain, similar to the atoms or coarse-grained beads that comprise an MD or MC simulation. Unlike these simulation methods, PRISM treats all of the sites of a given type as indistinguishable and does not track the individual positions of each site in space. Instead, the structure of the system is described through average spatial correlation functions. The fundamental PRISM equation for multi-component systems is represented in Fourierspace as a matrix equation of the site-site spatial correlation In this expression,Ĥ(k) is the inter-molecular total correlation function matrix,Ĉ(k) is the inter-molecular direct correlation function matrix, andΩ(k) is the intra-molecular correlation function matrix.Ω(k) describes the how the monomers within a molecule are connected and placed,Ĥ(k) andĈ(k) describe how the molecules are placed in space relative to one another.
The key difference betweenĤ(k) andĈ(k) is that the former includes many-body effects, while the latter does not. Knowledge ofĤ(k),Ĉ(k), andΩ(k) for a given system allows one to calculate a range of important structural and thermodynamic parameters, e.g., structure factors, radial distribution functions, second virial coefficients, Flory-Huggins χ parameters, bulk isothermal compressibilities, and spinodal decomposition temperatures. Each of the variables in Equation 1 represents a function of wavenumber k which returns an n × n matrix, with n being the number of site-types in the calculation. Each element of a correlation function matrix (e.g.,Ĥ α,β (k)) represents the value of that correlation function between site types α and β at a given wavenumber k. These correlation function matrices are symmetric, therefore there are n(n+1) 2 independent site-type pairs and correlation function values in each correlation function matrix. The nanocomposite in Figure 1 is modeled using n = 2 site-types which yields three independent site-type pairs: polymer-polymer, polymer-particle, and particle-particle.
Equation 1, as written, has one unspecified degree of freedom for each site-type pair, therefore additional mathematical relationships must be supplied to obtain a solution. These relationships are called closures and are derived in various ways from fundamental liquid-state theory. Closures are also how the chemistry of a system is specified via pairwise interaction potentials U α,β (r). For example, one widely-used closure is the Percus-Yevick closure shown below where Γ(r) is defined in real-space as While the PRISM equation can be solved analytically [SC94] in select cases, we focus on a more generalizable numerical approach here. Figure 2 shows a schematic of our approach. For all sitetypes or site-type pairs, the user provides input values forΩ α,β (k), site-site pair potentials U α,β (r), site-type densities ρ α , and an initial guess for all Γ α,β (r). After the user supplies all necessary parameters and input correlation functions, pyPRISM applies a numerical optimization routine, such as a Newton-Krylov method, [new] to minimize a self-consistent cost function. The details of this cost function were discussed in our previous work. [MGIJ + 18] After the cost function is minimized, the PRISM equation is considered "solved" and the resultant correlation functions can be used for calculations.
pyPRISM Overview pyPRISM defines a scripting API (application programming interface) that allows users to conduct calculations and numerically solve the PRISM equation for a range of liquid-like polymer systems. All of the theoretical details of PRISM theory are encapsulated in classes and methods which allow users to specify parameters and input correlation functions by name e.g., PercusYevick for Equation 2. Furthermore, the structure of these classes was kept as simple as possible so that novice scientific programmers could easily extend pyPRISM by implementing new closures, potentials, or intra-molecular correlation functions. These code structure of pyPRISM is shown in schematically in Figure 3 and is discussed in the Implementation Section. Providing a scripting API rather than an "input file"-based scheme gives users the ability to use the full power of Python for complex PRISM-based calculations. For example, one could use parallelized loops to fill a database with PRISM results using Python's built-in support for thread or process pools. Alternatively, pyPRISM could easily be coupled to a simulation engine by calling the engine via a subprocess, processing the engine output, and then feeding that output to to a pyPRISM calculation. The pyPRISM API is demonstrated in the Example pyPRISM Script section by modeling the system shown in Figure 1.
While experts in PRISM theory likely will need little guidance on how to appropriately apply pyPRISM, we also would like to make pyPRISM accessible to the widest possible audience.  Fig. 3: Overview of codebase and class organization. A full description of the codebase classes and methods can be found in the online documentation. [pyPa].

$ pip install pyPRISM
Full installation instructions can be found in the documentation.
[pyPa] Implementation Figure 3 shows an overview of the available classes and functions in pyPRISM and how they relate categorically. To begin, we consider the core data structures listed in the left column of the figure.
Parameters and data in PRISM theory fall into two categories: those that define the properties of a single site-type (e.g., density, diameter) and those that define properties for a site-type pair (e.g., closure, potential, intra-molecular correlation functions). pyPRISM defines two base container classes based on this concept, both of which inherit from a parent pyPRISM. Table class: pyPRISM.ValueTable and pyPRISM.PairTable. These classes store numerical and non-numerical data, support complex iteration, and provide a .check() method that is used to ensure that all parameters are fully specified. Both pyPRISM. Table  subclasses also support setting multiple pair-data at once, thereby making scripts easier to maintain via reduced visual noise and repetition. Additionally, pyPRISM.ValueTable automatically invokes matrix symmetry when a user sets an off-diagonal pair, assigning the α, β and β , α pairs automatically.
,v)) 41 42 # The above loop prints the following: In some cases where additional logic or error checking is needed, we have created more specialized container classes. For example, both the site volumes and the site-site contact distances are functions of the individual site diameters. The pyPRISM.Diameter class contains multiple pyPRISM. Table objects which are dynamically updated as the user defines site-type diameters. The pyPRISM.Density class was created for analogous reasons so that the pair-density matrix, ρ pair α,β = ρ α ρ β the site-density matrix, and the total site density, can all be calculated dynamically as the user specifies or modifies the individual site-type densities ρ α . An additional specialized container is pyPRISM.Domain. This class specifies the discretized real-and Fourier-space grids over which the PRISM equation is solved and is instantiated by specifying the length (i.e., number of gridpoints) and grid spacing in real-or Fourier space (i.e., dr or dk). An important detail of the PRISM cost function mentioned above is that correlation functions need to be transformed to and from Fourier space during the cost function evaluation. pyPRISM.Domain also contains the Fast Fourier Transform (FFT) methods needed to efficiently carry out these transforms. The mathematics behind these FFT methods, which are implemented as Type II and III Discrete Sine Transforms (DST-II and DST-III), are discussed in our previous work. [MGIJ + 18] The pyPRISM.System class contains multiple pyPRISM.ValueTable and pyPRISM.PairTable objects in addition to the specialized container classes described above. The goal of the pyPRISM.System class is to be a supercontainer that can validate that a system is fully and correctly specified before allowing the user to attempt to solve the PRISM equation.
While pyPRISM.System primarily houses input property tables, pyPRISM.PRISM represents a fully specified PRISM calculation and contains the cost function to be numerically minimized. The correlation functions shown in Equation 1 are stored in the pyPRISM.PRISM object as pyPRISM.MatrixArray objects, which are similar to pyPRISM.ValueTable objects but with a focus on mathematics rather than storage. pyPRISM.MatrixArray objects can only contain numerical data and provide many operators and methods which simplify PRISM theory mathematics. In particular, they satisfy the need for easy access to both the matrix and pair-function representations of the correlation functions, shown schematically in Figure 4 . The former is necessary for carrying out the mathematics of the PRISM equation (Equation 1) and the latter for performing Fourier transformations of the individual pairfunctions. The pyPRISM.MatrixArray objects also carry out a number of run-time error checks including ensuring that both MatrixArray objects involved in a binary operations (such as addition) are in the same space (real or Fourier). The core data structure underlying the pyPRISM.MatrixArray is a threedimensional Numpy ndarray of m n × n matrices, where m is the length of the pyPRISM.Domain.    The pyPRISM.PRISM object is solved by calling the .solve() method which invokes a numerical algorithm to minimize the output of the .cost() method by varying the input Γ α,β (r). Once a pyPRISM.PRISM object is numerically solved, it can be passed to a calculator that processes the optimized correlation functions and returns various structural and thermodynamic data. The current list of available calculators is shown in the rightmost column of Figure 3 and is fully described in the documentation. [pyPa] Beyond the core data structures, pyPRISM defines classes which are meant to represent various theoretical equations or ideas. Classes which inherit from pyPRISM.Potential, pyPRISM.Closure, or pyPRISM.Omega represent interaction potentials, theoretical closures, or intra-molecular correla- tion functionsΩ α,β (k), respectively. These properties must be specified for all site-type pairs before a pyPRISM.PRISM object can be created. To ensure that users can easily add new potentials, closures, andΩ α,β (k) to the codebase, we have kept the programming interface contract of these classes as simple as possible: Subclasses must inherit from the proper parent class and implement a .calculate() method.

Example Discussion
The code above shows how to use pyPRISM to calculate the properties of a polymer nanocomposite made of linear polymer chains and spherical nanoparticles. This system is shown schematically in Figure 1 and is fully described in reference [HS05]. The results of this calculation are plotted in Figure 5. In this section, we will discuss the details of this example in a line by line fashion as we specify all inputs shown in Figure 2 and then solve the PRISM equation. All pyPRISM calculations begin by first importing the pyPRISM library, and then creating a pyPRISM.System object. The first argument to the pyPRISM.System constructor is the names of the site-types for the calculation. In this case, we have two sitetypes which we (arbitrarily) call polymer and particle. Optionally, the constructor allows that the thermal energy level, k B T , be specified. Next a pyPRISM.Domain object is created with length=4096 grid-points and a grid spacing of dr=0.1.
Note that all parameters in pyPRISM are specified in a reduced unit system commonly called Lennard-Jones units. In this scheme, a characteristic length d c , mass m c , and energy e c are specified. All other units are then specified in terms of these characteristic units. For example, if d c = 1 nm, the grid spacing in the above code would be dr = 0.1d c = 0.1 nm. See [FB02] for more information on the Lennard-Jones reduced unit scheme. Next, site-type diameters and number densities are specified for both site-types in units of d c and beads per d 3 c , respectively. Qualitatively, these specifications imply that we are considering a dilute concentration of nanoparticles dissolved in a polymer matrix made up of polymer sites of significantly smaller diameter. D R A F T U polymer,polymer (r) and U particle,particle (r) pair potentials are specified as athermal hard sphere interactions, while the U polymer,particle (r) potential is an exponential attractive interaction. This configuration describes a dense melt-like polymer nanocomposite where the polymer chains are attracted to and adhere to (wet) the nanoparticle surface. The α and ε parameters in the pyPRISM.potential.Expontential constructor control the range and strength of the exponential attraction. 31 sys.closure['polymer',['polymer','particle']] = \ 32 pyPRISM.closure.PercusYevick() 33 sys.closure['particle','particle'] = \ 34 pyPRISM.closure.HyperNettedChain() To demonstrate one utility of the pyPRISM.PairTable data structure, here we have specified both the polymer-polymer and polymer-particle closure in a single line. Both pair-data are specified to the Percus-Yevick closure, while the particle-particle closure is set to be the hypernetted chain closure. In this code-block and those above, note how the subclasses of pyPRISM.Omega, pyPRISM.Potential and pyPRISM.Closure are used to easily specify complex theoretical constructs. Once the minimization completes, a pyPRISM.PRISM object is returned which contains the final solutions for H(r) and C(r) along with all input parameters and data. The pyPRISM.PRISM object is then passed through the pyPRISM.calculate.pair_correlation and pyPRISM.calculate.chi calculators. Both of these methods return pyPRISM.ValueTables, which can be subscripted to access the individual pair-functions. In the example, we extract the particle-particle pair correlation function, g particle,particle (r) and the particle-polymer χ particle,polymer parameter.
While it would be feasible to study this polymer nanocomposite system via simulation methods such as MD or MC, the use of PRISM theory offers some distinct advantages. PRISM theory does not suffer from finite-size or equilibration effects, both of which limit simulation methods. Furthermore, a simulation of sufficient size to study the large nanoparticles and relatively long polymer chains in this example would require many hours to days of CPU or GPU time from a supercomputing resource. This is due to the computational expense of evaluating the pairwise interactions at each simulated configuration and the many millions of configurations that must be generated in order to properly equilibrate and sample such a nanocomposite. In contrast, PRISM theory can be numerically solved in seconds even on modest hardware such as a laptop computer. This is because, unlike MD or MC, solving PRISM theory does not involve generating molecular configurations, but rather is a set of integral equations which are numerically solved for the spatial correlation functions,  H α,β (r) and C α,β (r). This numerical solution process is briefly described above at the end of the PRISM Theory section and is described in detail in Section II.E of [MGIJ + 18]. In addition to the computational performance benefits of PRISM theory over MD or MC, once the full set of pairwise spatial correlation functions is solved for, a variety of properties can quickly be screened without having to process large simulation trajectories.
PRISM theory provides a powerful alternative or complement to traditional simulation approaches, but we should note that it is not without limitation. There are restrictions on the types of systems and thermodynamic state points to which PRISM theory can be applied and the numerical closures are approximations and therefore sources of error. See Section IV.D of [MGIJ + 18] for a discussion on the known limitations of PRISM theory.

Pedagogy
It is our goal to create a central platform for polymer liquid state theorists while also lowering the barriers to using PRISM theory for the greater polymer science community. Towards this effort, we have identified two primary challenges: 1) The process of understanding and numerically solving PRISM theory is complex and filled with pitfalls and opportunities for error. 2) Many of those who would benefit most from PRISM theory do not have a strong programming background.
Our strategy to address both of these challenges is a strong focus on providing pedagogical resources to users. To start, we have put significant effort into our documentation. Every page of the API documentation [pyPa] contains a written description of the theory being implemented, all necessary mathematics, descriptions of all input and output parameters, links to any relevant journal articles, and a detailed and relevant example. While including these features in our documentation is not a new idea, we are focusing on providing these resources immediately upon release and iterating based on user feedback to improve the clarity and scope of the information provided.
Moving beyond API documentation, we also have created knowledgebase materials which provide more nuanced information about using and numerically solving PRISM theory. This D R A F T knowledgebase includes everything from concise lists of systems and properties that can be studied with pyPRISM to tips and tricks for reaching convergence of the numerical solver. In reference to Challenge 2 above, we also recognize that a significant barrier for non-experts to use these tools is the installation process. Our installation documentation [pyPa] attempts to be holistic and provide detailed instructions for the several different ways that users can install pyPRISM.
We have also created a self-guided tutorial to PRISM theory and pyPRISM in the form of a series of Jupyter notebooks.
[pyPb], [jup] The tutorial notebooks are designed to target a wide audience with varied programming and materials science expertise, with topics ranging from a basic introduction to Python to how to add new features to pyPRISM. The tutorial also has several case study-focused notebooks which walk users through the process of reproducing PRISM results from the literature. Figure 6 shows our recommendations for how users of different backgrounds and skill levels might move through the tutorial. In order to ensure the widest audience possible can take advantage of this tutorial, we have also set up a binder instance [pyPc], which allows users to try out pyPRISM and run the tutorial instantly in a web-browser without installing any software. This feature should also benefit users who might be hampered by Challenge 2 above.

Future Directions
While pyPRISM is a step forward in providing a central platform for polymer liquid-state theory calculations, we intend to significantly extend the tool beyond its release state. The most obvious avenue for extension will be to add new potentials, closures, and intra-molecular correlation functions Ω α,β (k) to the codebase. As described above, we hope that a significant portion of these classes will be contributed by users. Where analytical expressions forΩ α,β (k) do not exist, they can also be calculated from simulation trajectories. While we do provide a Cython-enhanced tool to do the calculation, we also plan to add features to more easily couple pyPRISM to common MD and MC simulation packages.
[hoo], [lam], [sim], [cas] These linkages would also make it easier for users to carry out the Self-Consistent PRISM (SCPRISM) method. [MGIJ + 18] PRISM theory also has advanced applications that are not possible in the current pyPRISM workflow. One example is the use of PRISM theory to translate a detailed atomistic simulation model to a less detailed, less computationally expensive coarsegrained model in a methodology called Integral Equation Coarse Graining (IECG). [DG17b], [DG17a], [MCLG12], [YSNG04] We plan to provide utilities in the pyPRISM codebase that aid in carrying out this method. PRISM theory can also be used to model or fit neutron and X-ray scattering data. In particular, PRISM theory can be used to take existing scattering models for single particles or polymer chains and model the effects of intermolecular interactions. This approach would greatly extend the applicability of existing scattering models, which on their own are only valid in the infinitely dilute concentration limit, but could be combined with pyPRISM to model higher concentrations.

Summary
pyPRISM is an open-source tool with the goal of facilitating the usage of PRISM theory, a polymer liquid-state theory. Compared to more widely-used simulation methods such as MD and MC, PRISM theory is significantly more computationally efficient, does not need to be equilibrated, and does not suffer from finite size effects. pyPRISM lowers the barriers to using PRISM theory by providing a simple scripting interface for setting up and numerically solving the theory. Furthermore, in order to ensure users correctly and appropriately use pyPRISM, we have created extensive pedagogical materials in the form of API documentation, knowledgebase materials, and Jupyter-notebook powered tutorials. D R A F T