Organic Molecules in Space : Insights from the NASA Ames Molecular Database in the era of the James Webb Space Telescope

We present the software tool pyPAHdb to the scientific astronomical community, which is used to characterize emission from one of the most prevalent types of organic molecules in space, namely polycyclic aromatic hydrocarbons (PAHs). It leverages the detailed studies of organic molecules done at the NASA Ames Research Center. pyPAHdb is a streamlined Python version of the NASA Ames PAH IR Spectroscopic Database (PAHdb; www.astrochemistry.org/pahdb) suite of IDL tools. PAHdb has been extensively used to analyze and interpret the PAH signature from a plethora of emission sources, ranging from solar-system objects to entire galaxies. pyPAHdb decomposes astronomical PAH emission spectra into contributing PAH sub-classes in terms of charge and size using a database-fitting technique. The inputs for the fit are spectra constructed using the spectroscopic libraries of PAHdb and take into account the detailed photo-physics of the PAH excitation/emission process.


Polycyclic aromatic hydrocarbons
Polycyclic aromatic hydrocarbons (PAHs) are a class of molecules found throughout the Universe that drive many critical astrophysical processes.They dominate the mid-infrared (IR) emission of many astronomical objects, as they absorb ultraviolet (UV) photons and re-emit that energy through a series of IR emission features between 3-20 µm.They are seen in reflection nebulae, protoplanetary disks, the diffuse interstellar medium (ISM), planetary nebulae, and entire galaxies (e.g., Figure 1), among other environments.Structurally, they are composed of a hexagonal carbon lattice (see Figure 2); taken as an entire family, they are by far the largest known molecules in space.PAHs are exceptionally stable, allowing them to survive the harsh conditions amongst a remarkably wide variety of astronomical objects.

The role of astronomical PAHs
Thanks to their ubiquity, PAH IR emission signatures are routinely used by astronomers as probes of object type and astrophysical Fig. 1: A combined visible light-IR image from the Spitzer Space Telescope of the galaxy Messier-82 (M82), also known as the Cigar galaxy because of its cigar-like shape in visible light.The red region streaming away from the galaxy into intergalactic space traces the IR emission from PAHs.Credits: NASA/JPL-Caltech/C.Engelbracht (Steward Observatory) and the SINGS team.
processes.For example, the PAH IR signature is used as an indicator of star formation in high redshift galaxies [RPD + 14] and to differentiate between black hole and starburst engines in galactic nuclei [GLS + 98].Those astronomers who study star and planet formation use the IR PAH signature as an indicator of the geometry of circumstellar disks [MWB + 01] [BPM + 09].
PAHs are believed to form in the circumstellar ejecta of latetype stars, after which they become part of the ISM as the material travels away from the star.Over time, PAHs are incorporated into dense clouds, wherein they participate in ongoing chemistry and are eventually brought into newly-forming star and budding planetary systems.
They play important roles in circumstellar processes and the diffuse ISM by modulating radiation fields and influencing charge balance.Once incorporated into dense molecular clouds, they can dominate cloud cooling and promote H 2 formation.PAHs also control the large-scale ionization balance and thereby the coupling of magnetic fields to the gas.Through their influence on the forces supporting clouds against gravity, PAHs also affect the process of star formation itself.They are a major contributor to the heating of diffuse atomic gas in the ISM and thereby the physical conditions in such environments and its structure.
The unique properties of PAHs, coupled with their spectroscopic response to changing astrophysical conditions and their ability to convert UV photons to IR radiation, makes them powerful probes of astronomical objects at all stages of the stellar life cycle.Notably, they allow astronomers to probe properties of diffuse media in regions not normally accessible.
NASA Ames PAH IR Spectroscopic Database (PAHdb) The Astrophysics & Astrochemistry Laboratory at NASA Ames Research Center [NAS] provides data and tools for analyzing and interpreting astronomical PAH spectra.The NASA Ames PAH IR Spectroscopic Database (PAHdb; [BRBA18] [BBR + 14]) is the culmination of more than 30 years of laboratory and computational research carried out at the NASA Ames Research Center to test and refine the astronomical PAH model.PAHdb consists of three components (all under the moniker of "PAHdb"): the spectroscopic libraries, the website (see Figure 2), and the suite of off-line IDL 1 tools.PAHdb has the world's foremost collection of PAH spectra.
PAHdb is highly cited and is used to characterize and understand organic molecules in our own Galaxy and external galaxies.The database includes a set of innovative astronomical models and tools that enables astronomers to probe and quantitatively analyze the state of the PAH population.For instance, one can derive PAH ionization balance, size, structure, and composition and tie these to the prevailing local astrophysical conditions (e.g., electron density, parameters of the radiation field, etc.) [BBA16] [BBA18].

NASA's next great observatory for PAH research: JWST
The next great leap forward for IR astronomy is the the James Webb Space Telescope (JWST).JWST is NASA's next flagship observatory and the successor to the exceptionally successful Hubble Space Telescope (www.nasa.gov/hubble)and Spitzer Space Telescope (www.nasa.gov/spitzer).JWST is being developed through a collaboration between NASA, the European Space Agency (ESA) and the Canadian Space Agency (CSA).The telescope features a primary mirror with a diameter of 6.5 m and carries four science instruments.These instruments will observe the Universe with unprecedented resolution and sensitivity in the near-and mid-IR.The observatory is expected to launch early 2021.
As part of an awarded JWST Early Release Science (ERS) program 2 , we are developing a Python-based toolkit for quickly analyzing PAH emission in IR spectroscopic data pyPAHdb: a tool designed for JWST The purpose of pyPAHdb is to derive astronomical parameters directly from JWST observations, but the tool is not limited to JWST observations alone.pyPAHdb is the light version of a full suite of Python software tools 3 that is currently being developed, which is an analog of the off-line IDL tools 4 .A feature comparison is made in Table 1 (see also Section "The underlying PAH photophysics").pyPAHdb will enable PAH experts and non-experts    1: Feature comparison between pyPAHdb and the full suites of off-line IDL/Python tools.Note: NNLS is non-negative least squares; FWHM is full-width at half-maximum of an emission profile; "uncertainties" in this context refers to handling observational spectroscopic uncertainties.
alike to analyze and interpret astronomical PAH emission spectra.pyPAHdb analyzes spectroscopic observations (including spectral maps) and characterizes the PAH emission using a database-fitting approach, providing the PAH ionization and size fractions.
The package is imported using the following statement: import pypahdb 3. AmesPAHdbPythonSuite: github.com/PAHdb/AmesPAHdbPythonSuite 4. AmesPAHdbIDLSuite: github.com/PAHdb/AmesPAHdbIDLSuite(2) An over-sampled pre-computed matrix of PAH spectra is loaded and interpolated onto the wavelength grid of the astronomical observations.Database-fitting is performed using non-negative least-squares (NNLS), which yields the contribution of an individual PAH molecule to the total fit.As a result, we obtain a breakdown of the model fit in terms of PAH charge and size.
(3) The results are written to disk as a single FITS file and a PDF summarizing the model fit (one page per pixel, if a spectral cube is provided as input).
The general program methodology is encapsulated in the flowchart presented in Figure 3 and is as follows: (1) Read-in a file containing spectroscopic PAH observations of an astronomical object.This functionality is provided by the class observation, which is implemented in observation.py.It is the responsibility of the user to ensure all non-PAH emission components have been removed from the spectrum.The class uses a fallthrough try-except chain to attempt to read the given filename using the facilities provided by astropy.io.
The spectroscopic data is stored as a class attribute as a spectrum object, which holds the data in terms of abscissa and ordinate values using numpy arrays.The units associated with the abscissa and ordinate values are, in the case of a FITS file, determined from the accompanying header, which itself is also stored as a class attribute.The spectral coordinate system is interpreted from FITS header keywords following the specification by [GCVA06].The spectrum class is implemented in spectrum.pyand provides functionality to convert between different coordinate representations.Below is example Python code demonstrating the use of the class.
The file NGC7023-NW-BRIGHT.txt_pahs.txt in this demonstration can be found in the examples directory that is part of the pyPAHdb package.The output of the following code-block is shown in Figure 3. (2) Decompose the observed PAH emission into contributions from different PAH subclasses, here charge and size.This functionality is provided by the class decomposer, which is implemented in decomposer.py.The class takes as input a spectrum object, of which it creates a deep copy and calls its spectrum.convertunitstomethod to convert the abscissa units to wavenumber.Subsequently, a pre-computed numpy matrix of highly oversampled PAH emission spectra stored as a pickle is loaded from file.Utilizing numpy.interp, each of the PAH emission spectra, represented by a single column in the pre-computed matrix, is interpolated onto the frequency grid (in wavenumber) of the input spectrum.This process is parallelized using the multiprocessing package.optimize.nnls is used to perform a nonnegative least-squares (NNLS) fit of the pre-computed spectra to the input spectra.NNLS is chosen because it is appropriate to the problem, fast, and always converges.
The solution vector (weights) is stored as an attribute and considered private.Combining lazy instantiation and Python's @property, the results of the fit and the breakdown can be retrieved.In case the input spectrum represents a spectral cube and where possible, the calculations are parallelized across each pixel using, again, the multiprocessing package.Below is example code demonstrating the use of the class and extends the previous code-block.The output of the code-block is shown in Figure 3. (3) Produce output to file given a decomposer object.This functionality is provided by the class writer, which is implemented in writer.py,and serves to summarize the results from the decomposer class so that a user may assess the quality of the fit and store the PAH characteristics of their astronomical observations.The class uses astropy.fits to write the PAH characteristics to a FITS file and the matplotlib package to generate a PDF summarizing the results.The class will attempt to incorporate relevant information from any (FITS) header provided.Below is example code demonstrating the use of the class, which extends the previous code-block.The size breakdown part of the generated PDF output is shown in Figure 3.
pah.writer(result, header=obs.header) It is anticipated that pyPAHdb will constitute an effective and useful tool of an astronomer's toolbox, handling thousands of spectra.Therefore, performance is of importance.To measure performance, a spectral cube containing the PAH emission spectra at some 210 pixel locations is analyzed with pyPAHdb (see also Section "Demonstration").To put the measurement in context, it is compared to analyzing the same spectral cube using the off-line IDL tools.In this comparison the analysis with pyPAHdb is 15 times faster at four seconds as compared to using the IDL tools, when tested on a 2.8 GHz Intel Core i7 MacBook Pro with 16 GB of memory.

The underlying PAH photo-physics
To analyze astronomical PAH emission spectra with the absorption data contained in PAHdb's libraries, the PAHdb data need to be turned into emission spectra.As discussed in the previous section, pyPAHdb hides the underlying photo-physics in a precomputed matrix that is read-in by the decomposer class.The pre-computed matrix is constructed using the full Python suite and takes modeled, highly-over-sampled PAH emission spectra from version 3.00 of the library of computed spectra.This matrix uses the data on a collection of "astronomical" PAHs, which include those PAHs that have more than 20 carbon atoms, have no hetero-atom substitutions except for possibly nitrogen, have no aliphatic side groups, and are not fully dehydrogenated.In addition, the fullerenes C 60 and C 70 are added.
While several more sophisticated emission models are available in the full Python suite, here a PAH's emission spectrum is calculated from the vibrational temperature it reaches after absorbing a single 7 eV photon and making use of the thermal approximation (e.g., [STA93] and [VPM + 01]).Table 1 highlights some of the differences between pyPAHdb and the full suite of IDL/Python tools.
The spectral intensity I j (ν), in erg s -1 cm -1 mol -1 , from a mol of the j th PAH is thus calculated as: with ν the frequency in cm -1 , h Planck's constant in erg s, c the speed-of-light in cm s -1 , ν i the frequency of mode i in cm -1 , σ i the integrated absorption cross-section for mode i in cm mol -1 , k Boltzmann's constant in erg K -1 , T the vibrational temperature in K, and φ (ν) is the frequency dependent emission profile in cm.The sum is taken over all n modes and the emission profile is assumed Gaussian with a full-width at half-maximum (FWHM) of 15 cm -1 .Note that before applying the emission profile, a redshift of 15 cm -1 is applied to each of the band positions (ν i ) to mimic some anharmonic effects.This redshift value is currently the best estimate from laboratory experiments (see e.g., the discussion in [BBA13]).The vibrational temperature attained after absorbing a single 7 eV photon is calculated by the molecule's heat capacity.The heat capacity, C V in erg K, of a molecular system can be described in terms of isolated harmonic oscillators by: where g(ν) is known as the density of states and describes the distribution of vibrational modes.However due to the discrete nature of the modes, the density of states is just a sum of δfunctions: The vibrational temperature is ultimately calculated by solving: where E in is the energy of the absorbed photon-here this is 7 eV. ) from version 3.00 of the library of computed spectra of PAHdb.After applying the PAH emission model, but before the convolution with the emission profile, the blue spectrum is obtained.The final spectrum is shown in orange.For display purposes the profiles have been given a FWHM of 45 cm -1 .
In Python, in the full suite, Equation 4 is solved using rootfinding with scipy.optimize.brentq.The integral is calculated with scipy.optimize.quad.
Figure 4 illustrates the process on the spectrum of the coronene cation (C 24 H 12 + ), which reaches a vibrational temperature of 1406 K after absorbing a single 7 eV photon.

Demonstration
As a more sophisticated demonstration of pyPAHdb's utility, we analyze a spectral cube dataset of the reflection nebula NGC 7023, as constructed from Spitzer Space Telescope observations.This data cube is overlaid on a visible-light image of NGC 7023 from the Hubble Space Telescope in Figure 5, left panel [BBA18].
The spectral cube is aligned such that, in these observations, we observe the transition from diffuse, ionized/atomic species (e.g., HI) near the exciting star to dense, molecular material (e.g., H 2 ) more distant from the star.The transition zone between the two is the photodissociation region, where PAHs have a strong presence.The properties of the PAH molecules are known to vary across these boundaries, since they are exposed to harsh radiation in the exposed cavity of the diffuse zone, and shielded in the molecular region.

Summary
The data and tools provided through PAHdb have proven to be valuable assets for the astronomical community for analyzing and interpreting PAH emission spectra.The launch of JWST in 2021 will usher a new era of astronomical PAH research.In the context of an awarded JWST Early Release Science program, we are developing pyPAHdb as a key data analysis tool to facilitate quick and effective analysis of PAH emission spectra.While this tool is being developed with JWST in mind, it is not limited to JWST data: it currently supports spectra from Spitzer Space Telescope, Infrared Space Observatory, and any user-defined spectrum.pyPAHdb is in active development and will be finalized well before JWST's launch.

Fig. 2 :
Fig.2: Screenshot of the NASA Ames PAH IR Spectroscopic Database website located at www.astrochemistry.org/pahdb/.Shown here are the details and vibrational spectrum for the PAH molecule ovalene (C 32 H 14 ).Additionally, each vibrational transition is animated and can be inspected for ease of interpretation (shown in the lower-right).

Fig. 3 :
Fig. 3: pyPAHdb flowchart.(1)Astronomical spectroscopic data is loaded, whether represented in FITS or ASCII files.(2) An over-sampled pre-computed matrix of PAH spectra is loaded and interpolated onto the wavelength grid of the astronomical observations.Database-fitting is performed using non-negative least-squares (NNLS), which yields the contribution of an individual PAH molecule to the total fit.As a result, we obtain a breakdown of the model fit in terms of PAH charge and size.(3) The results are written to disk as a single FITS file and a PDF summarizing the model fit (one page per pixel, if a spectral cube is provided as input).

Fig. 4 :
Fig. 4: Demonstration of applying the simple PAH emission model as outlined in Equations 1-4 to the 0 K spectrum of coronene (in black; C 24 H 12 +) from version 3.00 of the library of computed spectra of PAHdb.After applying the PAH emission model, but before the convolution with the emission profile, the blue spectrum is obtained.The final spectrum is shown in orange.For display purposes the profiles have been given a FWHM of 45 cm -1 .

Fig. 5 :
Fig. 5: Left: An image of the reflection nebula NGC 7023 as obtained by the Hubble Space Telescope.Overlaid is a pixel grid representing a spectral cube of observations taken with the Spitzer Space Telescope; each pixel contains an infrared spectrum.In this figure, the exciting star is just beyond the lower left corner.We are observing a photodissociation region boundary: the material in the lower half of the figure is diffuse and exposed to the star; the material in the upper (right) half is molecular and shielded from the star.The diagonal boundary separating the two zones is clearly visible.PAHs are common in these environments.Figure adapted from [BBA18].Right: We display PAH ionization across the NGC 7023 (white grid in left panel), using pyPAHdb.Here, an ionization fraction of 1 means all PAHs are ionized, while 0 means all are neutral.Note that in the diffuse, exposed cavity (lower half) the PAHs are on average more ionized than in the denser molecular zone (upper half).