A Real-Time 3 D Audio Simulator for Cognitive Hearing Science

This paper describes the development of a 3D audio simulator for use in cognitive hearing science studies and also for general 3D audio experimentation. The framework that the simulator is built upon is pyaudio_helper, which is a module of the package scikit-dsp-comm. The simulator runs in a Jupyter notebook and makes use of Jupyter widgets for interactive control of audio source positioning in 3D space. 3D audio has application in virtual reality and in hearing assistive devices (HAD) research and development. At its core the simulator uses digital filters to represent the sound pressure wave propagation path from the sound source to each ear canal of a human subject. Digital filters of 200 coefficients each for left and right ears are stored in a look-up table as a function of azimuth and elevation angles of the impinging sound’s source.


Introduction
In cognitive hearing science binaural hearing models how sound pressure waves arrive at either ear drum, at the end of the ear canal, or in the case of typical measurements, at the entry to the ear canal, both as a function of the arrival angle in 3D (azimuth and elevation) and radial distance.A tutorial on 3-D audio can be found at [HCI].This leads to the need for the head related impulse response (HRIR) (time-domain) or head-related transfer function (HRTF) (frequency domain) for a particular human subject.Traditionally human subjects are placed in an anechoic chamber with a sound source placed at e.g. one meter from the head and then moved relative the subject's head over a range of azimuth and elevation angles, with the HRIR measured at each angle.The 3D simulator described here uses a database of HRIR's from the University of California, Davis, originally in the Center for Image Processing and Integrated Computing (CIPIC), [CIPICHRTF], to describe a given subject.In the pyaudio_helper application the HRIR at a given angle is represented by two (left and right ear) 200 coefficient digital filters that the sound source audio is passed through.Here the data base for each subject holds 25 azimuth and 50 elevation angles to approximate continuous sound source 3D locations.
Obtaining individual HRTFs is a challenge in itself and the subject of much research.
In a related research project deep learning is being investigated as a means to fit a human subject to the CIPIC HRTF database of subjects, based on 27 upper torso anthropometrics (measurements) of the subject.As a simple solution, we can also consider using a simple spherical head model, and its corresponding HRTF, which makes use of spherical harmonics to solve for the sound pressure magnitude and phase at any location on the sphere surface.A frequency sweep of magnitude and phase is then inverse Fourier transformed to obtain the HRIR.The ultimate intent of the simulator is to serve as a clinical tool for blind sound source localization experiments.Human subjects will be exposed to several different HRIR models, where at least one model is a personalized fit based on deep learning using anthropometrics and/or a finite element wave equation solution using a 3D rendering of the subject's shoulders and head.3D rendering of a subject can be obtained using photogrammetry, which estimates three-dimensional coordinates of points on an object from a collection of photographic images taken from different positions.

3D Geometry
To produce a synthesized 3D audio sound field, we start with a geometry where the center of the coordinate frame is the intersection between the subject's mid-sagittal or vertical median plane and the line connecting the left and right ear canals.This is referred to as being head-centered.The coordinate systems used in this paper are shown in Figure 1.The primary head-centered system has cartesian coordinates labeled (x, y, z) and associated cylindrical coordinates (r xy , φ az , h y ) (black labels in Figure 1).The cylindrical coordinates will be used in Jupyter notebook apps presented later as the interface for GUI controls to conveniently position the audio source about a subject's head.A secondary head-centered system, used by CIPIC, has cartesian coordinates labeled (x 1 , x 2 , x 3 ) and associated spherical coordinates (r, φ , θ ) (purple labels in Figure 1).The first coordinate system is motivated by [Fitzpatrick], and its usage is explained in detail in the section FIR Filter Coefficient Set Selection.The second system is referred to by CIPIC as the interaural-polar coordinate system (IPCS), which is used to index into the HRIR filter pairs which produce the right and left audio outputs.
The 3D audio rendering provided by the simulator developed in this paper relies on the 1250 HRIR measurements taken using the geometrical configuration shown in Figure 2. A total of 45 subjects are contained in the CIPIC HRIR database, both human Fig. 1: The primary head-centered coordinate system, (x, y, z), used in the 3D audio simulator, along with the secondary system, (x 1 , x 2 , x 3 ) used by CIPIC via IPCS and spherical coordinates (r, φ , θ ).

Real-Time Signal Processing
In this section we briefly describe the role real-time digital signal processing (DSP) plays in implementing the 3D audio simulator.A top level block diagram of the 3D audio simulator is shown in Figure 4.For an audio source positioned at (x, y, z) relative to the head center, the appropriate HRIR right and left channel digital filter coefficients are utilized along with gain scaling to account for radial distance relative to 1 m and a parallax correction factor.Gain scaling and parallax correction, are taken from [Fitzpatrick], and are explained in more detail in the following section of this paper.
To implement the filtering action we use the pyaudio_helper framework [Wickert] of Figure 5, which interfaces to the audio subsystem of a personal computer.The framework supports real-time signal processing, in particular filtering using core signal processing functions of scipy.signalIn mathematical terms we have the output signals that drive (1) To produce real-time filtering with pyaudio_helper requires [Wickert] (i) create an instance of the DSP_io_stream  class by assigning valid PC audio input and output device ports to it, (ii) define a callback function to process the input signal sample frames into right/left output sample frames according to (1), and (iii) call the method interactive_stream() to start streaming.All of the code for the 3D simulator is developed in a Jupyter notebook for prototyping ease.Since [Wickert] details steps (i)-(iii), in the code snippet below we focus on the key filtering expressions in the callback and describe the playback of a geometrically positioned noise source via headphones:

FIR Filter Coefficient Set Selection
To finally render 3D audio requires selection of the appropriate right/left filter coefficient set, and if needed range correction.For the special case of the source on the 1 m CIPIC reference sphere, we simply pick the coefficient set that lies closest to the desired IPCS angle pair (φ , θ ).
For the more typical case of the source range, r = x 2 + y 2 + z 2 = 1, more processing is required.The approach taken here follows the methodology of [Fitzpatrick] by using the primary cartesian coordinates of Figure 1 to additionally perform parallax correction and source range amplitude correction.At distance r from a point source the sound wave energy diverges by 1/r 2 , so in terms of wave amplitude we include a scale factor of 1/r.Here the inverse distance correction also takes into account the fact that the entry to the ear canal is offset from the head center by the mean head radius R. The second correction factor is parallax, which is graphically depicted in Figure 6 for the special case of a source in the horizontal plane and directly in front of the head.Both corrections are addressed in detail in [Fitzpatrick].For a source not on the unit sphere, sound parallax requires an adjustment in the HRIR coefficients, unique to the right and left ears.If we extend rays from the right and left ears that pass through the sound source location and then touch the unit sphere, the required azimuth values will be shifted to locations on either side of the true source azimuth.The corresponding HRIR values where these rays contact the unit sphere, respectively, perform the needed parallax correction.The actual database entries utilized are those that are closest to the intersection points.The class ss_mapping2CIPIChrif() takes the source location, (x, y, z), and using the single method cart2ipcs(self,x,y,z), produces the parallax corrected right and left HRIR filter coefficients and range amplitude scaling factors.The code is listed below: class ss_mapping2CIPIChrir(object): """ A class for sound source mapping to the CIPIC HRIR database CIPIC uses the interaural polar coordinate system (IPCS).The reference sphere for the head-related transfer function (HRTF) measurements/head-related impulse response """ Map cartesian source coordinates (x,y,z) to the CIPIC interaural polar coordinate system (IPCS) for easy access to CIPIC HRIR.Parallax error is also dealt with so two azimuth values are found.To fit IPCS the cartesian coordinates are defined as follows: (0,0,0) <--> center of head.
(1,0,0) <--> unit vector pointing outward from the right on a line passing from left to right through the left and right ear (pinna) ear canals (0,1,0) <--> unit vector pointing out through the top of the head.(0,0,1) <--> unit vector straight out through the back of the head, such that a right-handed coordinate system is formed.
Mark Wickert June 2018, updated June 2019 """ # First solve for the parameter t, which is used # to describe parametrically the location of the # source at (x,y,z) on a line connecting the # right or left ear canal entry point to the # unit sphere.In the __init__ method all the right left filter coefficients for the chosen subject database entry are copied into class attributes and look-up tables (LUTs) are populated in terms of IPCS angles to ease selecting the needed right/left filters.Note in particular the scale factors self.tR and self.tL are the inverse distance wave amplitude correction factors representing G R and G L in (1) and (2), respectively.

3D Audio Simulator Notebook Apps
For human subject testing and general audio virtual reality experiments, two applications (apps) that run in the Jupyter notebook were created.The first allows the user to statically locate an audio source in space, while the second creates a time-varying motion audio source.For human subject tests the static source is of primary interest.Both apps have a GUI slider interface that use the cylindrical coordinates described in Figure 1 to control the position the source.

Static Sound Source
The first and foremost purpose of the 3D audio simulator is to be able to statically position an audio source and then ask a human subject where the source is located (localization).This is a cognitive experiment, and can serve many purposes.One purpose in the present research is to to see how well the HRIR utilized in the simulator matches the subject's true HRIR.As mentioned in the introduction, an ongoing study is to estimate an individualized HRIR using deep machine learning/deep learning.The Jupyter Widgets slider interface for this app is shown in Figure 7 Dynamic Sound Source Along a Trajectory From a virtual reality perspective, we were also interested in giving a subject a moving sound source experience via headphones.In this case we consider an orbit like sound source trajectory.The trajectory as shown in Figure 8, is a circular orbit with parameters of roll, pitch, and hight, relative to the ear canal centerline.The Jupyter Widgets slider interface for this app is shown in Figure 9.
Listeners Head

Spherical Head Model as a Simple Reference HRIR
In blind testing of human subjects it is also of interest to offer other HRIR solutions, e.g., the [KEMAR] mannequin head or a simple spherical head [Duda] and [Bogelein].In this section we consider a spherical head model with the intent of using the results of [Duda] to allow the construction of a CIPIC-like database entry, that can be used in the 3D audio simulator described earlier in this paper.

General Pressure Wave Solution
As a starting point, the acoustics text [Beranek], provides a solution for the resultant sound pressure at any point in space when a sinusoidal plane wave sound pressure source impinges upon a rigid sphere of radius R, centered at the coordinate system origin.Rotationally symmetric spherical coordinates, r and θ are appropriate here.First consider the incident plane wave pI (r, θ ), in the expansion pI (r, where θ i is the incidence angle between the plane wave and measurement point, P m (x) is the nth-order Legendre polynomial, j n (x) is the nth-order spherical Bessel function of the first kind, k = 2π f /c is the wavenumber, with f frequency in Hz and c = 344.4m/s the propagation velocity in air.We set the incident wave complex pressure p0 = 1∠0 • for convenience.Finally, solve for the scattered wave, ps (r, θ i ), by applying boundary conditions, see [Beranek] for details.The resultant wave is the sum of the incident and scattered waves as given below: where j n (x) is the spherical Bessel function of the first kind derivative, h n (kr) is the nth-order spherical Hankel function of the second kind, and h (2) n (kr) is the corresponding derivative.Figure 10 shows the pressure magnitude at 2000 Hz for R = 8.75 cm, for the plane wave traveling along the +z − axis.For plotting convenience, the axes z and w = x 2 + y 2 are cylindrical coordinates, as the sphere has axial symmetry.To be clear z and w are related to the original spherical coordinates of the math model by r = √ w 2 + z 2 and cos θ i = z/ √ w 2 + z 2 .The calculations required to evaluate (4), and thus create the plot of Figure 10, conveniently make use of functions in scipy.special.This is shown in the code listing below: def pS(w, z, f, R = 0.0875, N = 50): """ Scattered field from a rigid sphere w = radial comp in cylind coord Approximate ear canal locations on sphere at in the horizontal plane ( set-back from front) 12: Solving for the angle between the source and a ray extending from the right and left ears, also showing a set back of the ear canal by ±100 • from the from the font of the head.
Subject HRIR for Source φ az = 130 °,h r = 0,r xz = 1m Fig. 14: Example right/left HRIR plots for a particular arrival angle pulled from the CIPIC-like database entry created for a radius 8.75 cm sphere.
Casual listening tests with a coarse fit human subject from CIPIC and the simple spherical model, indicate both similarities and differences.Coarse localization is similar between the two, but the spherical model seems sterile, that is the sound seems unnatural.The fact that coarse localization is present is an indication that the database is correct.Additional testing is planned.

Conclusions and Future Work
Development of the real-time signal processing aspect of the 3D audio simulator was a relatively simple task.This is a perfect application for the pyaudio_helper code module of scikit-dsp-comm.Working through the details of the coordinate transformations, and gain and parallax corrections on the geometry side, was a more complex undertaking.Likewise, working with the spherical head model calculations, first in the frequency domain (HRIR), and then the time domain (HRIR), was the most complex.The fact that recursions can be used to evaluate the needed special functions for sound pressure on the surface of a sphere, makes the generation of a CIPIC-like database entry take only a few minutes of compute time.
Informal testing of human subjects has gone well.Precise localization experiments using the static app have not been attempted just yet, as a formal pool human subjects has yet to be gathered.The virtual reality aspects of the dynamic app have received many positive comments from informal testing with those interested in 3D audio.
For future research, this simulator will be used to evaluate personalized HRIR fitting to human subjects, based on their upper torso anthropometrics.For the case of the spherical head, it is of interest to consider alternative HRIR grids.The 1 m radius 1250 point grid of angle pairs is no longer a limitation.For close range sound localization a different grid of angle pairs and with r < 1 m, can be used.This would make filter switching on the real-time DSP side of things finer grained, and hence more natural.
The Jupyter notebooks used in the analysis and development of this paper can be found on GitHub [3D_Audio].This will give open access to anyone interested in trying out the simulator.

Fig. 2 :
Fig. 2: The CIPIC audio source locations, effectively on a 1 m radius sphere, used to obtain 1250 HRIR measurements for each of 45 subjects (only the right hemisphere locations shown).

Fig. 3 :
Fig. 3: Example right/left HRIR plots for a particular arrival angle pulled from CIPIC for subject 165.
[ScipySignal].The 200 coefficients of the right and left HRIR are equivalent to the coefficients in a finite impulse response (FIR) digital filter which produce a discrete-time output signal or sequence y R [n]/y L [n] from a single audio source signal x[n].All of the signals are processed with at a sampling rate of f s = 44.1 kHz, as this is rate used in forming the CIPIC database.

Fig. 4 :
Fig. 4: Real-time DSP filtering with coefficients determined by the audio source (x, y, z) location.

Fig. 5 :
Fig. 5: The pyaudio_helper framework for real-time DSP in the Jupyter notebook.

Fig. 6 :
Fig. 6: Parallax correction geometry for three possible source locations in the horizontal plane: A < 1 m, B = 1 m, and C > 1 m, directly in front of the head.

Fig. 7 :
Fig. 7: Jupyter notebook for static positioning of the audio test source.

Fig. 8 :
Fig. 8: The sound source trajectory utilized in the dynamic sound source app.

Fig. 9 :
Fig. 9: Jupyter notebook for setting the parameters of a sound source moving along a trajectory with prescribed motion characteristics.

Fig. 13 :
Fig. 13: A block diagram depicting the steps involved in calculating the HRIR right and left channel impulse responses, h R [n] and h L [n],starting from CIPIC source angles, (θ CIPIC , φ CIPIC ), ear canal setback angles, (φ R , φ L ), and the sphere radius R.