'''
Intermediate numpy
- array types (coersion rules used in ufuncs)
- array scalars
- broadcasting
- advanced indexing
- the where function
- examples of avoiding loops
- ieee special numbers and error handling
- masked arrays
Documentation on most of these can be found in the numpy.doc module docstrings
(except the loop examples)
************************
Array types: type coercion rules used in expressions (verbal discussion only)
Main point is that rules are different for array, array binary operators than
array, scalar combinations.
************************
Array scalars
Indexing single elements does NOT return Python scalars, but instead
"array scalars" which act in all respects (except for precision and
type identity) like scalars, but also support array methods.
************************
Broadcasting
>>> import numpy as np
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
General Broadcasting Rules
==========================
When operating on two arrays, NumPy compares their shapes element-wise.
It starts with the trailing dimensions, and works its way forward. Two
dimensions are compatible when
1) they are equal, or
2) one of them is 1
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
Here are some more examples::
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5
Here are examples of shapes that do not broadcast::
A (1d array): 3
B (1d array): 4 # trailing dimensions do not match
A (2d array): 2 x 1
B (3d array): 8 x 4 x 3 # second from last dimensions mismatch
An example of broadcasting in practice::
>>> x = np.arange(4)
>>> xx = x.reshape(4,1)
>>> y = np.ones(5)
>>> z = np.ones((3,4))
>>> x.shape
>>> y.shape
>>> x + y # not broadcastable
>>> xx.shape
>>> y.shape
>>> (xx + y).shape # broadcastable
>>> xx + y
>>> x.shape
>>> z.shape
>>> (x + z).shape # broadcastable
>>> x + z
Outer sum example
>>> a = np.array([0.0, 10.0, 20.0, 30.0])
>>> b = np.array([1.0, 2.0, 3.0])
>>> a[:, np.newaxis] + b # newaxis adds a new axis of size 1 to make it match
****************************
Zen of numpy
Avoid many loops over arrays. Often none are needed if you just use the right
array tools.
Extreme example:
>>> import pyfits
>>> import time
>>> import numpy as np
>>> im = pyfits.getdata('acs.fits')
>>> im.shape
(2048, 4096)
>>> sum = 0.
>>> def sum1(data):
... sum = 0.
... for j in range(data.shape[0]):
... for i in range(data.shape[1]):
... sum = sum + data[j,i]
... return sum
>>> t0=time.time(); sum1(im); print time.time()-t0
CPU times: user 9.64 s, sys: 0.12 s, total: 9.76 s
Wall time: 9.86 s
2253506224.15
>>> t0=time.time(); im.sum(); print time.time()-t0
CPU times: user 0.15 s, sys: 0.00 s, total: 0.15 s
Wall time: 0.15 s
Example 1: Linear interpolation
(simple)
Given an array (x) of independent variable values,
and a corresponding array of function values (y) defining
a sampled function, for an array of various x values (xx), compute
the corresponding interpolated values for the sampled function.
>>> x = np.arange(10.)*2 # 10 x values from 0 to 18
>>> y = x**2 # sample a parabola
>>> import numpy.random as ra
>>> xx = ra.uniform(0., 18., size=(100,)) # 100 random numbers between 0 and 18
>>> xind = np.searchsorted(x, xx) - 1 # indices where x is the next value
>>> # below that for xx
>>> xfract = (xx - x[xind])/(x[xind+1]-x[xind]) # fractional distance for the
>>> # particular xx value between the
>>> # bracketing x values
>>> yy = y[xind] + xfract*(y[xind+1]-y[xind]) # interpolate
Example 2: Radial profile example
(trickier) Illustrates how to deal with ordering and binning issues
Compute the average radial profile for an image (a galaxy in this case).
By that, what is the average image intensity as a function of radius from
the specified point.
>>> im = pyfits.getdata('pix.fits') # use the M51 image
>>> # first compute the radius for each pixel in the image
>>> y, x = np.indices(im.shape) # y, x have same shape as im, and values in each
>>> # are their respective y and x locations.
>>> r = np.sqrt((x-257.)**2 + (y-258.)**2) # radius for each pixel in im
>>> ind = np.argsort(r.flat) # indices for sorted radii (need to use with im)
>>> sr = r.flat[ind] # the sorted radii
>>> sim = im.flat[ind] # image values sorted by radii
>>> ri = sr.astype(np.int16) # integer part of radii (bin size = 1)
>>> # The particularly tricky part, must average values within each radii bin
>>> # Start by looking for where radii change values
>>> deltar = ri[1:] - ri[:-1] # assume all radii represented (more work if not)
>>> rind = np.where(deltar)[0] # location of changed radius
>>> nr = rind[1:] - rind[:-1] # number of pixels in radius bin
>>> csim = np.cumsum(sim, dtype=np.float64) # cumulative sum for increasing radius
>>> # total in one bin is simply difference between cumulative sum for adjacent bins
>>> tbin = csim[rind[1:]] - csim[rind[:-1]] # sum for image values in radius bins
>>> radialprofile = tbin/nr # compute average for each bin
>>> import pylab as pl
>>> pl.semilogy(radialprofile)
Sorting makes this slower than a corresonding C version would be, but not
horribly so.
Example 3: Random ensemble simulation example
Illustrates how to deal with varying numbers of iterations for different array
elements.
Given a coin with probability of heads of p, what are the odds of tossing
two heads in a row before 2 tails? An analytical solution is possible, but do this
by Monte Carlo technique.
>>> p = 0.4 # probability of heads
>>> n = 1000000 # initial ensemble size
>>> n2heads = 0; n2tails = 0 # totals for each outcome
>>> import numpy.random as ra
>>> prev = ra.rand(n) < p # true if heads, false if not
>>> while n > 0:
... next = ra.rand(n) < p
... notdone = (next != prev) # where they are equal, we have two in a row!
... n2heads += np.sum(prev & next, dtype=np.int64) # careful about sums over bools!
... n2tails += np.sum(np.logical_not(prev | next), dtype=np.int64)
... prev = next[notdone] # keep only those not finished and iterate
... n = len(prev)
... print n
...
>>> n2heads, n2tails
(336967, 663033)
Analytical result is: p**2*(1+q)/(1-p*q) where q = 1-p
For p=0.4, expected answer is 336842.105
The basic technique is to winnow the list of cases down as the end condition is met.
The only looping is over coin tosses, not elements in the array. There is some
inefficiency when the number of remaining cases is small (so if there is a long
tail of few remaining cases, then this will be more inefficient)
Example 4: Thermal diffusion solution example
Solve Laplace's equation iteratively. Construct a 500x500 grid, set a 100x100
pixel boundary condition of 100 at the center and 0 at the edges.
>>> grid = np.zeros((500,500), np.float32)
>>> prevgrid = grid.copy()
>>> grid[200:300,200:300] = 100.
>>> done = False
>>> i = 0
>>> while not done:
... i += 1
... prevgrid[:,:] = grid
... # new value is average of 4 neighboring values
... grid[1:-1,1:-1] = 0.25*(
... grid[:-2,1:-1]
... + grid[2:,1:-1]
... + grid[1:-1,:-2]
... + grid[1:-1,2:])
... grid[200:300,200:300] = 100. # restore boundary condition
... diffmax = np.abs(grid - prevgrid).max()
... print i, diffmax
... if diffmax < 0.1: done = True
...
>>> pl.clf(); pl.imshow(grid)
Example 5: Finding nearest neighbors
Given a list of coordinates (e.g., x, y Cartesian) find the coordinate nearest to
each coordinate in the list. Illustrate with small example:
>>> x = np.array([1.1, 1.8, 7.3, 3.4])
>>> y = np.array([2.3, 9.3, 1.5, 5.7])
>>> # compute all possible combinations
>>> deltax = x - np.reshape(x, (len(x),1))
>>> deltay = y - np.reshape(y, (len(y),1))
>>> dist = np.sqrt(deltax**2 + deltay**2)
>>> dist = dist + np.identity(len(x)) * dist.max() # eliminate self matching
>>> # dist is the matrix of distances from one coordinate to any other
>>> print np.argmin(dist, axis=0) # the closest poionts corresponding to each coord
[3, 3, 3, 1]
Notes:
Scales poorly with size, both in speed and memory (n**2)
Standard problem for which many have better algorithms for large data sets
*************************
Numerical error handling.
IEEE-754 floating point special values:
Special values defined in numpy: nan, inf,
NaNs can be used as a poor-man's mask (if you don't care what the original value was)
Note, cannot use equality to test NaNs. E.g., np.where(myarr == np.nan)
nan == nan is always False! Use special numpy functions instead.
>>> np.nan == np.nan
False
>>> myarr = np.array([1., 0., np.nan, 3.])
>>> myarr[myarr == np.nan] = 0. # doesn't work
>>> myarr
array([ 1., 0., NaN, 3.])
>>> myarr[np.isnan(myarr)] = 0. # use this instead find
>>> myarr
array([ 1., 0., 0., 3.])
Other related special value functions:
isinf(): True if value is inf
isfinite(): True if not nan or inf
nan_to_num(): Map nan to 0, inf to max float, -inf to min float
The following corresponds to the usual functions except that nans are excluded from
the results.
nansum()
nanmax()
nanmin()
nanargmax()
nanargmin()
>>> x = np.arange(10.)
>>> x[3] = np.nan
>>> x.sum()
nan
>>> np.nansum(x)
42.0
How numpy handles numerical exceptions
Default is to "warn"
But this can be changed, and it can be set individually for different kinds
of exceptions. The different behaviors are:
'ignore' : ignore completely
'warn' : print a warning (once only)
'raise' : raise an exception
'call' : call a user-supplied function (set using seterrcall())
These behaviors can be set for all kinds of errors or specific ones:
all: apply to all numeric exceptions
invalid: when NaNs are generated
divide: divide by zero (for integers as well!)
overflow: floating point overflows
underflow: floating point underflows
These behaviors are set on a per-thead basis
Examples:
>>> oldsettings = np.seterr(all='warn')
>>> np.zeros(5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = np.seterr(under='ignore')
>>> np.array([1.e-100])**10
>>> j = np.seterr(invalid='raise')
>>> np.sqrt(np.array([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
... print "saw stupid error!"
>>> np.seterrcall(errorhandler)
>>> j = np.seterr(all='call')
>>> np.zeros(5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = np.seterr(**oldsettings) # restore previous error handling settings
*****************************
Masked Arrays
Uses 2 arrays to represent the data, one with the data, the other is a boolean mask
True or nonzero values in the mask indicate bad data. The mask propagates
automatically through ufuncs and other functions. numpy.ma contains masked array
related utility functions. Limited use of masked arrays by other modules (e.g.,
matplotlib)
>>> marr = np.ma.masked_array([1,2,3,4,5], mask=[0,0,1,0,0])
>>> marr
masked_array(data = [1 4 -- 16 25],
mask = [False False True False False],
fill_value=999999)
>>> marr[0]
1
>>> marr[2]
masked_array(data = --,
mask = True,
fill_value=1e+20)
>>> np.ma.isMaskedArray(marr)
True
>>> pl.clf(); pl.plot(marr)
'''