'''
Setup
>>> import pyfits
>>> import pylab as pl
>>> import numpy as np

*************************

Memory management issues

1) memory management can be trickier with numpy than with matlab or IDL
   - simply deleting or reassigning variable pointing an array usually frees memory
     >>> im = pyfits.getdata('pix.fits')
     >>> del im # frees image memory  
     >>> im = 0 # so does this
     
   - additional references complicate this:
     >>> im = pyfits.getdata('pix.fits')
     >>> sim = im
     >>> im = 0 # image still in memory since sim refers to it
     >>> sim = 0 # now it is freed
     
   - as do views:
     >>> im = pyfits.getdata('pix.fits')
     >>> sim = im[:100,:50]
     >>> im = 0 # memory not freed yet
     >>> sim = 0 # now it is

   - use in other containers
     >>> im = pyfits.getdata('pix.fits')
     >>> mylist = [1, 'hello', im]
     >>> im = 0 # not freed since it is still in list
     >>> mylist[2] = 0 # now freed
     
   - or use with other modules that save references
     >>> im = pyfits.getdata('pix.fits')
     >>> pl.imshow(im.astype(np.float))
     >>> im = 0 # not freed since pylab still refers to it internally
     >>> pl.cla() # now it is freed
     
Note that IPython retains many references to results so if you are using much 
memory, IPython may complicate releasing it.

**************************************

Memory fragmentation may be a problem (though we've never seen it in our code).
- solution: resuse existing arrays within loops, 
  - e.g., im[:,:] = newdata # if the same shape
- use output arg for ufuncs.
  - e.g. instead of "arr1 + arr2"
    - np.add(arr1, arr2, arr2) # puts result in arr2 instead of creating new array
      - prone to type downcasting problems
      - leads to ugly expressions
      
Vectorizing algorithms may lead to large memory use with many temporary large arrays
- Solution: loop through large sub arrays (few iterations)

************************

most of the following comes from the numpy.doc module (to appear in v1.2)

=====================================
Structured Arrays (aka Record Arrays)
=====================================

Introduction
============

Numpy provides powerful capabilities to create arrays of structs or records.
These arrays permit one to manipulate the data by the structs or by fields of
the struct. A simple example will show what is meant.: 

>>> x = np.zeros((2,),dtype=('i4,f4,a10'))
>>> x[:] = [(1,2.,'Hello'),(2,3.,"World")]
>>> x
array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],
  dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])

Here we have created a one-dimensional array of length 2. Each element of
this array is a record that contains three items, a 32-bit integer, a 32-bit
float, and a string of length 10 or less. If we index this array at the second
position we get the second record: ::

>>> x[1]
(2,3.,"World")

The interesting aspect is that we can reference the different fields of the
array simply by indexing the array with the string representing the name of
the field. In this case the fields have received the default names of 'f0', 'f1'
and 'f2'.

>>> y = x['f1']
>>> y
array([ 2.,  3.], dtype=float32)
>>> y[:] = 2*y
>>> y
array([ 4.,  6.], dtype=float32)
>>> x
array([(1, 4.0, 'Hello'), (2, 6.0, 'World')],
      dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])

In these examples, y is a simple float array consisting of the 2nd field
in the record. But it is not a copy of the data in the structured array,
instead it is a view. It shares exactly the same data. Thus when we updated
this array by doubling its values, the structured array shows the
corresponding values as doubled as well. Likewise, if one changes the record,
the field view changes: ::

>>> x[1] = (-1,-1.,"Master")
>>> x
array([(1, 4.0, 'Hello'), (-1, -1.0, 'Master')],
   dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
>>> y
array([ 4., -1.], dtype=float32)

Defining Structured Arrays
==========================

The definition of a structured array is all done through the dtype object.
There are a **lot** of different ways one can define the fields of a
record. Some of variants are there to provide backward compatibility with
Numeric or numarray, or another module, and should not be used except for
such purposes. These will be so noted. One defines records by specifying
the structure by 4 general ways, using an argument (as supplied to a dtype
function keyword or a dtype object constructor itself) in the form of a:
1) string, 2) tuple, 3) list, or 4) dictionary. Each of these will be briefly
described.

1) String argument (as used in the above examples).
In this case, the constructor is expecting a comma
separated list of type specifiers, optionally with extra shape information.
The type specifiers can take 4 different forms: ::

  a) b1, i1, i2, i4, i8, u1, u2, u4, u8, f4, f8, c8, c16, a<n>
     (representing bytes, ints, unsigned ints, floats, complex and
      fixed length strings of specified byte lengths)
  b) int8,...,uint8,...,float32, float64, complex64, complex128
     (this time with bit sizes)
  c) older Numeric/numarray type specifications (e.g. Float32).
     Don't use these in new code!
  d) Single character type specifiers (e.g H for unsigned short ints).
     Avoid using these unless you must. Details can be found in the
     Numpy book

These different styles can be mixed within the same string (but why would you
want to do that?). Furthermore, each type specifier can be prefixed
with a repetition number, or a shape. In these cases an array
element is created, i.e., an array within a record. That array
is still referred to as a single field. An example: ::

>>> x = np.zeros(3, dtype='3int8, float32, (2,3)float64')
>>> x
array([([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
       ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
       ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])],
       dtype=[('f0', '|i1', 3), ('f1', '>f4'), ('f2', '>f8', (2, 3))])

By using strings to define the record structure, it precludes being
able to name the fields in the original definition. The names can
be changed as shown later, however.

2) Tuple argument: The only relevant tuple case that applies to record
structures is when a structure is mapped to an existing data type. This
is done by pairing in a tuple, the existing data type with a matching
dtype definition (using any of the variants being described here). As
an example (using a definition using a list, so see 3) for further
details): ::

>>> x = np.zeros(3, dtype=('i4',[('r','u1'), ('g','u1'), ('b','u1'), ('a','u1')]))
>>> x
array([0, 0, 0])
>>> x['r']
array([0, 0, 0], dtype=uint8)

In this case, an array is produced that looks and acts like a simple int32 array,
but also has definitions for fields that use only one byte of the int32 (a bit
like Fortran equivalencing).

3) List argument: In this case the record structure is defined with a list of
tuples. Each tuple has 2 or 3 elements specifying: 1) The name of the field
('' is permitted), 2) the type of the field, and 3) the shape (optional).
For example:

>>> x = np.zeros(3, dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))])
>>> x
array([(0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
       (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
       (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]])],
       dtype=[('x', '>f4'), ('y', '>f4'), ('value', '>f4', (2, 2))])

4) Dictionary argument: two different forms are permitted. The first consists
of a dictionary with two required keys ('names' and 'formats'), each having an
equal sized list of values. The format list contains any type/shape specifier
allowed in other contexts. The names must be strings. There are two optional
keys: 'offsets' and 'titles'. Each must be a correspondingly matching list to
the required two where offsets contain integer offsets for each field, and
titles are objects containing metadata for each field (these do not have
to be strings), where the value of None is permitted. As an example: ::

>>> x = np.zeros(3, dtype={'names':['col1', 'col2'], 'formats':['i4','f4']})
>>> x
array([(0, 0.0), (0, 0.0), (0, 0.0)],
      dtype=[('col1', '>i4'), ('col2', '>f4')])

The other dictionary form permitted is a dictionary of name keys with tuple
values specifying type, offset, and an optional title.

>>> x = np.zeros(3, dtype={'col1':('i1',0,'title 1'), 'col2':('f4',1,'title 2')})
array([(0, 0.0), (0, 0.0), (0, 0.0)],
      dtype=[(('title 1', 'col1'), '|i1'), (('title 2', 'col2'), '>f4')])

Accessing and modifying field names
===================================

The field names are an attribute of the dtype object defining the record structure.
For the last example: ::

>>> x.dtype.names
('col1', 'col2')
>>> x.dtype.names = ('x', 'y')
>>> x
array([(0, 0.0), (0, 0.0), (0, 0.0)],
     dtype=[(('title 1', 'x'), '|i1'), (('title 2', 'y'), '>f4')])
>>> x.dtype.names = ('x', 'y', 'z') # wrong number of names
<type 'exceptions.ValueError'>: must replace all names at once with a sequence of length 2

Accessing field titles
====================================

The field titles provide a standard place to put associated info for fields.
They do not have to be strings.

>>> x.dtype.fields['x'][2]
'title 1'

****************************

Object arrays

Permit all the array structural operations (indexing, shaping, etc), but
not numerical (e.g., no ufuncs)

>>> fo = open('pix.fits')
>>> objarr = np.array(['hello', fo])
>>> objarr
array([hello, <open file 'pix.fits', mode 'r' at 0x506d890>], dtype=object)

Character arrays

Fixed width string fields. Difference from record array character fields is that
they support standard string methods. E.g., 

>>> strarr = np.char.array(['hello', 'cruel', 'dastardly', 'world'])
>>> strarr
chararray(['hello', 'cruel', 'dastardly', 'world'], 
      dtype='|S9')
>>> strarr.find('o')
array([ 4, -1, -1,  1])

****************************

Example of data exchange with PIL

Illlustrates how separate modules can support data interchange without having 
explicit dependencies on each other

>>> from PIL import Image
>>> pim = Image.open('lenna.png')
>>> im = np.array(pim) # using the array interface to seamlessly convert PIL image
>>> pl.clf(); pl.imshow(im[::-1,:,:])

That was easy...

****************************

Interfacing to C/C++/Fortran

So many choices...

Only going to survey the choices. Little detail on how each works.

1) Bare metal, wrap your own C-code manually.
- Plusses:
  - Efficient
  - No dependencies on other tools
- Minuses:
  - Lots of learning overhead:
    - need to learn basics of Python C API
    - need to learn basics of numpy C API
    - need to learn how to handle reference counting and love it.
  - Reference counting often difficult to get right.
    - getting it wrong leads to memory leaks, and worse, segfaults
  - API will change for Python 3.0!

2) pyrex
- Plusses:
  - avoid learning C API's
  - no dealing with reference counting
  - can code in psuedo python and generate C code
  - can also interface to existing C code
  - should shield you from changes to Python C api
  - become pretty popular within Python community
- Minuses:
  - Can write code in non-standard form which may become obsolete
  - Not as flexible as manual wrapping
  - Maintainers not easily adaptable to new features
  
Thus:

3) cython - fork of pyrex to allow needed features for SAGE
  - being considered as the standard scipy/numpy wrapping tool
  - fast indexing support for arrays
  
4) ctypes
- Plusses:
  - part of Python standard library
  - good for interfacing to existing sharable libraries, particularly
    Windows DLLs
  - avoids API/reference counting issues
  - good numpy support: arrays have all these in their ctypes attribute:
  
  a.ctypes.data              a.ctypes.get_strides
  a.ctypes.data_as           a.ctypes.shape
  a.ctypes.get_as_parameter  a.ctypes.shape_as
  a.ctypes.get_data          a.ctypes.strides
  a.ctypes.get_shape         a.ctypes.strides_as
  
- Minuses:
  - can't use for writing code to be turned into C extensions, only a wrapper tool.
    
5) SWIG (automatic wrapper generator)
- Plusses:
  - around a long time
  - multiple scripting language support
  - C++ support
  - Good for wrapping large (many functions) existing C libraries
- Minuses:
  - generates lots of code between Python and the C code
    - can cause performance problems that are nearly impossible to optimize out
  - interface files can be hard to write
  - doesn't necessarily avoid reference counting issues or needing to know API's
  
7) Weave
- Plusses:
  - Phenomenal tool
  - can turn many numpy expressions into C code
  - dynamic compiling and loading of generated C code
  - can embed pure C code in Python module and have weave extract, generate interfaces
    and compile, etc.
- Minuses:
  - Future uncertain--lacks a champion
  
8) Psyco
- Plusses:
  - Turns pure python into efficient machine code through jit-like optimizations
  - very fast when it optimizes well
- Minuses:
  - Only on intel (windows?)
  - Doesn't do much for numpy?
  
Fortran: Clear choice is f2py. (Pyfort is an older alternative, but not supported
any longer)

C++:

1) CXX
2) Boost.python
3) SWIG
4) Sage has used cython to wrap C++ (not pretty, but it can be done)
5) SIP (used mainly in PyQT)
     
'''