Tutorials Schedule:

The Tutorials Schedule (July 16th & 17th) is in its final stages of confirmation. There may be changes made to the schedule between now and the conference.

Monday - July 16th
Time Room 105 Room 106
08:00 AM - 12:00 PMIntroductory/Intermediate
Introduction to NumPy and Matplotlib
Video
- Jones, Eric
Advanced
scikit-learn
- Vanderplas, Jake
01:00 PM - 05:00 PMIntroductory/Intermediate
HDF5 is for lovers
Video
- Scopatz, Anthony
Advanced
Advanced Matplotlib
- May, Ryan

[ back to top ]

Introduction to NumPy and Matplotlib - Eric Jones

Bio

Eric has a broad background in engineering and software development and leads Enthought's product engineering and software design. Prior to co-founding Enthought, Eric worked with numerical electromagnetics and genetic optimization in the Department of Electrical Engineering at Duke University. He has taught numerous courses on the use of Python for scientific computing and serves as a member of the Python Software Foundation. He holds M.S. and Ph.D. degrees from Duke University in electrical engineering and a B.S.E. in mechanical engineering from Baylor University

Description

NumPy is the most fundamental package for scientific computing with Python. It adds to the Python language a data structure (the NumPy array) that has access to a large library of mathematical functions and operations, providing a powerful framework for fast computations in multiple dimensions. NumPy is the basis for all SciPy packages which extends vastly the computational and algorithmic capabilities of Python as well as many visualization tools like Matplotlib, Chaco or Mayavi.

This tutorial will teach students the fundamentals of NumPy, including fast vector-based calculations on numpy arrays, the origin of its efficiency and a short introduction to the matplotlib plotting library. In the final section, more advanced concepts will be introduced including structured arrays, broadcasting and memory mapping.

Outline

Required Packages

It requires python 2.6+ or 3.1+, NumPy 1.6.1+, iPython 0.11+, and matplotlib 1.0+ to be installed on your laptop. All these packages are available in various one-click installers including EPDFree.

In addition:

  1. Download and unpack the tutorial files.
    [ introduction_numpy_matplotlib.zip | 5.7MB ]
  2. To test if your installation is working, follow the indications on page 7 of the manual. The speed of light folder is inside the class folder inside student/demo/speed_of_light/

[ back to top ]

scikit-learn - Jake Vanderplas

Bio

Jake Vanderplas is an NSF postdoctoral research fellow, working jointly between the Astronomy and Computer Science departments at the University of Washington, and is interested in topics at the intersection of large-scale machine learning and wide-field astronomical surveys. He is co-author of the book “Statistics, Data Mining, and Machine Learning in Astronomy”, which will be published by Princeton press later this year. In the Python world, Jake is the author of AstroML, and a maintainer of Scikit-learn & Scipy. He gives regular talks and tutorials at various Python conferences, and occasionally blogs his thoughts and his code at Pythonic Perambulations: http://jakevdp.github.com.

Description

Machine Learning has been getting a lot of buzz lately, and many software libraries have been created which implement these routines. scikit-learn is a python package built on numpy and scipy which implements a wide variety of machine learning algorithms, useful for everything from facial recognition to optical character recognition to automated classification of astronomical images. This tutorial will begin with a crash course in machine learning and introduce participants to several of the most common learning techniques for classification, regression, and visualization. Building on this background, we will explore several applications of these techniques to scientific data -- in particular, galaxy, star, and quasar data from the Sloan Digital Sky Survey -- and learn some basic astrophysics along the way. From these examples, tutorial participants will gain knowledge and experience needed to successfully solve a variety of machine learning and statistical data mining problems with python.

Outline

This follows the general outline of the online tutorial I've prepared for scikit-learn (see link above). For the purpose of Scipy2012, I plan to convert most of this material into ipython notebooks for interactive instruction, though an updated version of the web page will be available as well.

Packages Required

numpy, scipy, scikit-learn (bleeding-edge not necessary; I'll make everything compatible with the ubuntu distro versions)

ipython *including the recently released ipython notebook*. Participants should be able to run "ipython notebook" in the command line and see the ipython dashboard in their web browser (version 2+ is fine).

Note that an EPDfree installation contains all the necessary dependencies, with the exception of scikit-learn.

For installation instructions, click here.

[ back to top ]

HDF5 is for lovers - Anthony Scopatz

Bio

Anthony Scopatz is a computational nuclear engineer / physicist post-doctoral scholar at the FLASH Center at the University of Chicago. His initial workshop teaching experience came from instructing bootcamps for The Hacker Within - a peer-led teaching organization at the University of Wisconsin. Out of this grew a collaboration teaching Software Carpentry bootcamps in partnership with Greg Wilson. During his tenure at Enthought, Inc, Anthony taught many week long courses (approx. 1 per month) on scientific computing in Python.

Description

HDF5 is a hierarchical, binary database format that has become a *de facto* standard for scientific computing. While the specification may be used in a relatively simple way (persistence of static arrays) it also supports several high-level features that prove invaluable. These include chunking, ragged data, extensible data, parallel I/O, compression, complex selection, and in-core calculations. Moreover, HDF5 bindings exist for almost every language - including two Python libraries (PyTables and h5py).

This tutorial will discuss tools, strategies, and hacks for really squeezing every ounce of performance out of HDF5 in new or existing projects. It will also go over fundamental limitations in the specification and provide creative and subtle strategies for getting around them. Overall, this tutorial will show how HDF5 plays nicely with all parts of an application making the code and data both faster and smaller. With such powerful features at the developer's disposal, what is not to love?!

This tutorial is targeted at a more advanced audience which has a prior knowledge of Python and NumPy. Knowledge of C or C++ and basic HDF5 is recommended but not required.

Outline

Packages Required

This tutorial will require Python 2.7, IPython 0.12+, NumPy 1.5+, and PyTables 2.3+. `ViTables`_ and MatPlotLib are also recommended. These may all be found in Linux package managers. They are also available through EPD or easy_install. ViTables may need to be installed independently. [http://vitables.org/]

[ back to top ]

Advanced Matplotlib - Ryan May

Bio

Ryan May is a Software Engineer at Enterprise Electronics Corporation and a Doctoral student in the School of Meteorology at the University of Oklahoma. His primary interest in Python is for its application for data visualization and for rapid development and testing of signal processing techniques for weather radar applications. He has also been a developer for the Matplotlib project since 2008, giving an introductory Matplotlib tutorial at SciPy 2010. Among Ryan's contributions to Matplotlib are improvements to its spectral analysis routines, wind barb support (for the meteorological community), and, most recently, simplified support for creating and saving animations.

Description

Matplotlib is one of the main plotting libraries in use within the scientific Python community. This tutorial covers advanced features of the Matplotlib library, including many recent additions: laying out axes, animation support, Basemap (for plotting on maps), and other tweaks for creating aesthetic plots. The goal of this tutorial is to expose attendees to several of the chief sub-packages within Matplotlib, helping to ensure that users maximize the use of the full capabilities of the library. Additionally, the attendees will be run through a "grab-bag" of tweaks for plots that help to increase the aesthetic appeal of created figures. Attendees should be familiar with creating basic plots in Matplotlib as well as basic use of NumPy for manipulating data.

Outline

Packages Required

Matplotlib 1.1.0 (May need to bump if another release comes out), Basemap, Numpy. ffmpeg/mencoder would optionally be used to save animations as movie files.

Attendees need to have Matplotlib and Basemap installed. EPDFree is a good start, since matplotlib is included. Instructions for installing Basemap can be found at:

If needed, instructions for installing Matplotlib are located at: