Using geospatial data with python
Kelsey Jordahl - Enthought, Inc.
Kelsey Jordahl has extensive experience with geographical and geophysical datasets. He holds a Ph.D. in marine geophysics from the MIT/Woods Hole Oceanographic Institution Joint Program in Oceanography. As a developer with Enthought, he has worked with investment banks and other clients on grid computing and data optimization. He has years of college teaching experience, ranging from introductory science classes to graduate level seismology. He currently teaches weeklong python training courses for scientific and financial audiences.
Geographically referenced data is important in many scientific fields, and working with spatial data has become widespread in other domains as well (e.g. Google Maps, geolocated tweets, 4square checkins). Python has become an increasingly important language for working with geospatial data. In this tutorial, students will get experience in working with common geospatial formats in open source python libraries.
Python bindings are available for (nearly) all the standard libraries for working with geospatial data (proprietary and open source). Some of these libraries (including PROJ.4 and GDAL) will be discussed and used in this tutorial, along with more "pythonic" packages for accessing them, such as Shapely. Using spatially-aware databases will be discussed, with examples and an exercise using PostGIS, an extension to PostgreSQL. Python scripting extensions to Geographic Information Systems (GIS) packages such as QGIS and ArcView will be briefly discussed.
This tutorial should be accessible to anyone who has a basic understanding of NumPy and matplotlib. Prior familiarity with SQL database queries and the python DB API will be helpful for the PostGIS section.
1 map projections ~~~~~~~~~~~~~~~~~~ /10 min/
introduction to map projections
/10 min + 15 min exercise/
A python interface to the venerable and powerful PROJ.4 library, PROJ.4 can handle transformations between many map projections and datums, and is the standard engine for such transformation used by many open source GIS programs.
exercise: convert state plane coordinates to/from latitude & longitude
/10 min + 10 min exercise/
Basic map projection plotting with matplotlib. Baseplot will be used for data visualization in later exercises as well.
exercise: Plot political boundaries on several different projections.
Newer library for map plotting in python, also using matplotlib.
2 geographical data ~~~~~~~~~~~~~~~~~~~~
2.1 data formats
/20 min intro/
Raster and vector data, common proprietary formats (e.g. shapefiles), and interchange formats (e.g. geoJSON).
/10 min + 10 min exercise/
GDAL (Geospatial Data Abstraction Library) is the standard open source library for converting between raster geospatial formats. The OGR simple features library for translating vector formats also lives within the GDAL source tree. Python bindings are available for GDAL/OGR.
exercise: convert a shapefile to GeoJSON
/15 min + 30 min exercise/
Shapely is a python library for interacting with simple geometry objects (points, lines, polygons). Shapely has no knowledge of map projections and does its calculations in the coordinate system of the data. It implements the Simple Features standard (ISO 19125) for geometry objects.
Exercise: Grouping geolocated tweets by zip code and census block. Plotting heatmaps on a map.
/30 min + 30 min exercise/
PostGIS is an extension to the popular and powerful open source database PostgreSQL. PostGIS brings spatial awareness to your SQL data and allows you to perform GIS calculations with SQL queries.
Other databases (both SQL and NoSQL) have geospatial extensions and will be mentioned in context (e.g. Microsoft SQL Server, Oracle, CouchDB, MongoDB). PostGIS is arguably the most powerful of the open geospatial databases, and officially supported by the Open Source Geospatial Foundation (OSGeo).
topics covered will include:
2.4.1 Connecting to a PostGIS database with
2.4.2 Converting latitude and longitude fields to geographical points
2.4.3 Setting and converting coordinate systems
2.4.4 Aggregation and geographic calculations with queries
2.4.5 GEOMETRY and GEOGRAPHY data types
Exercise: Loading tweet data into PostGIS database. Using PostGIS queries to group data and do calculations. (Students who have PostGIS installed on their laptops can create the database locally. Others can connect to a port on the instructors machine.)
3 plugins for GIS software ~~~~~~~~~~~~~~~~~~~~~~~~~~~
Quantum GIS is an open source Geographic Information System. QGIS itself was developed in C++, but has an API for python plugins and an interactive python console. Developing a full QGIS plugin will be beyond the scope of this tutorial, but a short example of using a python plugin in QGIS will be shown.
Since version 9.0 (released in 2004), python has been a supported scripting language in the industry standard geographical information system, ESRI ArcGIS. Recent versions have brought increased power and flexibility to the scripting layer, and python is an important part of the ArcGIS ecosystem. Interfacing with ArcGIS is beyond the scope of this tutorial (and would require participants to have an ArcGIS license), but capabilities and examples of python scripting with ArcGIS will be briefly discussed.
4 Conclusion ~~~~~~~~~~~~~ /10 min/
Wrap-up, touch on missing topics (e.g. web visualization with D3/leaflet, GeoDjango, etc.).
required packages pyproj, gdal, shapely, psycopg2
optional packages PostGIS, QGIS, cartopy