Installing Python Modules for Kalpana

This post was contributed by CCHT collaborator Jason Fleming. This is a LONG post, but as he notes, it is a stream of consciousness about how he compiled the modules needed by Kalpana. Hopefully his experience will be helpful to new users. After the modules have been compiled, then Kalpana is very easy to use. So please don’t be discouraged!

The actual script does not require installation, it is merely executed. However, it relies on several Python modules that most users (or their IT supporters) will need to install before working with Kalpana. These modules are:

  • matplotlib – main python module used for data visualization.
  • pylab – Imports plotting and numerics libraries in a single name space.
  • shapely – used to construct geometric objects like Points, Polygons and LineStrings.
  • fiona – for writing .shp files.
  • netCDF4 – reading and writing netCDF files.
  • datetime – dates and time calculation, manipulation, and formatting.
  • time – contains time related functions for measuring the performance of the code itself.
  • numpy – to facilitate scientific computing; used primarily in Kalpana for working with n-dimensional numpy arrays, which are ideal for storing large amounts of data.
  • collections – accessing OrderedDict, which is a dictionary subclass that remembers the order in which entries were added, whereas an ordinary dictionary does not do so).
  • simplekml – writing kml (Google Earth) files.

In the process of installing Python modules on a desktop platform, as well as an HPC platform, I’ve found that the process can be very challenging. The biggest issue is conflicting module version dependencies, which can be handled by the use of virtual execution environments for Python (explained below).

The remaining issues are the required versions of the modules, the installation infrastructure, the underlying version of Python (Kalpana requires Python 2.7), and the supporting non-Python libraries on the host platform. These issues are not always easily addressed, and the error messages that result from module installation failures are often so non-intuitive that the installer must make educated guesses about the root cause of the issue and rely heavily on Google searches to provide clues about how to proceed.

As a result, the exact issues I encountered with the installation of Python modules for supporting Kalpana may or may not be encountered by others. There seems to be a strong dependency on the individual platform. Because of the unpredictable nature of the issues, this installation guide was not written as a set of step-by-step instructions, but rather as a stream of consciousness narrative of what steps I took, what errors resulted, and how I went about resolving them.

Finally, the reader should be encouraged by the fact that I was ultimately successful in getting the supporting Python modules installed in each case.

This post starts with instructions for how to install a virtual environment, which can be used to run Kalpana and provide access to different packages. Then the instructions are given for how to install the necessary Python modules: (1) on a desktop computer, and (2) in an HPC environment.

Installing the Virtual Environment

When installing the libraries supporting Kalpana on an HPC cluster or desktop Linux machine, conflicts can arise between the Python libraries required by Kalpana, and Python libraries required by other python programs. This issue can be resolved by using the Python package virtualenv, which can be used to create different virtual Python environments for different packages.

A great introduction to virtualenv is available. If it is not already installed, go ahead and install it via pip. Then create a virtual environment for running Kalpana called kalpanaenv:

jason:~/Kalpana$ virtualenv kalpanaenv
New python executable in kalpanaenv/bin/python
Installing distribute........done.
Installing pip...............done.

The virtualenv package installs pip into the new virtual environment by default. Now activate this virtual environment by running the following command:

jason:~/Kalpana$ source kalpanaenv/bin/activate

To install and manage Python packages, I installed pip. After some trial-and-error, I found that it may be best to immediately upgrade the version of pip that was installed in the virtual environment by virtualenv:

pip install --upgrade pip

Installing Kalpana on a Desktop Computer

The next step for running Kalpana on your desktop machine requires the installation of Python libraries to satisfy its dependencies. My desktop machine has Python version 2.7.3 installed, and Kalpana requires Python 2.7, although Python 2.6 may work as well.

The Python module dependencies are documented at the top of the source code of as follows:

import matplotlib
import pylab as pl
import matplotlib.pyplot as pplot
from shapely.geometry import mapping, Polygon, LineString, LinearRing, Point
import fiona
import netCDF4
import datetime
import time
import numpy as np
import collections
import simplekml
import math

Installing the matplotlib Python Module

When installing matplotlib into my Kalpana virtual environment, I got the following error message:

Downloading/unpacking matplotlib
Downloading matplotlib-2.0.0b1.tar.gz (53.2Mb): 53.2Mb downloaded
Running egg_info for package matplotlib
The required version of distribute (>=0.6.28) is not available,
and can't be installed while this script is running. Please
install a more recent version first, using
'easy_install -U distribute'.

So I ran the command:

easy_install -U distribute

and it seemed to run successfully. I reran:

pip install matplotlib

and it seemed to install successfully and also seemed to install numpy in the process.

Installing the pylab Python Module

Next, I tried to install pylab with:

pip install pylab

and got the following error message:

pip install pylab
Downloading/unpacking pylab
Could not find any downloads that satisfy the requirement pylab
No distributions at all found for pylab
Storing complete log in /home/jason/.pip/pip.log

Looking in /home/jason/.pip/pip.log, I found the following error messages:

Skipping link
any.whl#md5=a49eb20cdd1ce3c2ceebcb5588cddb62 (from; unknown archive format: .whl
Could not find any downloads that satisfy the requirement pylab

After some research focused on the error message “unknown archive format .whl“, I started to wonder whether my installed version of setuptools is out of date. I tried to upgrade setuptools with:

pip install --upgrade setuptools

but it seemed to already be at the latest version. So instead I tried to upgrade pip the same way (as suggested in the introduction):

pip install --upgrade pip

which seemed to be successful. Then I retried:

pip install pylab

and it successfully downloaded, built, and installed pylab and its dependencies.

Installing the fiona Python Module

When I ran:

pip install fiona

I ran into the following error message:

/usr/bin/pip:5: UserWarning: Module dap was already imported from None,
but /usr/lib/python2.7/dist-packages is being added to sys.path

After a search, I found a recommendation to remove python-dap using my system package manager. I did so, and this resolved the error message about module dap. I retried the fiona install and encountered the following error message:

WARNING:root:Failed to get options via gdal-config:
[Errno 2] No such file or directory

This error occurred because I did not have the gdal-config executable. In order to resolve this, I installed libgdal1-dev (and its many dependencies) using the system package manager (in my case, Synaptic on Ubuntu 12.04LTS).

After another retry, the fiona package did install, or at least it claimed to be successful, but there were many warnings associated with the compile. And when I ran:

import fiona

at the Python prompt, I confirmed that there was an error with the fiona install because I got the following error message:

ImportError: /usr/local/lib/python2.7/dist-packages/fiona/
undefined symbol: OGR_L_GetName

After some searching, I concluded that fiona requires GDAL version 1.8+. I ran gdal-config and the output indicated that I currently have GDAL version 1.7.3. In order to upgrade to the latest GDAL using my system’s package management software, I issued the following sequence of commands:

sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo pip install fiona --upgrade

When I retried the installation of fiona, it still emitted many warnings during compilation, but after completing the compile and install, I successfully executed the ‘import fiona‘ command at the Python prompt.

Installing the Other Python Modules

Finally I was able to install shapely, netCDF4, datetime, and simplekml with pip without encountering any further error messages. The time, collections, and math modules appear to be included in my Python distribution and therefore did not need to be installed with pip.

On the first attempt to run, the execution failed immediately with the following error message:

Traceback (most recent call last):
File "./", line 8, in <module>
from shapely.geometry import mapping, Polygon, LineString,
LinearRing, Point
ImportError: cannot import name LinearRing

My attempted fix was to upgrade shapely:

sudo pip install shapely --upgrade

which resulted in Kalpana running successfully.

Installing Kalpana on HPC Platform

The integration process on an HPC platform (Hatteras at RENCI) was broadly similar to the integration process for my desktop Linux machine, except for the significant complications resulting from my lack of administrative privileges on Hatteras at RENCI. Fortunately, the sysadmins at RENCI were very supportive.

After doing some research about how to approach this issue of Python module installation in an HPC environment where I do not have root access, I decided that the first step was for me to request that the Hatteras administrators install pip and virtualenv on Hatteras. Pip is the Python package manager and virtualenv allows me to create different Python environments for the NCFS on Hatteras for different projects, without having administrative privileges. The RENCI systems adminstrators promptly installed both of these programs as well as virtualenvwrapper.

I created a new directory /projects/ncfs/apps/kalpana, changed to that directory and created a new Python environment:

[ncfs@ht1 kalpana]$ virtualenv env
New python executable in env/bin/python
Installing setuptools, pip...done.

Then I activated the new Python environment with the command:

[ncfs@ht1 kalpana]$ source /projects/ncfs/apps/kalpana/env/bin/activate

and verified that I was using this virtual Python environment whenever I execute pip or the Python interpreter:

(env)[ncfs@ht1 kalpana]$ which python
(env)[ncfs@ht1 kalpana]$ which pip

I started up the Python interpreter to see which of the Python modules I needed were already available. The results indicated that datetime, time, collections, and math were already available in my virtual Python environment, but matplotlib, pylab, shapely, fiona, netCDF4, numpy, and simplekml were not.

Installing the matplotlib and numpy Python Modules

I attempted to install matplotlib with the following:

pip install matplotlib

The screen messages indicated that matplotlib was being downloaded; this was quickly followed by additional screen messages indicating that other packages (which I assumed were dependencies) were also being downloaded and compiled. However, this compilation and installation process seemed to fail with the following error message:

running build_ext
building 'matplotlib.ft2font' extension
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-vZ7N1V/matplotlib/", line 268, in <module>
File "/usr/lib64/python2.6/distutils/", line 152, in setup
File "/usr/lib64/python2.6/distutils/", line 975, in run_commands
File "/usr/lib64/python2.6/distutils/", line 995, in run_command
ImportError: /projects/ncfs/apps/kalpana/env/lib/python2.6/
site-packages/numpy/core/ undefined symbol:

After some research, I concluded that the attempted build and installation of numpy must have used a different compiler than the one that was used for Python originally. Specifically, it seemed that pip tried to build numpy (to satisfy matplotlib dependency on numpy) with the Intel C compiler.

Just to be sure, I checked to see if pip had successfully installed numpy:

(env)[ncfs@ht1 kalpana]$ pip install numpy
Requirement already satisfied (use --upgrade to upgrade):
numpy in ./env/lib/python2.6/site-packages
(env)[ncfs@ht1 kalpana]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/projects/ncfs/apps/kalpana/env/lib/python2.6/site-packages/
numpy/", line 170, in <module>
from . import add_newdocs
ImportError: /projects/ncfs/apps/kalpana/env/lib/python2.6/site-packages/
numpy/core/ undefined symbol: __intel_sse2_strcpy

From this, I concluded that pip had not built and/or installed numpy successfully, and that the system Python had been built using gcc, not the Intel C compiler. Based on the final error message, it seemed likely that pip had attempted to use the intel C compiler (icc) to compile numpy.

I really wanted to avoid any changes to the system-installed Python environment (actually I wanted to change as little of the underlying HPC environment as possible). I decided that the presence of the Intel compiler suite in my loaded modules (system modules, not Python modules) had somehow triggered pip to try to use these compilers to build new Python modules rather than gcc and gfortran that were used to build Python for this HPC platform originally. I decided that if that were the case, then I should be able to resolve the issue by unloading those Intel compiler modules and start over with my virtualenv setup.

So at that point, I deactivated the virtualenv Python environment and deleted the env directory. I unloaded system modules as follows:

[ncfs@ht1 kalpana]$ module unload netcdf/4.1.3_intel-14.0.3
[ncfs@ht1 kalpana]$ module unload mvapich2/2.0_intel-14.0.3_nemesis
[ncfs@ht1 kalpana]$ module unload intelfort/14.0.3
[ncfs@ht1 kalpana]$ module unload intelc/14.0.3

and set environmental variables for CC and FC as follows:

[ncfs@ht1 kalpana]$ CC=gcc
[ncfs@ht1 kalpana]$ FC=gfortran

with the aim of setting up a virtual Python environment whose modules would be compiled with gcc and gfortran and therefore be compatible with the system Python installation.

Then I created a new virtual environment using the same sequence of commands as previously:

[ncfs@ht1 kalpana]$ virtualenv env
[ncfs@ht1 kalpana]$ source /projects/ncfs/apps/kalpana/env/bin/activate
(env)[ncfs@ht1 kalpana]$ pip install numpy

The build and install process for numpy proceeded, with seemingly many fewer compiler warnings, and the build finished with the following informational message:

Successfully installed numpy-1.9.1

I performed a smoke test to see if numpy had been installed correctly:

(env)[ncfs@ht1 kalpana]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy
<module 'numpy' from '/projects/ncfs/apps/kalpana/env/lib/python2.6/
>>> numpy.__version__

and concluded that numpy had most likely been built and installed correctly.

From there, I reattempted the build and installation of the matplotlib Python module. The compile seemed to proceed with relatively few warnings and ended with the following informational message:

Successfully installed matplotlib-1.4.2 mock-1.0.1 nose-1.3.4
pyparsing-2.0.3 python-dateutil-2.4.0 pytz-2014.10 six-1.9.0

The smoke test of the matplotlib module also completed successfully:

(env)[ncfs@ht1 kalpana]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib
<module 'matplotlib' from '/projects/ncfs/apps/kalpana/env/lib/
>>> matplotlib.__version__

Installing the pylab Python Module

Next, I attempted to build and install pylab:

(env)[ncfs@ht1 kalpana]$ pip install pylab
Collecting pylab
Could not find any downloads that satisfy the requirement pylab
No distributions at all found for pylab

After some research, I realized that pylab is actually part of the matplotlib module, rather than a separate component to be built and installed. I confirmed that I could successfully start up a Python interpreter and execute “import pylab” without error.

Installing the shapely Python Module

The next dependency to install was the shapely module, which is used for geometric operations. However, this module failed to install with the following error messages:

(env)[ncfs@ht1 kalpana]$ pip install shapely
Collecting shapely
Downloading Shapely-1.5.3.tar.gz (258kB)
100% |################################| 262kB 1.1MB/s
Numpy or Cython not available, shapely.vectorized submodule not
being built.
pkg_resources/ DeprecationWarning: `require`
parameter is deprecated. Use EntryPoint._load instead.
Installing collected packages: shapely
Running install for shapely
Numpy or Cython not available, shapely.vectorized submodule not
being built.
error: None
Complete output from command /projects/ncfs/apps/kalpana/
env/bin/python -c "import setuptools, tokenize;
exec(compile(getattr(tokenize, 'open', open)(__file__).read().
replace('\r\n', '\n'), __file__, 'exec'))" install --record
--single-version-externally-managed --compile --install-headers
Numpy or Cython not available, shapely.vectorized submodule not
being built.

running install

error: None

Command “/projects/ncfs/apps/kalpana/env/bin/python -c
“import setuptools, tokenize;
exec(compile(getattr(tokenize, ‘open’, open)(__file__).read().
replace(‘\r\n’, ‘\n’), __file__, ‘exec’))” install –record
–single-version-externally-managed –compile –install-headers
failed with error code 1 in /tmp/pip-build-O0S9mu/shapely

After some research, I found that the shapely module relies on the GEOS software library (version 3.3+), which is written in C++ and must be installed by system administrators. In order to check to see if GEOS had been installed on Hatteras at RENCI, I ran the following command:

(env)[ncfs@ht1 kalpana]$ geos-config
-bash: geos-config: command not found

As a result, I submitted a request to the sysadmins at RENCI to install the GEOS library. They promptly performed the install of GEOS version 3.3.2, and I retried the installation of the shapely module. The compile appeared to generate a lot of warnings, but completed with the following message:

Successfully installed shapely-1.5.3

I performed a smoke test to see if I could import the module, just in case:

(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shapely
>>> shapely
<module 'shapely' from '/projects/ncfs/apps/kalpana/env/lib/python2.6/site-
>>> shapely.__version__

Installing the fiona Python Module

In order to install the fiona module (used to write shapefiles), I first checked to see if the system GDAL library was available by executing “which gdal-config“. I found that it was not, so I emailed a request to the RENCI sysadmins to see if they could install GDAL version 1.8+, including the development files. They found that they needed to build and install the library from source (fortunately they used gcc to do so) and they even created a system module.

I loaded this system module and then re-tried the fiona module install. The process completed with a success message, so I performed a smoke test:

(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ module load gdal/1.11.1_gcc
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ which gdal-config
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ pip install fiona
Collecting fiona
Using cached Fiona-1.4.8.tar.gz
Installing fio script to /projects/ncfs/apps/kalpana/env/bin
Successfully installed fiona-1.4.8
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import fiona
>>> fiona
<module 'fiona' from '/projects/ncfs/apps/kalpana/env/lib/python2.6/site-
>>> fiona.__version__

Based on the above, it appeared that the fiona module had built and installed successfully.

Installing the netCDF4 Python Module

My first attempt at installing the netCDF4 module produced the following results:

(env)[ncfs@ht1 nhcConsensus]$ pip install netCDF4
Collecting netCDF4
Downloading netCDF4-1.1.3.tar.gz (628kB)
100% |################################| 630kB 1.2MB/s
HDF5_DIR environment variable not set, checking some standard locations ..
checking /home/ncfs ...
checking /usr/local ...
checking /sw ...
checking /opt ...
checking /opt/local ...
checking /usr ...
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/tmp/pip-build-vOPUHd/netCDF4/", line 216, in <module>
raise ValueError('did not find HDF5 headers')
ValueError: did not find HDF5 headers
Complete output from command python egg_info:
reading from setup.cfg...
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/tmp/pip-build-vOPUHd/netCDF4/", line 216, in <module>
raise ValueError('did not find HDF5 headers')
ValueError: did not find HDF5 headers
Command "python egg_info" failed with error code 1 in

From this I concluded that I needed to compile HDF5 using the gcc and gfortan compilers on Hatteras, so that the netCDF4 module could use the HDF5 library. I started by making a copy of the existing HDF5 source code, for use in the new build:

(env)[ncfs@ht1 hdf5]$ tar xvjf hdf5-1.8.10.tar.bz2
(env)[ncfs@ht1 hdf5]$ mv hdf5-1.8.10 hdf5-1.8.10-hatteras-gcc
(env)[ncfs@ht1 hdf5]$ cd hdf5-1.8.10-hatteras-gcc
(env)[ncfs@ht1 hdf5-1.8.10-hatteras-gcc]$ ./configure --enable-fortran
--enable-fortran2003 --enable-cxx

After configuring the build, I typed make, then make test, then make install. Then I set the environment variable to the path to this HDF5 installation so that pip could find it:


However, when I retried pip install netCDF4, I got the same error message about the HDF5 environment variable not being set. So then I tried:

export HDF5_DIR=/projects/ncfs/apps/hdf5/hdf5-1.8.10-hatteras-gcc/hdf5

and then retried pip install netCDF4. The issue with HDF5 was resolved successfully, but this time, I got a different error message:

NETCDF4_DIR environment variable not set, checking standard locations.....
ValueError: did not find netCDF version 4 headers

From this I concluded that I needed to download the NetCDF source code and build it myself using the gcc compiler. As shown in the following sequence, I downloaded and compiled NetCDF version 4.3.2 (the latest stable version), set two environment variables to point to the HDF5 installation I had just built, and then built the software:

(env)[ncfs@ht1 netcdf]$ pwd
(env)[ncfs@ht1 netcdf]$ curl
pub/netcdf/netcdf-4.3.2.tar.gz > netcdf-4.3.2.tar.gz
(env)[ncfs@ht1 netcdf]$ tar xvzf netcdf-4.3.2.tar.gz
(env)[ncfs@ht1 netcdf]$ mv netcdf-4.3.2 netcdf-4.3.2-hatteras-gcc-build
(env)[ncfs@ht1 netcdf]$ mkdir netcdf-4.3.2-hatteras-gcc
(env)[ncfs@ht1 netcdf]$ cd netcdf-4.3.2-hatteras-gcc-build
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc-build]$ export
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc-build]$ export
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc-build]$ ./configure
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc-build]$ make
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc-build]$ make check
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc-build]$ make install

With NetCDF now in place, I was able to set the environment variable so that pip could find NetCDF, and then retried the pip command to build the netCDF4 module:

export NETCDF4_DIR=/projects/ncfs/apps/netcdf/netcdf-4.3.2-hatteras-gcc
pip install netCDF4

The build and install of the netCDF4 module seemed to progress with only a single warning and finished with the following informational message:

Successfully installed netCDF4-1.1.3

I performed a smoke test on the netCDF4 module to confirm that it could at least be imported and could recite its version number:

(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import netCDF4
>>> netCDF4
<module 'netCDF4' from '/projects/ncfs/apps/kalpana/env/lib/python2.6/site-
>>> netCDF4.__version__

From the above, I concluded that the netCDF4 module most likely installed correctly.

Installing the simplekml Python Module

I was able to install the final Python module, simplekml, without issues:

(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ pip install simplekml
Collecting simplekml
Downloading (51kB)
100% |################################| 53kB 1.2MB/s
packages/pkg_resources/ DeprecationWarning:
`require` parameter is deprecated. Use EntryPoint._load instead.
Installing collected packages: simplekml
Running install for simplekml
Successfully installed simplekml-1.2.5
(env)[ncfs@ht1 netcdf-4.3.2-hatteras-gcc]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import simplekml
>>> simplekml
<module 'simplekml' from '/projects/ncfs/apps/kalpana/env/lib/
>>> simplekml.__version__

With that, I was ready to test the production of shapefiles and KMZ files on Hatteras.


The primary challenge for the integration of the Kalpana code for shapefile and Google Earth products generation came from the Python module dependencies in the Kalpana code. However, I was able to successfully satisfy these dependencies in cooperation with the RENCI systems administrators.