Data Analysis Resources
Python Books
Here is a list of books related to data analysis that I have found useful in my class.
- Janert, P. K. (2011), Data Analysis with Open Source Tools, is my preferred book for teaching the basic concepts of data analysis. Available on-line through the Purdue library. Also available for download via IT eBooks.
- McKinney, W. (2012), Python for Data Analysis, introduces the Python Data Analysis Library (pandas) a very useful set of tools, especially for working with time series data. Available on-line through the Purdue library.
Statistical Tools
- Python based tools
- SciPy - statistical analysis tools for science in Python - The Stats module in SciPy contains a large number of probability distributions and a growing library of statistical functions, especially for probability function study.
- StatsModels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
- Scikit-learn is a Python module for machine learning and data mining. Contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensional reduction.
- R Project for Statistical Computing
- Free distribution that is installed on most Linux/Unix systems and is also available for other platforms. It is available through Anaconda.
- Based on S-plus, an older proprietary statistical analysis package, the R-project is an example of open source software development driving a company out of business.
- Introduction to R
- RPy module for Python provides access to many R project statistical tools
- SAS statistical tools
- Proprietary software package, available through Purdue University license for installation on University systems.
- Information on licensing for faculty, staff and students can be found through the Purdue IT Software Licensing and Distribution page
- SAS Tutorials for Statistical Data Analysis
- SPSS statistical tools
- SPSS is favored by the social science research community.
- Proprietary software package, available through Purdue University license for installation on University systems.
- Information on licensing for faculty, staff and students can be found through the Purdue IT Software Licensing and Distribution page
- Tutorial: Analyzing with SPSS Statistics - basic tutorial from IBM.
- Here is a useful page on Social Science Data and Statistics Resources focused on SPSS from Tufts University.
- Lots of other on-line courses available if you want to pay, including one through Purdue Global.
Environmental Metrics
- Literature on Metrics found to be Environmental Indicators
- Streamflow Metrics
- Spatial Metrics
- Tools for Calculating Metrics
Tools for Working with Date and Time Information
- Shared Date and Time Tools - these libraries are now only supported for C language applications as datetime support is now well integrated into Python and its submodules.
- Python tools for working with date and time
- Working with date and time in the Generic Mapping Tools (GMT)
Tools for Working with Common Data Formats
- ASCII text files
- Software packages
- Excel
- Python tools
- Reading and Writing files in Python
- Text read and write functions in NumPy
- Text read and write functions in pandas
- Software packages
- Binary files
- NetCDF files
- HDF and HDF5 files
- GRiB files
Graphical Analysis Tools
- Using matplotlib to create MatLAB like plots and graphical analysis
- Pyplot tutorial - Limited tutorial, but method for importing pylab library makes understanding what is part of the module much easier.
- A more complete tutorial - Assumes pyplot is a stand-alone module. As of February 10, 2014 you will need to import pylab using the command: "from matplotlib.pyplot import *" for the tutorial examples to work.
- Gallery - Images of matplotlib figrues, and the source code use to make them.
- The PyGMT interface for the Generic Mapping Tools for processing geospatial and geophysical data and making publication-quality maps and figures without having to learn Linux.
- PyNGL is a Python interface to the high quality 2D scientific visualizations in the NCAR Command Language (NCL).
Geospatial Tools
- Working with Python in ArcGIS
- PyGMT interface for the Generic Mapping Tools
- PyQGIS scripting language for QGIS (Quantum GIS)