Data Resources
Reading Material on Databases and Data Formats
There are many good resources on databases, and on the ever changing field of finding the common format in science, so here are a few links to resources that I have found useful.
- A short history of databases from Wikipedia
- A reasonable description of the difference between a flat file (database) and a relational database
- A fairly comprehensive listing of scientific formats for a variety of fields can be found at Just Solve the File Format Problem
Database Formats
There are many common database file formats, here is a list of the most common and an indication their most likely source.
- The dbf file
- DBF is a file format used by databases such dBase, Visual FoxPro, and FoxBase+.
- It is an older format the incorporates a binary header with plain text data.
- Many current database programs can still read this format, but there are also modules to read dbf files for most programming languages.
- The gdb file
- The latest ArcGIS geodatabase database (gbd) format,
- The mdb file
Database Software
This is not a comprehensive list, but provides some links to commonly used database software:
- Open Source
- SQLite - a free, open source, embedded SQL database engine that read s and writes to ordinary disk files, without the need for a separate server process. It is therefore not a heavy-duty database, such as PostgreSQL or Oracle, but it uses a version of the SQL query language so provides a lower threshold for entry into database use and development.
- SQL (Structured Query Language) - the primary database language, many versions both proprietary and open source are available (see PostGreSQL below) that allow for customization of the basic structured language
- PostGreSQL home page:
http://www.postgresql.org/
- PostGIS home page: http://postgis.refractions.net/
- Proprietary
- Access - The database software that is included in Microsoft Office.
- There are plenty of short videos on youtube to help learn the Access database.
- An Access Database tutorial from Tutorials Point
- INFO - the original Geospatial database being ArcInfo and still present in ArcGIS though it is no longer being updated
- Access - The database software that is included in Microsoft Office.
Tutorials for Working with Databases in Python
- SQLite tools
- Standard Python documentation for the sqlite3 module - select the correct version of python
- Official pandas documentation for SQL queries
- Tutorial for Working with SQLite Databases using Python and Pandas
- Tutorial for working with very large databases using Python and SQLite. Works with a database that is too large to be read completely into a pandas dataframe. -
Scientific Data Formats
Descriptions of Common Scientific Data Formats
- Overview of Common Scientific Data Formats used for Earth Sciences
- Wikipedia's Take on Data Formats
Links to Specific Data Formats
- NetCDF home page: http://www.unidata.ucar.edu/software/netcdf/
- HDF home page: http://hdf.ncsa.uiuc.edu/
Tutorials for Working with Common Scientific Formats in Python
- General guidance for reading multiple data file format types with Python
- Flat ASCII files
- NetCDF files
- HDF5 files (HDF and HDF5 files are not the same thing)