From: [email protected] (Ilana Stern) Newsgroups: sci.data.formats,news.answers,sci.answersScientific Data Format Information FAQ
Archive-name: sci-data-formats Last-modified: 13 Oct 1995
within last two weeks
within last four weeks
This is the FAQ for the sci.data.formats newsgroup. Contents:
1)
How to use this document
2)
How to get a current copy of this document
3)
Resources for format information
4)
Resources for visualization software information
5)
How to use the data retrieval methods
6)
Why isn't my favorite format on this list?
Each (major) section has a "Subject:" line, so you can search on the subject title above to find the section quickly.
This article is copyright (c) 1995 by Ilana Stern. It may be freely distributed provided that this copyright notice and the information on retrieving a current copy are not removed.
Comments, corrections, or additions should be sent to Ilana Stern at [email protected].
Most FAQ (Frequently Asked Questions) documents list many questions and their answers. This FAQ is (mostly) devoted to answering only one question:
"Where can I find documentation and software for [X] data format?"
As the amount of information available over the networks has been increasing, so have the methods by which this information can be obtained. No longer is direct usage of FTP the only, or even the most frequent, method of obtaining data; we now have Gopher, Wais, and WWW, as well as many site-specific interfaces. Because the information itself may be accessible in many different ways, this FAQ will identify resources in terms of URLs (Uniform Resource Locators). This will also help us convert this FAQ to a hypertext document, so that it can be used with a WWW browser to go directly to any of the listed sources.
Here's a glossary, so you can decode the URLs if necessary to reach the sites:
<URL:ftp://host.name.domain/[directory/[filename]]> ftp site
<URL:http://host.name.domain/[directory/[filename]]> www server
<URL:telnet://host.name.domain/> telnet site
<URL:gopher://host.name.domain/[directory/[filename]]> gopher server
<URL:wais://host.name.domain/> wais server
<URL:news:newsgroup.name> newsgroup
So, for example, if a document is available at ftp://ncardata.ucar.edu/ it means that you should ftp to ncardata.ucar.edu, and the information is in the top-level directory.
If you don't know what these information retrieval methods are, see the section "How to use the data retrieval methods".
If you are reading this document after 11 Oct 1995, you are reading an outdated copy. A current copy of this document can be obtained by anonymous FTP to <URL:ftp://rtfm.mit.edu/pub/usenet/news.answers/sci-data-formats>. If you don't know what FTP is, see the section "How to use the data retrieval methods".
If you can't use FTP, send email to [email protected] with
send /pub/usenet/news.answers/sci-data-formats
A current somewhat hypertext version of this document can be obtained from <URL:http://www.smartpages.com/faqs/sci-data-formats/faq.html>. Real hypertext versions are available at <URL:http://ncardata.ucar.edu/dss/faq/data-formats.html>, <URL:http://fits.cv.nrao.edu/traffic/scidataformats/faq.html>, and (for European users in particular) at <URL:http://info.mcc.ac.uk/CGU/Visualisation/sdf.html>. If you would like to archive this FAQ in either hypertext or plaintext format, and want to receive a new copy automatically at every update, please send me email.
1)
CDF
2)
FITS
3)
GRIB
4)
HDF
5)
netCDF
6)
VICAR
7)
PDS
8)
Miscellaneous graphics formats
9)
SAIF
10)
SDTS
11)
HDS
12)
MedFileS
13)
CXF
14)
JCAMP
15)
CIF
16)
OpenMath
17)
GeoTIFF
18)
DLG-3
19)
DEM
CDF (Common Data Format) is a library and toolkit for storing,
manipulating, and accessing multi-dimensional data sets. The basic component of
CDF is a software programming interface that is a device independent view of the
CDF data model.
All software and related information, including a FAQ and
hypertext User's Guide with searchable WAIS index, are available from the WWW
site: <URL:http://nssdc.gsfc.nasa.gov/cdf/cdf_home.html>.
A user's guide and software are on <URL:ftp://nssdca.gsfc.nasa.gov/cdf.dir/>
for VMS and <URL:ftp://ncgl.gsfc.nasa.gov/pub/cdf/>
for all others.
A recent paper for CDF is available from <URL:ftp://ncgl.gsfc.nasa.gov/pub/cdf/doc/papers/CDF-nssdc.ps.Z>.
A mailing list, [email protected], exists for discussion of CDF.
To subscribe, please send email to "[email protected]" with the
command "SUBSCRIBE cdf-users" in the body of your message.
Questions can be
directed to [email protected].
A client-server software layer
called CSCDF, which can be used with the CDF library to provide applications
access to remote CDF datasets, can be obtained from its author, Hillel
Steinberg, by email at [email protected].
FITS (Flexible Image Transport System) is the standard data interchange
and archival format of the worldwide astronomy community. The NOST Standard and
User's Guide, some software, and test files are available from <URL:ftp://nssdc.gsfc.nasa.gov/pub/fits>.
The site <URL:http://fits.cv.nrao.edu/> has other
software and a different set of test files, and electronic copies of FITS
proposals that are under development or in the international approval process.
Archives of the USENET newsgroups sci.data.formats, sci.astro.fits (which is
devoted to discussion of FITS), and others that are of interest to astronomers
can be found here. This site is also accessible via ftp at <URL:ftp://fits.cv.nrao.edu/fits>.
The
"FITS Support Office" which contains many useful documents and links to other
information, is at <URL:http://www.gsfc.nasa.gov/astro/fits/fits_home.html>.
A WAIS index that can be searched for FITS information is at <URL:http://info.cern.ch:8001/fits.cv.nrao.edu:210/nrao-fits>.
If you've searched all these resources and still have questions, you can
direct them to [email protected].
GRIB (GRid In Binary) is the World Meteorological Organization (WMO)
standard for gridded meteorological data. Unfortunately it is still not very
"standard", as some organizations use their own versions. A format description
for WMO GRIB, and software to read general GRIB grids, can be found at
<URL:ftp://ncardata.ucar.edu/libraries/grib/>.
The format description can also be found at <URL:ftp://nic.fb4.noaa.gov/pub/nws/nmc/docs/gribguide/guide.txt>.
If you need GRIB to read ECMWF data, the above format description, along
with the ECMWF-specific parameter table, and a list of differences between the
WMO and the ECMWF versions of GRIB, is in <URL:ftp://ncardata.ucar.edu/datasets/ds111.2/format>.
Read code can be found in <URL:ftp://ncardata.ucar.edu/datasets/ds111.2/software>.
If all else fails, contact Ilana Stern at [email protected].
HDF (Hierarchical Data Format) is a self-defining file format for
transfer of various types of data between different machines. The HDF library
contains interfaces for storing and retrieving compressed or uncompressed raster
images with palettes, and an interface for storing and retrieving n-Dimensional
scientific datasets together with information about the data, such as labels,
units, formats, and scales for all dimensions.
Source code and documentation
are on <URL:ftp://ftp.ncsa.uiuc.edu/HDF>. Some
general information on HDF, including a FAQ, is available from <URL:http://www.ncsa.uiuc.edu/SDG/Software/HDF/HDFIntro.html>.
The HDF WWW information server, with links to the above plus an in-progress
HTML reference manual is on <URL:http://hdf.ncsa.uiuc.edu:8001/>.
NetCDF (Network Common Data Form) is an interface for scientific data
access which implements a machine-independent, self-describing, extendible file
format. All netCDF information is available via the WWW site <URL:http://www.unidata.ucar.edu/packages/netcdf/>.
Source code and documentation for the netCDF data access library is
available from <URL:ftp://ftp.unidata.ucar.edu/pub/netcdf/netcdf.tar.Z>.
A FAQ is available from <URL:http://www.unidata.ucar.edu/packages/netcdf/faq.html>
or in text from <URL:ftp://ftp.unidata.ucar.edu/pub/netcdf/FAQ>.
Past netCDF support inquiries have been archived and can be searched from
<URL:gopher://groucho.unidata.ucar.edu/7waissrc:/systems/netcdf/unidata-support-netcdf.src>.
The netCDF User's Guide is available as a hypertext (HTML) document from
<URL:http://www.unidata.ucar.edu/packages/netcdf/guide.txn_toc.html>,
in compressed PostScript at <URL:ftp://ftp.unidata.ucar.edu/pub/netcdf/guide.ps.Z>,
or in source form with the netCDF source distribution.
A recent paper
(Jenter and Signell, 1992) which provides a good introduction to netCDF is
available as <URL:ftp://crusty.er.usgs.gov/pub/netcdf.asce.ps>.
A visual browser for netCDF format data files is available from <URL:ftp://ftp.unidata.ucar.edu/pub/netcdf/contrib/ncview.tar.Z>.
A mailing list, [email protected], exists for discussion of the
netCDF interface, and for announcements of netCDF news: to subscribe, send a
message to [email protected] containing the line: "subscribe
netcdfgroup". The archives of netcdfgroup are available from <URL:ftp://ftp.unidata.ucar.edu/mail-archives/netcdfgroup>,
and can be searched at <URL:wais://wais.unidata.ucar.edu:210/netcdf-group.src>.
For more information, contact [email protected].
VICAR (Video Image Communication and Retrieval) is a collection of image
processing programs supported by the Multimission Image Processing Laboratory
(MIPL) at the Jet Propulsion Laboratory (JPL), for use in manipulating and
analyzing spacecraft images. The image format used by VICAR programs, and for
all or most data from JPL-managed missions, is referred to as VICAR format. An
independent third-party description of the VICAR image format is available at
<URL:ftp://lager.geo.brown.edu/pub/doc/vicar_fmt.txt>.
A much more comprehensive and official description of the VICAR image format
was recently spotted at <URL:http://www-mipl.jpl.nasa.gov/vic_file_fmt.html>.
Contact [email protected] for more information.
In recent years, the Planetary Data System (PDS) has been responsible for
archiving space mission data on CD-ROM media, using its own self- describing
data format, variously know as PDS or ODL (Object Description Language). At
least some of the current projects (e.g. Magellan, Galileo) are using the PDS
format as a "pointer" to detached VICAR-format imagery on the mission CDROM
volumes.
The PDS Standards Reference Document can be found at <URL:http://stardust.jpl.nasa.gov/stdref/stdref.htm>.
For more information, contact [email protected].
SAIF (Spatial Archive and Interchange Format) is a Canadian standard for
the exchange of geographic data. It uses an object oriented data model, and
consists of definitions of the underlying building blocks, including tuples,
sets, lists, enumerations, and primitives.
A company has formed to provide
tools and training for the SAIF data standard. Safe Software may be contacted by
email at [email protected] or by phone at either (604) 241-4424 or (604)
583-2016. They maintain a WWW page for SAIF at <URL:http://www.wimsey.com/~infosafe/saif/saifHome.html>
which will be continually updated.
The SAIF specification is also available
by FTP at <URL:ftp://s2k-ftp.cs.berkeley.edu/pub/sequoia/schema/STANDARDS/SAIF>
and <URL:ftp://moon.cecer.army.mil/ogis/related/SAIF3.1>.
There is a SAIF Mailing List: send email to "[email protected]" with the
subject "SAIF Request" to be added to the list.
HDS (Hierarchical Data System) is a freely available database system. It
is particularly suited to the storage of large multi-dimensional arrays (with
their ancillary data) where efficiency of access is a requirement. It is
presently used in astronomy, for storing (in particular) images, spectra and
time series.
Documentation, and information on obtaining the source code, is
available at <URL:http://star-www.rl.ac.uk/> or in a LaTex
document at <URL:ftp://starlink-ftp.rl.ac.uk/pub/doc/star-docs/sun92.tex>.
CXF provides representation of chemical substances and queries, including
atoms, fragments, molecules, and reactions. Also available are various substance
types, including organics, inorganics, polymers, salts, hydrates,
multi-component mixtures and biosequences.
The specification is available at
<URL:ftp://info.cas.org/pub/cxf>.
For more information, interested users should contact Thomas Steckert
([email protected]) or Joseph Mockus ([email protected]). Questions and comments
also are welcome.
JCAMP is a draft standard for spectra data (IR & NMR) and chemical
stuff which is related to netCDF. Some references:
JCAMP-DX for NMR, A. N.
Davies, P. Lampen, Applied Spectroscopy, 1993, 47, 1093-1099;
A proposed
European Implementation of the JCAMP-DX Format, D. N. Rutledge, P. Mcintyre,
Chemometrics and Intelligent Laboratory Systems, 1992, 16, 95-101
JCAMP-DX,
A standard format for exchange of infrared-spectra in computer readable form, J.
G. Grasselli, Pure and Applied Chemistry 1991, 63, 1781-1792
JCAMP-CS A
standard exchange format for chemical-structure information, J.Gasteiger, B. M.
P. Hendriks, P. Hoever, C. Jochum, H. Somberg, Applied spectroscopy, 1991, 45,
4-11
Also, see the DEC 1994 issue of Applied Spectroscopy.
A viewer is
at <URL:http://wwwchem.uwimona.edu.jm:1104/software/jcampdx.html>
The mass spectrometry standard is available at <URL:ftp://ftp.pe-nelson.com/andi-MS/ms_doc.zip>
(192.52.153.11)
CIF (Crystallographic Information File) is becoming standard in the
crystallography world and related fields: <URL:http://www.iucr.ac.uk/cif/home.html>
The OpenMath effort aims at developing a standard exchange format for
mathematical objects (such as formulae processed by computer algebra systems).
The OpenMath home page is located at <URL:http://www.uni-koeln.de/themen/Computeralgebra/OpenMath/index.html>
A new set of TIFF tag extensions for georeferencing raster data within
TIFF 6.0, GeoTIFF, was announced July 1995. Information is available at
<URL:http://www-mipl.jpl.nasa.gov/cartlab/geotiff/geotiff.html>
and specifications and source code are available via ftp at <URL:ftp://mtritter.jpl.nasa.gov/pub/geotiff/>.
A mailing list for discussion of the development of this standard is
[email protected]; to subscribe send email to
[email protected] with subscribe geotiff your-name-here as the
body of the message.
The Digital Line Graph (DLG) format is used by USGS to store geographical
vector data. Documentation on this format is available at <URL:ftp://spectrum.xerox.com/depts/markc/demtools/demwork/dlg/doc/dlgguide.txt.Z>.
A Digital Elevation Model (DEM) consists of a sampled array of elevations
for ground positions that are normally at regularly spaced intervals. <URL:http://edcwww.cr.usgs.gov/glis/hyper/guide/1_dgr_dem>
has information about this format (along with data availability) from the USGS.
Many visualization software packages exist which are intended to be used
with data in one or more of these standard formats. Here are pointers to some
lists of information about this software. (Note that this is somewhat outside
the scope of this document, which is really only intended to discuss data
formats, but I think this will be useful to many.)
Brief descriptions and
pointers to software that can be used with netCDF is at <URL:http://www.unidata.ucar.edu/packages/netcdf/utilities.html>.
A page of links to many scientific visualization and graphics software
packages is at <URL:http://www.msi.umn.edu/SciVis/Packages/packages.html>.
A page of links to both graphics software and various scientific data format
descriptions is at <URL:http://sslab.colorado.edu:2222/sw_list.html>.
An article comparing several scientific visualization techniques and
packages is available at <URL:http://www.sara.nl/Consumer.Report/Report.html>.
This section only describes FTP and telnet in any detail; for other
methods, FTP sites are given, so you can get information on them yourself.
How to use FTP
FTP (File Transfer Protocol) allows transfer of files between two computers which are on the Internet. To access the FTP areas listed here, at your system prompt type "ftp" followed by the name of the desired system. For example, to access ncardata.ucar.edu you'd type
ftp ncardata.ucar.edu
Use "anonymous" as your login and your email address as the password (if requested).
[Note: quotes ("like this") are used to set off names of directories and files, or commands you'd type, and are not part of these names.]
Not all FTP systems accept the same commands, but here's a list of the most useful:
ls list files in the current directory.
cd change directory, e.g. "cd wx" changes to the wx directory.
binary sets binary mode
ascii sets ascii mode (the default). Use for retrieving text.
get retrieves a file, e.g. "get readme" gets a file called readme.
bye exits FTP.
If you can't seem to connect to the site, check to see if it is a telnet site. If it is, follow the instructions in the following section instead.
If you can't FTP from your site, use one of the following ftp-by-mail servers:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Send an e-mail message to the closest address, with the lines:
reply [email protected] <- with your email address
connect ncardata.ucar.edu <- for example
cd datasets/ds111.2/software
get access_sun.f
quit
For complete instructions, send a one-line message reading "help" to the server. Please don't ask me for help!
How to use telnet
Type "telnet" followed by the name or IP number of the desired system. These publicly accessible systems generally allow you to log in but put you in a restricted shell, from which only a certain menu of commands is available. The description for the site will include the login to use.
If you can't seem to connect to the site, re-check its description in the document; if it's an FTP site, follow the instructions in the previous section instead.
Gopher information
Available by ftp at ftp://rtfm.mit.edu/pub/usenet/news.answers/gopher-faq.
Wais information
Available by ftp at ftp://rtfm.mit.edu/pub/usenet/news.answers/wais-faq/getting-started.
WWW information
Available by ftp at ftp://rtfm.mit.edu/pub/usenet/news.answers/www/faq. WWW is so easy to use that you might as well just hop in and try it, so ask your sysadmin if you have a WWW browser such as NCSA Mosaic or Netscape.
If you don't see a format you're interested in here, it could be one of three reasons. First of all, there are a lot of formats which are out of the scope of this newsgroup: it ain't named sci.data.formats for nuthin', you know. Formats used in commercial spreadsheet and word-processing software aren't scientific data formats, and aren't discussed in this group.
Second, it may be that nobody has given the FAQ organizer any information on sources for information on that format. So ask the newsgroup -- and if you do get a response, please let me know what it is!
Finally, you may ask on the net, and hear nothing, because the data format description just isn't publicly available. For most scientific data formats, this is a Bad Thing, and most archivists and scientists want to have their format information available. If you have such information, but don't have resources to make it available, please ask around and see if you can get it into an FTP area or other resource. Please don't publicize private or proprietary formats without the permission of the author, though.