The gridfile class is the core class used to organize climate date. The class creates gridfile objects, which catalogue data stored in various data source files. Each catalogue can span multiple different data files and allows users to load any subset of the catalogued data using a common interface.
Each gridfile object catalogues data on an abstract N-dimensional data grid. This data grid does not actually exist. Rather, the data grid is an abstraction that describes how the data in different files fits together. The dimensions of this grid and their sizes are defined using a gridMetadata object:
The dimensions of the grid are associated with the metadata from the gridMetadata object. This way, each data point on the grid is associated with a particular set of metadata values - essentially, a metadata coordinate:
An empty, 3-dimensional gridfile catalogue with dimensional metadata.
gridfile requires that each data point is associated with a unique metadata coordinate. Thus, although it’s fine to repeat metadata values across different dimensions, each individual dimension must use unique metadata values along its length.
Because of this uniqueness, each data point on the grid is associated with a unique set of metadata values. This allows users to load specific data arrays from the grid by querying the associated metadata values. We’ll look at loading data in more detail in the next coding session.
When you create a new gridfile catalogue, its N-dimensional grid is initially empty. However, you can fill the grid with values by adding data source files to the catalogue. A data source file is a file with some data saved in it. When you add a source file to a catalogue, the contents of the source file are associated with a portion of the N-dimensional grid.
A 3-dimensional gridfile catalogue with two data source files.
Currently, DASH supports the following data file formats: NetCDF, MAT-files, delimited text files, and NetCDF files accessed via OPeNDAP URLs.
Each gridfile catalogue is stored in a file with a .grid extension. This allows you to reuse data catalogues between different coding sessions. Note that .grid files do not store the actual data for a gridded dataset (only metadata and the location of source files), so datasets are not duplicated when added to a catalogue.
Now we’ll take a quick look at some features of the gridfile class. This section is just meant as an overview - we’ll go in further depth in the next open coding session.
One of the most useful feature of gridfile catalogues is the ability to load data from any portion of the catalogue using the load command. This command allows users to load data arrays that may span multiple files, and even multiple file formats, without needing to interact with any of the individual files.
Furthermore, the load command allows users to load specific subsets of the data catalogue by querying specific metadata values. The use of human-readable metadata, rather than array indices or other syntaxes, helps make code more readable and easier to use.
You can specify fill values and/or a valid data range for .grid catalogues. If you specify a fill value, data matching the fill value is converted to NaN when loaded. Similarly, if you specify a valid data range, values outside of the range are converted to NaN upon load. When setting fill values and valid ranges, you can set values for the entire catalogue, or for data in individual source files.
gridfile allows you to apply various mathematical transformations to data loaded from a catalogue. The class currently supports:
Addition: A+X
Multiplication: A*X
Linear transform: A+B*X
Exponential: exp(X)
Power: X^A
Natural log: ln(X)
Base-10 log: log10(X)
These transformations are often useful for converting the units of loaded data. You can apply transformations to an entire gridfile catalogue, or to data in individual source files.
gridfile includes support for applying arithmetic operations across multiple catalogues. This functionality is similar to NetCDF Operator (NCO) commands, but can be applied to any gridfile catalogue, regardless of the formats of data source files. These arithmetic commands are often used when a climate variable of interest must be calculated using multiple output variables from a climate model.
The gridfile class supports a number of other methods that we will not detail in this tutorial. However, you can read about all supported gridfile commands by entering dash.doc("gridfile") in the Matlab console.