stateVector Indices
As previously mentioned, the stateVector
class uses data from gridfiles to build a state vector ensemble. However, in most cases, we will only need a small subset of the data in a gridfile. Thus, when you design the template for a state vector ensemble, you’ll use various types of indices to select the relevant data and implement design concepts like sequences.
On this page, we’ll give an overview of the various types of indices used to build state vector ensembles. We’ll also illustrate how these indices interact, in order to better explain how the stateVector
class functions “under-the-hood”. In general, state vector indices are defined within the scope of individual variables. Each variable may implement different sets of indices, and the indices are interpreted relative to the variable’s associated gridfile.
State Indices
State indices indicate the elements along a state dimension that will be included in the state vector. For a given variable, each state dimension has an accompanying set of state indices. These state indices refer to the elements along the dimension in the variable’s gridfile.
Reference Indices
Reference indices indicate the elements along an ensemble dimension that can be used to select ensemble members. Essentially, these indices point to metadata values that serve as reference points for potential members of an ensemble. It is important here to distinguish between reference indices and reference points. A reference index points to a particular element along an ensemble dimension. A reference point is the metadata value associated with that element.
As an example, say I have a variable with time
as an ensemble dimension. The time
dimension in the variable’s gridfile extends from 850 CE to 2005 CE at monthly resolution:
time = datetime(850,1,15) : calmonths(1) : datetime(2005,12,15)
Suppose I want to always select ensemble members from the month of June. I could use:
juneIndices = month(time)==6;
to create reference indices for the variable. These reference indices point to the 1156 June months along the time dimension:
These June time steps are the reference points for building an ensemble. In this example, ensemble members could be selected from any of June 850, June 851, June 852, …, June 2003, June 2004, June 2005.
How DASH selects ensemble members
Thus far, we have only discussed reference indices in the context of a single variable. However as previously mentioned, reference indices are defined individually for each variable in a state vector. This raises the question: How does stateVector
deal with multiple sets of reference indices?
Ultimately, the class relies on the reference points for each variable - that is, the metadata values associated with each variable - rather than the direct reference indices. When selecting ensemble members, the stateVector
class uses the following algorithm:
Obtain the reference points for each variable
Determine the set of reference points common to all variables in the state vector.
Discard any reference points not in this set
The class will then select ensemble members from the remaining reference points, which are common to all variables in the state vector. This ensures that every ensemble member is complete, even when gridfile catalogues have different ranges along a dimension.
Important
By default, the stateVector
class will ensure that all data within an ensemble member correspond to the same reference points. This way, all variables represent the same time step / model run / etc.
At this point, you may be wondering: If stateVector
only cares about metadata values, then why bother with reference indices? Why not just use the metadata values directly?
As we will see in the next two sections, using indices will allow us to implement other design features. Specifically, they will let us implement sequences and means along ensemble dimensions.
Sequence Indices
As previously discussed, you may want an ensemble dimension to exhibit some structure down the state vector. Typically, this occurs when a state vector should include data from several points in time. You can use sequence indices to implement these features. Like state and reference indices, sequence indices are defined individually for the dimensions of each variable. Since only ensemble dimensions can exhibit a sequence, you can only define sequence indices for ensemble dimensions.
Each set of sequence indices lists a series of offsets. These offsets are applied to the reference indices of the ensemble dimension to yield a set of adjusted indices, which we refer to as sequence elements. These sequence elements indicate the elements along the dimension that should be used in each potential ensemble member.
Building off the example in Figure 2, let’s say I want the state vector to include data from each of June, July, and August, so that my desired ensemble resembles the following:
Since the reference indices point to individual June months, the sequence indices should indicate the offsets between each June reference point, and the associated June, July, and August. Since each June is the reference point, the offset for June (along the time dimension) is 0. Since each July occurs one time step after its associated June, the offset for July is 1. Similarly, the offset for August in 2. This gives our sequence indices:
sequenceIndices = [0 1 2];
Using these sequence indices, each ensemble member will include data from each of these three sequence elements:
Sequence indices are quite flexible. They do not need to be evenly spaced (and don’t even need to be sorted). They can contain negative elements (for time steps before a reference point), and do not need to include 0. Note that if the sequence indices do not contain zero, then the variable will not contain data from the reference point itself. It will only contain data from the indicated sequence elements.
Note
The stateVector
class will only select ensemble members that allow for complete sequences. The class will discard any ensemble members in which a sequence requires data outside of the associated gridfile.
Mean/Total Indices
Implementing a mean or total over state dimensions is relatively straightforward, as stateVector
can simply implement a mean/total over the data elements included in the state vector. However, you may also want to implement means/totals over ensemble dimensions. For example, you may want each ensemble member to implement a temporal mean, or a mean over multiple model runs. We will use mean/total indices to implement means/totals over ensemble dimensions.
Mean/Total indices function similarly to sequence indices and also list a series of offsets applied to reference indices. However, stateVector
then takes a mean/sum over the indicated data elements, rather than implementing a sequence.
Returning to the example figures, let’s say I want the state vector to implement a June-July-August seasonal mean, rather than a sequence. In this case, the desired state vector ensemble would resemble the following:
Since the reference indices point to individual June months, the mean indices should indicate the offsets between each June reference point, and the associated June, July, and August included in the mean. Since each June is the reference point, the offset for June (along the time dimension) is 0. Since each July occurs one time step after its associated June, the offset for July is 1. Similarly, the offset for August in 2. This gives our mean indices:
meanIndices = [0 1 2];
Using these mean indices, each ensemble member will implement a mean over the associated three elements:
As with sequence indices, mean/total indices do not need to be evenly spaced or sorted. They may contain negative values and do not need to contain zero. Note that if the mean/total indices do not contain zero, then the variable will not contain data from the reference point itself. It will only contain data from the indicated mean/total elements.
Note
The stateVector
class will only select ensemble members that allow for complete means. The class will discard any ensemble members in which a mean requires data outside of the associated gridfile.
Combining sequences and means/totals
In some cases, you may want a state vector to implement a sequence of means/totals. For example, a series of seasonal means/totals within a year, or a series of decadal means/totals following a climate event. When this occurs, you will need to use both sequence indices and mean/total indices. When you provide both sets of indices, stateVector
uses the following procedure to build an ensemble member:
Locate the ensemble member’s reference point
Apply sequence index offset to locate the sequence elements
Apply mean/total indices to each individual sequence element
The procedure is summarized in the following figure:
Returning to the example, let’s say I want each state vector to implement a moving, three-month seasonal mean from June-July-August (JJA) to September-October-November (SON). In this case, the state vector ensemble would resemble the following:
To implement this, we’ll start by using sequence indices to locate the starting month of each seasonal mean. Since the reference indices point to individual June months, the sequence indices should indicate the offset between each June month and the associated June, July, August, and September. This gives us:
sequenceIndices = [0 1 2 3];
Next, we’ll use mean indices to locate the months that should be used in each seasonal mean. Since the sequence elements point to the starting month of each seasonal mean, the mean indices should indicate the offset between each starting month and the months used in the seasonal mean. This gives us:
meanIndices = [0 1 2];
Using these indices, first sequence element will implement a JJA seasonal mean:
The second sequence element will be a JAS seasonal mean:
and so on.