Coding 8

Goal

In this session, we will use the kalmanFilter class to run an assimilation.

Step 1: Create kalmanFilter object

We’ll start by using the kalmanFilter command to create an object. The syntax is:

obj = kalmanFilter

where obj is the new kalmanFilter object. You can optionally use the second input to label the object:

obj = kalmanFilter(label)

where label is a string.

Tip

You can also use the kalmanFilter.label command to label the object at any later point.

Here, we’ll create a kalmanFilter object for the NTREND demo. We’ll also apply a label to the object:

label = "NTREND Demo";
kf = kalmanFilter(label);

Inspecting the object:

disp(kf)
kalmanFilter with properties:

            Label: NTREND Demo

     Observations: none
            Prior: none
        Estimates: none
    Uncertainties: none

        Returning:
            Mean

we can see the initialized Kalman filter. We can see that none of the 4 essential inputs (observations, prior, estimates, and uncertainties) have been set yet. We can also see that the object will only update and return the ensemble mean when running the Kalman filter algorithm.

Here, we’ll create a kalmanFilter object for the LGM temperature assimilation. We’ll also apply a label to the object:

label = "LGM Demo";
kf = kalmanFilter(label);

Inspecting the object:

disp(kf)
kalmanFilter with properties:

            Label: LGM Demo

     Observations: none
            Prior: none
        Estimates: none
    Uncertainties: none

        Returning:
            Mean

we can see the initialized Kalman filter. We can see that none of the 4 essential inputs (observations, prior, estimates, and uncertainties) have been set yet. We can also see that the object will only update and return the ensemble mean when running the Kalman filter algorithm.

Step 2: Essential Inputs

Next, we’ll use the prior, observations, estimates, and uncertainties commands to provide essential inputs to the Kalman filter. All four commands use a similar base syntax - each command takes a data array as input, and outputs an updated kalmanFilter object:

obj = obj.prior(X)
obj = obj.observations(Y)
obj = obj.estimates(Ye)
obj = obj.uncertainties(R)
X

The prior may either be provided via an ensemble object or as a data matrix. If using a matrix, each row should be an element of the state vector, and each column should be an ensemble member.

Y

The proxy observations should be a data matrix. Each row holds a particular proxy record and each column holds the values for an assimilation time step. You can use a NaN value when a proxy record does not have values for a particular time step.

Ye

The proxy estimates should be a matrix with one row per proxy record, and one column per ensemble member.

R

The proxy uncertainties should be a data matrix holding either error-variances or a full error-covariance matrix. If using error variances, the uncertainties should be a column vector with one row per proxy record. If using error-covariances, the matrix should be symmetric with one row and one column per proxy record.

You can also modify these commands to use different values in different assimilation time steps. (For example, to use an evolving prior). We will not discuss this syntax in the tutorial, but you can read about it in the DASH documentation.

Select Reconstruction Targets

When providing an assimilation prior, the prior only needs to contain state vector variables that represent reconstruction targets. Since we already generated proxy estimates, we don’t need to assimilate variables used only as forward model inputs. You can use the ensemble.useVariables command to restrict an ensemble object to specific state vector variables. The syntax for the command is:

obj = obj.useVariables(variables)
variables

The input should list the names or indices of specific variables in the state vector ensemble.

obj

The output is an updated ensemble object.

We’ll use the four input commands to provide the essential data values for our assimilation. We’ll start by providing the prior. Our prior will consist of the reconstruction targets T and T_index. We’ll use the ensemble.useVariables command to limit the reconstruction to these two variables:

% Get an ensemble object
ens = ensemble('ntrend');

% Restrict the object to reconstruction target variables
variables = ["T", "T_index"];
ens = ens.useVariables(variables);

% Provide the ensemble to the Kalman filter
kf = kf.prior(ens);

Next, we’ll provide the proxy records. The proxy records are catalogued in ntrend.grid, so we’ll first use the gridfile.load command to load them as a data array:

% Load the proxy estimates
proxies = gridfile('ntrend');
Y = proxies.load;

% Provide the proxy records to the Kalman filter
kf = kf.observations(Y);

Next, we’ll provide the proxy estimates (Ye). We generated these proxy estimates in the previous open coding session:

% Provide the proxy estimates
kf = kf.estimates(Ye);

Finally, we’ll provide proxy uncertainties - specifically, error variances. These values are provided in the ntrend.mat file, and were produced by (1) running the forward models on instrumental observations, and then (2) comparing the instrumental proxy estimates to the real instrumental proxy records:

% Load the proxy uncertainties
data = load('ntrend.mat', 'R');
R = data.R;

% Provide the uncertainties to the Kalman filter
kf = kf.uncertainties(R);

Inspecting the updated kalmanFilter object:

disp(kf)
kalmanFilter with properties:

                  Label: NTREND Demo

           Observations: set
                  Prior: static
              Estimates: set
          Uncertainties: variances

      Observation Sites: 54
    State Vector Length: 4321
       Ensemble Members: 1156
                 Priors: 1
             Time Steps: 1262

              Returning:
                  Mean

we can see that the Kalman filter now includes all four essential inputs. We can see it uses a static (time-independent) prior, and error-variances for the uncertainties. The output also shows a few key sizes, such as the number of observations sites, prior, assimilation time steps, etc.

We’ll use the four input commands to provide the essential data values for our assimilation. We’ll start by providing the prior using an ensemble object:

% Get the ensemble object
ens = ensemble('lgm');

% Provide the ensemble to the Kalman filter
kf = kf.prior(ens);

Next, we’ll provide the proxy records. The proxy records are catalogued in uk37.grid, so we’ll first use the gridfile.load command to load them as a data array:

% Load the proxy records
proxies = gridfile('uk37');
Y = proxies.load;

% Provide the proxy records to the Kalman filter
kf = kf.observations(Y);

Next, we’ll provide the proxy estimates (Ye) and uncertainties (R). We generated both of these in the previous coding session:

% Provide proxy estimates and uncertainties
kf = kf.estimates(Ye);
kf = kf.uncertainties(R);

Inspecting the updated kalmanFilter object:

disp(kf)
kalmanFilter with properties:

                  Label: LGM Demo

           Observations: set
                  Prior: static
              Estimates: set
          Uncertainties: variances

      Observation Sites: 89
    State Vector Length: 122880
       Ensemble Members: 16
                 Priors: 1
             Time Steps: 1

              Returning:
                  Mean

we can see that the Kalman filter now includes all four essential inputs. We can see it uses a static (time-independent) prior, and error-variances for the uncertainties. The output also shows a few key sizes, such as the number of observations sites, prior, assimilation time steps, etc.

Step 3: Covariance Adjustments

In this tutorial, we’ll focus on covariance localization, but feel free to try out other covariance adjustments. You can implement localization using the kalmanFilter.localize command. It’s syntax is:

obj = obj.localize(wloc, yloc)

where wloc and yloc are the localization weights between (1) the proxy estimates and state vector elements, and (2) the proxy estimates and one another. The dash.localize package contains functions for calculating localization weights. We’ll use the gc2d function, which implements a Gaspari-Cohn 5th order polynomial in 2 dimensions (this is a standard localization scheme for paleoclimate DA). The syntax for this command is:

[wloc, yloc] = dash.localize.gc2d(stateCoordinates, proxyCoordinates, R)
stateCoordinates

The first input lists the latitude-longitude coordinate of each element in the state vector. This is a matrix with one row per state vector element, and 2 columns. The first column is latitude, and the second is longitude.

proxyCoordinates

The second input lists the latitude-longitude coordinate of each proxy site. This is a matrix with one row per proxy record and 2 columns. As before, the first column is latitude, and the second is longitude.

R

The third input lists the localization radius in kilometers.

In this demo, we’ll implement covariance localization with a localization radius of 15,000 km. We’ll start by using the metadata in ntrend.grid to obtain the proxy coordinates:

% Load the metadata for each site
site = gridfile('ntrend').metadata.site;
lats = str2double(site(:,2));
lons = str2double(site(:,3));

% Get the proxy coordinates
proxyCoordinates = [lats, lons];

Next, we’ll use the ensembleMetadata.latlon command to return latitude-longitude coordinates for each element in the state vector. (Note that we’ll first use the ensemble.useVariables command to restrict the ensemble to the T and T_index reconstruction targets):

% Get the ensemble object for the reconstruction targets
ens = ensemble('ntrend');
ens = ens.useVariables(["T", "T_index"]);

% Get the ensembleMetadata object
ensMeta = ens.metadata;

% Get the latitude-longitude coordinates for the state vector
stateCoordinates = ensMeta.latlon;

With these coordinates, we’ll use the dash.localize.gc2d function to calculate localization weights:

% Calculate localization weights for a 15000 km radius
R = 15000;
[wloc, yloc] = dash.localize.gc2d(stateCoordinates, proxyCoordinates, R);

Finally, we’ll provide the localization weights to the kalmanFilter object:

kf = kf.localize(wloc, yloc);

Inspecting the updated object:

disp(kf)
kalmanFilter with properties:

                  Label: NTREND Demo

           Observations: set
                  Prior: static
              Estimates: set
          Uncertainties: variances

      Observation Sites: 54
    State Vector Length: 4321
       Ensemble Members: 1156
                 Priors: 1
             Time Steps: 1262

             Covariance:
                  Localization

              Returning:
                  Mean

we can see that the object will now implement covariance localization when running a Kalman filter.

In this demo, we’ll implement covariance localization with a localization radius of 22,000 km. We’ll start by using the metadata in UK37.grid to obtain the proxy coordinates:

% Load the metadata for each site
site = gridfile('uk37').metadata.site;
lats = str2double(site(:,2));
lons = str2double(site(:,3));

% Get the proxy coordinates
proxyCoordinates = [lats, lons];

Next, we’ll use the ensembleMetadata.latlon command to return latitude-longitude coordinates for each element in the state vector. Our SST variable is located on a tripolar grid, so DASH treats the climate model output as a collection of unique spatial sites. The spatial metadata is thus organized along the “site” dimension, rather than “lat” and “lon”. Because of this we’ll use the first input to the latlon command to indicate that the first column of “site” metadata corresponds to latitude, and the second column is longitude:

% Get the ensembleMetadata object
ensMeta = ensemble('lgm').metadata;

% Get the latitude-longitude coordinates
siteColumns = [1 2];
stateCoordinates = ensMeta.latlon([1 2]);

With these coordinates, we’ll use the dash.localize.gc2d function to calculate localization weights:

% Calculate localization weights for a 15000 km radius
R = 22000;
[wloc, yloc] = dash.localize.gc2d(stateCoordinates, proxyCoordinates, R);

Finally, we’ll provide the localization weights to the kalmanFilter object:

kf = kf.localize(wloc, yloc);

Inspecting the updated object:

disp(kf)
kalmanFilter with properties:

                  Label: LGM Demo

           Observations: set
                  Prior: static
              Estimates: set
          Uncertainties: variances

      Observation Sites: 89
    State Vector Length: 122880
       Ensemble Members: 16
                 Priors: 1
             Time Steps: 1

             Covariance:
                  Localization

              Returning:
                  Mean

we can see that the object will now implement covariance localization when running a Kalman filter.

Step 4: Select outputs

As mentioned, you can use various commands to indicate that the Kalman filter should return specific outputs. In this tutorial, we’ll focus on the kalmanFilter.variance and kalmanFilter.deviations commands, which share a similar syntax. Use:

obj = obj.variance(true)

to return the variance of the posterior ensemble and:

obj = obj.deviations(true)

to return the full ensemble deviations. These outputs will be labeled as Avar and Adev in the Kalman filter output.

You can also use the kalmanFilter.percentiles command to return specific percentiles of the posterior ensemble. Here the syntax is:

obj = obj.percentiles(percentages)

where the percentages input is a vector of percentages for which to compute percentiles. The percentiles will be labeled as Aperc in the Kalman filter output.

In this demo, we’ll return the variance and quartiles of the posterior ensemble. First, we’ll use the variance command to indicate that the algorithm should update the ensemble deviations and return ensemble variance:

kf = kf.variance(true);

Then, we’ll use the percentiles command to also request the quartiles of the ensemble as output:

percentages = [25 50 75];
kf = kf.percentiles(percentages);

Inspecting the updated object:

disp(kf)
kalmanFilter with properties:

                  Label: NTREND Demo

           Observations: set
                  Prior: static
              Estimates: set
          Uncertainties: variances

      Observation Sites: 54
    State Vector Length: 4321
       Ensemble Members: 1156
                 Priors: 1
             Time Steps: 1262

             Covariance:
                  Localization

              Returning:
                  Mean
                  Variance
                  Percentiles (3)

we can see that the object will return the variance of the ensemble, as well as the 3 requested percentiles, when the object runs the Kalman filter algorithm.

In this demo, we’ll return the full ensemble deviations. This is possible because we are only assimilating a single time step, so the output ensemble won’t be too large. We’ll use the deviations command to indicate that the algorithm should update and return the ensemble deviations:

kf = kf.deviations(true);

Inspecting the updated object:

disp(kf)
kalmanFilter with properties:

                  Label: LGM Demo

           Observations: set
                  Prior: static
              Estimates: set
          Uncertainties: variances

      Observation Sites: 89
    State Vector Length: 122880
       Ensemble Members: 16
                 Priors: 1
             Time Steps: 1

             Covariance:
                  Localization

              Returning:
                  Mean
                  Deviations

we can see that the object will now return both the ensemble mean and deviations when running a Kalman filter.

Step 5: Run the filter

We’re finally ready to run the Kalman filter algorithm. We’ll do this using the kalmanFilter.run command. The base syntax for this command is:

output = obj.run;

The output is a struct with fields for the requested outputs. These may include:

  • Amean: The updated ensemble mean

  • Adev: The updated ensemble deviations

  • Avar: The variance of the posterior ensemble

  • Aperc: Requested percentiles of the posterior ensemble

as well as other output fields.

We’ll use the run command to run the Kalman filter algorithm. As a reminder, the object will implement covariance localization with a localization radius of 15,000 km. Also, the algorithm should return the updated ensemble mean, ensemble variance, and ensemble quartiles for each assimilated time step:

output = kf.run;

Examining the output:

disp(output)
struct with fields:

             Amean: [4321×1262 single]
  calibrationRatio: [54×1262 single]
              Avar: [4321×1262 single]
             Aperc: [4321×3×1262 single]

we can see it includes the updated ensemble mean for each of the 1262 assimilated time steps (Amean), the ensemble variance in each time step (Avar), and the 3 ensemble quartiles in each time step (Aperc). The output also includes the calibration ratio for each proxy record in each time step.

We’ll use the run command to run the Kalman filter algorithm. As a reminder, the object will implement covariance localization with a localization radius of 15,000 km. Also, the algorithm should return both the updated ensemble mean and deviations:

output = kf.run;

Examining the output:

disp(output)
struct with fields:

             Amean: [122880×1 double]
  calibrationRatio: [89×1 double]
              Adev: [122880×16 double]

we can see it includes the updated ensemble mean (Amean), and the updated deviations from the ensemble mean for each ensemble member (Adev). The output also includes the calibration ratio for each proxy record in each time step.

Step 6: Regrid state vector variables

At this point, you’ll typically want to start mapping and visualizing the assimilation outputs. However, the assimilated variables are still organized as state vectors, which can hinder visualization. You can use the ensembleMetadata.regrid command to (1) extract a variable from a state vector, and (2) return the variable to its original data grid. The base syntax is:

[V, metadata] = obj.regrid(variable, X)
variable

The first input is the name or index of a variable in the state vector.

X

The second input is a data array with a dimension that proceeds along the state vector. Most of the output fields from kalmanFilter.run use the first dimension as the state vector dimension.

obj

The object is an ensembleMetadata object for the prior ensemble.

V

The first output is the regridded variable. The state vector dimension will be reshaped to the original grid dimensions. All other data dimensions are left unaltered.

metadata

The second output is a gridMetadata object for the regridded variable. Note that this metadata is only for the reshaped state vector dimension. Other data array dimensions (such as assimilation time steps) are not included in this metadata.

A note on tripolar grids: DASH usually represents tripolar grids as a collection of unique spatial sites. The regrid command will reshape these spatial sites into a single site dimension, so you’ll often want to reshape the regridded site dimension to the size of the original tripolar output.

Here, we’ll regrid the updated ensemble mean of the temperature field (the T variable):

% Get the ensembleMetadata object for the prior
ens = ensemble('ntrend');
ens = ens.useVariables(["T", "T_index"]);
ensMeta = ens.metadata;

% Regrid the ensemble mean of the temperature field
[T, metadata] = ensMeta.regrid("T", output.Amean);

Investigating the output:

siz = size(T)
siz =
         144          30        1262

we can see that the regridded output has dimensions of (longitude x latitude x assimilation time steps). The returned metadata:

disp(metadata)
gridMetadata with metadata:

  lon: [144×1 double]
  lat: [30×1 double]

includes values for the regridded lon and lat dimensions. The metadata does not include values for the third dimension, because assimilation time steps were not a dimension of the original data grid.

Here, we don’t actually need to regrid the output. This is for two reasons. First, we assimilated a single state vector variable, so we don’t need to extract the variable from the rest of the state vector. Second, because our variable is on a tripolar grid, the variable has a single site spatial dimension - as such, the variable would just be regridded as the current state vector. Instead, we will use Matlab’s reshape command to return the tripolar grid to its original shape. Note that the original tripolar model output was organized on a 320 x 384 spatial grid:

% Reshape the updated SST field
SST = reshape(output.Amean, 320, 384);

We will also want to reshape the latitude and longitude metadata. We can use the ensembleMetadata object for the prior to obtain this metadata:

% Get the latitude and longitude metadata
ensMeta = ensemble('lgm').metadata;
latlon = ensMeta.latlon([1 2]);

% Reshape the metadata
lat = reshape(latlon(:,1), 320, 384);
lon = reshape(latlon(:,2), 320, 384);

We can now use the regridded variable and metadata with various mapping utilties.

Step 7: Visualize!

That’s it, the assimilation is complete! Try visualizing some of the outputs. Plotting data is outside of the scope of DASH, so use whatever mapping and visualization tools you prefer. You may be interested in:

and there are many other resources built in to Matlab, as well as online.

Full Demo

This section recaps all the essential code from the demos and may be useful as a quick reference.

% Load the proxy data catalogue, and the prior ensemble/its metadata
proxies = gridfile('ntrend');
ens = ensemble('ntrend');
ens = ens.useVariables(["T", "T_monthly"]);
ensMeta = ens.metadata;

% Create a kalman filter object
kf = kalmanFilter("NTREND Demo");

% Collect essential inputs
X = ens;
Y = proxies.load;
% Ye    (from PSM.estimate)
R = load('ntrend.mat','R').R;

% Provide essential inputs to the filter
kf = kf.prior(X);
kf = kf.observations(Y);
kf = kf.estimates(Ye);
kf = kf.uncertainties(R);

% Get localization weights
stateCoordinates = ensMeta.latlon;
proxyCoordinates = str2double( proxies.metadata.site(:,2:3) );
radius = 22000;
[wloc, yloc] = dash.localize.gc2d(stateCoordinates, proxyCoordinates, radius);

% Add localization to the filter
kf = kf.localize(wloc, yloc);

% Return ensemble variance and percentiles
percentages = [25 50 75];
kf = kf.percentiles(percentages);
kf = kf.variance(true);

% Run the filter
output = kf.run;

% Extract and regrid variables
[T, metadata] = ensMeta.regrid("T", output.Amean);
% Load the proxy data catalogue, and the prior ensemble/its metadata
proxies = gridfile('UK37');
ens = ensemble('lgm');
ensMeta = ens.metadata;

% Create a Kalman filter object
kf = kalmanFilter('LGM Demo');

% Collect essential inputs
X = ens;
Y = proxies.load;
% Ye     (from PSM.estimate)
% R      (from PSM.estimate)

% Provide essential inputs to the filter
kf = kf.prior(ens);
kf = kf.observations(Y);
kf = kf.estimates(Ye);
kf = kf.uncertainties(R);

% Compute localization weights
stateCoordinates = ensMeta.latlon([1 2]);
proxyCoordinates = str2double( proxies.metadata.site(:, 2:3) );
radius = 22000;
[wloc, yloc] = dash.localize.gc2d(stateCoordinates, proxyCoordinates, radius);

% Add localization to the filter
kf = kf.localize(wloc, yloc);

% Return the full ensemble deviations
kf = kf.deviations(true);

% Run the filter
output = kf.run;

% Reshape tripolar output
gridSize = [320 384];
SST = reshape(output.Amean, gridSize);
lat = reshape(stateCoordinates(:,1), gridSize);
lon = reshape(stateCoordinates(:,2), gridSize);