Coding 10

Goal

Use the optimalSensor class to implement an assimilation.

Step 1: Create optimalSensor object

We’ll start by using the optimalSensor command to create an object. The syntax is:

obj = optimalSensor

where obj is the new optimalSensor object. You can optionally use the second input to label the object:

obj = optimalSensor(label)

where label is a string.

Tip

You can also use the optimalSensor.label command to label the object at any later point.

Here, we’ll create a kalmanFilter object for the NTREND demo. We’ll also apply a label to the object:

label = "NTREND Demo";
os = optimalSensor(label);

Inspecting the object:

disp(os)
optimalSensor with properties:

            Label: NTREND Demo

           Metric: none
        Estimates: none
    Uncertainties: none

we can see the initialize optimalSensor object. We can see that none of the essential inputs have been set yet.

Step 2: Essential Inputs

Next, we’ll use the metric, estimates, and uncertainties commands to provide the essential inputs to the optimal sensor. All three commands use a similar base syntax - each command takes a data array as input, and outputs an updated optimalSensor object:

obj = obj.metric(X)
obj = obj.estimates(Ye)
obj = obj.uncertainties(R)
X

The climate metric should be a vector with one element per ensemble member.

Ye

The proxy estimates should be a matrix with one row per proxy record, and one column per ensemble member.

R

The proxy uncertainties should be a data matrix holding either error-variances or a full error-covariance matrix. If using error variances, the uncertainties should be a column vector with one row per proxy record. If using error-covariances, the matrix should be symmetric with one row and one column per proxy record.

We’ll use the input commands to provide the data values for our optimal sensor analyses. We’ll start by providing the climate metric. Here, we’ll use the T_index variable from our ensemble. We can use the ensemble.useVariables and ensemble.load commands to load this metric directly. (You can find a detailed discussion of the useVariables command here: Select Reconstruction Targets):

% Load the "T_index" variable from the ensemble
ens = ensemble('ntrend');
ens = ens.useVariables("T_index");
T_index = ens.load;

% Use as the optimal sensor metric
os = os.metric(T_index);

We previously generated the proxy estimates using the PSM.estimate command, so we’ll provide those next:

% Provide proxy estimates
os = os.estimates(Ye);

Finally, we’ll provide proxy uncertainties - specifically, error variances. These values are provided in the ntrend.mat file, and were produced by (1) running the forward models on instrumental observations, and then (2) comparing the instrumental proxy estimates to the real instrumental proxy records:

% Load the proxy uncertainties
data = load('ntrend.mat', 'R');
R = data.R;

% Provide the uncertainties to the optimal sensor
os = os.uncertainties(R);

Inspecting the updated object:

disp(os)
optimalSensor with properties:

                Label: NTREND Demo

               Metric: set
            Estimates: set
        Uncertainties: set (variances)

    Observation Sites: 54
     Ensemble Members: 1156

we can see that the optimal sensor now has all three essential inputs. The output also shows two key sizes - the number of observations sites, and the number of ensemble members.

Step 3: run command

We’ll next use the optimalSensor.run command to run the greedy algorithm. The base syntax is:

[optimalSites, variance, metric] = obj.run
optimalSites

This vector indicates which site was the “optimal” site for each iteration of the algorithm. Each element holds the index of one of the sites in the proxy network. These indices are relative to the input proxy network, and are not affected by the decreasing size of the network with each algorithm iteration.

variance

This vector indicates the variance of the climate metric after each iteration of the algorithm.

metric

This matrix indicates the climate metric after each algorithm iteration. Each row holds the ensemble at the end of an iteration, and each column holds the values for an ensemble member.

Here, we’ll run the greedy algorithm:

[optimalSites, variance, metric] = os.run;

Examining the first 5 optimal sites:

optimalSites(1:5)
 2
 3
52
 7
 4

we can see that proxy site 2 was the most optimal site in the network. After site 2 is used to update the climate metric, site 3 is the next-most optimal site. After site 3 is used to updated the metric, site 52 is the next most valuable and so on.

We can use the variance output to examine how the variance of the metric changes as each of these site is assimilated:

variance(1:5)
0.27035
0.26537
0.26079
0.25839
0.25629

Step 4: evaluate command

Next, we’ll use the evaluate command to assess the ability of proxy site to reduce variance as the only proxy in the network. This will allow us to quantify proxy influence without the confounding variables of other covarying proxy sites. The syntax for this command is:

deltaVariance = os.evaluate
deltaVariance

This vector indicates the total reduction in variance caused by each individual proxy site in the network.

Note that the total variance reduced by a proxy network will be smaller than the sum of deltaVariance. This is because covariance between proxy sites reduces the effects of individual proxies.

Here, we’ll run the evaluation algorithm:

deltaVariance = os.run
54×1 single column vector

  0.0035785
   0.010422
  0.0075898
  ...
  0.0045732
 1.7153e-06
  0.0018802

If we rank the proxy sites by their ability to reduce variance:

[~, rank] = sort(deltaVariance, 'descend')

and examine the 5 best proxy sites:

rank(1:5)
 2
 3
 4
52
 8

we can see that the sites differ slightly from the results of the greedy algorithm. In particular, sites 4 and 8 appear to have a higher rank in the evaluation than in the greedy algorithm. This is because sites 4 and 8 covary strongly with sites 2 and 3. Because of this covariance, these sites share a portion of the total variance reduction. However, in the greedy algorithm, all of this shared variance reduction is assigned to sites 2 and 3, because these sites are selected earlier in the algorithm. Consequently, sites 4 and 8 receive a lower rank.

By contrast, the evaluate command allows us to begin to disentangle the effects these covariances, and we can see that the individual influence of site 4 is the third-highest in the network. Similarly, site 8 has the 5th most influence in the network.

Step 5: update command

The update command allows you to assess the total reduction in variance that occurs for a proxy network. As mentioned, the evaluate command deliberately removes the effects of proxy covariance, and so cannot be used to accurately assess the total reduction of variance for a network. Similarly, each iteration of the greedy algorithm updates proxy estimates linearly via the Kalman Gain - this is not appropriate for non-linear forward models, and for covarying proxy uncertainties, so the run command should also not be used to evaluate total reduction of variance. Instead, use the update method, which accounts for these factors. Here the syntax is:

[variance, metric] = os.update
variance

This scalar reports the final variance of the metric after the full proxy network is used to update the ensemble.

metric

This vector returns the updated metric across the ensemble.

Here, we’ll run the update:

[variance, metric] = os.update

Inspecting the final variance:

disp(variance)
0.24022

we can see that assimilating the entire proxy network will reduce metric variance to 0.24022.