improver.calibration.ensemble_calibration module

This module defines all the “plugins” specific for Ensemble Model Output Statistics (EMOS).

Ensemble Model Output Statistics (EMOS)

Ensemble Model Output Statistics (EMOS), otherwise known as Nonhomogeneous Gaussian Regression (NGR), is a technique for calibrating an ensemble forecast.

Estimating EMOS coefficients

Following Gneiting et al., 2005, Ensemble Model Output Statistics for a normal distribution can be represented by the equation:

\[\mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})\]

where the location parameter is a bias-corrected weighted average of the ensemble member forecasts, or alternatively, where the location parameter is a bias-corrected ensemble mean:

\[\mathcal{N}(a + b\bar{X}, c + dS^{2})\]

If a different distribution is required, for example, using a truncated normal distribution for wind speed, then the equations remain the same, apart from updating the distribution chosen. The preferred distribution will depend upon the variable being calibrated.

The a, b, c and d coefficients within the equations above are computed by minimising the Continuous Ranked Probability Score (CRPS), in terms of \(\alpha, \beta, \gamma\) and \(\delta\) (explained below). The distribution is accounted for through the formulation of the CRPS that is minimised e.g. please see Gneiting et al., 2005 for an example using a normal distribution and Thorarinsdottir and Gneiting, 2010 for an example using a truncated normal distribution.

Distribution description

A normal (Gaussian) distribution is often represented using the syntax:

\[\mathcal{N}(\mu,\,\sigma^{2})\]

where \(\mu\) is mean and \(\sigma^{2}\) is the variance. The normal distribution is a special case, where \(\mu\) can be interpreted as both the mean and the location parameter and \(\sigma^{2}\) can be interpreted as both the variance and the square of the scale parameter. For an alternative distribution, such as a truncated normal distribution that has been truncated to lie within 0 and infinity, the distribution can be represented as:

\[\mathcal{N^0}(\mu,\,\sigma^{2})\]

In this case, the \(\mu\) is strictly interpreted as the location parameter and \(\sigma^{2}\) is strictly interpreted as the square of the scale parameter.

What is the location parameter?

The location parameter indicates the shift in the distribution from the “centre” of the standard normal.

What is the scale parameter?

The scale parameter indicates the width in the distribution. If the scale parameter is large, then the distribution will be broader. If the scale is smaller, then the distribution will be narrower.

Implementation details

In this implementation, we will choose to define the distributions using the scale parameter (as this matches scipy’s expectation), rather than the square of the scale parameter:

\[\mathcal{N}(\mu,\,\sigma)\]

The full equation when estimating the EMOS coefficients using the ensemble mean is therefore:

\[\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})\]

This matches the equations noted in Allen et al., 2021.

Estimating EMOS coefficients using the ensemble mean

If the predictor is the ensemble mean, coefficients are estimated as \(\alpha, \beta, \gamma\) and \(\delta\) based on the equation:

\[\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})\]

where N is a chosen distribution and values of a, b, c and d are solved in the format of \(\alpha, \beta, \gamma\) and \(\delta\), see the equations below.

\[a = \alpha\]
\[b = \beta\]
\[c = \gamma^2\]
\[d = \delta^2\]

The \(\gamma\) and \(\delta\) values are squared to ensure c and d are positive and therefore more interpretable.

Estimating EMOS coefficients using the ensemble realizations

If the predictor is the ensemble realizations, coefficients are estimated for \(\alpha, \beta, \gamma\) and \(\delta\) based on the equation:

\[\mathcal{N}(a + b_1X_1 + ... + b_mX_m, \sqrt{c + dS^{2}})\]

where N is a chosen distribution, the values of a, b, c and d relate to alpha, beta, gamma and delta through the equations above with the exception that \(b=\beta^2\), and the number of beta terms depends on the number of realizations provided. The beta, gamma, and delta values are squared to ensure that b, c and d are positive values and therefore are more easily interpretable. Specifically for the b term, the squaring ensures that the the b values can be interpreted as a weighting for each realization.

Applying EMOS coefficients

The EMOS coefficients represent adjustments to the ensemble mean and ensemble variance, in order to generate the location and scale parameters that, for the chosen distribution, minimise the CRPS. The coefficients can therefore be used to construct the location parameter, \(\mu\), and scale parameter, \(\sigma\), for the calibrated forecast from today’s ensemble mean, or ensemble realizations, and the ensemble variance.

\[ \begin{align}\begin{aligned}\mu = a + b\bar{X}\\\sigma = \sqrt{c + dS^{2}}\end{aligned}\end{align} \]

Note here that this procedure holds whether the distribution is normal, i.e. where the application of the EMOS coefficients to the raw ensemble mean results in a calibrated location parameter that is equivalent to a calibrated ensemble mean (e.g. for screen temperature), and where the distribution is e.g. truncated normal (e.g. for wind speed). For a truncated normal distribution, the result of applying the EMOS coefficients to an uncalibrated forecast is a location parameter and scale parameter describing the calibrated distribution.

class ApplyEMOS(percentiles=None)[source]

Bases: PostProcessingPlugin

Class to calibrate an input forecast given EMOS coefficients

__init__(percentiles=None)[source]

Initialise class.

Parameters:

percentiles (Optional[Sequence]) – The set of percentiles used to create the calibrated forecast.

_abc_impl = <_abc_data object>
_check_additional_field_sites(forecast, additional_fields)[source]

Check that the forecast and additional fields have matching sites.

Parameters:
  • forecast – Uncalibrated forecast as probabilities, percentiles or realizations

  • additional_fields – Additional fields to be used as forecast predictors.

Raises:

ValueError – If the sites mismatch between the forecast and additional fields.

_convert_to_realizations(forecast, realizations_count, ignore_ecc_bounds)[source]

Convert an input forecast of probabilities or percentiles into pseudo-realizations.

Parameters:
  • forecast (Cube) –

  • realizations_count (Optional[int]) – Number of pseudo-realizations to generate from the input forecast

  • ignore_ecc_bounds (bool) –

Return type:

Cube

Returns:

Cube with pseudo-realizations

_format_forecast(template, randomise, random_seed)[source]

Generate calibrated probability, percentile or realization output in the desired format.

Parameters:
  • template (Cube) – A template cube containing the coordinates and metadata expected on the calibrated forecast.

  • randomise (bool) – If True, order realization output randomly rather than using the input forecast. If forecast type is not realizations, this is ignored.

  • random_seed (Optional[int]) – For realizations input if randomise is True, random seed for generating re-ordered percentiles. If randomise is False, the random seed may still be used for splitting ties.

Return type:

Cube

Returns:

Calibrated forecast

static _get_attribute(coefficients, attribute_name, optional=False)[source]

Get the value for the requested attribute, ensuring that the attribute is present consistently across the cubes within the coefficients cubelist.

Parameters:
  • coefficients (CubeList) – EMOS coefficients

  • attribute_name (str) – Name of expected attribute

  • optional (bool) – Indicate whether the attribute is allowed to be optional.

Return type:

Optional[Any]

Returns:

Returns None if the attribute is not present. Otherwise, the value of the attribute is returned.

Raises:

ValueError – If coefficients do not share the expected attributes.

_get_input_forecast_type(forecast)[source]

Identifies whether the forecast is in probability, realization or percentile space.

Parameters:

forecast (Cube) –

process(forecast, coefficients, additional_fields=None, land_sea_mask=None, prob_template=None, realizations_count=None, ignore_ecc_bounds=True, tolerate_time_mismatch=False, predictor='mean', randomise=False, random_seed=None)[source]

Calibrate input forecast using pre-calculated coefficients

Parameters:
  • forecast (Cube) – Uncalibrated forecast as probabilities, percentiles or realizations

  • coefficients (CubeList) – EMOS coefficients

  • additional_fields (Optional[CubeList]) – Additional fields to be used as forecast predictors.

  • land_sea_mask (Optional[Cube]) – Land sea mask where a value of “1” represents land points and “0” represents sea. If set, allows calibration of land points only.

  • prob_template (Optional[Cube]) – A cube containing a probability forecast that will be used as a template when generating probability output when the input format of the forecast cube is not probabilities i.e. realizations or percentiles.

  • realizations_count (Optional[int]) – Number of realizations to use when generating the intermediate calibrated forecast from probability or percentile inputs

  • ignore_ecc_bounds (bool) – If True, allow percentiles from probabilities to exceed the ECC bounds range. If input is not probabilities, this is ignored.

  • tolerate_time_mismatch (bool) – If True, tolerate a mismatch in validity time and forecast period for coefficients vs forecasts. Use with caution!

  • predictor (str) – Predictor to be used to calculate the location parameter of the calibrated distribution. Value is “mean” or “realizations”.

  • randomise (bool) – Used in generating calibrated realizations. If input forecast is probabilities or percentiles, this is ignored.

  • random_seed (Optional[int]) – Used in generating calibrated realizations. If input forecast is probabilities or percentiles, this is ignored.

Return type:

Cube

Returns:

Calibrated forecast in the form of the input (ie probabilities percentiles or realizations)

class CalibratedForecastDistributionParameters(predictor='mean')[source]

Bases: BasePlugin

Class to calculate calibrated forecast distribution parameters given an uncalibrated input forecast and EMOS coefficients.

__init__(predictor='mean')[source]

Create a plugin that uses the coefficients created using EMOS from historical forecasts and corresponding truths and applies these coefficients to the current forecast to generate location and scale parameters that represent the calibrated distribution at each point.

Parameters:

predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

_abc_impl = <_abc_data object>
_calculate_location_parameter_from_mean()[source]

Function to calculate the location parameter when the ensemble mean at each grid point is the predictor.

Further information is available in the module level docstring.

Return type:

ndarray

Returns:

Location parameter calculated using the ensemble mean as the predictor.

_calculate_location_parameter_from_realizations()[source]

Function to calculate the location parameter when the ensemble realizations are the predictor.

Further information is available in the module level docstring.

Return type:

ndarray

Returns:

Location parameter calculated using the ensemble realizations as the predictor.

_calculate_scale_parameter()[source]

Calculation of the scale parameter using the ensemble variance adjusted using the gamma and delta coefficients calculated by EMOS.

Further information is available in the module level docstring.

Return type:

ndarray

Returns:

Scale parameter for defining the distribution of the calibrated forecast.

_create_output_cubes(location_parameter, scale_parameter)[source]

Creation of output cubes containing the location and scale parameters.

Parameters:
  • location_parameter (ndarray) – Location parameter of the calibrated distribution.

  • scale_parameter (ndarray) – Scale parameter of the calibrated distribution.

Return type:

Tuple[Cube, Cube]

Returns:

  • Location parameter of the calibrated distribution with associated metadata.

  • Scale parameter of the calibrated distribution with associated metadata.

_diagnostic_match()[source]

Check that the forecast diagnostic matches the coefficients used to construct the coefficients.

Raises:

ValueError – If the forecast diagnostic and coefficients cube diagnostic does not match.

Return type:

None

_spatial_domain_match()[source]

Check that the domain of the current forecast and coefficients cube match for gridded forecasts. For spot forecasts, the spatial domain check is skipped.

Raises:

ValueError – If the points or bounds of the specified axis of the current_forecast and coefficients_cube do not match.

Return type:

None

process(current_forecast, coefficients_cubelist, additional_fields=None, landsea_mask=None, tolerate_time_mismatch=False)[source]

Apply the EMOS coefficients to the current forecast, in order to generate location and scale parameters for creating the calibrated distribution.

Parameters:
  • current_forecast (Cube) – The cube containing the current forecast.

  • coefficients_cubelist (CubeList) – CubeList of EMOS coefficients where each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

  • additional_fields (Optional[CubeList]) – Additional fields to be used as forecast predictors.

  • landsea_mask (Optional[Cube]) – The optional cube containing a land-sea mask. If provided sea points will be masked in the output cube. This cube needs to have land points set to 1 and sea points to 0.

  • tolerate_time_mismatch (Optional[bool]) – If True, tolerate a mismatch in validity time and forecast period for coefficients vs forecasts. Use with caution!

Return type:

Tuple[Cube, Cube]

Returns:

  • Cube containing the location parameter of the calibrated distribution calculated using either the ensemble mean or the ensemble realizations. The location parameter represents the point at which a resulting PDF would be centred.

  • Cube containing the scale parameter of the calibrated distribution calculated using either the ensemble mean or the ensemble realizations. The scale parameter represents the statistical dispersion of the resulting PDF, so a larger scale parameter will result in a broader PDF.

class ContinuousRankedProbabilityScoreMinimisers(predictor, tolerance=0.02, max_iterations=1000, point_by_point=False)[source]

Bases: BasePlugin

Minimise the Continuous Ranked Probability Score (CRPS) Calculate the optimised coefficients for minimising the CRPS based on assuming a particular probability distribution for the phenomenon being minimised.

The number of coefficients that will be optimised depend upon the initial guess. The coefficients will be calculated either using all points provided or coefficients will be calculated separately for each point. Minimisation is performed using the Nelder-Mead algorithm for 200 iterations to limit the computational expense. Note that the BFGS algorithm was initially trialled but had a bug in comparison to comparative results generated in R.

BAD_VALUE = 999999.0
TOLERATED_PERCENTAGE_CHANGE = 5
__init__(predictor, tolerance=0.02, max_iterations=1000, point_by_point=False)[source]

Initialise class for performing minimisation of the Continuous Ranked Probability Score (CRPS).

Parameters:
  • predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

  • tolerance (float) – The tolerance for the Continuous Ranked Probability Score (CRPS) calculated by the minimisation. The CRPS is in the units of the variable being calibrated. The tolerance is therefore representative of how close to the actual value are we aiming to forecast for a particular variable. Once multiple iterations result in a CRPS equal to the same value within the specified tolerance, the minimisation will terminate.

  • max_iterations (int) – The maximum number of iterations allowed until the minimisation has converged to a stable solution. If the maximum number of iterations is reached, but the minimisation has not yet converged to a stable solution, then the available solution is used anyway, and a warning is raised. If the predictor_of_mean is “realizations”, then the number of iterations may require increasing, as there will be more coefficients to solve for.

  • point_by_point (bool) – If True, coefficients are calculated independently for each point within the input cube by minimising each point independently.

_abc_impl = <_abc_data object>
_calculate_percentage_change_in_last_iteration(allvecs)[source]

Calculate the percentage change that has occurred within the last iteration of the minimisation. If the percentage change between the last iteration and the last-but-one iteration exceeds the threshold, a warning message is printed.

Parameters:

allvecs (List[ndarray]) – List of numpy arrays containing the optimised coefficients, after each iteration.

Warns:

Warning – If a satisfactory minimisation has not been achieved.

Return type:

None

_minimise_caller(minimisation_function, initial_guess, forecast_predictor_data, truth_data, forecast_var_data, sqrt_pi)[source]

Call scipy minimize with the options provided.

Parameters:
  • minimisation_function (Callable) –

  • initial_guess (ndarray) –

  • forecast_predictor

  • truth

  • forecast_var

  • sqrt_pi (float) – Square root of pi for minimisation.

Return type:

OptimizeResult

Returns:

A single set of coefficients with the order [alpha, beta, gamma, delta].

_normal_crps_preparation(initial_guess, forecast_predictor, truth, forecast_var)[source]

Prepare for the CRPS calculation by computing estimates for the location parameter (mu), scale parameter (sigma), normalised prediction error (xz) and the corresponding CDF and PDF, assuming a normal distribution.

Parameters:
Return type:

Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Returns:

The location parameter (mu), scale parameter (sigma), normalised prediction error (xz) and the corresponding CDF and PDF, assuming a normal distribution.

_prepare_forecasts(forecast_predictors)[source]

Prepare forecasts to be a consistent shape for minimisation by broadcasting static predictors along the time dimension and flattening the spatiotemporal dimensions.

Parameters:

forecast_predictors (CubeList) – The forecast predictors to be reshaped.

Return type:

ndarray

Returns:

Reshaped array with a first dimension representing the flattened spatiotemporal dimensions and an optional second dimension for flattened non-spatiotemporal dimensions (e.g. realizations).

_process_points_independently(minimisation_function, initial_guess, forecast_predictors, truth, forecast_var, sqrt_pi)[source]

Minimise each point along the spatial dimensions independently to create a set of coefficients for each point. The coefficients returned can be either gridded (i.e. separate dimensions for x and y) or for a list of sites where x and y share a common dimension.

Parameters:
  • minimisation_function (Callable) – Function to use when minimising.

  • initial_guess (ndarray) –

  • forecast_predictor

  • truth (Cube) –

  • forecast_var (Cube) –

  • sqrt_pi (float) –

Return type:

ndarray

Returns:

Separate optimised coefficients for each point. The shape of the coefficients array is (number of coefficients, length of spatial dimensions). Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be provided if either realizations are provided as the predictor, or if additional predictors are provided.

_process_points_together(minimisation_function, initial_guess, forecast_predictors, truth, forecast_var, sqrt_pi)[source]

Minimise all points together in one minimisation to create a single set of coefficients.

Parameters:
  • minimisation_function (Callable) – Function to use when minimising.

  • initial_guess (ndarray) –

  • forecast_predictor

  • truth (Cube) –

  • forecast_var (Cube) –

  • sqrt_pi (float) –

Return type:

ndarray

Returns:

The optimised coefficients. Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be returned if either realizations are provided as the predictor, or if additional predictors are provided.

calculate_normal_crps(initial_guess, forecast_predictor, truth, forecast_var, sqrt_pi)[source]

Calculate the CRPS for a normal distribution.

Scientific Reference: Gneiting, T. et al., 2005. Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation. Monthly Weather Review, 133(5), pp.1098-1118.

Parameters:
  • initial_guess (ndarray) – List of optimised coefficients. Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be provided if either realizations are provided as the predictor, or if additional predictors are provided.

  • forecast_predictor (ndarray) – Data to be used as the predictor, either the ensemble mean or the ensemble realizations of the predictand variable and additional static predictors, as required.

  • truth (ndarray) – Data to be used as truth.

  • forecast_var (ndarray) – Ensemble variance data.

  • sqrt_pi (float) – Square root of Pi

Return type:

float

Returns:

CRPS for the current set of coefficients. This CRPS is a mean value across all points.

calculate_truncated_normal_crps(initial_guess, forecast_predictor, truth, forecast_var, sqrt_pi)[source]

Calculate the CRPS for a truncated normal distribution with zero as the lower bound.

Scientific Reference: Thorarinsdottir, T.L. & Gneiting, T., 2010. Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. Journal of the Royal Statistical Society. Series A: Statistics in Society, 173(2), pp.371-388.

Parameters:
  • initial_guess (ndarray) – List of optimised coefficients. Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be provided if either realizations are provided as the predictor, or if additional predictors are provided.

  • forecast_predictor (ndarray) – Data to be used as the predictor, either the ensemble mean or the ensemble realizations of the predictand variable and additional static predictors, as required.

  • truth (ndarray) – Data to be used as truth.

  • forecast_var (ndarray) – Ensemble variance data.

  • sqrt_pi (float) – Square root of Pi

Return type:

float

Returns:

CRPS for the current set of coefficients. This CRPS is a mean value across all points.

process(initial_guess, forecast_predictors, truth, forecast_var, distribution)[source]

Function to pass a given function to the scipy minimize function to estimate optimised values for the coefficients.

Further information is available in the module level docstring.

Parameters:
  • initial_guess (ndarray) – List of optimised coefficients. Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be provided if either realizations are provided as the predictor, or if additional predictors are provided.

  • forecast_predictors (CubeList) – CubeList containing the fields to be used as the predictor. These will include the ensemble mean or realizations of the predictand variable and additional static predictors, as required.

  • truth (Cube) – Cube containing the field, which will be used as truth.

  • forecast_var (Cube) – Cube containing the field containing the ensemble variance.

  • distribution (str) – String used to access the appropriate function for use in the minimisation within self.minimisation_dict.

Return type:

ndarray

Returns:

The optimised coefficients following the order [alpha, beta, gamma, delta]. If point_by_point is False, then one set of coefficients are generated. If point_by_point is True, then the leading dimension of the numpy array is the length of the spatial dimensions within the forecast and truth cubes. Each set of coefficients are appropriate for a particular point. If realizations or static additional predictors are provided, then multiple values for beta will be generated.

Raises:

KeyError – If the distribution is not supported.

Warns:

Warning – If the minimisation did not converge.

class EstimateCoefficientsForEnsembleCalibration(distribution, point_by_point=False, use_default_initial_guess=False, desired_units=None, predictor='mean', tolerance=0.02, max_iterations=1000, proportion_of_nans=0.5)[source]

Bases: BasePlugin

Class focussing on estimating the optimised coefficients for ensemble calibration.

__init__(distribution, point_by_point=False, use_default_initial_guess=False, desired_units=None, predictor='mean', tolerance=0.02, max_iterations=1000, proportion_of_nans=0.5)[source]

Create an ensemble calibration plugin that, for Nonhomogeneous Gaussian Regression, calculates coefficients based on historical forecasts and applies the coefficients to the current forecast.

Further information is available in the module level docstring.

Parameters:
  • distribution (str) – Name of distribution. Assume that a calibrated version of the current forecast could be represented using this distribution.

  • point_by_point (bool) – If True, coefficients are calculated independently for each point within the input cube by creating an initial guess and minimising each grid point independently. Please note this option is memory intensive and is unsuitable for gridded input. Using a default initial guess may reduce the memory overhead option.

  • use_default_initial_guess (bool) – If True, use the default initial guess. The default initial guess assumes no adjustments are required to the initial choice of predictor to generate the calibrated distribution. This means coefficients of 1 for the multiplicative coefficients and 0 for the additive coefficients. If False, the initial guess is computed.

  • desired_units (Union[Unit, str, None]) – The unit that you would like the calibration to be undertaken in. The current forecast, historical forecast and truth will be converted as required.

  • predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

  • tolerance (float) – The tolerance for the Continuous Ranked Probability Score (CRPS) calculated by the minimisation. The CRPS is in the units of the variable being calibrated. The tolerance is therefore representative of how close to the actual value are we aiming to forecast for a particular variable. Once multiple iterations result in a CRPS equal to the same value within the specified tolerance, the minimisation will terminate.

  • max_iterations (int) – The maximum number of iterations allowed until the minimisation has converged to a stable solution. If the maximum number of iterations is reached, but the minimisation has not yet converged to a stable solution, then the available solution is used anyway, and a warning is raised. If the predictor_of_mean is “realizations”, then the number of iterations may require increasing, as there will be more coefficients to solve for.

  • proportion_of_nans (float) – The proportion of the matching historic forecast-truth pairs that are allowed to be NaN.

_abc_impl = <_abc_data object>
static _add_predictor_coords(template_cube, forecast_predictors)[source]

Add predictor index and predictor name coordinates to the beta coefficient template cube to support the use of additional predictors.

Parameters:
  • template_cube (Cube) – A template cube for storing the optimised beta coefficients.

  • forecast_predictors (CubeList) – CubeList where each cube contains a separate forecast predictor

Return type:

Cube

Returns:

A cube with the predictor_index and predictor_name coordinates added. Single value dimension coordinates are converted to non-dimension coordinates.

_create_cubelist(optimised_coeffs, historic_forecasts, forecast_predictors)[source]

Create a cubelist by combining the optimised coefficients and the appropriate metadata. The units of the alpha and gamma coefficients match the units of the historic forecast. If the predictor is the realizations, then the beta coefficient cube contains a realization coordinate. :type optimised_coeffs: ndarray :param optimised_coeffs: :type historic_forecasts: Cube :param historic_forecasts: Historic forecasts from the training dataset. :type forecast_predictors: CubeList :param forecast_predictors:

Return type:

CubeList

Returns:

CubeList constructed using the coefficients provided and using metadata from the historic_forecasts cube. Each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

static _create_temporal_coordinates(historic_forecasts)[source]

Create forecast reference time and forecast period coordinates for the EMOS coefficients cube.

Parameters:

historic_forecasts (Cube) – Historic forecasts from the training dataset.

Return type:

List[Coord]

Returns:

List of the temporal coordinates.

_get_spatial_associated_coordinates(historic_forecasts)[source]

Set-up the spatial dimensions and coordinates for the EMOS coefficients cube.

Parameters:

historic_forecasts (Cube) – Historic forecasts from the training dataset.

Return type:

Tuple[List[int], List[Coord]]

Returns:

List of the spatial dimensions to retain within the coefficients cube and a list of the auxiliary coordinates that share the same dimension as the spatial coordinates.

_set_attributes(historic_forecasts)[source]

Set attributes for use on the EMOS coefficients cube.

Parameters:

historic_forecasts (Cube) – Historic forecasts from the training dataset.

Return type:

Dict[str, Any]

Returns:

Attributes for an EMOS coefficients cube including “diagnostic standard name”, “distribution”, “shape_parameters” and an updated title.

_validate_distribution()[source]

Validate that the distribution supplied has a corresponding method for minimising the Continuous Ranked Probability Score.

Raises:

ValueError – If the distribution requested is not supported.

Return type:

None

compute_initial_guess(truths, forecast_predictor, predictor, number_of_realizations)[source]

Function to compute initial guess of the alpha, beta, gamma and delta components of the EMOS coefficients by linear regression of the forecast predictor and the truths, if requested. Otherwise, default values for the coefficients will be used.

If the predictor is “mean”, then the order of the initial_guess is [alpha, beta, gamma, delta]. Otherwise, if the predictor is “realizations” then the order of the initial_guess is [alpha, beta0, beta1, beta2, gamma, delta], where the number of beta variables will correspond to the number of realizations. In this example initial guess with three beta variables, there will correspondingly be three realizations.

The default values for the initial guesses are in [alpha, beta, gamma, delta] ordering:

  • For the ensemble mean, the default initial guess: [0, 1, 0, 1] assumes that the raw forecast is skilful and the expected adjustments are small.

  • For the ensemble realizations, the default initial guess is effectively: [0, 1/3., 1/3., 1/3., 0, 1], such that each realization is assumed to have equal weight.

If linear regression is enabled, the alpha and beta coefficients associated with the ensemble mean or ensemble realizations are modified based on the results from the linear regression fit.

Parameters:
  • truths (ndarray) – Array containing the truth fields.

  • forecast_predictor (ndarray) – The predictors are the historic forecasts processed to be either in the form of the ensemble mean or the ensemble realizations and any additional predictors.

  • predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

  • number_of_realizations (Optional[int]) – Number of realizations within the forecast predictor. If no realizations are present, this option is None.

Return type:

List[float]

Returns:

List of coefficients to be used as initial guess. Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be provided if either realizations are provided as the predictor, or if additional predictors are provided.

create_coefficients_cubelist(optimised_coeffs, historic_forecasts, forecast_predictors)[source]

Create a cubelist for storing the coefficients computed using EMOS.

Examples

For a cubelist containing coefficients calculated using Ensemble Model Output Statistics:

0: emos_coefficient_gamma / (K)        (scalar cube)
1: emos_coefficient_beta / (1)         (scalar cube)
2: emos_coefficient_alpha / (K)        (scalar cube)
3: emos_coefficient_delta / (1)        (scalar cube)

An example cube is therefore:

emos_coefficient_gamma / (K)        (scalar cube)
     Scalar coordinates:
          forecast_period: 43200 seconds
          forecast_reference_time: 2017-06-05 03:00:00
          projection_x_coordinate: -159000.0 m, bound=(-358000.0, 40000.0) m
          projection_y_coordinate: -437000.0 m, bound=(-636000.0, -238000.0) m
          time: 2017-06-05 15:00:00
     Attributes:
          Conventions: CF-1.5
          diagnostic_standard_name: air_temperature
          mosg__model_configuration: uk_ens
Parameters:
  • optimised_coeffs (Union[List[float], ndarray]) – Array or list of optimised coefficients. Order of coefficients is [alpha, beta, gamma, delta]. Multiple beta values can be provided if either realizations are provided as the predictor, or if additional predictors are provided.

  • historic_forecasts (Cube) – Historic forecasts from the training dataset.

  • forecast_predictors (CubeList) – The predictors are the historic forecasts processed to be either in the form of the ensemble mean or the ensemble realizations and any additional predictors.

Return type:

CubeList

Returns:

CubeList constructed using the coefficients provided and using metadata from the historic_forecasts cube. Each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

Raises:

ValueError – If the number of coefficients in the optimised_coeffs does not match the expected number.

guess_and_minimise(truths, historic_forecasts, forecast_predictors, forecast_var, number_of_realizations)[source]

Function to consolidate calls to compute the initial guess, compute the optimised coefficients using minimisation and store the resulting coefficients within a CubeList.

Parameters:
  • truths (Cube) – Truths from the training dataset.

  • historic_forecasts (Cube) – Historic forecasts from the training dataset. These are used as a template cube for creating the coefficient cube.

  • forecast_predictors (CubeList) – The predictors are the historic forecasts processed to be either in the form of the ensemble mean or the ensemble realizations and any additional predictors.

  • forecast_var (Cube) – Variance of the forecast for use in the minimisation.

  • number_of_realizations (Optional[int]) – Number of realizations within the forecast predictor. If no realizations are present, this option is None.

Return type:

CubeList

Returns:

CubeList constructed using the coefficients provided and using metadata from the historic_forecasts cube. Each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

static mask_cube(cube, landsea_mask)[source]

Mask the input cube using the given landsea_mask. Sea points are filled with nans and masked.

Parameters:
  • cube (Cube) – A cube to be masked, on the same grid as the landsea_mask. The last two dimensions on this cube must match the dimensions in the landsea_mask cube.

  • landsea_mask (Cube) – A cube containing a land-sea mask. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Raises:

IndexError – if the cube and landsea_mask shapes are not compatible.

Return type:

None

process(historic_forecasts, truths, additional_fields=None, landsea_mask=None)[source]

Using Nonhomogeneous Gaussian Regression/Ensemble Model Output Statistics, estimate the required coefficients from historical forecasts.

The main contents of this method is:

  1. Check that the predictor is valid.

  2. Filter the historic forecasts and truths to ensure that these inputs match in validity time.

  3. Apply unit conversion to ensure that the historic forecasts and truths have the desired units for calibration.

  4. Calculate the variance of the historic forecasts. If the chosen predictor is the mean, also calculate the mean of the historic forecasts.

  5. If a land-sea mask is provided then mask out sea points in the truths and predictor from the historic forecasts.

  6. Calculate initial guess at coefficient values by performing a linear regression, if requested, otherwise default values are used.

  7. Perform minimisation.

Parameters:
  • historic_forecasts (Cube) – Historic forecasts from the training dataset.

  • truths (Cube) – Truths from the training dataset.

  • additional_fields (Optional[CubeList]) – Additional fields to use as supplementary predictors.

  • landsea_mask (Optional[Cube]) – The optional cube containing a land-sea mask. If provided, only land points are used to calculate the coefficients. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Return type:

CubeList

Returns:

CubeList constructed using the coefficients provided and using metadata from the historic_forecasts cube. Each cube within the cubelist is for a separate EMOS coefficient e.g. alpha, beta, gamma, delta.

Raises:
  • ValueError – If either the historic_forecasts or truths cubes were not passed in.

  • ValueError – If the units of the historic and truth cubes do not match.