improver.ensemble_copula_coupling.ensemble_copula_coupling module

This module defines the plugins required for Ensemble Copula Coupling.

class ConvertLocationAndScaleParameters(distribution='norm', shape_parameters=None)[source]

Bases: object

Base Class to support the plugins that compute percentiles and probabilities from the location and scale parameters.

__init__(distribution='norm', shape_parameters=None)[source]

Initialise the class.

In order to construct percentiles or probabilities from the location or scale parameter, the distribution for the resulting output needs to be selected. For use with the outputs from EMOS, where it has been assumed that the outputs from minimising the CRPS follow a particular distribution, then the same distribution should be selected, as used for the CRPS minimisation. The conversion to percentiles and probabilities from the location and scale parameter relies upon functionality within scipy.stats.

Parameters:

distribution (str) – Name of a distribution supported by scipy.stats.
shape_parameters (Optional[ndarray]) – For use with distributions in scipy.stats (e.g. truncnorm) that require the specification of shape parameters to be able to define the shape of the distribution. For the truncated normal distribution, the shape parameters should be appropriate for the distribution constructed from the location and scale parameters provided. Please note that for use with calculate_truncated_normal_crps(), the shape parameters for a truncated normal distribution with a lower bound of zero should be [0, np.inf].

_rescale_shape_parameters(location_parameter, scale_parameter)[source]

Rescale the shape parameters for the desired location and scale parameters for the truncated normal distribution. The shape parameters for any other distribution will remain unchanged.

For the truncated normal distribution, if the shape parameters are not rescaled, then scipy.stats.truncnorm will assume that the shape parameters are appropriate for a standard normal distribution. As the aim is to construct a distribution using specific values for the location and scale parameters, the assumption of a standard normal distribution is not appropriate. Therefore the shape parameters are rescaled using the equations:

\[ \begin{align}\begin{aligned}a\_rescaled = (a - location\_parameter)/scale\_parameter\\b\_rescaled = (b - location\_parameter)/scale\_parameter\end{aligned}\end{align} \]

Please see scipy.stats.truncnorm for some further information.

Parameters:

location_parameter (ndarray) – Location parameter to be used to scale the shape parameters.
scale_parameter (ndarray) – Scale parameter to be used to scale the shape parameters.

Return type:

None

class ConvertLocationAndScaleParametersToPercentiles(distribution='norm', shape_parameters=None)[source]

Bases: BasePlugin, ConvertLocationAndScaleParameters

Plugin focusing on generating percentiles from location and scale parameters. In combination with the EnsembleReordering plugin, this is Ensemble Copula Coupling.

_abc_impl = <_abc_data object>

_location_and_scale_parameters_to_percentiles(location_parameter, scale_parameter, template_cube, percentiles)[source]

Function returning percentiles based on the supplied location and scale parameters.

Parameters:

location_parameter (Cube) – Location parameter of calibrated distribution.
scale_parameter (Cube) – Scale parameter of the calibrated distribution.
template_cube (Cube) – Template cube containing either a percentile or realization coordinate. All coordinates apart from the percentile or realization coordinate will be copied from the template cube. Metadata will also be copied from this cube.
percentiles (List[float]) – Percentiles at which to calculate the value of the phenomenon at.

Return type:

Cube

Returns:

Cube containing the values for the phenomenon at each of the percentiles requested.

Raises:

ValueError – If any of the resulting percentile values are nans and these nans are not caused by a scale parameter of zero.

process(location_parameter, scale_parameter, template_cube, no_of_percentiles=None, percentiles=None)[source]

Generate ensemble percentiles from the location and scale parameters.

Parameters:

location_parameter (Cube) – Cube containing the location parameters.
scale_parameter (Cube) – Cube containing the scale parameters.
template_cube (Cube) – Template cube containing either a percentile or realization coordinate. All coordinates apart from the percentile or realization coordinate will be copied from the template cube. Metadata will also be copied from this cube.
no_of_percentiles (Optional[int]) – Integer defining the number of percentiles that will be calculated from the location and scale parameters.
percentiles (Optional[List[float]]) – List of percentiles that will be generated from the location and scale parameters provided.

Return type:

Cube

Returns:

Cube for calibrated percentiles. The percentile coordinate is always the zeroth dimension.

Raises:

ValueError – Ensure that it is not possible to supply “no_of_percentiles” and “percentiles” simultaneously as keyword arguments.

class ConvertLocationAndScaleParametersToProbabilities(distribution='norm', shape_parameters=None)[source]

Bases: BasePlugin, ConvertLocationAndScaleParameters

Plugin to generate probabilities relative to given thresholds from the location and scale parameters of a distribution.

_abc_impl = <_abc_data object>

_check_template_cube(cube)[source]

The template cube is expected to contain a leading threshold dimension followed by spatial (y/x) dimensions for a gridded cube. For a spot template cube, the spatial dimensions are not expected to be dimension coordinates. If the cube contains the expected dimensions, a threshold leading order is enforced.

Parameters:: cube (Cube) – A cube whose dimensions are checked to ensure they match what is expected.
Raises:: ValueError – If cube is not of the expected dimensions.
Return type:: None

static _check_unit_compatibility(location_parameter, scale_parameter, probability_cube_template)[source]

The location parameter, scale parameters, and threshold values come from three different cubes. This is a sanity check to ensure the units are as expected, converting units of the location parameter and scale parameter if possible.

Parameters:

location_parameter (Cube) – Cube of location parameter values.
scale_parameter (Cube) – Cube of scale parameter values.
probability_cube_template (Cube) – Cube containing threshold values.

Raises:

ValueError – If units of input cubes are not compatible.

Return type:

None

_location_and_scale_parameters_to_probabilities(location_parameter, scale_parameter, probability_cube_template)[source]

Function returning probabilities relative to provided thresholds based on the supplied location and scale parameters.

Parameters:

location_parameter (Cube) – Predictor for the calibrated forecast location parameter.
scale_parameter (Cube) – Scale parameter for the calibrated forecast.
probability_cube_template (Cube) – A probability cube that has a threshold coordinate, where the probabilities are defined as above or below the threshold by the spp__relative_to_threshold attribute. This cube matches the desired output cube format.

Return type:

Cube

Returns:

Cube containing the data expressed as probabilities relative to the provided thresholds in the way described by spp__relative_to_threshold.

process(location_parameter, scale_parameter, probability_cube_template)[source]

Generate probabilities from the location and scale parameters of the distribution.

Parameters:

location_parameter (Cube) – Cube containing the location parameters.
scale_parameter (Cube) – Cube containing the scale parameters.
probability_cube_template (Cube) – A probability cube that has a threshold coordinate, where the probabilities are defined as above or below the threshold by the spp__relative_to_threshold attribute. This cube matches the desired output cube format.

Return type:

Cube

Returns:

A cube of diagnostic data expressed as probabilities relative to the thresholds found in the probability_cube_template.

class ConvertProbabilitiesToPercentiles(ecc_bounds_warning=False, mask_percentiles=False, skip_ecc_bounds=False)[source]

Bases: BasePlugin

Class for generating percentiles from probabilities. In combination with the Ensemble Reordering plugin, this is a variant Ensemble Copula Coupling.

This class includes the ability to interpolate between probabilities specified using multiple thresholds in order to generate the percentiles, see Figure 1 from Flowerdew, 2014.

Scientific Reference: Flowerdew, J., 2014. Calibrated ensemble reliability whilst preserving spatial structure. Tellus Series A, Dynamic Meteorology and Oceanography, 66, 22662.

__init__(ecc_bounds_warning=False, mask_percentiles=False, skip_ecc_bounds=False)[source]

Initialise the class.

Parameters:

ecc_bounds_warning (bool) – If true and ECC bounds are exceeded by the percentile values, a warning will be generated rather than an exception. Default value is FALSE.
mask_percentiles (bool) – A boolean determining whether the final percentiles should be masked. If True then where the percentile is higher than the probability of the diagnostic existing the outputted percentile will be masked. The probability of being below the final threshold in forecast_probabilities is used as the probability of the diagnostic existing. For example if at some grid square the probability of cloud base being below 15000m (the highest threshold) is 0.7 then every percentile above the 70th would be masked.
skip_ecc_bounds – If true, the usage of the ECC bounds is skipped. This has the effect that percentiles outside of the range given by the input percentiles will be computed by nearest neighbour interpolation from the nearest available percentile, rather than using linear interpolation between the nearest available percentile and the ECC bound.

_abc_impl = <_abc_data object>

_add_bounds_to_thresholds_and_probabilities(threshold_points, probabilities_for_cdf, bounds_pairing)[source]

Padding of the lower and upper bounds of the distribution for a given phenomenon for the threshold_points, and padding of probabilities of 0 and 1 to the forecast probabilities.

Parameters:

threshold_points (ndarray) – Array of threshold values used to calculate the probabilities.
probabilities_for_cdf (ndarray) – Array containing the probabilities used for constructing an cumulative distribution function i.e. probabilities below threshold.
bounds_pairing (Tuple[int, int]) – Lower and upper bound to be used as the ends of the cumulative distribution function.

Return type:

Tuple[ndarray, ndarray]

Returns:

Array of threshold values padded with the lower and upper bound of the distribution.
Array containing the probabilities padded with 0 and 1 at each end.

Raises:

ValueError – If the thresholds exceed the ECC bounds for the diagnostic and self.ecc_bounds_warning is False.

Warns:

Warning – If the thresholds exceed the ECC bounds for the diagnostic and self.ecc_bounds_warning is True.

_probabilities_to_percentiles(forecast_probabilities, percentiles)[source]

Conversion of probabilities to percentiles through the construction of an cumulative distribution function. This is effectively constructed by linear interpolation from the probabilities associated with each threshold to a set of percentiles.

Parameters:

forecast_probabilities (Cube) – Cube with a threshold coordinate.
percentiles (ndarray) – Array of percentiles, at which the corresponding values will be calculated.

Return type:

Cube

Returns:

Cube containing values for the required diagnostic e.g. air_temperature at the required percentiles.

Raises:

NotImplementedError – If the threshold coordinate has an spp__relative_to_threshold attribute that is not either “above” or “below”.

Warns:

Warning – If the probability values are not ascending, so the resulting cdf is not monotonically increasing.

process(forecast_probabilities, no_of_percentiles=None, percentiles=None, sampling='quantile')[source]

Concatenates cubes with a threshold coordinate.
Creates a list of percentiles.
Accesses the lower and upper bound pair to find the ends of the cumulative distribution function.
Convert the threshold coordinate into values at a set of percentiles using linear interpolation, see Figure 1 from Flowerdew, 2014.

Parameters:

forecast_probabilities (Cube) – Cube containing a threshold coordinate.
no_of_percentiles (Optional[int]) – Number of percentiles. If None and percentiles is not set, the number of thresholds within the input forecast_probabilities cube is used as the number of percentiles. This argument is mutually exclusive with percentiles.
percentiles (Optional[List[float]]) – The desired percentile values in the interval [0, 100]. This argument is mutually exclusive with no_of_percentiles.
sampling (str) –
Type of sampling of the distribution to produce a set of percentiles e.g. quantile or random.

Accepted options for sampling are:
- Quantile: A regular set of equally-spaced percentiles aimed
  at dividing a Cumulative Distribution Function into blocks of equal probability.
- Random: A random set of ordered percentiles.

Return type:

Cube

Returns:

Cube with forecast values at the desired set of percentiles. The threshold coordinate is always the zeroth dimension.

Raises:

ValueError – If both no_of_percentiles and percentiles are provided

class EnsembleReordering[source]

Bases: BasePlugin

Plugin for applying the reordering step of Ensemble Copula Coupling, in order to generate ensemble realizations with multivariate structure from percentiles. The percentiles are assumed to be in ascending order.

Reference: Schefzik, R., Thorarinsdottir, T.L. & Gneiting, T., 2013. Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. Statistical Science, 28(4), pp.616-640.

_abc_impl = <_abc_data object>

static _check_input_cube_masks(post_processed_forecast, raw_forecast)[source]

Checks that if the raw_forecast is masked the post_processed_forecast is also masked. The code supports the post_processed_forecast being masked even if the raw_forecast isn’t masked, but not vice versa.

If both post_processed_forecast and raw_forecast are masked checks that both input cubes have the same mask applied to each x-y slice.

Parameters:

post_processed_forecast – The cube containing the post-processed forecast realizations.
raw_forecast – The cube containing the raw (not post-processed) forecast.

Raises:

ValueError – If only the raw_forecast is masked
ValueError – If the post_processed_forecast does not have same mask on all x-y slices
ValueError – If the raw_forecast x-y slices do not all have the same mask as the post_processed_forecast.

static _recycle_raw_ensemble_realizations(post_processed_forecast_percentiles, raw_forecast_realizations, percentile_coord_name)[source]

Function to determine whether there is a mismatch between the number of percentiles and the number of raw forecast realizations. If more percentiles are requested than ensemble realizations, then the ensemble realizations are recycled. This assumes that the identity of the ensemble realizations within the raw ensemble forecast is random, such that the raw ensemble realizations are exchangeable. If fewer percentiles are requested than ensemble realizations, then only the first n ensemble realizations are used.

Parameters:

post_processed_forecast_percentiles (Cube) – Cube for post-processed percentiles. The percentiles are assumed to be in ascending order.
raw_forecast_realizations (Cube) – Cube containing the raw (not post-processed) forecasts.
percentile_coord_name (str) – Name of required percentile coordinate.

Return type:

Cube

Returns:

Cube for the raw ensemble forecast, where the raw ensemble realizations have either been recycled or constrained, depending upon the number of percentiles present in the post-processed forecast cube.

process(post_processed_forecast, raw_forecast, random_ordering=False, random_seed=None)[source]

Reorder post-processed forecast using the ordering of the raw ensemble.

Parameters:

post_processed_forecast (Cube) – The cube containing the post-processed forecast realizations.
raw_forecast (Cube) – The cube containing the raw (not post-processed) forecast.
random_ordering (bool) – If random_ordering is True, the post-processed forecasts are reordered randomly, rather than using the ordering of the raw ensemble.
random_seed (Optional[int]) – If random_seed is an integer, the integer value is used for the random seed. If random_seed is None, no random seed is set, so the random values generated are not reproducible.

Return type:

Cube

Returns:

Cube containing the new ensemble realizations where all points within the dataset have been reordered in comparison to the input percentiles. This cube contains the same ensemble realization numbers as the raw forecast.

static rank_ecc(post_processed_forecast_percentiles, raw_forecast_realizations, random_ordering=False, random_seed=None)[source]

Function to apply Ensemble Copula Coupling. This ranks the post-processed forecast realizations based on a ranking determined from the raw forecast realizations.

Parameters:

post_processed_forecast_percentiles (Cube) – Cube for post-processed percentiles. The percentiles are assumed to be in ascending order.
raw_forecast_realizations (Cube) – Cube containing the raw (not post-processed) forecasts. The probabilistic dimension is assumed to be the zeroth dimension.
random_ordering (bool) – If random_ordering is True, the post-processed forecasts are reordered randomly, rather than using the ordering of the raw ensemble.
random_seed (Optional[int]) – If random_seed is an integer, the integer value is used for the random seed. If random_seed is None, no random seed is set, so the random values generated are not reproducible.

Return type:

Cube

Returns:

Cube for post-processed realizations where at a particular grid point, the ranking of the values within the ensemble matches the ranking from the raw ensemble.

class RebadgePercentilesAsRealizations[source]

Bases: BasePlugin

Class to rebadge percentiles as ensemble realizations. This will allow the quantisation to percentiles to be completed, without a subsequent EnsembleReordering step to restore spatial correlations, if required.

_abc_impl = <_abc_data object>

static process(cube, ensemble_realization_numbers=None)[source]

Rebadge percentiles as ensemble realizations. The ensemble realization numbering will depend upon the number of percentiles in the input cube i.e. 0, 1, 2, 3, …, n-1, if there are n percentiles.

Parameters:

cube (Cube) – Cube containing a percentile coordinate, which will be rebadged as ensemble realization.
ensemble_realization_numbers (Optional[ndarray]) – An array containing the ensemble numbers required in the output realization coordinate. Default is None, meaning the realization coordinate will be numbered 0, 1, 2 … n-1 for n percentiles on the input cube.

Return type:

Cube

Returns:

Processed cube

Raises:

InvalidCubeError – If the realization coordinate already exists on the cube.

class RebadgeRealizationsAsPercentiles(optimal_crps_percentiles=False)[source]

Bases: BasePlugin

Class to rebadge realizations as percentiles.

__init__(optimal_crps_percentiles=False)[source]

Initialise the class.

Parameters:: optimal_crps_percentiles (Optional[bool]) – If True, percentiles are computed following the recommendation of Bröcker, 2012 for optimising the CRPS using the equation: q = (i-0.5)/N, i=1,…,N, where N is the number of realizations. If False, percentiles are computed as equally spaced following the equation: q = i/(1+N), i=1,…,N. Defaults to False.

References

Bröcker, J. (2012), Evaluating raw ensembles with the continuous ranked probability score. Q.J.R. Meteorol. Soc., 138: 1611-1617. https://doi.org/10.1002/qj.1891 Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta–RSM Short-Range Ensemble Forecasts. Mon. Wea. Rev., 125, 1312–1327, https://doi.org/10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.

_abc_impl = <_abc_data object>

process(cube)[source]

Convert a cube of realizations into percentiles by sorting the cube along the realization dimension and rebadging the realization coordinate as a percentile coordinate.

Parameters:: cube (Cube) – Cube containing realizations.
Return type:: Cube
Returns:: Cube containing percentiles.

class ResamplePercentiles(ecc_bounds_warning=False, skip_ecc_bounds=False)[source]

Bases: BasePlugin

Class for resampling percentiles from an existing set of percentiles. In combination with the Ensemble Reordering plugin, this is a variant of Ensemble Copula Coupling.

This class includes the ability to linearly interpolate from an input set of percentiles to a different output set of percentiles.

__init__(ecc_bounds_warning=False, skip_ecc_bounds=False)[source]

Initialise the class.

Parameters:

ecc_bounds_warning (bool) – If true and ECC bounds are exceeded by the percentile values, a warning will be generated rather than an exception. Default value is FALSE.
skip_ecc_bounds (bool) – If true, the usage of the ECC bounds is skipped. This has the effect that percentiles outside of the range given by the input percentiles will be computed by nearest neighbour interpolation from the nearest available percentile, rather than using linear interpolation between the nearest available percentile and the ECC bound.

_abc_impl = <_abc_data object>

_add_bounds_to_percentiles_and_forecast_at_percentiles(percentiles, forecast_at_percentiles, bounds_pairing)[source]

Padding of the lower and upper bounds of the percentiles for a given phenomenon, and padding of forecast values using the constant lower and upper bounds.

Parameters:

percentiles (ndarray) – Array of percentiles from a Cumulative Distribution Function.
forecast_at_percentiles (ndarray) – Array containing the underlying forecast values at each percentile.
bounds_pairing (Tuple[int, int]) – Lower and upper bound to be used as the ends of the cumulative distribution function.

Return type:

Tuple[ndarray, ndarray]

Returns:

Percentiles
Forecast at percentiles with endpoints

Raises:

ValueError – If the percentile points are outside the ECC bounds and self.ecc_bounds_warning is False.
ValueError – If the percentiles are not in ascending order.

Warns:

Warning – If the percentile points are outside the ECC bounds and self.ecc_bounds_warning is True.

_interpolate_percentiles(forecast_at_percentiles, desired_percentiles, percentile_coord_name)[source]

Interpolation of forecast for a set of percentiles from an initial set of percentiles to a new set of percentiles. This is constructed by linearly interpolating between the original set of percentiles to a new set of percentiles.

Parameters:

forecast_at_percentiles (Cube) – Cube containing a percentile coordinate.
desired_percentiles (ndarray) – Array of the desired percentiles.
percentile_coord_name (str) – Name of required percentile coordinate.

Return type:

Cube

Returns:

Cube containing values for the required diagnostic e.g. air_temperature at the required percentiles.

process(forecast_at_percentiles, no_of_percentiles=None, sampling='quantile', percentiles=None)[source]

Creates a list of percentiles, if not provided.
Accesses the lower and upper bound pair of the forecast values, in order to specify lower and upper bounds for the percentiles.
Interpolate the percentile coordinate into an alternative set of percentiles using linear interpolation.

Parameters:

forecast_at_percentiles (Cube) – Cube expected to contain a percentile coordinate.
no_of_percentiles (Optional[int]) – Number of percentiles If None, the number of percentiles within the input forecast_at_percentiles cube is used as the number of percentiles.
sampling (Optional[str]) –
Type of sampling of the distribution to produce a set of percentiles e.g. quantile or random.

Accepted options for sampling are:
- Quantile: A regular set of equally-spaced percentiles aimed
  at dividing a Cumulative Distribution Function into blocks of equal probability.
- Random: A random set of ordered percentiles.
percentiles (Optional[List]) – List of the desired output percentiles.

Return type:

Cube

Returns:

Cube with forecast values at the desired set of percentiles. The percentile coordinate is always the zeroth dimension.

Raises:

ValueError – The percentiles supplied must be between 0 and 100.