improver.ensemble_copula_coupling.utilities module#

This module defines the utilities required for Ensemble Copula Coupling plugins.

class CalculatePercentilesFromIntensityDistribution(distribution='gamma', nan_mask_value=0.0, scale_percentiles_to_probability_lower_bound=False)[source]#

Bases: BasePlugin

Plugin to calculate percentiles at which provided intensity values would fall, according to a fitted probability distribution (currently only gamma).

__init__(distribution='gamma', nan_mask_value=0.0, scale_percentiles_to_probability_lower_bound=False)[source]#

Initialise the plugin.

Parameters:
  • distribution (str) – Type of distribution to fit (currently only ‘gamma’ is supported).

  • nan_mask_value (Optional[float]) – Value to mask as NaN before calculating mean and std. This option might be most useful for a diagnostic, such as precipitation rate, where there is a high frequency of zero values. If None, no masking is performed. Default is 0.0.

  • scale_percentiles_to_probability_lower_bound (bool) – If True, the minimum value of the calculated percentiles will be set to the minimum CDF probability implied by the input probabilities, rather than zero. This has the effect of restricting the percentiles to the non-zero part of the distribution, which is useful when there is a high probability of zero values (e.g., for precipitation). When False, percentiles are calculated over the full [0, 1] range, regardless of the input probabilities. Default is False.

_abc_impl = <_abc._abc_data object>#
_calculate_percentiles_from_intensity_distribution(probability_cube, intensity_cube)[source]#

For each spatial location, calculate the percentiles at which the provided intensity values would fall, according to a fitted probability distribution.

Parameters:
  • probability_cube (Cube) – Cube containing probability data at a range of thresholds.

  • intensity_cube (Cube) – Cube containing the intensity data to be mapped to percentiles.

Returns:

An array of percentiles (CDF values) for each intensity at each location.

References

Scheuerer, M., and T. M. Hamill, 2015: Statistical Postprocessing of Ensemble Precipitation Forecasts by Fitting Censored, Shifted Gamma Distributions. Mon. Wea. Rev., 143, 4578–4596, https://doi.org/10.1175/MWR-D-15-0061.1. Wilks, D. S., 2019: Statistical Methods in the Atmospheric Sciences, Academic Press.

static _scale_percentiles_to_probability_lower_bound(percentiles, probabilities, nan_mask_value=0.0)[source]#

Rescale percentiles based on provided probabilities, with optional NaN masking. The rescaling uses the min probability after converting probabilities to be “less than” a given threshold, rather than “greater than”. This rescaling has the effect that when the percentiles are later used for sampling the probabilities, the percentiles will not sample below the min probability. This is helpful for distributions with a high frequency of zero values, where the probability of e.g. the precipitation being less than a particular value can be large. This therefore focuses the percentiles on the non-zero part of the distribution.

Parameters:
  • percentiles (ndarray) – Array of percentiles to be rescaled.

  • probabilities (ndarray) – Array of probabilities for scaling.

  • nan_mask_value (Optional[float]) – Value to mask as NaN before rescaling. If None, no masking is performed.

Returns:

Rescaled percentiles array.

process(probability_cube, intensity_cube)[source]#

Public interface to calculate percentiles from intensity distribution.

Parameters:
  • probability_cube (Cube) – Cube containing probability data at a range of thresholds.

  • intensity_cube (Cube) – Cube containing the intensity data to be mapped to percentiles.

Return type:

ndarray

Returns:

A 3D array of percentiles (CDF values) for each intensity at each location.

choose_set_of_percentiles(no_of_percentiles, sampling='quantile', probability_cube=None, intensity_cube=None, distribution='gamma', nan_mask_value=0.0, scale_percentiles_to_probability_lower_bound=True)[source]#

Function to create percentiles.

Parameters:
  • no_of_percentiles (int) – Number of percentiles.

  • sampling (str) –

    Type of sampling of the distribution to produce a set of percentiles e.g. quantile, random or transformation.

    Accepted options for sampling are:

    • Quantile: A regular set of equally-spaced percentiles aimed

      at dividing a Cumulative Distribution Function into blocks of equal probability. This is the default option.

    • Random: A random set of ordered percentiles.

    • Transformation: A set of percentiles generated by applying a

      transformation to the distribution.

  • probability_cube (Optional[Cube]) – Cube containing probability data at a range of thresholds.

  • intensity_cube (Optional[Cube]) – Cube containing the intensity data to be mapped to percentiles.

  • distribution (Optional[str]) – Type of distribution to fit (currently only ‘gamma’ is supported).

  • nan_mask_value (Optional[float]) – Value to mask as NaN before calculating mean and std. If None, no masking is performed. Default is 0.0.

  • scale_percentiles_to_probability_lower_bound (bool) – If True, the minimum value of the calculated percentiles will be set to the minimum CDF probability implied by the input probabilities, rather than zero. This has the effect of restricting the percentiles to the non-zero part of the distribution, which is useful when there is a high probability of zero values (e.g., for precipitation). When False, percentiles are calculated over the full [0, 1] range, regardless of the input probabilities. Default is True.

Return type:

ndarray

Returns:

Percentiles calculated using the sampling technique specified as a numpy array.

Raises:

ValueError – if the sampling option is not one of the accepted options.

References

For further details, Flowerdew, J., 2014. Calibrating ensemble reliability whilst preserving spatial structure. Tellus, Series A: Dynamic Meteorology and Oceanography, 66(1), pp.1-20. Schefzik, R., Thorarinsdottir, T.L. & Gneiting, T., 2013. Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. Statistical Science, 28(4), pp.616-640.

concatenate_2d_array_with_2d_array_endpoints(array_2d, low_endpoint, high_endpoint)[source]#

For a 2d array, add a 2d array as the lower and upper endpoints. The concatenation to add the lower and upper endpoints to the 2d array are performed along the second (index 1) dimension.

Parameters:
  • array_2d (ndarray) – 2d array of values

  • low_endpoint (float) – Number used to create a 2d array of a constant value as the lower endpoint.

  • high_endpoint (float) – Number of used to create a 2d array of a constant value as the upper endpoint.

Return type:

ndarray

Returns:

2d array of values after padding with the low_endpoint and high_endpoint.

create_cube_with_percentile_index(indices, template_cube, cube_data, cube_unit=None)[source]#

Create a cube with a percentile_index coordinate based on a template cube. The resulting cube will have an extra percentile_index coordinate compared with the template cube. The shape of the cube_data should be the shape of the desired output cube.

Parameters:
  • indices (Union[List[int], ndarray]) – Indices to use for the percentile_index coordinate. There should be the same number of indices as the first dimension of cube_data.

  • template_cube (Cube) – Cube to copy metadata from.

  • cube_data (ndarray) – Data to insert into the template cube. The shape of the cube_data, excluding the dimension associated with the percentile_index coordinate, should be the same as the shape of template_cube.

  • cube_unit (Union[Unit, str, None]) – The units of the data within the cube, if different from those of the template_cube.

Return type:

Cube

Returns:

Cube containing a percentile_index coordinate as the leading dimension (or scalar percentile_index coordinate if single-valued)

create_cube_with_percentiles(percentiles, template_cube, cube_data, cube_unit=None)[source]#

Create a cube with a percentile coordinate based on a template cube. The resulting cube will have an extra percentile coordinate compared with the template cube. The shape of the cube_data should be the shape of the desired output cube.

Parameters:
  • percentiles (Union[List[float], ndarray]) – Ensemble percentiles. There should be the same number of percentiles as the first dimension of cube_data.

  • template_cube (Cube) – Cube to copy metadata from.

  • cube_data (ndarray) – Data to insert into the template cube. The shape of the cube_data, excluding the dimension associated with the percentile coordinate, should be the same as the shape of template_cube. For example, template_cube shape is (3, 3, 3), whilst the cube_data is (10, 3, 3, 3), where there are 10 percentiles.

  • cube_unit (Union[Unit, str, None]) – The units of the data within the cube, if different from those of the template_cube.

Return type:

Cube

Returns:

Cube containing a percentile coordinate as the leading dimension (or scalar percentile coordinate if single-valued)

get_bounds_of_distribution(bounds_pairing_key, desired_units)[source]#

Gets the bounds of the distribution and converts the units of the bounds_pairing to the desired_units.

This method gets the bounds values and units from the imported dictionaries: BOUNDS_FOR_ECDF and units_of_BOUNDS_FOR_ECDF. The units of the bounds are converted to be the desired units.

Parameters:
  • bounds_pairing_key (str) – Name of key to be used for the BOUNDS_FOR_ECDF dictionary, in order to get the desired bounds_pairing.

  • desired_units (Unit) – Units to which the bounds_pairing will be converted.

Return type:

ndarray

Returns:

Lower and upper bound to be used as the ends of the empirical cumulative distribution function, converted to have the desired units.

Raises:

KeyError – If the bounds_pairing_key is not within the BOUNDS_FOR_ECDF dictionary.

insert_lower_and_upper_endpoint_to_1d_array(array_1d, low_endpoint, high_endpoint)[source]#

For a 1d array, add a lower and upper endpoint.

Parameters:
  • array_1d (ndarray) – 1d array of values

  • low_endpoint (float) – Number of use as the lower endpoint.

  • high_endpoint (float) – Number of use as the upper endpoint.

Return type:

ndarray

Returns:

1d array of values padded with the low_endpoint and high_endpoint.

interpolate_multiple_rows_same_x(*args)[source]#

For each row i of fp, do the equivalent of np.interp(x, xp, fp[i, :]).

Calls a fast numba implementation where numba is available (see improver.ensemble_copula_coupling.numba_utilities.fast_interp_same_y) and calls a the native python implementation otherwise (see slow_interp_same_y()).

Parameters:
  • x – 1-D array

  • xp – 1-D array, sorted in non-decreasing order

  • fp – 2-D array with len(xp) columns

Returns:

2-D array with shape (len(fp), len(x)), with each row i equal to

np.interp(x, xp, fp[i, :])

interpolate_multiple_rows_same_y(*args)[source]#

For each row i of xp, do the equivalent of np.interp(x, xp[i], fp).

Calls a fast numba implementation where numba is available (see improver.ensemble_copula_coupling.numba_utilities.fast_interp_same_y) and calls a the native python implementation otherwise (see slow_interp_same_y()).

Parameters:
  • x – 1-d array

  • xp – n * m array, each row must be in non-decreasing order

  • fp – 1-d array with length m

Returns:

n * len(x) array where each row i is equal to np.interp(x, xp[i], fp)

restore_non_percentile_dimensions(array_to_reshape, original_cube, n_percentiles)[source]#

Reshape a 2d array, so that it has the dimensions of the original cube, whilst ensuring that the probabilistic dimension is the first dimension.

Parameters:
  • array_to_reshape (ndarray) – The array that requires reshaping. This has dimensions “percentiles” by “points”, where “points” is a flattened array of all the other original dimensions that needs reshaping.

  • original_cube (Cube) – Cube slice containing the desired shape to be reshaped to, apart from the probabilistic dimension. This would typically be expected to be either [time, y, x] or [y, x].

  • n_percentiles (int) – Length of the required probabilistic dimension (“percentiles”).

Return type:

ndarray

Returns:

The array after reshaping.

Raises:
  • ValueError – If the probabilistic dimension is not the first on the original_cube.

  • CoordinateNotFoundError – If the input_probabilistic_dimension_name is not a coordinate on the original_cube.

slow_interp_same_x(x, xp, fp)[source]#

For each row i of fp, calculate np.interp(x, xp, fp[i, :]). Note that a fast_interp_same_x version of this function exists that uses numba. This slow version is retained as a fallback for when numba is not available.

Parameters:
  • x (ndarray) – 1-D array

  • xp (ndarray) – 1-D array, sorted in non-decreasing order

  • fp (ndarray) – 2-D array with len(xp) columns

Return type:

ndarray

Returns:

2-D array with shape (len(fp), len(x)), with each row i equal to

np.interp(x, xp, fp[i, :])

slow_interp_same_y(x, xp, fp)[source]#

For each row i of xp, do the equivalent of np.interp(x, xp[i], fp). Note that a fast_interp_same_y version of this function exists that uses numba. This slow version is retained as a fallback for when numba is not available.

Parameters:
  • x (ndarray) – 1-d array

  • xp (ndarray) – n * m array, each row must be in non-decreasing order

  • fp (ndarray) – 1-d array with length m

Return type:

ndarray

Returns:

n * len(x) array where each row i is equal to np.interp(x, xp[i], fp)

slow_interp_same_y_2d(x, xp, fp)[source]#

For each row i of xp, do the equivalent of np.interp(x[i], xp[i], fp). Note that a fast_interp_same_y_2d version of this function exists that uses numba. This slow version is retained as a fallback for when numba is not available.

Parameters:
  • x (ndarray) – 2-d array, shape (n, k)

  • xp (ndarray) – n * m array, each row must be in non-decreasing order

  • fp (ndarray) – 1-d array with length m

Return type:

ndarray

Returns:

n * k array where each row i is equal to np.interp(x[i], xp[i], fp)

slow_interp_same_y_nd(x, xp, fp)[source]#

Dispatch to functions that handle 1D or 2D x inputs. Note that a fast_interp_same_y_nd version of this function exists that uses numba. This slow version is retained as a fallback for when numba is not available.

Parameters:
  • x (ndarray) – 1-D or 2-D array

  • xp (ndarray) – n * m array, each row must be in non-decreasing order

  • fp (ndarray) – 1-D array with length m

Return type:

ndarray

Returns:

If x is 1-D, returns n * len(x) array where each row i is equal to

np.interp(x, xp[i], fp).

If x is 2-D, returns n * k array where each row i is equal to

np.interp(x[i], xp[i], fp).

Raises:

ValueError – If x is not 1-D or 2-D.