improver.calibration.reliability_calibration module

Reliability calibration plugins.

class AggregateReliabilityCalibrationTables[source]

Bases: BasePlugin

This plugin enables the aggregation of multiple reliability calibration tables, and/or the aggregation over coordinates in the tables.

_abc_impl = <_abc_data object>
static _check_frt_coord(cubes)[source]

Check that the reliability calibration tables do not have overlapping forecast reference time bounds. If these coordinates overlap in time it indicates that some of the same forecast data has contributed to more than one table, thus aggregating them would double count these contributions.

Parameters:

cubes (Union[List[Cube], CubeList]) – The list of reliability calibration tables for which the forecast reference time coordinates should be checked.

Raises:

ValueError – If the bounds overlap.

Return type:

None

process(cubes, coordinates=None)[source]

Aggregate the input reliability calibration table cubes and return the result.

Parameters:
  • cubes (Union[CubeList, List[Cube]]) – The cube or cubes containing the reliability calibration tables to aggregate.

  • coordinates (Optional[List[str]]) – A list of coordinates over which to aggregate the reliability calibration table using summation. If the argument is None and a single cube is provided, this cube will be returned unchanged.

Return type:

Cube

Returns:

Aggregated cube

class ApplyReliabilityCalibration(point_by_point=False)[source]

Bases: PostProcessingPlugin

A plugin for the application of reliability calibration to probability forecasts. This calibration is designed to improve the reliability of probability forecasts without significantly degrading their resolution.

The method implemented here is described in Flowerdew J. 2014. Calibration is always applied as long as there are at least two bins within the input reliability table.

References: Flowerdew J. 2014. Calibrating ensemble reliability whilst preserving spatial structure. Tellus, Ser. A Dyn. Meteorol. Oceanogr. 66.

__init__(point_by_point=False)[source]

Initialise class for applying reliability calibration.

Parameters:

point_by_point (bool) – Whether to calibrate each point in the input cube independently. Utilising this option requires that each spatial point in the forecast cube has a corresponding spatial point in the reliability table. Please note this option is memory intensive and is unsuitable for gridded input.

_abc_impl = <_abc_data object>
_apply_calibration(forecast, reliability_table)[source]

Apply reliability calibration to a forecast.

Parameters:
  • forecast (Cube) – The forecast to be calibrated.

  • reliability_table (Union[Cube, CubeList]) – The reliability table to use for applying calibration.

Return type:

Cube

Returns:

The forecast cube following calibration.

_apply_point_by_point_calibration(forecast, reliability_table)[source]

Apply point by point reliability calibration by iteratively picking a spatial coordinate within the forecast cube, extracting the forecast at that point and the reliability table corresponding to that point, then passing the extracted forecast and reliability table to _get_calibrated_forecast().

Parameters:
  • forecast (Cube) – The forecast to be calibrated.

  • reliability_table (CubeList) – The reliability table to use for applying calibration.

Return type:

Cube

Returns:

The forecast cube following calibration.

_calculate_reliability_probabilities(reliability_table)[source]

Calculates forecast probabilities and observation frequencies from the reliability table. If fewer than two bins are provided, Nones are returned as no calibration can be applied. Fewer than two bins can occur due to repeated combination of undersampled probability bins, please see ManipulateReliabilityTable.

Parameters:

reliability_table (Cube) – A reliability table for a single threshold from which to calculate the forecast probabilities and observation frequencies.

Return type:

Tuple[Optional[ndarray], Optional[ndarray]]

Returns:

Tuple containing forecast probabilities calculated by dividing the sum of forecast probabilities by the forecast count and observation frequency calculated by dividing the observation count by the forecast count.

_ensure_monotonicity_across_thresholds(cube)[source]

Ensures that probabilities change monotonically relative to thresholds in the expected order, e.g. exceedance probabilities always remain the same or decrease as the threshold values increase, below threshold probabilities always remain the same or increase as the threshold values increase.

Parameters:

cube (Cube) – The probability cube for which monotonicity is to be checked and enforced. This cube is modified in place.

Raises:

ValueError – Threshold coordinate lacks the spp__relative_to_threshold attribute.

Warns:

UserWarning – If the probabilities must be sorted to reinstate expected monotonicity following calibration.

Return type:

None

static _extract_matching_reliability_table(forecast, reliability_table)[source]

Extract the reliability table with a threshold coordinate matching the forecast cube. If no matching reliability table is found raise an exception.

Parameters:
  • forecast (Cube) – The forecast to be calibrated.

  • reliability_table (Union[Cube, CubeList]) – The reliability table to use for applying calibration.

Return type:

Cube

Returns:

A reliability table with a threshold coordinate that matches the forecast cube.

Raises:

ValueError – If no matching reliability table is found.

static _interpolate(forecast_threshold, reliability_probabilities, observation_frequencies)[source]

Perform interpolation of the forecast probabilities using the reliability table data to produce the calibrated forecast. Where necessary linear extrapolation will be applied. Any mask in place on the forecast_threshold data is removed and reapplied after calibration.

Parameters:
  • forecast_threshold (Union[MaskedArray, ndarray]) – The forecast probabilities to be calibrated.

  • reliability_probabilities (ndarray) – Probabilities taken from the reliability tables.

  • observation_frequencies (ndarray) – Observation frequencies that relate to the reliability probabilities, taken from the reliability tables.

Return type:

Union[MaskedArray, ndarray]

Returns:

The calibrated forecast probabilities. The final results are clipped to ensure any extrapolation has not yielded probabilities outside the range 0 to 1.

process(forecast, reliability_table)[source]

Apply reliability calibration to a forecast. The reliability table and the forecast cube must share an identical threshold coordinate.

Parameters:
  • forecast (Cube) – The forecast to be calibrated.

  • reliability_table (Union[Cube, CubeList]) – The reliability table to use for applying calibration.

Return type:

Cube

Returns:

The forecast cube following calibration.

class ConstructReliabilityCalibrationTables(n_probability_bins=5, single_value_lower_limit=False, single_value_upper_limit=False)[source]

Bases: BasePlugin

A plugin for creating and populating reliability calibration tables.

__init__(n_probability_bins=5, single_value_lower_limit=False, single_value_upper_limit=False)[source]

Initialise class for creating reliability calibration tables. These tables include data columns entitled observation_count, sum_of_forecast_probabilities, and forecast_count, defined below.

n_probability_bins:

The total number of probability bins required in the reliability tables. If single value limits are turned on, these are included in this total.

single_value_lower_limit:

Mandates that the lowest bin should be single valued, with a small precision tolerance, defined as 1.0E-6. The bin is thus 0 to 1.0E-6.

single_value_upper_limit:

Mandates that the highest bin should be single valued, with a small precision tolerance, defined as 1.0E-6. The bin is thus (1 - 1.0E-6) to 1.

_abc_impl = <_abc_data object>
_add_reliability_tables(forecast, truth, threshold_reliability)[source]

Add reliability tables. The presence of a masked truth is handled separately to ensure support for a mask that changes with validity time.

Parameters:
  • forecast (Cube) – An array containing data over an xy slice for a single validity time and threshold.

  • truth (Cube) – An array containing a thresholded gridded truth at an equivalent validity time to the forecast array.

  • threshold_reliability (MaskedArray) – The current reliability table that will be added to.

Return type:

Union[MaskedArray, ndarray]

Returns:

An array containing reliability table data for a single time and threshold. The leading dimension corresponds to the rows of a calibration table, the second dimension to the number of probability bins, and the trailing dimensions are the spatial dimensions of the forecast and truth cubes (which are equivalent).

_create_probability_bins_coord()[source]

Construct a dimension coordinate describing the probability bins of the reliability table.

Return type:

DimCoord

Returns:

A dimension coordinate describing probability bins.

_create_reliability_table_coords()[source]

Construct coordinates that describe the reliability table rows. These are observation_count, sum_of_forecast_probabilities, and forecast_count. The order used here is the order in which the table data is populated, so these must remain consistent with the _populate_reliability_bins function.

Return type:

Tuple[DimCoord, AuxCoord]

Returns:

  • A numerical index dimension coordinate.

  • An auxiliary coordinate that assigns names to the index coordinates, where these names correspond to the reliability table rows.

_create_reliability_table_cube(forecast, threshold_coord)[source]

Construct a reliability table cube and populate it with the provided data. The returned cube will include a forecast_reference_time coordinate, which will be the maximum range of bounds of the input forecast reference times, with the point value set to the latest of those in the inputs. It will further include the forecast period, threshold coordinate, and spatial coordinates from the forecast cube.

Parameters:
  • forecast (Cube) – A cube slice across the spatial dimensions of the forecast data. This slice provides the time and threshold values that relate to the reliability_table_data.

  • threshold_coord (DimCoord) – The threshold coordinate.

Return type:

Cube

Returns:

A reliability table cube.

static _define_metadata(forecast_slice)[source]

Define metadata that is specifically required for reliability table cubes, whilst ensuring any mandatory attributes are also populated.

Parameters:

forecast_slice (Cube) – The source cube from which to get pre-existing metadata of use.

Return type:

Dict[str, str]

Returns:

A dictionary of attributes that are appropriate for the reliability table cube.

_define_probability_bins(n_probability_bins, single_value_lower_limit, single_value_upper_limit)[source]

Define equally sized probability bins for use in a reliability table. The range 0 to 1 is divided into ranges to give n_probability bins. If single_value_lower_limit and / or single_value_upper_limit are True, additional bins corresponding to values of 0 and / or 1 will be created, each with a width defined by self.single_value_tolerance.

Parameters:
  • n_probability_bins (int) – The total number of probability bins desired in the reliability tables. This number includes the extrema bins (equals 0 and equals 1) if single value limits are turned on, in which case the minimum number of bins is 3.

  • single_value_lower_limit (bool) – Mandates that the lowest bin should be single valued, with a small precision tolerance, defined as 1.0E-6. The bin is thus 0 to 1.0E-6.

  • single_value_upper_limit (bool) – Mandates that the highest bin should be single valued, with a small precision tolerance, defined as 1.0E-6. The bin is thus (1 - 1.0E-6) to 1.

Return type:

ndarray

Returns:

An array of 2-element arrays that contain the bounds of the probability bins. These bounds are non-overlapping, with adjacent bin boundaries spaced at the smallest representable interval.

Raises:

ValueError – If trying to use both single_value_lower_limit and single_value_upper_limit with 2 or fewer probability bins.

_populate_masked_reliability_bins(forecast, truth)[source]

Support populating the reliability table bins with a masked truth. If a masked truth is provided, a masked reliability table is returned.

Parameters:
  • forecast (ndarray) – An array containing data over an xy slice for a single validity time and threshold.

  • truth (MaskedArray) – An array containing a thresholded gridded truth at an equivalent validity time to the forecast array.

Return type:

MaskedArray

Returns:

An array containing reliability table data for a single time and threshold. The leading dimension corresponds to the rows of a calibration table, the second dimension to the number of probability bins, and the trailing dimensions are the spatial dimensions of the forecast and truth cubes (which are equivalent).

_populate_reliability_bins(forecast, truth)[source]

For a spatial slice at a single validity time and threshold, populate a reliability table using the provided truth.

Parameters:
  • forecast (Union[MaskedArray, ndarray]) – An array containing data over a spatial slice for a single validity time and threshold.

  • truth (Union[MaskedArray, ndarray]) – An array containing a thresholded gridded truth at an equivalent validity time to the forecast array.

Return type:

MaskedArray

Returns:

An array containing reliability table data for a single time and threshold. The leading dimension corresponds to the rows of a calibration table, the second dimension to the number of probability bins, and the trailing dimension(s) are the spatial dimension(s) of the forecast and truth cubes (which are equivalent).

process(historic_forecasts, truths, aggregate_coords=None)[source]

Slice data over threshold and time coordinates to construct reliability tables. These are summed over time to give a single table for each threshold, constructed from all the provided historic forecasts and truths. If a masked truth is provided, a masked reliability table is returned. If the mask within the truth varies at different timesteps, any point that is unmasked for at least one timestep will have unmasked values within the reliability table. Therefore historic forecast points will only be used if they have a corresponding valid truth point for each timestep.

Examples

The reliability calibration tables returned by this plugin are structured as shown below:

reliability_calibration_table / (1) (air_temperature: 2; table_row_index: 3; probability_bin: 5; projection_y_coordinate: 970; projection_x_coordinate: 1042)
     Dimension coordinates:
          air_temperature                           x                   -                   -                           -                             -
          table_row_index                           -                   x                   -                           -                             -
          probability_bin                           -                   -                   x                           -                             -
          projection_y_coordinate                   -                   -                   -                           x                             -
          projection_x_coordinate                   -                   -                   -                           -                             x
     Auxiliary coordinates:
          table_row_name                            -                   x                   -                           -                             -
     Scalar coordinates:
          forecast_reference_time: 2017-11-11 00:00:00, bound=(2017-11-10 00:00:00, 2017-11-11 00:00:00)
          forecast_period: 68400 seconds
     Attributes:
          institution: Met Office
          source: Met Office Unified Model
          title: Reliability calibration data table

The leading dimension is threshold (here shown with air_temperature thresholds). The table row index dimension corresponds to the expected reliability table rows. These rows are named in the associated table_row_name coordinate, with the names being observation_count, sum_of_forecast_probability, and forecast_count. The probability bins column will have a size that corresponds to the user provided n_probability_bins. The last two dimensions are the spatial coordinates on which the forecast and truth data has been provided.

Note that the probability bins are defined as a series of non-overlapping ranges. Adjacent bin boundaries are spaced at the smallest representable interval, such that no probability can fall outside of any bin. The probability bins may include single value limits if the user chooses, where these are bins with a width of 1.0E-6, at 0 (0 to 1.0E-6) and 1 ((1 - 1.0E-6) to 1). These finite widths ensure that float precision errors do not prevent values being allocated to these bins. The equality operators away from these single limit bins are effectively “greater than” the lower bound of the probability bin, and “less than or equal to” the upper bound.

Note that the forecast and truth data used is probabilistic, i.e. has already been thresholded relative to the thresholds of interest, using the equality operator required. As such this plugin is agnostic as to whether the data is thresholded below or above a given diagnostic threshold.

historic_forecasts and truths should have matching validity times.

Parameters:
Return type:

Cube

Returns:

A cubelist of reliability table cubes, one for each threshold in the historic forecast cubes.

Raises:

ValueError – If the forecast and truth cubes have differing threshold coordinates.

class ManipulateReliabilityTable(minimum_forecast_count=200, point_by_point=False)[source]

Bases: BasePlugin

A plugin to manipulate the reliability tables before they are used to calibrate a forecast. x and y coordinates on the reliability table must be collapsed. The result is a reliability diagram with monotonic observation frequency.

Steps taken are:

1. If any bin contains less than the minimum forecast count then try combining this bin with whichever neighbour has the lowest sample count. This process is repeated for all bins that are below the minimum forecast count criterion.

2. If non-monotonicity of the observation frequency is detected, try combining a pair of bins that appear non-monotonic. Only a single pair of bins are combined.

3. If non-monotonicity of the observation frequency remains after trying to combine a single pair of bins, replace non-monotonic bins by assuming a constant observation frequency.

__init__(minimum_forecast_count=200, point_by_point=False)[source]

Initialise class for manipulating a reliability table.

Parameters:
  • minimum_forecast_count (int) – The minimum number of forecast counts in a forecast probability bin for it to be used in calibration. The default value of 200 is that used in Flowerdew 2014.

  • point_by_point (bool) – Whether to process each point in the input cube independently. Please note this option is memory intensive and is unsuitable for gridded input

Raises:

ValueError – If minimum_forecast_count is less than 1.

References

Flowerdew J. 2014. Calibrating ensemble reliability whilst preserving spatial structure. Tellus, Ser. A Dyn. Meteorol. Oceanogr. 66.

_abc_impl = <_abc_data object>
static _assume_constant_observation_frequency(observation_count, forecast_count)[source]

Decide which end bin (highest probability bin or lowest probability bin) has the highest sample count. Iterate through the observation frequency from the end bin with the highest sample count to the end bin with the lowest sample count. Whilst iterating, compare each pair of bins and, if a pair is non-monotonic, replace the value of the bin closer to the lowest sample count end bin with the value of the bin that is closer to the higher sample count end bin. Then calculate the new observation count required to give a monotonic observation frequency.

Parameters:
  • observation_count (ndarray) – Observation count extracted from reliability table.

  • forecast_count (ndarray) – Forecast count extracted from reliability table.

Return type:

ndarray

Returns:

Observation count computed from a monotonic observation frequency.

_combine_bin_pair(observation_count, forecast_probability_sum, forecast_count, probability_bin_coord)[source]

Combine a pair of bins when non-monotonicity of the observation frequency is detected. Iterate top-down from the highest forecast probability bin to the lowest probability bin when combining the bins. Only allow a single pair of bins to be combined.

Parameters:
  • observation_count (ndarray) – Observation count extracted from reliability table.

  • forecast_probability_sum (ndarray) – Forecast probability sum extracted from reliability table.

  • forecast_count (ndarray) – Forecast count extracted from reliability table.

  • probability_bin_coord (DimCoord) – Original probability bin coordinate.

Return type:

Tuple[ndarray, ndarray, ndarray, DimCoord]

Returns:

Tuple containing the updated observation count, forecast probability sum, forecast count and probability bin coordinate.

_combine_undersampled_bins(observation_count, forecast_probability_sum, forecast_count, probability_bin_coord)[source]

Combine bins that are under-sampled i.e. that have a lower forecast count than the minimum_forecast_count, so that information from these poorly-sampled bins can contribute to the calibration. If multiple bins are below the minimum forecast count, the bin closest to meeting the minimum_forecast_count criterion is combined with whichever neighbour has the lowest sample count. A new bin is then created by summing the neighbouring pair of bins. This process is repeated for all bins that are below the minimum forecast count criterion.

Parameters:
  • observation_count (ndarray) – Observation count extracted from reliability table.

  • forecast_probability_sum (ndarray) – Forecast probability sum extracted from reliability table.

  • forecast_count (ndarray) – Forecast count extracted from reliability table.

  • probability_bin_coord (DimCoord) – Original probability bin coordinate.

Return type:

Tuple[ndarray, ndarray, ndarray, DimCoord]

Returns:

Tuple containing the updated observation count, forecast probability sum, forecast count and probability bin coordinate.

static _create_new_bin_coord(probability_bin_coord, upper)[source]

Create a new probability_bin coordinate by combining two adjacent points on the probability_bin coordinate. This matches the combination of the data for the two bins.

Parameters:
  • probability_bin_coord (DimCoord) – Original probability bin coordinate.

  • upper (int) – Upper index of pair.

Return type:

DimCoord

Returns:

Probability bin coordinate with updated points and bounds where a pair of bins have been combined to create a single bin.

_enforce_min_count_and_montonicity(rel_table_slice)[source]

Apply the steps needed to produce a reliability diagram on a single slice of reliability table cube.

Parameters:

reliability_table_slice – The reliability table slice to be manipulated. The only coordinates expected on this cube are a table_row_index coordinate and corresponding table_row_name coordinate and a probability_bin coordinate.

Return type:

Cube

Returns:

Processed reliability table slice, with reliability steps applied.

static _extract_reliability_table_components(reliability_table)[source]

Extract reliability table components from cube

Parameters:

reliability_table (Cube) – A reliability table to be manipulated.

Return type:

Tuple[ndarray, ndarray, ndarray, DimCoord]

Returns:

Tuple containing the updated observation count, forecast probability sum, forecast count and probability bin coordinate.

static _sum_pairs(array, upper)[source]

Returns a new array where a pair of values in the original array have been replaced by their sum. Combines the value in the upper index with the value in the upper-1 index.

Parameters:
  • array (ndarray) – Array to be modified.

  • upper (int) – Upper index of pair.

Return type:

ndarray

Returns:

Array where a pair of values has been replaced by their sum.

static _update_reliability_table(reliability_table, observation_count, forecast_probability_sum, forecast_count, probability_bin_coord)[source]

Update the reliability table data and the probability bin coordinate.

Parameters:
  • reliability_table (Cube) – A reliability table to be manipulated.

  • observation_count (ndarray) – Observation count extracted from reliability table.

  • forecast_probability_sum (ndarray) – Forecast probability sum extracted from reliability table.

  • forecast_count (ndarray) – Forecast count extracted from reliability table.

  • probability_bin_coord (DimCoord) – Original probability bin coordinate.

Return type:

Cube

Returns:

Updated reliability table.

process(reliability_table)[source]

Apply the steps needed to produce a reliability diagram with a monotonic observation frequency.

Parameters:

reliability_table (Cube) – A reliability table to be manipulated. The only coordinates expected on this cube are a threshold coordinate, a table_row_index coordinate and corresponding table_row_name coordinate and a probability_bin coordinate.

Return type:

CubeList

Returns:

CubeList containing a reliability table cube for each threshold in the input reliablity table. For tables where monotonicity has been enforced the probability_bin coordinate will have one less bin than the tables that were already monotonic. If under-sampled bins have been combined, then the probability_bin coordinate will have been reduced until all bins have more than the minimum_forecast_count if possible; a single under-sampled bin will be returned if combining all bins is still insufficient to reach the minimum_forecast_count.