improver.calibration.utilities module

This module defines all the utilities used by the “plugins” specific for ensemble calibration.

_ceiling_fp(cube)[source]

Find the forecast period points rounded up to the next hour.

Parameters:: cube (Cube) – Cube with a forecast_period coordinate.
Return type:: ndarray
Returns:: The forecast period points in units of hours after rounding the points up to the next hour.

broadcast_data_to_time_coord(cubelist)[source]

Ensure that the data from all cubes within a cubelist is of the required shape by broadcasting the data from cubes without a time coordinate along the time dimension taken from other input cubes that do have a time coordinate. In the case where none of the input cubes have a time coordinate that is a dimension coordinate, which may occur when using a very small training dataset, the data is returned without being broadcast.

Parameters:: cubelist (CubeList) – The cubelist from which the data will be extracted and broadcast along the time dimension as required.
Return type:: List[ndarray]
Returns:: The data taken from cubes within a cubelist where cubes without a time coordinate have had their data broadcast along the time dimension (with this time dimension provided by other input cubes with a time dimension) to ensure that the data within each numpy array within the output list has the same shape. If a time dimension coordinate is not present on any of the cubes, no broadcasting occurs.

check_data_sufficiency(historic_forecasts, truths, point_by_point, proportion_of_nans)[source]

Check whether there is sufficient valid data (i.e. values that are not NaN) within the historic forecasts and truths, in order to robustly compute EMOS coefficients.

Parameters:

historic_forecasts (Cube) – Cube containing historic forcasts.
truths (Cube) – Cube containing truths.
point_by_point (bool) – If True, coefficients are calculated independently for each point within the input cube by creating an initial guess and minimising each grid point independently.
proportion_of_nans (float) – The proportion of the matching historic forecast-truth pairs that are allowed to be NaN.

Raises:

ValueError – If the proportion of NaNs is higher than allowable for a site, if using point_by_point.
ValueError – If the proportion of NaNs is higher than allowable when considering all sites.

check_forecast_consistency(forecasts)[source]

Checks that the forecast cubes have a consistent forecast reference time hour and a consistent forecast period.

Parameters:

forecasts (Cube) –

Raises:

ValueError – Forecast cubes have differing forecast reference time hours
ValueError – Forecast cubes have differing forecast periods

Return type:

None

check_predictor(predictor)[source]

Check the predictor at the start of the process methods in relevant ensemble calibration plugins, to avoid having to check and raise an error later. Also, lowercase the string.

Parameters:: predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.
Return type:: str
Returns:: The predictor string in lowercase.
Raises:: ValueError – If the predictor is not valid.

convert_cube_data_to_2d(forecast, coord='realization', transpose=True)[source]

Function to convert data from a N-dimensional cube into a 2d numpy array. The result can be transposed, if required.

Parameters:

forecast (Cube) – N-dimensional cube to be reshaped.
coord (str) – This dimension is retained as the second dimension by default, and the leading dimension if “transpose” is set to False.
transpose (bool) – If True, the resulting flattened data is transposed. This will transpose a 2d array of the format [coord, :] to [:, coord]. If coord is not a dimension on the input cube, the resulting array will be 2d with items of length 1.

Return type:

ndarray

Returns:

Reshaped 2d array.

create_unified_frt_coord(forecast_reference_time)[source]

Constructs a single forecast reference time coordinate from a multi-valued coordinate. The new coordinate records the maximum range of bounds of the input forecast reference times, with the point value set to the latest of those in the inputs.

Parameters:: forecast_reference_time (DimCoord) – The forecast_reference_time coordinate to be used in the coordinate creation.
Return type:: DimCoord
Returns:: A dimension coordinate containing the forecast reference time coordinate with suitable bounds. The coordinate point is that of the latest contributing forecast.

filter_non_matching_cubes(historic_forecast, truth)[source]

Provide filtering for the historic forecast and truth to make sure that these contain matching validity times. This ensures that any mismatch between the historic forecasts and truth is dealt with. If multiple time slices of the historic forecast match with the same truth slice, only the first truth slice is kept to avoid duplicate truth slices, which prevent the truth cubes being merged. This can occur when processing a cube with a multi-dimensional time coordinate. If a historic forecast time slice contains only NaNs, then this time slice is also skipped. This can occur when processing a multi-dimensional time coordinate where some of the forecast reference time and forecast period combinations do not typically occur, so may be filled with NaNs.

Parameters:

historic_forecast (Cube) – Cube of historic forecasts that potentially contains a mismatch compared to the truth.
truth (Cube) – Cube of truth that potentially contains a mismatch compared to the historic forecasts.

Return type:

Tuple[Cube, Cube]

Returns:

Cube of historic forecasts where any mismatches with the truth cube have been removed.
Cube of truths where any mismatches with the historic_forecasts cube have been removed.

Raises:

ValueError – The filtering has found no matches in validity time between the historic forecasts and the truths.

flatten_ignoring_masked_data(data_array, preserve_leading_dimension=False)[source]

Flatten an array, selecting only valid data if the array is masked. There is also the option to reshape the resulting array so it has the same leading dimension as the input array, but the other dimensions of the array are flattened. It is assumed that each of the slices along the leading dimension are masked in the same way. This functionality is used in EstimateCoefficientsForEnsembleCalibration when realizations are used as predictors.

Parameters:

data_array (Union[MaskedArray, ndarray]) – An array or masked array to be flattened. If it is masked and the leading dimension is preserved the mask must be the same for every slice along the leading dimension.
preserve_leading_dimension (bool) – Default False. If True the flattened array is reshaped so it has the same leading dimension as the input array. If False the returned array is 1D.

Return type:

ndarray

Returns:

A flattened array containing only valid data. Either 1D or, if preserving the leading dimension 2D. In the latter case the leading dimension is the same as the input data_array.

Raises:

ValueError – If preserving the leading dimension and the mask on the input array is not the same for every slice along the leading dimension.

forecast_coords_match(first_cube, second_cube)[source]

Determine if two cubes have equivalent forecast_periods and forecast_reference_time coordinates with an accepted leniency. The forecast period is rounded up to the next hour to support calibrating subhourly forecasts with coefficients taken from on the hour. For forecast reference time, only the hour is checked.

Parameters:

first_cube (Cube) – First cube to compare.
second_cube (Cube) – Second cube to compare.

Raises:

ValueError – The two cubes are not equivalent.

Return type:

None

get_frt_hours(forecast_reference_time)[source]

Returns a set of integer representations of the hour of the forecast reference time.

Parameters:: forecast_reference_time (DimCoord) – The forecast_reference_time coordinate to extract the hours from.
Return type:: Set[int]
Returns:: A set of integer representations of the forecast reference time hours.

merge_land_and_sea(calibrated_land_only, uncalibrated)[source]

Merge data that has been calibrated over the land with uncalibrated data. Calibrated data will have masked data over the sea which will need to be filled with the uncalibrated data.

Parameters:

calibrated_land_only (Cube) – A cube that has been calibrated over the land, with sea points masked out. Either realizations, probabilities or percentiles. Data is modified in place.
uncalibrated (Cube) – A cube of uncalibrated data with valid data over the sea. Either realizations, probabilities or percentiles. Dimension coordinates must be the same as the calibrated_land_only cube.

Raises:

ValueError – If input cubes do not have the same input dimensions.

Return type:

None