improver.categorical.modal_code module

Module containing a plugin to calculate the modal category in a period.

class ModalCategory(decision_tree, model_id_attr=None, record_run_attr=None)[source]

Bases: BasePlugin

Plugin that returns the modal category over the period spanned by the input data. In cases of a tie in the mode values, scipy returns the smaller value. The opposite is desirable in this case as the significance / importance of the weather code categories generally increases with the value. To achieve this the categories are subtracted from an arbitrarily larger number prior to calculating the mode, and this operation is reversed before the final output is returned.

If there are many different categories for a single point over the time spanned by the input cubes it may be that the returned mode is not robust. Given the preference to return more significant categories explained above, a 12 hour period with 12 different categories, one of which is severe, will return that severe category to describe the whole period. This is likely not a good representation. In these cases grouping is used to try and select a suitable category (e.g. a rain shower if the codes include a mix of rain showers and dynamic rain) by providing a more robust mode. The lowest number (least significant) member of the group is returned as the code. Use of the least significant member reflects the lower certainty in the forecasts.

Where there are different categories available for night and day, the modal code returned is always a day code, regardless of the times covered by the input files.

__init__(decision_tree, model_id_attr=None, record_run_attr=None)[source]

Set up plugin and create an aggregator instance for reuse

Parameters:
  • decision_tree (Dict) – The decision tree used to generate the categories and which contains the mapping of day and night categories and of category groupings.

  • model_id_attr (Optional[str]) – Name of attribute recording source models that should be inherited by the output cube. The source models are expected as a space-separated string.

  • record_run_attr (Optional[str]) – Name of attribute used to record models and cycles used in constructing the categories.

_abc_impl = <_abc_data object>
_code_groups()[source]

Determines code groupings from the decision tree

Return type:

Dict

_group_codes(modal, cube)[source]

In instances where the mode returned is not significant, i.e. the category chosen occurs infrequently in the period, the codes can be grouped to yield a more definitive period code. Given the uncertainty, the least significant category (lowest number in a group that is found in the data) is used to replace the other data values that belong to that group prior to recalculating the modal code.

The modal cube is modified in place.

Parameters:
  • modal (Cube) – The modal categorical cube which contains UNSET_CODE_INDICATOR values that need to be replaced with a more definitive period code.

  • cube (Cube) – The original input data. Data relating to unset points will be grouped and the mode recalculated.

static _set_blended_times(cube)[source]

Updates time coordinates so that time point is at the end of the time bounds, blend_time and forecast_reference_time (if present) are set to the end of the bound period and bounds are removed, and forecast_period is updated to match.

Return type:

None

_unify_day_and_night(cube)[source]

Remove distinction between day and night codes so they can each contribute when calculating the modal code. The cube of categorical data is modified in place with all night codes made into their daytime equivalents.

Parameters:

data (A cube of categorical) –

mode_aggregator(data, axis)[source]

An aggregator for use with iris to calculate the mode along the specified axis. If the modal value selected comprises less than 30% of data along the dimension being collapsed, the value is set to the UNSET_CODE_INDICATOR to indicate that the uncertainty was too high to return a mode.

Parameters:
  • data (ndarray) – The data for which a mode is to be calculated.

  • axis (int) – The axis / dimension over which to calculate the mode.

Return type:

ndarray

Returns:

The data array collapsed over axis, containing the calculated modes.

process(cubes)[source]

Calculate the modal categorical code, with handling for edge cases.

Parameters:

cubes (CubeList) – A list of categorical cubes at different times. A modal code will be calculated over the time coordinate to return the most common code, which is taken to be the best representation of the whole period.

Return type:

Cube

Returns:

A single categorical cube with time bounds that span those of the input categorical cubes.