improver.calibration.rainforest_training module#

RainForests model training plugin.

class TrainRainForestsModel(model_config_dict, training_data, observation_column, training_columns, lightgbm_params=None)[source]#

Bases: BasePlugin

__init__(model_config_dict, training_data, observation_column, training_columns, lightgbm_params=None)[source]#

Initialise the options used when training models.

Parameters:
  • model_config_dict (dict[int, dict[str, dict[str, str]]]) – Dictionary describing the high-level RainForests model structure; - top level key describes the lead-hour, - next level key describes the threshold, - corresponding values locate the associated model file.

  • training_data (DataFrame) – Combined data set used to train models.

  • observation_column (str) – The column in the data set to be trained for.

  • training_columns (list[str]) – Set of columns from the data set to be trained from.

  • lightgbm_params (dict | None) – Optional. Parameters passed into training library. Any parameters here will override the default parameters.

Dictionary is of format:

{
“24”: {
“0.000010”: {

“lightgbm_model”: “<path_to_lightgbm_model_object>”, “treelite_model”: “<path_to_treelite_model_object>”

}, “0.000050”: {

“lightgbm_model”: “<path_to_lightgbm_model_object>”, “treelite_model”: “<path_to_treelite_model_object>”

}, “0.000100”: {

“lightgbm_model”: “<path_to_lightgbm_model_object>”, “treelite_model”: “<path_to_treelite_model_object>”

},

}

The keys specify the lead times and model threshold values, while the associated values are the path to the corresponding tree-model objects for that lead time and threshold.

_abc_impl = <_abc._abc_data object>#
_train_model(threshold, model_path)[source]#

Train a model for a particular threshold and saves it to disk.

Parameters:
  • threshold (float) – Threshold for which the observation column is trained.

  • model_path (Path) – Full file path where the model should be saved.

Return type:

None

params = {'num_leaves': 5, 'objective': 'binary', 'seed': 0}#
process(lead_time, thresholds)[source]#

Train models for a set of threshold values.

Parameters:
  • lead_time (int) – Lead time in hours of training data. Used to get retreive model paths from config data.

  • thresholds (list[str]) – Threshold values for which the observation column is trained. Formatted to match the keys used in the model_config object.

Return type:

None