improver.clustering.clustering module#

Plugins to perform clustering on DataFrames using scikit-learn or kmedoids.

class FitClustering(clustering_method, **kwargs)[source]#

Bases: BasePlugin

Class to perform clustering on DataFrames using scikit-learn or kmedoids.

This plugin provides a unified interface for applying various clustering algorithms to pandas DataFrames. It supports clustering methods from scikit-learn’s cluster module as well as the KMedoids algorithm from the kmedoids package. The plugin automatically selects the appropriate package based on the specified clustering method: - “KMedoids”: Uses the kmedoids package - All other methods: Uses sklearn.cluster

__init__(clustering_method, **kwargs)[source]#

Initialise the clustering plugin.

Parameters:
  • clustering_method (str) – The name of the clustering method to use. Must be either “KMedoids” (from kmedoids package) or a valid clustering class name from sklearn.cluster (e.g., “KMeans”, “DBSCAN”, “AgglomerativeClustering”).

  • **kwargs (Any) – Additional keyword arguments to pass to the clustering algorithm. These are method-specific parameters. Common examples: - n_clusters (int): Number of clusters (for KMeans, AgglomerativeClustering) - random_state (int): Random seed for reproducibility Refer to the scikit-learn or kmedoids documentation for the complete list of parameters for each clustering method.

Raises:

ValueError – If the specified clustering method is not found in sklearn.cluster or kmedoids packages.

_abc_impl = <_abc._abc_data object>#
process(df)[source]#

Apply the clustering method to the DataFrame. Fits the specified clustering algorithm to the input DataFrame and returns the fitted clustering model.

Parameters:

df (DataFrame) – The input DataFrame to cluster. Each row represents a sample and each column represents a feature. The DataFrame should contain numeric data suitable for the chosen clustering algorithm.

Return type:

Any

Returns:

A fitted clustering model object from either sklearn.cluster or kmedoids. The returned object will have at minimum a labels_ attribute containing the cluster assignment for each sample. Additional attributes depend on the specific clustering method used (e.g., cluster_centers_ for KMeans, core_sample_indices_ for DBSCAN).

Raises:

ValueError – If the specified clustering method is not found in sklearn.cluster or is not “KMedoids”.