improver.clustering.clustering module#
Plugins to perform clustering on DataFrames using scikit-learn or kmedoids.
- class FitClustering(clustering_method, **kwargs)[source]#
Bases:
BasePluginClass to perform clustering on DataFrames using scikit-learn or kmedoids.
This plugin provides a unified interface for applying various clustering algorithms to pandas DataFrames. It supports clustering methods from scikit-learn’s cluster module as well as the KMedoids algorithm from the kmedoids package. The plugin automatically selects the appropriate package based on the specified clustering method: - “KMedoids”: Uses the kmedoids package - All other methods: Uses sklearn.cluster
- __init__(clustering_method, **kwargs)[source]#
Initialise the clustering plugin.
- Parameters:
clustering_method (
str) – The name of the clustering method to use. Must be either “KMedoids” (from kmedoids package) or a valid clustering class name from sklearn.cluster (e.g., “KMeans”, “DBSCAN”, “AgglomerativeClustering”).**kwargs (
Any) – Additional keyword arguments to pass to the clustering algorithm. These are method-specific parameters. Common examples: - n_clusters (int): Number of clusters (for KMeans, AgglomerativeClustering) - random_state (int): Random seed for reproducibility Refer to the scikit-learn or kmedoids documentation for the complete list of parameters for each clustering method.
- Raises:
ValueError – If the specified clustering method is not found in sklearn.cluster or kmedoids packages.
- _abc_impl = <_abc._abc_data object>#
- process(df)[source]#
Apply the clustering method to the DataFrame. Fits the specified clustering algorithm to the input DataFrame and returns the fitted clustering model.
- Parameters:
df (
DataFrame) – The input DataFrame to cluster. Each row represents a sample and each column represents a feature. The DataFrame should contain numeric data suitable for the chosen clustering algorithm.- Return type:
- Returns:
A fitted clustering model object from either sklearn.cluster or kmedoids. The returned object will have at minimum a labels_ attribute containing the cluster assignment for each sample. Additional attributes depend on the specific clustering method used (e.g., cluster_centers_ for KMeans, core_sample_indices_ for DBSCAN).
- Raises:
ValueError – If the specified clustering method is not found in sklearn.cluster or is not “KMedoids”.