Nearest Centroid Method

Nearest Centroid Method classical method implementation.

This section contains the Nearest Centroid Method (NCM) implementation for classification tasks. NCM is a classification algorithm that assigns to samples the label of the class of training samples whose mean (centroid) is closest to the sample.

class TALENT.model.classical_methods.ncm.NCMMethod(args, is_regression)

Bases: classical_methods

construct_model(model_config=None)
fit(data, info, train=True, config=None)
metric(predictions, labels, y_info)
predict(data, info, model_name)
class TALENT.model.classical_methods.ncm.NCMMethod

Nearest Centroid Method for classification tasks.

Key Features:

  • Uses sklearn’s NearestCentroid for classification

  • Assigns labels based on closest class centroid

  • Supports both binary and multiclass classification

  • Automatically handles data preprocessing including normalization and encoding

  • Saves trained model to pickle file for later use

  • Simple and interpretable classification method

Algorithm:

The Nearest Centroid Method assigns to samples the label of the class of training samples whose mean (centroid) is closest to the sample. It is a simple classification algorithm that works well when classes are well-separated.

__init__(args, is_regression)

Initialize the NCM method.

Parameters:

  • args (object) – Configuration arguments containing model settings

  • is_regression (bool) – Whether the task is regression (True) or classification (False)

construct_model(model_config=None)

Construct the NCM model instance.

Parameters:

  • model_config (dict, optional) – Model configuration parameters for NCM

Model Creation:

  • Creates NearestCentroid classifier

  • Configures parameters like metric, shrink_threshold, etc.

fit(data, info, train=True, config=None)

Train the NCM model on the provided data.

Parameters:

  • data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels

  • info (dict) – Dataset information

  • train (bool, default=True) – Whether to train the model or just load from checkpoint

  • config (dict, optional) – Additional configuration parameters

Returns:

  • time_cost (float) – Training time in seconds

Training Process:

  1. Data Preprocessing: Handles missing values, categorical encoding, normalization

  2. Model Training: Computes class centroids from training data

  3. Model Saving: Saves the trained model to disk for later use

predict(data, info, model_name)

Make predictions using the trained NCM model.

Parameters:

  • data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels

  • info (dict) – Dataset information

  • model_name (str) – Name of the model for saving/loading

Returns:

  • test_logit (array-like) – Test predictions (class labels for classification)

Prediction Process:

  1. Data Preprocessing: Applies same preprocessing as training data

  2. Model Loading: Loads the trained NCM model

  3. Prediction: Finds closest centroid for each sample

  4. Output: Returns predicted class labels

Evaluation Metrics:

  • For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics

References:

[1] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.