Nearest Centroid Method

Nearest Centroid Method classical method implementation.

This section contains the Nearest Centroid Method (NCM) implementation for classification tasks. NCM is a classification algorithm that assigns to samples the label of the class of training samples whose mean (centroid) is closest to the sample.

class TALENT.model.classical_methods.ncm.NCMMethod(args, is_regression)

Bases: classical_methods

construct_model(model_config=None)

fit(data, info, train=True, config=None)

metric(predictions, labels, y_info)

predict(data, info, model_name)

class TALENT.model.classical_methods.ncm.NCMMethod

Nearest Centroid Method for classification tasks.

Key Features:

Uses sklearn’s NearestCentroid for classification
Assigns labels based on closest class centroid
Supports both binary and multiclass classification
Automatically handles data preprocessing including normalization and encoding
Saves trained model to pickle file for later use
Simple and interpretable classification method

Algorithm:

The Nearest Centroid Method assigns to samples the label of the class of training samples whose mean (centroid) is closest to the sample. It is a simple classification algorithm that works well when classes are well-separated.

__init__(args, is_regression)

Initialize the NCM method.

Parameters:

args (object) – Configuration arguments containing model settings
is_regression (bool) – Whether the task is regression (True) or classification (False)

construct_model(model_config=None)

Construct the NCM model instance.

Parameters:

model_config (dict, optional) – Model configuration parameters for NCM

Model Creation:

Creates NearestCentroid classifier
Configures parameters like metric, shrink_threshold, etc.

fit(data, info, train=True, config=None)

Train the NCM model on the provided data.

Parameters:

data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
train (bool, default=True) – Whether to train the model or just load from checkpoint
config (dict, optional) – Additional configuration parameters

Returns:

time_cost (float) – Training time in seconds

Training Process:

Data Preprocessing: Handles missing values, categorical encoding, normalization
Model Training: Computes class centroids from training data
Model Saving: Saves the trained model to disk for later use

predict(data, info, model_name)

Make predictions using the trained NCM model.

Parameters:

data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
model_name (str) – Name of the model for saving/loading

Returns:

test_logit (array-like) – Test predictions (class labels for classification)

Prediction Process:

Data Preprocessing: Applies same preprocessing as training data
Model Loading: Loads the trained NCM model
Prediction: Finds closest centroid for each sample
Output: Returns predicted class labels

Evaluation Metrics:

For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics

References:

[1] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.