Naive Bayes

Naive Bayes classical method implementation.

This section contains the Naive Bayes implementation for classification tasks. Naive Bayes is a probabilistic classifier based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features.

class TALENT.model.classical_methods.naivebayes.NaiveBayesMethod(args, is_regression)

Bases: NCMMethod

construct_model(model_config=None)
class TALENT.model.classical_methods.naivebayes.NaiveBayesMethod

Naive Bayes method for classification tasks.

Key Features:

  • Uses sklearn’s GaussianNB for classification

  • Probabilistic classifier based on Bayes’ theorem

  • Supports both binary and multiclass classification

  • Automatically handles data preprocessing including normalization and encoding

  • Saves trained model to pickle file for later use

  • Provides probability predictions

Algorithm:

Naive Bayes is a probabilistic classifier based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

__init__(args, is_regression)

Initialize the Naive Bayes method.

Parameters:

  • args (object) – Configuration arguments containing model settings

  • is_regression (bool) – Whether the task is regression (True) or classification (False)

construct_model(model_config=None)

Construct the Naive Bayes model instance.

Parameters:

  • model_config (dict, optional) – Model configuration parameters for Naive Bayes

Model Creation:

  • Creates GaussianNB classifier

  • Configures parameters like priors, var_smoothing, etc.

fit(data, info, train=True, config=None)

Train the Naive Bayes model on the provided data.

Parameters:

  • data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels

  • info (dict) – Dataset information

  • train (bool, default=True) – Whether to train the model or just load from checkpoint

  • config (dict, optional) – Additional configuration parameters

Returns:

  • time_cost (float) – Training time in seconds

Training Process:

  1. Data Preprocessing: Handles missing values, categorical encoding, normalization

  2. Model Training: Fits the Naive Bayes model

  3. Model Saving: Saves the trained model to disk for later use

predict(data, info, model_name)

Make predictions using the trained Naive Bayes model.

Parameters:

  • data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels

  • info (dict) – Dataset information

  • model_name (str) – Name of the model for saving/loading

Returns:

  • test_logit (array-like) – Test predictions (probabilities for classification)

Prediction Process:

  1. Data Preprocessing: Applies same preprocessing as training data

  2. Model Loading: Loads the trained Naive Bayes model

  3. Prediction: Generates probability predictions using Bayes’ theorem

  4. Output: Returns probabilities for classification

Evaluation Metrics:

  • For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics

References:

[1] Rish, I. (2001). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41-46).