Random Forest

Random Forest classical method implementation.

This section contains the Random Forest implementation for classification and regression tasks using ensemble of decision trees. Random Forest is an ensemble learning method that operates by constructing multiple decision trees and outputting the class that is the mode of the classes predicted by individual trees.

class TALENT.model.classical_methods.randomforest.RandomForestMethod(args, is_regression)

Bases: KnnMethod

construct_model(model_config=None)
class TALENT.model.classical_methods.randomforest.RandomForestMethod

Random Forest method for classification and regression tasks using ensemble of decision trees.

Key Features:

  • Uses sklearn’s RandomForestClassifier for classification and RandomForestRegressor for regression

  • Inherits from KnnMethod class for common functionality

  • Supports both binary and multiclass classification

  • Automatically handles data preprocessing including normalization and encoding

  • Saves trained model to pickle file for later use

  • Ensemble method that combines multiple decision trees

Algorithm:

Random Forest builds multiple decision trees during training and outputs the class that is the mode of the classes predicted by individual trees for classification, or the mean prediction for regression.

__init__(args, is_regression)

Initialize the Random Forest method.

Parameters:

  • args (object) – Configuration arguments containing model settings

  • is_regression (bool) – Whether the task is regression (True) or classification (False)

construct_model(model_config=None)

Construct the Random Forest model instance.

Parameters:

  • model_config (dict, optional) – Model configuration parameters for Random Forest

Model Creation:

  • For classification: creates RandomForestClassifier

  • For regression: creates RandomForestRegressor

  • Configures ensemble parameters like number of trees, max depth, etc.

fit(data, info, train=True, config=None)

Train the Random Forest model on the provided data.

Parameters:

  • data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels

  • info (dict) – Dataset information

  • train (bool, default=True) – Whether to train the model or just load from checkpoint

  • config (dict, optional) – Additional configuration parameters

Returns:

  • time_cost (float) – Training time in seconds

Training Process:

  1. Data Preprocessing: Handles missing values, categorical encoding, normalization

  2. Model Training: Fits the Random Forest ensemble to the training data

  3. Model Saving: Saves the trained model to disk for later use

predict(data, info, model_name)

Make predictions using the trained Random Forest model.

Parameters:

  • data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels

  • info (dict) – Dataset information

  • model_name (str) – Name of the model for saving/loading

Returns:

  • test_logit (array-like) – Test predictions (probabilities for classification, values for regression)

Prediction Process:

  1. Data Preprocessing: Applies same preprocessing as training data

  2. Model Loading: Loads the trained Random Forest model

  3. Prediction: Generates predictions using the ensemble

  4. Output: Returns probabilities for classification or values for regression

Evaluation Metrics:

  • For regression: returns MAE, R2, RMSE metrics

  • For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics

References:

[1] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.