Random Forest

Random Forest classical method implementation.

This section contains the Random Forest implementation for classification and regression tasks using ensemble of decision trees. Random Forest is an ensemble learning method that operates by constructing multiple decision trees and outputting the class that is the mode of the classes predicted by individual trees.

class TALENT.model.classical_methods.randomforest.RandomForestMethod(args, is_regression)

Bases: KnnMethod

construct_model(model_config=None)

class TALENT.model.classical_methods.randomforest.RandomForestMethod

Random Forest method for classification and regression tasks using ensemble of decision trees.

Key Features:

Uses sklearn’s RandomForestClassifier for classification and RandomForestRegressor for regression
Inherits from KnnMethod class for common functionality
Supports both binary and multiclass classification
Automatically handles data preprocessing including normalization and encoding
Saves trained model to pickle file for later use
Ensemble method that combines multiple decision trees

Algorithm:

Random Forest builds multiple decision trees during training and outputs the class that is the mode of the classes predicted by individual trees for classification, or the mean prediction for regression.

__init__(args, is_regression)

Initialize the Random Forest method.

Parameters:

args (object) – Configuration arguments containing model settings
is_regression (bool) – Whether the task is regression (True) or classification (False)

construct_model(model_config=None)

Construct the Random Forest model instance.

Parameters:

model_config (dict, optional) – Model configuration parameters for Random Forest

Model Creation:

For classification: creates RandomForestClassifier
For regression: creates RandomForestRegressor
Configures ensemble parameters like number of trees, max depth, etc.

fit(data, info, train=True, config=None)

Train the Random Forest model on the provided data.

Parameters:

data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
train (bool, default=True) – Whether to train the model or just load from checkpoint
config (dict, optional) – Additional configuration parameters

Returns:

time_cost (float) – Training time in seconds

Training Process:

Data Preprocessing: Handles missing values, categorical encoding, normalization
Model Training: Fits the Random Forest ensemble to the training data
Model Saving: Saves the trained model to disk for later use

predict(data, info, model_name)

Make predictions using the trained Random Forest model.

Parameters:

data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
model_name (str) – Name of the model for saving/loading

Returns:

test_logit (array-like) – Test predictions (probabilities for classification, values for regression)

Prediction Process:

Data Preprocessing: Applies same preprocessing as training data
Model Loading: Loads the trained Random Forest model
Prediction: Generates predictions using the ensemble
Output: Returns probabilities for classification or values for regression

Evaluation Metrics:

For regression: returns MAE, R2, RMSE metrics
For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics

References:

[1] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.