XGBoost

XGBoost classical method implementation.

This section contains the XGBoost implementation for classification and regression tasks. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.

class TALENT.model.classical_methods.xgboost.XGBoostMethod(args, is_regression)

Bases: classical_methods

construct_model(model_config=None)

fit(data, info, train=True, config=None)

predict(data, info, model_name)

class TALENT.model.classical_methods.xgboost.XGBoostMethod

XGBoost method for classification and regression tasks using gradient boosting.

Key Features:

Uses XGBoost library for gradient boosting implementation
Supports both classification and regression tasks
Handles missing values automatically
Provides feature importance analysis
Supports early stopping to prevent overfitting
Efficient implementation with parallel processing

Algorithm:

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. It uses a more regularized model formalization to control overfitting.

__init__(args, is_regression)

Initialize the XGBoost method.

Parameters:

args (object) – Configuration arguments containing model settings
is_regression (bool) – Whether the task is regression (True) or classification (False)

construct_model(model_config=None)

Construct the XGBoost model instance.

Parameters:

model_config (dict, optional) – Model configuration parameters for XGBoost

Model Creation:

For classification: creates XGBClassifier
For regression: creates XGBRegressor
Configures boosting parameters like learning rate, max depth, etc.

fit(data, info, train=True, config=None)

Train the XGBoost model on the provided data.

Parameters:

data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
train (bool, default=True) – Whether to train the model or just load from checkpoint
config (dict, optional) – Additional configuration parameters

Returns:

time_cost (float) – Training time in seconds

Training Process:

Data Preprocessing: Handles missing values, categorical encoding, normalization
Model Training: Fits the XGBoost model with gradient boosting
Model Saving: Saves the trained model to disk for later use

predict(data, info, model_name)

Make predictions using the trained XGBoost model.

Parameters:

data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
model_name (str) – Name of the model for saving/loading

Returns:

test_logit (array-like) – Test predictions (probabilities for classification, values for regression)

Prediction Process:

Data Preprocessing: Applies same preprocessing as training data
Model Loading: Loads the trained XGBoost model
Prediction: Generates predictions using the gradient boosting model
Output: Returns probabilities for classification or values for regression

Evaluation Metrics:

For regression: returns MAE, R2, RMSE metrics
For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics

References:

[1] Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).