XGBoost
XGBoost classical method implementation.
This section contains the XGBoost implementation for classification and regression tasks. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.
- class TALENT.model.classical_methods.xgboost.XGBoostMethod(args, is_regression)
Bases:
classical_methods- construct_model(model_config=None)
- fit(data, info, train=True, config=None)
- predict(data, info, model_name)
- class TALENT.model.classical_methods.xgboost.XGBoostMethod
XGBoost method for classification and regression tasks using gradient boosting.
Key Features:
Uses XGBoost library for gradient boosting implementation
Supports both classification and regression tasks
Handles missing values automatically
Provides feature importance analysis
Supports early stopping to prevent overfitting
Efficient implementation with parallel processing
Algorithm:
XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. It uses a more regularized model formalization to control overfitting.
- __init__(args, is_regression)
Initialize the XGBoost method.
Parameters:
args (object) – Configuration arguments containing model settings
is_regression (bool) – Whether the task is regression (True) or classification (False)
- construct_model(model_config=None)
Construct the XGBoost model instance.
Parameters:
model_config (dict, optional) – Model configuration parameters for XGBoost
Model Creation:
For classification: creates XGBClassifier
For regression: creates XGBRegressor
Configures boosting parameters like learning rate, max depth, etc.
- fit(data, info, train=True, config=None)
Train the XGBoost model on the provided data.
Parameters:
data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
train (bool, default=True) – Whether to train the model or just load from checkpoint
config (dict, optional) – Additional configuration parameters
Returns:
time_cost (float) – Training time in seconds
Training Process:
Data Preprocessing: Handles missing values, categorical encoding, normalization
Model Training: Fits the XGBoost model with gradient boosting
Model Saving: Saves the trained model to disk for later use
- predict(data, info, model_name)
Make predictions using the trained XGBoost model.
Parameters:
data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
model_name (str) – Name of the model for saving/loading
Returns:
test_logit (array-like) – Test predictions (probabilities for classification, values for regression)
Prediction Process:
Data Preprocessing: Applies same preprocessing as training data
Model Loading: Loads the trained XGBoost model
Prediction: Generates predictions using the gradient boosting model
Output: Returns probabilities for classification or values for regression
Evaluation Metrics:
For regression: returns MAE, R2, RMSE metrics
For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics
References:
[1] Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).