CatBoost
CatBoost classical method implementation.
This section contains the CatBoost implementation for classification and regression tasks. CatBoost is a gradient boosting algorithm that handles categorical features automatically and provides high performance with minimal parameter tuning.
- class TALENT.model.classical_methods.catboost.CatBoostMethod(args, is_regression)
Bases:
classical_methods- fit(data, info, train=True, config=None)
- predict(data, info, model_name)
- class TALENT.model.classical_methods.catboost.CatBoostMethod
CatBoost method for classification and regression tasks using gradient boosting with categorical features support.
Key Features:
Uses CatBoost library for gradient boosting implementation
Automatically handles categorical features without preprocessing
Supports both classification and regression tasks
Provides feature importance analysis
Robust to overfitting with built-in regularization
Efficient implementation with GPU support
Algorithm:
CatBoost is a gradient boosting algorithm that uses ordered boosting and innovative algorithms for processing categorical features, which helps reduce overfitting and improve prediction quality.
- __init__(args, is_regression)
Initialize the CatBoost method.
Parameters:
args (object) – Configuration arguments containing model settings
is_regression (bool) – Whether the task is regression (True) or classification (False)
- construct_model(model_config=None)
Construct the CatBoost model instance.
Parameters:
model_config (dict, optional) – Model configuration parameters for CatBoost
Model Creation:
For classification: creates CatBoostClassifier
For regression: creates CatBoostRegressor
Configures boosting parameters like learning rate, depth, etc.
- fit(data, info, train=True, config=None)
Train the CatBoost model on the provided data.
Parameters:
data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
train (bool, default=True) – Whether to train the model or just load from checkpoint
config (dict, optional) – Additional configuration parameters
Returns:
time_cost (float) – Training time in seconds
Training Process:
Data Preprocessing: Handles missing values, categorical encoding, normalization
Model Training: Fits the CatBoost model with gradient boosting
Model Saving: Saves the trained model to disk for later use
- predict(data, info, model_name)
Make predictions using the trained CatBoost model.
Parameters:
data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
model_name (str) – Name of the model for saving/loading
Returns:
test_logit (array-like) – Test predictions (probabilities for classification, values for regression)
Prediction Process:
Data Preprocessing: Applies same preprocessing as training data
Model Loading: Loads the trained CatBoost model
Prediction: Generates predictions using the gradient boosting model
Output: Returns probabilities for classification or values for regression
Evaluation Metrics:
For regression: returns MAE, R2, RMSE metrics
For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics
References:
[1] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.