LightGBM
LightGBM classical method implementation.
This section contains the LightGBM implementation for classification and regression tasks. LightGBM is a gradient boosting framework that uses tree-based learning algorithms and is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, better accuracy, support of parallel and GPU learning, and capability of handling large-scale data.
- class TALENT.model.classical_methods.lightgbm.LightGBMMethod(args, is_regression)
Bases:
XGBoostMethod- construct_model(model_config=None)
- class TALENT.model.classical_methods.lightgbm.LightGBMMethod
LightGBM method for classification and regression tasks using gradient boosting.
Key Features:
Uses LightGBM library for gradient boosting implementation
Supports both classification and regression tasks
Handles missing values automatically
Provides feature importance analysis
Supports early stopping to prevent overfitting
Efficient implementation with parallel processing
GPU acceleration support
Algorithm:
LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, better accuracy, support of parallel and GPU learning, and capability of handling large-scale data.
- __init__(args, is_regression)
Initialize the LightGBM method.
Parameters:
args (object) – Configuration arguments containing model settings
is_regression (bool) – Whether the task is regression (True) or classification (False)
- construct_model(model_config=None)
Construct the LightGBM model instance.
Parameters:
model_config (dict, optional) – Model configuration parameters for LightGBM
Model Creation:
For classification: creates LGBMClassifier
For regression: creates LGBMRegressor
Configures boosting parameters like learning rate, max depth, etc.
- fit(data, info, train=True, config=None)
Train the LightGBM model on the provided data.
Parameters:
data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
train (bool, default=True) – Whether to train the model or just load from checkpoint
config (dict, optional) – Additional configuration parameters
Returns:
time_cost (float) – Training time in seconds
Training Process:
Data Preprocessing: Handles missing values, categorical encoding, normalization
Model Training: Fits the LightGBM model with gradient boosting
Model Saving: Saves the trained model to disk for later use
- predict(data, info, model_name)
Make predictions using the trained LightGBM model.
Parameters:
data (tuple) – Tuple containing (N, C, y) where N is numerical features, C is categorical features, y is labels
info (dict) – Dataset information
model_name (str) – Name of the model for saving/loading
Returns:
test_logit (array-like) – Test predictions (probabilities for classification, values for regression)
Prediction Process:
Data Preprocessing: Applies same preprocessing as training data
Model Loading: Loads the trained LightGBM model
Prediction: Generates predictions using the gradient boosting model
Output: Returns probabilities for classification or values for regression
Evaluation Metrics:
For regression: returns MAE, R2, RMSE metrics
For classification: returns Accuracy, Avg_Precision, Avg_Recall, F1 metrics
References:
[1] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.