Method Base
- class TALENT.model.methods.base.Method(args, is_regression)
Bases:
object- Parameters
args – argparse object
is_regression – bool, whether the task is regression or not
- data_format(is_train=True, N=None, C=None, y=None)
Format the data for training or testing.
- Parameters
is_train – bool, whether the data is for training or testing
N – dict, numerical data
C – dict, categorical data
y – dict, labels
- fit(data, info, train=True, config=None)
Fit the method to the data.
- Parameters
data – tuple, (N, C, y)
info – dict, information about the data
train – bool, whether to train the method
config – dict, configuration for the method
- Returns
float, time cost
- metric(predictions, labels, y_info)
Compute the evaluation metric.
- Parameters
predictions – np.ndarray, predictions
labels – np.ndarray, labels
y_info – dict, information about the labels
- Returns
tuple, (metric, metric_name)
- predict(data, info, model_name)
Predict the results of the data.
- Parameters
data – tuple, (N, C, y)
info – dict, information about the data
model_name – str, name of the model
- Returns
tuple, (loss, metric, metric_name, predictions)
- reset_stats_withconfig(config)
Reset the training statistics with a new configuration.
- Parameters
config – dict, new configuration
- train_epoch(epoch)
Train the model for one epoch.
- Parameters
epoch – int, the current epoch
- validate(epoch)
Validate the model.
- Parameters
epoch – int, the current epoch
- TALENT.model.methods.base.check_softmax(logits)
Check if the logits are already probabilities, and if not, convert them to probabilities.
- Parameters
logits – np.ndarray of shape (N, C) with logits
- Returns
np.ndarray of shape (N, C) with probabilities
Utility Functions
- TALENT.model.methods.base.check_softmax(logits)
Check if the logits are already probabilities, and if not, convert them to probabilities.
Parameters:
logits (np.ndarray) – Array of shape (N, C) with logits
Returns:
np.ndarray – Array of shape (N, C) with probabilities
Note:
This function checks if the input values are already in the [0, 1] range and sum to 1. If not, it applies softmax transformation with numerical stability (subtracting max before exp).
Core Method Class
- class TALENT.model.methods.base.Method
Abstract base class for all machine learning methods in TALENT.
This class provides a unified interface for training, validation, and prediction across all deep learning and classical machine learning models in TALENT.
Attributes:
args (argparse.Namespace) – Command line arguments and configuration
is_regression (bool) – Whether the task is regression
D (Dataset) – Dataset object containing features and labels
train_step (int) – Current training step counter
val_count (int) – Counter for validation without improvement
continue_training (bool) – Whether to continue training
timer (Timer) – Timer for tracking training time
trlog (dict) – Training log containing loss, best results, etc.
model (torch.nn.Module) – The neural network model (to be implemented by subclasses)
optimizer (torch.optim.Optimizer) – Optimizer for training
criterion (callable) – Loss function
Methods:
- __init__(args, is_regression)
Initialize the method with arguments and task type.
Parameters:
args (argparse.Namespace) – Command line arguments and configuration
is_regression (bool) – Whether the task is regression
Initialization:
Sets up training statistics and logging
Initializes device (CPU/GPU)
Sets up training log with appropriate best result tracking
- reset_stats_withconfig(config)
Reset training statistics with a new configuration.
Parameters:
config (dict) – New configuration dictionary
Actions:
Resets random seeds for reproducibility
Clears training step counter and validation counter
Resets training log with new configuration
Reinitializes timer
- data_format(is_train=True, N=None, C=None, y=None)
Format and preprocess data for training or testing.
Parameters:
is_train (bool, optional) – Whether data is for training. Defaults to True.
N (dict, optional) – Numerical features dictionary. Defaults to None.
C (dict, optional) – Categorical features dictionary. Defaults to None.
y (dict, optional) – Target labels dictionary. Defaults to None.
Processing Pipeline:
Training Mode: * Handle missing values (NaN processing) * Process labels (standardization for regression, encoding for classification) * Apply numerical feature encoding (PLE, Unary, etc.) * Apply categorical feature encoding (ordinal, one-hot, etc.) * Apply normalization to numerical features * Create DataLoaders for training and validation * Set up loss function
Testing Mode: * Apply same preprocessing using fitted encoders and normalizers * Create DataLoader for testing * Prepare test data tensors
- fit(data, info, train=True, config=None)
Fit the method to the training data.
Parameters:
data (tuple) – Tuple of (N, C, y) where N=numerical, C=categorical, y=labels
info (dict) – Dataset information including task type and feature counts
train (bool, optional) – Whether to train the model. Defaults to True.
config (dict, optional) – Configuration dictionary. Defaults to None.
Returns:
float – Total training time in seconds
Training Process:
Initialize dataset and extract features
Format data for training
Construct model (implemented by subclasses)
Set up optimizer (AdamW)
Train for specified number of epochs
Save best model and training log
- predict(data, info, model_name)
Make predictions on test data.
Parameters:
data (tuple) – Tuple of (N, C, y) test data
info (dict) – Dataset information
model_name (str) – Name of the saved model file
Returns:
tuple – (loss, metrics, metric_names, predictions) where: * loss: Test loss value * metrics: List of evaluation metrics * metric_names: Names of the metrics * predictions: Model predictions
Prediction Process:
Load trained model weights
Format test data using fitted preprocessors
Run inference on test set
Compute evaluation metrics
Return results
- train_epoch(epoch)
Train the model for one epoch.
Parameters:
epoch (int) – Current epoch number
Training Loop:
Set model to training mode
Iterate through training batches
Forward pass and compute loss
Backward pass and update weights
Log training progress
Update training statistics
- validate(epoch)
Validate the model on validation set.
Parameters:
epoch (int) – Current epoch number
Validation Process:
Set model to evaluation mode
Run inference on validation set
Compute validation metrics
Check for improvement
Save best model if improved
Implement early stopping (20 epochs without improvement)
Save training log
- metric(predictions, labels, y_info)
Compute evaluation metrics based on task type.
Parameters:
predictions (np.ndarray) – Model predictions
labels (np.ndarray) – Ground truth labels
y_info (dict) – Label information including processing policy
Returns:
tuple – (metrics, metric_names) where: * metrics: List of computed metric values * metric_names: Names of the metrics
Metrics by Task Type:
Regression: * MAE (Mean Absolute Error) * R² (Coefficient of determination) * RMSE (Root Mean Squared Error)
Binary Classification: * Accuracy * Balanced Recall * Macro Precision * F1 Score * Log Loss * AUC (Area Under ROC Curve)
Multi-class Classification: * Accuracy * Balanced Recall * Macro Precision * Macro F1 Score * Log Loss * Macro AUC (One-vs-Rest)
Abstract Methods
The following methods must be implemented by subclasses:
- TALENT.model.methods.base.construct_model()
Construct the neural network model architecture.
Implementation Required:
Subclasses must implement this method to create their specific model architecture. The model should be assigned to self.model and should accept numerical and categorical features as separate inputs.
Expected Model Interface:
def forward(self, X_num, X_cat): # X_num: numerical features tensor or None # X_cat: categorical features tensor or None # Return: predictions tensor pass
Usage Example
from TALENT.model.methods.base import Method
import torch.nn as nn
class MyModel(Method):
def construct_model(self):
# Define your model architecture
self.model = nn.Sequential(
nn.Linear(self.d_in, 128),
nn.ReLU(),
nn.Linear(128, self.d_out)
)
# Usage
method = MyModel(args, is_regression=True)
time_cost = method.fit(train_data, info)
loss, metrics, metric_names, predictions = method.predict(test_data, info, 'best-val')