Method Base

class TALENT.model.methods.base.Method(args, is_regression)

Bases: object

Parameters
  • args – argparse object

  • is_regression – bool, whether the task is regression or not

data_format(is_train=True, N=None, C=None, y=None)

Format the data for training or testing.

Parameters
  • is_train – bool, whether the data is for training or testing

  • N – dict, numerical data

  • C – dict, categorical data

  • y – dict, labels

fit(data, info, train=True, config=None)

Fit the method to the data.

Parameters
  • data – tuple, (N, C, y)

  • info – dict, information about the data

  • train – bool, whether to train the method

  • config – dict, configuration for the method

Returns

float, time cost

metric(predictions, labels, y_info)

Compute the evaluation metric.

Parameters
  • predictions – np.ndarray, predictions

  • labels – np.ndarray, labels

  • y_info – dict, information about the labels

Returns

tuple, (metric, metric_name)

predict(data, info, model_name)

Predict the results of the data.

Parameters
  • data – tuple, (N, C, y)

  • info – dict, information about the data

  • model_name – str, name of the model

Returns

tuple, (loss, metric, metric_name, predictions)

reset_stats_withconfig(config)

Reset the training statistics with a new configuration.

Parameters

config – dict, new configuration

train_epoch(epoch)

Train the model for one epoch.

Parameters

epoch – int, the current epoch

validate(epoch)

Validate the model.

Parameters

epoch – int, the current epoch

TALENT.model.methods.base.check_softmax(logits)

Check if the logits are already probabilities, and if not, convert them to probabilities.

Parameters

logits – np.ndarray of shape (N, C) with logits

Returns

np.ndarray of shape (N, C) with probabilities

Utility Functions

TALENT.model.methods.base.check_softmax(logits)

Check if the logits are already probabilities, and if not, convert them to probabilities.

Parameters:

  • logits (np.ndarray) – Array of shape (N, C) with logits

Returns:

  • np.ndarray – Array of shape (N, C) with probabilities

Note:

This function checks if the input values are already in the [0, 1] range and sum to 1. If not, it applies softmax transformation with numerical stability (subtracting max before exp).

Core Method Class

class TALENT.model.methods.base.Method

Abstract base class for all machine learning methods in TALENT.

This class provides a unified interface for training, validation, and prediction across all deep learning and classical machine learning models in TALENT.

Attributes:

  • args (argparse.Namespace) – Command line arguments and configuration

  • is_regression (bool) – Whether the task is regression

  • D (Dataset) – Dataset object containing features and labels

  • train_step (int) – Current training step counter

  • val_count (int) – Counter for validation without improvement

  • continue_training (bool) – Whether to continue training

  • timer (Timer) – Timer for tracking training time

  • trlog (dict) – Training log containing loss, best results, etc.

  • model (torch.nn.Module) – The neural network model (to be implemented by subclasses)

  • optimizer (torch.optim.Optimizer) – Optimizer for training

  • criterion (callable) – Loss function

Methods:

__init__(args, is_regression)

Initialize the method with arguments and task type.

Parameters:

  • args (argparse.Namespace) – Command line arguments and configuration

  • is_regression (bool) – Whether the task is regression

Initialization:

  • Sets up training statistics and logging

  • Initializes device (CPU/GPU)

  • Sets up training log with appropriate best result tracking

reset_stats_withconfig(config)

Reset training statistics with a new configuration.

Parameters:

  • config (dict) – New configuration dictionary

Actions:

  • Resets random seeds for reproducibility

  • Clears training step counter and validation counter

  • Resets training log with new configuration

  • Reinitializes timer

data_format(is_train=True, N=None, C=None, y=None)

Format and preprocess data for training or testing.

Parameters:

  • is_train (bool, optional) – Whether data is for training. Defaults to True.

  • N (dict, optional) – Numerical features dictionary. Defaults to None.

  • C (dict, optional) – Categorical features dictionary. Defaults to None.

  • y (dict, optional) – Target labels dictionary. Defaults to None.

Processing Pipeline:

  • Training Mode: * Handle missing values (NaN processing) * Process labels (standardization for regression, encoding for classification) * Apply numerical feature encoding (PLE, Unary, etc.) * Apply categorical feature encoding (ordinal, one-hot, etc.) * Apply normalization to numerical features * Create DataLoaders for training and validation * Set up loss function

  • Testing Mode: * Apply same preprocessing using fitted encoders and normalizers * Create DataLoader for testing * Prepare test data tensors

fit(data, info, train=True, config=None)

Fit the method to the training data.

Parameters:

  • data (tuple) – Tuple of (N, C, y) where N=numerical, C=categorical, y=labels

  • info (dict) – Dataset information including task type and feature counts

  • train (bool, optional) – Whether to train the model. Defaults to True.

  • config (dict, optional) – Configuration dictionary. Defaults to None.

Returns:

  • float – Total training time in seconds

Training Process:

  • Initialize dataset and extract features

  • Format data for training

  • Construct model (implemented by subclasses)

  • Set up optimizer (AdamW)

  • Train for specified number of epochs

  • Save best model and training log

predict(data, info, model_name)

Make predictions on test data.

Parameters:

  • data (tuple) – Tuple of (N, C, y) test data

  • info (dict) – Dataset information

  • model_name (str) – Name of the saved model file

Returns:

  • tuple – (loss, metrics, metric_names, predictions) where: * loss: Test loss value * metrics: List of evaluation metrics * metric_names: Names of the metrics * predictions: Model predictions

Prediction Process:

  • Load trained model weights

  • Format test data using fitted preprocessors

  • Run inference on test set

  • Compute evaluation metrics

  • Return results

train_epoch(epoch)

Train the model for one epoch.

Parameters:

  • epoch (int) – Current epoch number

Training Loop:

  • Set model to training mode

  • Iterate through training batches

  • Forward pass and compute loss

  • Backward pass and update weights

  • Log training progress

  • Update training statistics

validate(epoch)

Validate the model on validation set.

Parameters:

  • epoch (int) – Current epoch number

Validation Process:

  • Set model to evaluation mode

  • Run inference on validation set

  • Compute validation metrics

  • Check for improvement

  • Save best model if improved

  • Implement early stopping (20 epochs without improvement)

  • Save training log

metric(predictions, labels, y_info)

Compute evaluation metrics based on task type.

Parameters:

  • predictions (np.ndarray) – Model predictions

  • labels (np.ndarray) – Ground truth labels

  • y_info (dict) – Label information including processing policy

Returns:

  • tuple – (metrics, metric_names) where: * metrics: List of computed metric values * metric_names: Names of the metrics

Metrics by Task Type:

  • Regression: * MAE (Mean Absolute Error) * R² (Coefficient of determination) * RMSE (Root Mean Squared Error)

  • Binary Classification: * Accuracy * Balanced Recall * Macro Precision * F1 Score * Log Loss * AUC (Area Under ROC Curve)

  • Multi-class Classification: * Accuracy * Balanced Recall * Macro Precision * Macro F1 Score * Log Loss * Macro AUC (One-vs-Rest)

Abstract Methods

The following methods must be implemented by subclasses:

TALENT.model.methods.base.construct_model()

Construct the neural network model architecture.

Implementation Required:

Subclasses must implement this method to create their specific model architecture. The model should be assigned to self.model and should accept numerical and categorical features as separate inputs.

Expected Model Interface:

def forward(self, X_num, X_cat):
    # X_num: numerical features tensor or None
    # X_cat: categorical features tensor or None
    # Return: predictions tensor
    pass

Usage Example

from TALENT.model.methods.base import Method
import torch.nn as nn

class MyModel(Method):
    def construct_model(self):
        # Define your model architecture
        self.model = nn.Sequential(
            nn.Linear(self.d_in, 128),
            nn.ReLU(),
            nn.Linear(128, self.d_out)
        )

# Usage
method = MyModel(args, is_regression=True)
time_cost = method.fit(train_data, info)
loss, metrics, metric_names, predictions = method.predict(test_data, info, 'best-val')