Utils

class TALENT.model.utils.Averager

Bases: object

A simple averager.

add(x)

X: float, value to be added

item()

class TALENT.model.utils.Timer

Bases: object

measure(p=1)

Measure the time since the last call to measure.

P: int, period of printing the time

TALENT.model.utils.ensure_path(path, remove=True)

Ensure a path exists.

path: str, path to the directory remove: bool, whether to remove the directory if it exists

TALENT.model.utils.get_classical_args()

Get the arguments for classical models.

Returns: argparse.Namespace, arguments

TALENT.model.utils.get_deep_args()

Get the arguments for deep learning models.

Returns: argparse.Namespace, arguments

TALENT.model.utils.get_device() → torch.device

TALENT.model.utils.get_method(model)

Get the method class.

Model: str, model name
Returns: class, method class

TALENT.model.utils.load_config(args, config=None, config_name=None)

Load the config file.

Args: argparse.Namespace, arguments
Config: dict, config file
Config_name: str, name of the config file
Returns: argparse.Namespace, arguments

TALENT.model.utils.merge_sampled_parameters(config, sampled_parameters)

Merge the sampled hyper-parameters.

Config: dict, configuration
Sampled_parameters: dict, sampled hyper-parameters

TALENT.model.utils.mkdir(path)

Create a directory if it does not exist.

Path: str, path to the directory

TALENT.model.utils.pprint(x)

TALENT.model.utils.rmse(y, prediction, y_info)

Y: np.ndarray, ground truth
Prediction: np.ndarray, prediction
Y_info: dict, information about the target variable
Returns: float, root mean squared error

TALENT.model.utils.sample_parameters(trial, space, base_config)

Sample hyper-parameters.

Trial: optuna.trial.Trial, trial
Space: dict, search space
Base_config: dict, base configuration
Returns: dict, sampled hyper-parameters

TALENT.model.utils.set_gpu(x)

Set environment variable CUDA_VISIBLE_DEVICES

X: str, GPU id

TALENT.model.utils.set_seeds(base_seed: int, one_cuda_seed: bool = False) → None

Set random seeds for reproducibility.

Base_seed: int, base seed
One_cuda_seed: bool, whether to set one seed for all GPUs

TALENT.model.utils.show_results(args, info, metric_name, loss_list, results_list, time_list)

Show the results for deep learning models.

Args: argparse.Namespace, arguments
Info: dict, information about the dataset
Metric_name: list, names of the metrics
Loss_list: list, list of loss
Results_list: list, list of results
Time_list: list, list of time

TALENT.model.utils.show_results_classical(args, info, metric_name, results_list, time_list)

Show the results for classical models.

Args: argparse.Namespace, arguments
Info: dict, information about the dataset
Metric_name: list, names of the metrics
Results_list: list, list of results
Time_list: list, list of time

TALENT.model.utils.tune_hyper_parameters(args, opt_space, train_val_data, info)

Tune hyper-parameters.

Args: argparse.Namespace, arguments
Opt_space: dict, search space
Train_val_data: tuple, training and validation data
Info: dict, information about the dataset
Returns: argparse.Namespace, arguments

File and Path Utilities

TALENT.model.utils.mkdir(path)

Create a directory if it does not exist.

Parameters:

path (str) – Path to the directory to create

Raises:

OSError – If directory creation fails for reasons other than already existing

TALENT.model.utils.set_gpu(x)

Set environment variable CUDA_VISIBLE_DEVICES to specify which GPU to use.

Parameters:

x (str) – GPU ID to use (e.g., “0”, “1”, “0,1”)

Example:

set_gpu("0")  # Use GPU 0
set_gpu("0,1")  # Use GPUs 0 and 1

TALENT.model.utils.ensure_path(path, remove=True)

Ensure a path exists, optionally removing existing directory.

Parameters:

path (str) – Path to the directory
remove (bool, optional) – Whether to remove the directory if it exists. Defaults to True.

Note:

If the path exists and remove=True, will prompt user for confirmation before removing.

Random Seed and Device Management

TALENT.model.utils.set_seeds(base_seed, one_cuda_seed=False)

Set random seeds for reproducibility across all random number generators.

Parameters:

base_seed (int) – Base seed value (must be 0 <= base_seed < 2^32 - 10000)
one_cuda_seed (bool, optional) – Whether to set one seed for all GPUs. Defaults to False.

Note:

Sets seeds for Python random, NumPy, PyTorch CPU, and PyTorch CUDA generators. Each generator gets a different seed derived from the base_seed.

TALENT.model.utils.get_device()

Get the appropriate device (GPU or CPU) for PyTorch operations.

Returns:

torch.device – CUDA device if available, otherwise CPU device

Evaluation Metrics

TALENT.model.utils.rmse(y, prediction, y_info)

Calculate Root Mean Squared Error (RMSE) for regression tasks.

Parameters:

y (np.ndarray) – Ground truth values
prediction (np.ndarray) – Predicted values
y_info (dict) – Information about the target variable, including normalization policy

Returns:

float – RMSE value, adjusted for normalization if applicable

Note:

If y_info[‘policy’] is ‘mean_std’, the RMSE is multiplied by the standard deviation to denormalize the result.

Configuration Management

TALENT.model.utils.load_config(args, config=None, config_name=None)

Load configuration file for model training and save current arguments.

Parameters:

args (argparse.Namespace) – Command line arguments
config (dict, optional) – Pre-loaded configuration dictionary. Defaults to None.
config_name (str, optional) – Name for the saved config file. Defaults to None.

Returns:

argparse.Namespace – Updated arguments with loaded configuration

Note:

Automatically saves the current arguments to a JSON file in the save_path directory.

Hyperparameter Optimization

TALENT.model.utils.sample_parameters(trial, space, base_config)

Sample hyperparameters from the search space using Optuna trial.

Parameters:

trial (optuna.trial.Trial) – Optuna trial object for parameter sampling
space (dict) – Hyperparameter search space definition
base_config (dict) – Base configuration dictionary

Returns:

dict – Sampled hyperparameters

Special Distributions:

$mlp_d_layers – Special distribution for MLP layer dimensions
$d_token – Special distribution for transformer token dimensions
$d_ffn_factor – Special distribution for feedforward network factors
? – Optional parameters with default values

TALENT.model.utils.merge_sampled_parameters(config, sampled_parameters)

Merge sampled hyperparameters into the base configuration.

Parameters:

config (dict) – Base configuration to update
sampled_parameters (dict) – Sampled parameters to merge

Note:

Recursively merges nested dictionaries and overwrites existing parameters.

Argument Parsing

TALENT.model.utils.get_classical_args()

Parse command line arguments for classical machine learning models.

Returns:

tuple – (args, default_para, opt_space) where: * args: Parsed arguments * default_para: Default parameter configurations * opt_space: Hyperparameter optimization space

Supported Models:

LogReg, NCM, RandomForest, xgboost, catboost, lightgbm
svm, knn, NaiveBayes, dummy, LinearRegression

Key Parameters:

normalization: Data normalization method
num_nan_policy: Policy for handling numerical missing values
cat_nan_policy: Policy for handling categorical missing values
cat_policy: Categorical encoding policy
num_policy: Numerical feature processing policy

TALENT.model.utils.get_deep_args()

Parse command line arguments for deep learning models.

Returns:

tuple – (args, default_para, opt_space) where: * args: Parsed arguments * default_para: Default parameter configurations * opt_space: Hyperparameter optimization space

Supported Models:

mlp, resnet, ftt, node, autoint, tabpfn, tangos, saint
tabcaps, tabnet, snn, ptarl, danets, dcn2, tabtransformer
dnnr, switchtab, grownet, tabr, modernNCA, hyperfast
bishop, realmlp, protogate, mlp_plr, excelformer, grande
amformer, tabptm, trompt, tabm, PFN-v2, t2gformer
tabautopnpnet, tabicl

Results Display

TALENT.model.utils.show_results_classical(args, info, metric_name, results_list, time_list)

Display results for classical machine learning models.

Parameters:

args (argparse.Namespace) – Training arguments
info (dict) – Dataset information
metric_name (list) – Names of evaluation metrics
results_list (list) – List of results from multiple trials
time_list (list) – List of training times

Output:

Prints formatted results including mean, standard deviation, and GPU information.

TALENT.model.utils.show_results(args, info, metric_name, loss_list, results_list, time_list)

Display results for deep learning models.

Parameters:

args (argparse.Namespace) – Training arguments
info (dict) – Dataset information
metric_name (list) – Names of evaluation metrics
loss_list (list) – List of training losses
results_list (list) – List of results from multiple trials
time_list (list) – List of training times

Output:

Prints formatted results including mean loss, metrics, and GPU information.

Hyperparameter Tuning

TALENT.model.utils.tune_hyper_parameters(args, opt_space, train_val_data, info)

Perform hyperparameter optimization using Optuna.

Parameters:

args (argparse.Namespace) – Training arguments
opt_space (dict) – Hyperparameter search space
train_val_data (tuple) – Training and validation data
info (dict) – Dataset information

Returns:

argparse.Namespace – Updated arguments with optimized hyperparameters

Features:

Uses TPE sampler for efficient optimization
Supports both regression (minimize) and classification (maximize) objectives
Automatically saves best configuration to JSON file
Handles model-specific parameter adjustments

Model Factory

TALENT.model.utils.get_method(model)

Get the method class for a given model name.

Parameters:

model (str) – Model name

Returns:

class – Method class for the specified model

Raises:

NotImplementedError – If the model is not yet implemented

Supported Models:

All deep learning and classical models supported by TALENT.

Utility Classes

class TALENT.model.utils.Averager

A simple averager for tracking running averages.

Methods:

add(x)

Add a value to the running average.

Parameters:

x (float) – Value to add

item()

Get the current average value.

Returns:

float – Current running average

class TALENT.model.utils.Timer

A timer for measuring elapsed time.

Methods:

measure(p=1)

Measure elapsed time since timer creation.

Parameters:

p (int, optional) – Period for time formatting. Defaults to 1.

Returns:

str – Formatted time string (e.g., “30s”, “2m”, “1.5h”)

Debugging Utilities

TALENT.model.utils.pprint(x)

Pretty print an object using the PrettyPrinter.

Parameters:

x (any) – Object to print