TabPTM

A general method for tabular data that standardizes heterogeneous datasets using meta-representations, allowing a pre-trained model to generalize to unseen datasets without additional training.

Functions

def prepare_meta_feature(X, Y, args)

Prepares class centers for classification tasks by sampling from training data.

Parameters:

X (dict) - Dataset splits (keys: ‘train’, ‘val’, ‘test’).
Y (dict) - Labels for dataset splits.
args - Command-line arguments (must contain centers_num and seed).

Returns:

centers (list) - List of numpy arrays where each array contains sampled centers for a class.

def prepare_meta_feature_regression(X, Y, args, dataname=None, is_meta=False)

Prepares sampled data points for regression tasks.

Parameters:

X (dict) - Dataset splits.
Y (dict) - Target values for dataset splits.
args - Command-line arguments (must contain centers_num and seed).
dataname (str, optional, Default is None) - Dataset name.
is_meta (bool, optional, Default is False) - Whether this is meta-data.

Returns:

centers (np.ndarray) - Sampled data points concatenated with targets.

def to_tensors(data: ArrayDict) -> Dict[str, torch.Tensor]

Converts numpy arrays in a dictionary to PyTorch tensors.

Parameters:

data (dict) - Dictionary with numpy array values.

Returns:

dict - Dictionary with PyTorch tensors.

class TabPTMData(Dataset)

Dataset class for tabular data with numerical features.

Parameters:

dataset - Dataset object (must have is_regression attribute).
X (dict) - Feature splits.
Y (dict) - Label splits.
y_info - Label information.
part (str) - Data split (‘train’, ‘val’, ‘test’).

Methods:

get_dim_in(self) - Returns the input feature dimension.
get_categories(self) - Returns categorical feature information (always None for this class).
__len__(self) - Returns the number of samples in the dataset.
__getitem__(self, i) - Retrieves a data sample and its label.

References:

Han-Jia Ye, Qi-Le Zhou, Huai-Hong Yin, De-Chuan Zhan, and Wei-Lun Chao. Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective. arXiv:2311.00055 [cs.LG], 2025. https://arxiv.org/abs/2311.00055