TabPTM
A general method for tabular data that standardizes heterogeneous datasets using meta-representations, allowing a pre-trained model to generalize to unseen datasets without additional training.
Functions
def prepare_meta_feature(X, Y, args)
Prepares class centers for classification tasks by sampling from training data.
Parameters:
X (dict) - Dataset splits (keys: ‘train’, ‘val’, ‘test’).
Y (dict) - Labels for dataset splits.
args - Command-line arguments (must contain centers_num and seed).
Returns:
centers (list) - List of numpy arrays where each array contains sampled centers for a class.
def prepare_meta_feature_regression(X, Y, args, dataname=None, is_meta=False)
Prepares sampled data points for regression tasks.
Parameters:
X (dict) - Dataset splits.
Y (dict) - Target values for dataset splits.
args - Command-line arguments (must contain centers_num and seed).
dataname (str, optional, Default is None) - Dataset name.
is_meta (bool, optional, Default is False) - Whether this is meta-data.
Returns:
centers (np.ndarray) - Sampled data points concatenated with targets.
def to_tensors(data: ArrayDict) -> Dict[str, torch.Tensor]
Converts numpy arrays in a dictionary to PyTorch tensors.
Parameters:
data (dict) - Dictionary with numpy array values.
Returns:
dict - Dictionary with PyTorch tensors.
class TabPTMData(Dataset)
Dataset class for tabular data with numerical features.
Parameters:
dataset - Dataset object (must have is_regression attribute).
X (dict) - Feature splits.
Y (dict) - Label splits.
y_info - Label information.
part (str) - Data split (‘train’, ‘val’, ‘test’).
Methods:
get_dim_in(self) - Returns the input feature dimension.
get_categories(self) - Returns categorical feature information (always None for this class).
__len__(self) - Returns the number of samples in the dataset.
__getitem__(self, i) - Retrieves a data sample and its label.
References:
Han-Jia Ye, Qi-Le Zhou, Huai-Hong Yin, De-Chuan Zhan, and Wei-Lun Chao. Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective. arXiv:2311.00055 [cs.LG], 2025. https://arxiv.org/abs/2311.00055