TData

A PyTorch Dataset implementation designed for handling tabular data with numerical and categorical features.

Functions

class TData(Dataset)

A PyTorch Dataset implementation for tabular data with numerical and categorical features.

Parameters:

  • is_regression (bool) - Flag indicating if the task is regression. Used to determine label processing and metadata.

  • X (Tuple[Optional[Tensor], Optional[Tensor]]) - Tuple containing numerical and categorical features. - X[0]: Numerical features (shape: [n_samples, n_num_features]). - X[1]: Categorical features (shape: [n_samples, n_cat_features]).

  • Y (Dict[str, Tensor]) - Dictionary with labels for train/val/test splits. Keys must include part.

  • y_info (Dict[str, Any]) - Metadata about the labels. For regression, typically contains mean and std for de-normalization.

  • part (str) - Data split to use. Must be one of [‘train’, ‘val’, ‘test’].

def get_dim_in(self) -> int

Returns the input feature dimension.

Returns:

  • int - Number of numerical features (self.X_num.shape[1]), or 0 if self.X_num is None.

def get_categories(self) -> Optional[List[int]]

Returns categorical feature cardinality information.

Returns:

  • Optional[List[int]] - List where each element is the number of unique categories for a categorical feature. Returns None if self.X_cat is None.

def __len__(self) -> int

Returns the number of samples in the dataset.

Returns:

  • int - Number of samples in the dataset (len(self.Y)).

def __getitem__(self, i) -> Tuple[Union[Tensor, Tuple[Tensor, Tensor]], Tensor]

Retrieves a data sample and its label.

Parameters:

  • i (int) - Sample index.

Returns:

  • Tuple[Union[Tensor, Tuple[Tensor, Tensor]], Tensor] - - data: Feature tensor(s):

    • If both numerical and categorical features exist: (self.X_num[i], self.X_cat[i]).

    • If only categorical features exist: self.X_cat[i].

    • If only numerical features exist: self.X_num[i].

    • label: Corresponding target label (self.Y[i]).