TData

A PyTorch Dataset implementation designed for handling tabular data with numerical and categorical features.

Functions

class TData(Dataset)

A PyTorch Dataset implementation for tabular data with numerical and categorical features.

Parameters:

is_regression (bool) - Flag indicating if the task is regression. Used to determine label processing and metadata.
X (Tuple[Optional[Tensor], Optional[Tensor]]) - Tuple containing numerical and categorical features. - X[0]: Numerical features (shape: [n_samples, n_num_features]). - X[1]: Categorical features (shape: [n_samples, n_cat_features]).
Y (Dict[str, Tensor]) - Dictionary with labels for train/val/test splits. Keys must include part.
y_info (Dict[str, Any]) - Metadata about the labels. For regression, typically contains mean and std for de-normalization.
part (str) - Data split to use. Must be one of [‘train’, ‘val’, ‘test’].

def get_dim_in(self) -> int

Returns the input feature dimension.

Returns:

int - Number of numerical features (self.X_num.shape[1]), or 0 if self.X_num is None.

def get_categories(self) -> Optional[List[int]]

Returns categorical feature cardinality information.

Returns:

Optional[List[int]] - List where each element is the number of unique categories for a categorical feature. Returns None if self.X_cat is None.

def __len__(self) -> int

Returns the number of samples in the dataset.

Returns:

def __getitem__(self, i) -> Tuple[Union[Tensor, Tuple[Tensor, Tensor]], Tensor]

Retrieves a data sample and its label.

Parameters:

Returns:

Tuple[Union[Tensor, Tuple[Tensor, Tensor]], Tensor] - - data: Feature tensor(s):
- If both numerical and categorical features exist: (self.X_num[i], self.X_cat[i]).
- If only categorical features exist: self.X_cat[i].
- If only numerical features exist: self.X_num[i].
- label: Corresponding target label (self.Y[i]).