num_embeddings

A set of functions to compute and validate bin edges for numerical feature discretization, commonly used in embeddings like piecewise linear encoding.

Functions

def _check_bins(bins: List[Tensor]) -> None

Validates the structure of bin edges to ensure they meet requirements for discretization.

Parameters:

bins (List[Tensor]) - List of tensors, where each tensor contains bin edges for a feature.

Raises:

ValueError - If bins are empty, not 1D tensors, have fewer than 2 edges, contain non-finite values, or are unsorted.
UserWarning - If a feature has exactly 2 bin edges (only 1 bin, equivalent to MinMax scaling).

def compute_bins(
    X: torch.Tensor,
    n_bins: int = 48,
    *,
    tree_kwargs: Optional[Dict[str, Any]] = None,
    y: Optional[Tensor] = None,
    regression: Optional[bool] = None,
    verbose: bool = False,
) -> List[Tensor]

Computes bin edges for numerical features using either quantile-based or tree-based methods.

Parameters:

X (torch.Tensor) - 2D tensor of shape (n_samples, n_features) containing numerical features.
n_bins (int, optional, Default is 48) - Target number of bins (actual count may vary).
tree_kwargs (Optional[Dict[str, Any]]) - If provided, uses tree-based binning with these kwargs for the decision tree.
y (Optional[Tensor]) - 1D tensor of labels (required for tree-based binning).
regression (Optional[bool]) - Whether the task is regression (required for tree-based binning).
verbose (bool, optional, Default is False) - If True, uses tqdm to show progress for tree-based binning.

Returns:

List[Tensor] - List of tensors, where each tensor contains sorted, unique bin edges for a feature, with shape (n_edges,).

class _PiecewiseLinearEncodingImpl(nn.Module)

Internal implementation of piecewise linear encoding (not recommended for direct use, as outputs contain infinite values).

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature (generated by compute_bins).

class PiecewiseLinearEncoding(nn.Module)

Wrapper for _PiecewiseLinearEncodingImpl that sanitizes outputs (removes infinite values).

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.

class _UnaryEncodingImpl(nn.Module)

Internal implementation of unary encoding, converting feature values into binary indicators of bin membership.

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.

class UnaryEncoding(nn.Module)

Wrapper for _UnaryEncodingImpl that sanitizes outputs.

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.

class _JohnsonEncodingImpl(nn.Module)

Internal implementation of Johnson encoding, using Johnson codes for feature discretization.

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.

class JohnsonEncoding(nn.Module)

Wrapper for _JohnsonEncodingImpl that sanitizes outputs.

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.

class _BinsEncodingImpl(nn.Module)

Internal implementation of bins encoding, using simple binning for feature discretization.

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.

class BinsEncoding(nn.Module)

Wrapper for _BinsEncodingImpl that sanitizes outputs.

Parameters:

bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.