num_embeddings
A set of functions to compute and validate bin edges for numerical feature discretization, commonly used in embeddings like piecewise linear encoding.
Functions
def _check_bins(bins: List[Tensor]) -> None
Validates the structure of bin edges to ensure they meet requirements for discretization.
Parameters:
bins (List[Tensor]) - List of tensors, where each tensor contains bin edges for a feature.
Raises:
ValueError - If bins are empty, not 1D tensors, have fewer than 2 edges, contain non-finite values, or are unsorted.
UserWarning - If a feature has exactly 2 bin edges (only 1 bin, equivalent to MinMax scaling).
def compute_bins(
X: torch.Tensor,
n_bins: int = 48,
*,
tree_kwargs: Optional[Dict[str, Any]] = None,
y: Optional[Tensor] = None,
regression: Optional[bool] = None,
verbose: bool = False,
) -> List[Tensor]
Computes bin edges for numerical features using either quantile-based or tree-based methods.
Parameters:
X (torch.Tensor) - 2D tensor of shape (n_samples, n_features) containing numerical features.
n_bins (int, optional, Default is 48) - Target number of bins (actual count may vary).
tree_kwargs (Optional[Dict[str, Any]]) - If provided, uses tree-based binning with these kwargs for the decision tree.
y (Optional[Tensor]) - 1D tensor of labels (required for tree-based binning).
regression (Optional[bool]) - Whether the task is regression (required for tree-based binning).
verbose (bool, optional, Default is False) - If True, uses tqdm to show progress for tree-based binning.
Returns:
List[Tensor] - List of tensors, where each tensor contains sorted, unique bin edges for a feature, with shape (n_edges,).
class _PiecewiseLinearEncodingImpl(nn.Module)
Internal implementation of piecewise linear encoding (not recommended for direct use, as outputs contain infinite values).
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature (generated by compute_bins).
class PiecewiseLinearEncoding(nn.Module)
Wrapper for _PiecewiseLinearEncodingImpl that sanitizes outputs (removes infinite values).
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.
class _UnaryEncodingImpl(nn.Module)
Internal implementation of unary encoding, converting feature values into binary indicators of bin membership.
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.
class UnaryEncoding(nn.Module)
Wrapper for _UnaryEncodingImpl that sanitizes outputs.
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.
class _JohnsonEncodingImpl(nn.Module)
Internal implementation of Johnson encoding, using Johnson codes for feature discretization.
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.
class JohnsonEncoding(nn.Module)
Wrapper for _JohnsonEncodingImpl that sanitizes outputs.
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.
class _BinsEncodingImpl(nn.Module)
Internal implementation of bins encoding, using simple binning for feature discretization.
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.
class BinsEncoding(nn.Module)
Wrapper for _BinsEncodingImpl that sanitizes outputs.
Parameters:
bins (List[Tensor]) - List of 1D tensors, where each tensor represents bin edges for a feature.