ExcelFormer
A deep learning model for tabular data prediction, featuring a semi-permeable attention module to address rotational invariance, tailored data augmentation, and an attentive feedforward network, making it a reliable solution across diverse datasets.
Functions
class Lambda(nn.Module)
Lambda layer that applies a custom function.
Parameters:
f (callable) - Function to apply.
Input:
x - Input tensor.
Output:
Tensor - Result of applying function f to input.
class RMSNorm(nn.Module)
Root Mean Square Layer Normalization.
Parameters:
d (int) - Model size.
p (float, optional, Default is -1.0) - Partial RMSNorm parameter.
eps (float, optional, Default is 1e-5) - Epsilon value.
bias (bool, optional, Default is False) - Whether to use bias term.
Input:
x (Tensor) - Input tensor.
Output:
Tensor - Normalized tensor.
class ScaleNorm(nn.Module)
Scale normalization layer.
Parameters:
d (int) - Model dimension.
eps (float, optional, Default is 1e-5) - Epsilon value.
clamp (bool, optional, Default is False) - Whether to clamp norms.
Input:
x (Tensor) - Input tensor.
Output:
Tensor - Scaled and normalized tensor.
def reglu(x: Tensor) -> Tensor
ReGLU activation function.
Parameters:
x (Tensor) - Input tensor.
Returns:
Tensor - ReGLU output.
def geglu(x: Tensor) -> Tensor
GEGLU activation function.
Parameters:
x (Tensor) - Input tensor.
Returns:
Tensor - GEGLU output.
def tanglu(x: Tensor) -> Tensor
TanGLU activation function.
Parameters:
x (Tensor) - Input tensor.
Returns:
Tensor - TanGLU output.
class ReGLU(nn.Module)
ReGLU activation module.
Input:
x (Tensor) - Input tensor.
Output:
Tensor - ReGLU output.
class GEGLU(nn.Module)
GEGLU activation module.
Input:
x (Tensor) - Input tensor.
Output:
Tensor - GEGLU output.
def make_optimizer(optimizer: str, parameter_groups, lr: float, weight_decay: float) -> optim.Optimizer
Creates an optimizer with specified parameters.
Parameters:
optimizer (str) - Optimizer type.
parameter_groups - Parameter groups.
lr (float) - Learning rate.
weight_decay (float) - Weight decay.
Returns:
optim.Optimizer - Configured optimizer.
class RAdam(optim.Optimizer)
Rectified Adam optimizer.
Parameters:
params - Model parameters.
lr (float, optional, Default is 1e-3) - Learning rate.
betas (tuple, optional, Default is (0.9, 0.999)) - Beta parameters.
eps (float, optional, Default is 1e-8) - Epsilon value.
weight_decay (float, optional, Default is 0) - Weight decay.
degenerated_to_sgd (bool, optional, Default is True) - Whether to degenerate to SGD.
class AdaBelief(optim.Optimizer)
AdaBelief optimizer.
Parameters:
params - Model parameters.
lr (float, optional, Default is 1e-3) - Learning rate.
betas (tuple, optional, Default is (0.9, 0.999)) - Beta parameters.
eps (float, optional, Default is 1e-16) - Epsilon value.
weight_decay (float, optional, Default is 0) - Weight decay.
amsgrad (bool, optional, Default is False) - Whether to use AMSGrad.
weight_decouple (bool, optional, Default is True) - Whether to decouple weight decay.
fixed_decay (bool, optional, Default is False) - Whether to use fixed decay.
rectify (bool, optional, Default is True) - Whether to use rectification.
degenerated_to_sgd (bool, optional, Default is True) - Whether to degenerate to SGD.
Referencses:
ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data Jintai Chen, Jiahuan Yan, Qiyuan Chen, Danny Ziyi Chen, Jian Wu, Jimeng Sun. ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data. arXiv:2301.02819 [cs.LG], 2024. https://arxiv.org/abs/2301.02819