ExcelFormer

A deep learning model for tabular data prediction, featuring a semi-permeable attention module to address rotational invariance, tailored data augmentation, and an attentive feedforward network, making it a reliable solution across diverse datasets.

Functions

class Lambda(nn.Module)

Lambda layer that applies a custom function.

Parameters:

  • f (callable) - Function to apply.

Input:

  • x - Input tensor.

Output:

  • Tensor - Result of applying function f to input.

class RMSNorm(nn.Module)

Root Mean Square Layer Normalization.

Parameters:

  • d (int) - Model size.

  • p (float, optional, Default is -1.0) - Partial RMSNorm parameter.

  • eps (float, optional, Default is 1e-5) - Epsilon value.

  • bias (bool, optional, Default is False) - Whether to use bias term.

Input:

  • x (Tensor) - Input tensor.

Output:

  • Tensor - Normalized tensor.

class ScaleNorm(nn.Module)

Scale normalization layer.

Parameters:

  • d (int) - Model dimension.

  • eps (float, optional, Default is 1e-5) - Epsilon value.

  • clamp (bool, optional, Default is False) - Whether to clamp norms.

Input:

  • x (Tensor) - Input tensor.

Output:

  • Tensor - Scaled and normalized tensor.

def reglu(x: Tensor) -> Tensor

ReGLU activation function.

Parameters:

  • x (Tensor) - Input tensor.

Returns:

  • Tensor - ReGLU output.

def geglu(x: Tensor) -> Tensor

GEGLU activation function.

Parameters:

  • x (Tensor) - Input tensor.

Returns:

  • Tensor - GEGLU output.

def tanglu(x: Tensor) -> Tensor

TanGLU activation function.

Parameters:

  • x (Tensor) - Input tensor.

Returns:

  • Tensor - TanGLU output.

class ReGLU(nn.Module)

ReGLU activation module.

Input:

  • x (Tensor) - Input tensor.

Output:

  • Tensor - ReGLU output.

class GEGLU(nn.Module)

GEGLU activation module.

Input:

  • x (Tensor) - Input tensor.

Output:

  • Tensor - GEGLU output.

def make_optimizer(optimizer: str, parameter_groups, lr: float, weight_decay: float) -> optim.Optimizer

Creates an optimizer with specified parameters.

Parameters:

  • optimizer (str) - Optimizer type.

  • parameter_groups - Parameter groups.

  • lr (float) - Learning rate.

  • weight_decay (float) - Weight decay.

Returns:

  • optim.Optimizer - Configured optimizer.

class RAdam(optim.Optimizer)

Rectified Adam optimizer.

Parameters:

  • params - Model parameters.

  • lr (float, optional, Default is 1e-3) - Learning rate.

  • betas (tuple, optional, Default is (0.9, 0.999)) - Beta parameters.

  • eps (float, optional, Default is 1e-8) - Epsilon value.

  • weight_decay (float, optional, Default is 0) - Weight decay.

  • degenerated_to_sgd (bool, optional, Default is True) - Whether to degenerate to SGD.

class AdaBelief(optim.Optimizer)

AdaBelief optimizer.

Parameters:

  • params - Model parameters.

  • lr (float, optional, Default is 1e-3) - Learning rate.

  • betas (tuple, optional, Default is (0.9, 0.999)) - Beta parameters.

  • eps (float, optional, Default is 1e-16) - Epsilon value.

  • weight_decay (float, optional, Default is 0) - Weight decay.

  • amsgrad (bool, optional, Default is False) - Whether to use AMSGrad.

  • weight_decouple (bool, optional, Default is True) - Whether to decouple weight decay.

  • fixed_decay (bool, optional, Default is False) - Whether to use fixed decay.

  • rectify (bool, optional, Default is True) - Whether to use rectification.

  • degenerated_to_sgd (bool, optional, Default is True) - Whether to degenerate to SGD.

Referencses:

ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data Jintai Chen, Jiahuan Yan, Qiyuan Chen, Danny Ziyi Chen, Jian Wu, Jimeng Sun. ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data. arXiv:2301.02819 [cs.LG], 2024. https://arxiv.org/abs/2301.02819