**AMFormer** =========== A token-based method which improves the transformer architecture for tabular data by incorporating parallel addition and multiplication attention mechanisms, utilizing prompt tokens to constrain feature interactions. Functions ~~~~~~~~~ .. code-block:: python class GEGLU(nn.Module) GEGLU (Gated Exponential Linear Unit) activation function. **Input:** * **x** *(Tensor)* - Input tensor. **Output:** * **Tensor** - GEGLU output. .. code-block:: python def FeedForward(dim, mult=4, dropout=0.) Creates a feedforward network with GEGLU activation. **Parameters:** * **dim** *(int)* - Input/output dimension. * **mult** *(int, optional, Default is 4)* - Multiplier for hidden dimension. * **dropout** *(float, optional, Default is 0.)* - Dropout rate. **Returns:** * **nn.Sequential** - Feedforward network. .. code-block:: python class Attention(nn.Module) Multi-head attention mechanism. **Parameters:** * **heads** *(int, optional, Default is 8)* - Number of attention heads. * **dim** *(int, optional, Default is 64)* - Input dimension. * **dropout** *(float, optional, Default is 0.)* - Dropout rate. * **inner_dim** *(int, optional, Default is 0)* - Inner dimension (0 for same as dim). **Input:** * **x** *(Tensor)* - Input tensor. * **attn_out** *(bool, optional, Default is False)* - Whether to return attention weights. **Output:** * **Tensor** - Attention output, or tuple (output, attention_weights) if attn_out=True. .. code-block:: python class MemoryBlock(nn.Module) Memory block with grouped attention mechanism. **Parameters:** * **token_num** *(int)* - Number of tokens. * **heads** *(int)* - Number of attention heads. * **dim** *(int)* - Input dimension. * **attn_dropout** *(float)* - Attention dropout rate. * **cluster** *(bool)* - Whether to use clustering. * **target_mode** *(str)* - Target mode for attention. * **groups** *(int)* - Number of groups. * **num_per_group** *(int)* - Number of tokens per group. * **use_cls_token** *(bool)* - Whether to use CLS token. * **sum_or_prod** *(str, optional)* - Sum or product operation. * **qk_relu** *(bool, optional, Default is False)* - Whether to use ReLU in QK computation. **Input:** * **x** *(Tensor)* - Input tensor. **Output:** * **Tensor** - Memory block output. .. code-block:: python class Transformer(nn.Module) Transformer model with memory blocks. **Parameters:** * **dim** *(int)* - Input dimension. * **depth** *(int)* - Number of transformer layers. * **heads** *(int)* - Number of attention heads. * **attn_dropout** *(float)* - Attention dropout rate. * **ff_dropout** *(float)* - Feedforward dropout rate. * **use_cls_token** *(bool)* - Whether to use CLS token. * **groups** *(int)* - Number of groups. * **sum_num_per_group** *(int)* - Number per group for sum operation. * **prod_num_per_group** *(int)* - Number per group for product operation. * **cluster** *(bool)* - Whether to use clustering. * **target_mode** *(str)* - Target mode. * **token_num** *(int)* - Number of tokens. * **token_descent** *(bool, optional, Default is False)* - Whether to use token descent. * **use_prod** *(bool, optional, Default is True)* - Whether to use product operation. * **qk_relu** *(bool, optional, Default is False)* - Whether to use ReLU in QK. **Input:** * **x** *(Tensor)* - Input tensor. **Output:** * **Tensor** - Transformer output. .. code-block:: python class NumericalEmbedder(nn.Module) Numerical feature embedder. **Parameters:** * **dim** *(int)* - Embedding dimension. * **num_numerical_types** *(int)* - Number of numerical feature types. **Input:** * **x** *(Tensor)* - Numerical feature tensor. **Output:** * **Tensor** - Embedded numerical features. **Referencses:** Cheng, Y., Hu, R., Ying, H., Shi, X., Wu, J., & Lin, W. (2024). Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning. arXiv:2402.02334. ``_