TALENT: A Tabular Analytics and Learning Toolbox

_images/TALENT-LOGO.png

Welcome to TALENT, a comprehensive machine learning toolbox designed to enhance model performance on tabular data.

TALENT integrates advanced deep learning models, classical algorithms, and efficient hyperparameter tuning, offering robust preprocessing capabilities to optimize learning from tabular datasets. The toolbox is user-friendly and adaptable, catering to both novice and expert data scientists.

Important

If you use any content of this repo for your work, please make sure to cite the relevant papers as described in the Citing TALENT section below.

Citing TALENT

If you use TALENT in your research, please consider citing the following works:

@article{ye2024closerlookdeeplearning,
         title={A Closer Look at Deep Learning on Tabular Data},
         author={Han-Jia Ye and Si-Yang Liu and Hao-Run Cai and Qi-Le Zhou and De-Chuan Zhan},
         journal={arXiv preprint arXiv:2407.00956},
         year={2024}
}

@article{liu2024talenttabularanalyticslearning,
         title={TALENT: A Tabular Analytics and Learning Toolbox},
         author={Si-Yang Liu and Hao-Run Cai and Qi-Le Zhou and Han-Jia Ye},
         journal={arXiv preprint arXiv:2407.04057},
         year={2024}
}

What’s New

Here are the recent updates to TALENT: - [2025-06]🌟 Add TabAutoPNPNet (Electronics 2025)

  • [2025-06]🌟 Add TabICL (ICML 2025). The current code is based on TabICL v0.1.2.

  • [2025-05]🌟 Check out our three papers MMTU, Tabular-Temporal-Shift, and BETA accepted at ICML 2025!

  • [2025-04]🌟 Check out our new survey Representation Learning for Tabular Data: A Comprehensive Survey (Repo). We organize existing methods into three main categories according to their generalization capabilities: specialized, transferable, and general models, which provides a comprehensive taxonomy for deep tabular representation methods.πŸš€πŸš€πŸš€

  • [2025-02]🌟 Add T2Gformer (AAAI 2023).

  • [2025-02]🌟 Add TabPFN v2 (Nature).

  • [2025-02]🌟 Thanks to Hengzhe Zhang for providing a Scikit-Learn compatible wrapper for TALENT!

  • [2025-01]🌟 Check out our new baseline ModernNCA (ICLR 2025), inspired by traditional Neighbor Component Analysis, which outperforms both tree-based and other deep tabular models, while also reducing training time and model size!πŸš€πŸš€πŸš€

  • [2025-01]🌟 Check out our latest version of the benchmark paper for updated and expanded results and analysis!

  • [2025-01]🌟We have curated and released new benchmark datasets, along with updated results of the dataset across a broader range of methods. This update focuses on enhancing dataset quality, including removing duplicates, and correcting tasks where bin-class was mistakenly treated as regression. We have also separated the larger datasets and formed the basic benchmark (300 datasets, including 120 bin-class, 80 multi-class, and 100 regression), and the large benchmark (22 datasets).

  • [2024-12]🌟 Add TabM (ICLR 2025).

  • [2024-09]🌟 Add Trompt (ICML 2023).

  • [2024-09]🌟 Add AMFormer (AAAI 2024).

  • [2024-08]🌟 Add GRANDE (ICLR 2024).

  • [2024-08]🌟 Add Excelformer (KDD 2024).

  • [2024-08]🌟 Add MLP_PLR (NeurIPS 2022).

  • [2024-07]🌟 Add RealMLP.

  • [2024-07]🌟 Add ProtoGate (ICML 2024).

  • [2024-07]🌟 Add BiSHop (ICML 2024).

  • [2024-06]🌟 Check out our new baseline ModernNCA, inspired by traditional Neighbor Component Analysis, which outperforms both tree-based and other deep tabular models, while also reducing training time and model size!

  • [2024-06]🌟 Check out our benchmark paper about tabular data, which provides comprehensive evaluations of classical and deep tabular methods based on our toolbox in a fair manner!

Note

If you want to view benchmark results, please visit: https://6sy666.github.io/TALENT-Results/

To explore default hyperparameters and search spaces of methods in this toolbox, check: https://6sy666.github.io/TALENT-Configs/

Contents

Acknowledgements