You don't need to. Just sample 1% of data, let the model do the feature engineering, ask model model to replicate everything in Pandas (or R, or whatever else), so you can run it for a full dataset.
hodgehog11 20 hours ago [-]
On the one hand, this is impressive. TabPFN was already state of the art and is seriously shaking up Bayesian prediction for tabular data (which is almost everything).
On the other hand, perhaps it is just me, but I do not feel that this is an acceptable form of benchmark reporting in this domain. TabArena actually has multiple metrics, since ELO does not properly quantify the degree of improvement. The fact that these are not displayed here should give pause. Also the results section in the GitHub is a dumpster fire.
Eridrus 19 hours ago [-]
GitHub Repo: Please see the results folder
Results folder: Here's some undocumented parquet files
Definitely feels like they're hiding the ball lol.
If they had good benchmarks they'd talk about them.
Not comparing to tuned xgboost is also a warning sign.
nok22kon 14 hours ago [-]
wouldn't xgboost be covered under autogluon? not perfect, but not missing either
Eridrus 3 hours ago [-]
Honestly, I don't really know AutoGluon, if this does xgboost tuning that's good.
I do still think ELO scores are still a way to obscure results though. For all we know this is like 0.1% better than a "normal" approach on like 70% of tasks and a tire fire on others.
nok22kon 2 hours ago [-]
AutoGluon does their own benchmarking, the default is Elo, but you can switch to other metrics:
https://news.ycombinator.com/item?id=48689744
On the other hand, perhaps it is just me, but I do not feel that this is an acceptable form of benchmark reporting in this domain. TabArena actually has multiple metrics, since ELO does not properly quantify the degree of improvement. The fact that these are not displayed here should give pause. Also the results section in the GitHub is a dumpster fire.
Results folder: Here's some undocumented parquet files
Definitely feels like they're hiding the ball lol.
If they had good benchmarks they'd talk about them.
Not comparing to tuned xgboost is also a warning sign.
I do still think ELO scores are still a way to obscure results though. For all we know this is like 0.1% better than a "normal" approach on like 70% of tasks and a tire fire on others.
https://huggingface.co/spaces/TabArena/leaderboard
xgboost is listed, they say "tuned" but who knows what that means. and its below CatBoost and LightGBM
as an aggregator of models (trees, neural networks, clustering, ...) AutoGluon doesn't really have a dog in this fight