Stop building from scratch.

Linear Regression

y = a·x + b + noise

The reference PFN prior. Random linear functions with Gaussian noise — the simplest demonstrable PFN training task.

PriorClassification

Random interlocking half-moons + 0/1 labels

Two-Moons Classification

Classification analogue of linear regression. Each task is a fresh draw of the two-moons geometry — the PFN learns a generic 2D classifier.

PriorTime series

y_t = φ₁·y_{t-1} + φ₂·y_{t-2} + ε_t

AR(2) Time Series

Real-world-shaped time series. Random stationary AR(2) coefficients per task; the PFN learns to forecast any well-behaved autoregressive series.

PriorCausal / discovery

Recover the DAG behind linear y = Ax + ε

Linear SCM Discovery

Random sparse linear structural causal models. The PFN outputs an adjacency matrix — pure structure discovery, no fitted edges.

PriorRegression

Polynomial Regression

y = Σ c_k · x^k + noise

Random polynomial functions up to degree D. Generalises linear regression with curvature.

Per-task GMM with K components → K-class labels

PriorRegression

GP Regression (RBF)

y ~ GP(0, k_RBF(x, x'))

Functions sampled from a Gaussian Process with an RBF kernel. PFN learns to do GP regression at inference without solving the kernel system.

Müller et al.

PriorClassification

Gaussian-Mixture Classification

Random Gaussian mixtures in D dimensions. Trains a PFN that does Bayesian-optimal classification on any well-separated mixture.

PriorClassification

Logistic w/ Feature Interactions

σ(w·x + interactions)

Logistic regression with pairwise feature interactions baked in. Trains a PFN that picks up cross-feature signal automatically.

y_t = a·sin(ω·t + φ) + noise

PriorTime series

Sine Wave

The simplest temporal PFN. Random amplitude, frequency, and phase per task.

y_t = trend(t) + seasonal(t) + noise

PriorTime series

Seasonal + Trend

Classic decomposition: slow trend plus periodic component plus noise. Demo-worthy for retail / energy / web traffic.

p ~ Beta(α, β); flips ~ Bernoulli(p)

PriorProbabilistic

Bayesian Coin Flip

Textbook conjugate prior. Direct evidence the PFN has learned Bayesian inference.

μ_g ~ N(μ_0, τ); y ~ N(μ_g, σ)

PriorProbabilistic

Hierarchical Normal

Two-level normal model. Groups share information through a population mean — the canonical multi-level setup.

PriorCausal / discovery

Causal Chain Discovery

Detect X → Y → Z chains

Specialised SCM prior with chain-shaped DAGs. Faster to learn than full ER-DAGs and matches a common scientific use case.

ModelRegression

embed → 2× attn → scalar head

2-Layer Regression Transformer

Smallest model that consistently solves linear-regression-shape priors. Good first pass for any 1D scalar-output task.

tabular_embedder (d=64)transformer_encoder × 2scalar_head

embed → 2× attn → sigmoid head

ModelClassification

2-Layer Binary Classifier

Same backbone as the regression baseline, scalar head emits one logit per point. Paired with the two-moons / GMM priors.

tabular_embedder (d=64)transformer_encoder × 2scalar_head (d_out=1)

ModelTime series

embed → 4× attn → forecast head

4-Layer Temporal Transformer

Deeper backbone for sine / AR / seasonal priors. Wider d_model and four attention layers give enough capacity for non-trivial dynamics.

tabular_embedder (d=128)transformer_encoder × 4scalar_head

embed → attn → causal pool → discovery head

ModelCausal / discovery

Discovery Encoder

Outputs a d×d adjacency matrix from N×d observations. The default architecture for linear-SCM / chain-SCM discovery priors.

tabular_embedder (d=128)transformer_encoder × 3causal_attention_pooldiscovery_head

embed → 6× attn → estimation head

ModelClassification

Tabular Foundation (6L)

Wider, deeper tabular backbone for harder priors (hierarchical models, GMM with many classes, interaction-heavy logistic).

tabular_embedder (d=256)transformer_encoder × 6 (heads=8)estimation_head

embed → attn → estimation head

ModelCausal / discovery

Treatment-Effect Stack

For causal effect estimation tasks where the output is a real-valued estimate per query. Pairs with potential-outcome priors.

tabular_embedder (d=128)transformer_encoder × 4estimation_head

ProjectProbabilistic✓ Importable

Reference library for training Prior-Data Fitted Networks

PFNs

AutoML Freiburg's maintained PFN library — the canonical implementation for training transformer-based PFNs that approximate Bayesian prediction. Foundation for TabPFN, PFNs4BO, LC-PFN and most downstream PFN work.

✓Müller, Hollmann, Hutter · AutoML Freiburg

ProjectClassification◐ Partial import

Foundation model for tabular classification + regression

TabPFN

In-context-learning transformer that predicts on small tabular datasets in seconds, no per-dataset training. v2 was published in Nature (2025) and matches or beats tuned tree ensembles.

✓Hollmann, Müller, Hutter · Prior Labs

Prior Labs License (Apache-2 + attribution)

ProjectTime series✓ Importable

TabPFN-TS

Zero-shot univariate time-series forecasting via TabPFN v2

Frames forecasting as tabular regression and runs TabPFN v2 with lightweight feature engineering for zero-shot point + probabilistic forecasts. Handles exogenous features (weather, holidays) without preprocessing.

✓Prior Labs

ProjectClassification✓ Importable

MotherNet

Hypernetwork that emits a trained tabular classifier in one pass

Foundational hypernetwork trained on synthetic tabular tasks: prompted with a training set, it emits the weights of a small child neural network without gradient descent. Faster inference than TabPFN.

Microsoft Research

ProjectOptimization✓ Importable

PFNs4BO

In-context Bayesian optimisation via PFNs

ICML 2023 implementation using PFNs as surrogates for Bayesian optimisation, replacing Gaussian processes with a pre-trained transformer that predicts posteriors in one forward pass.

Müller, Feurer, Hollmann, Hutter

ProjectOptimization✓ Importable

ifBO

In-context freeze-thaw Bayesian optimization

ICML 2024. Uses a PFN as a freeze-thaw surrogate, predicting learning-curve continuations to decide which configurations to keep training. Anytime-efficient hyperparameter search.

Rakotoarison, Adriaensen et al. · AutoML

MIT

ProjectProbabilistic✓ Importable

LC-PFN

Bayesian learning-curve extrapolation via a PFN

NeurIPS 2023. Predicts the posterior over a learning curve's continuation given a few initial points, using a PFN trained on a parametric curve prior. Drop-in surrogate for early-stopping.

Adriaensen, Rakotoarison, Müller, Hutter

MIT

ProjectClassification✓ Importable

TabICL

Tabular foundation model via in-context learning

Permissive BSD-licensed tabular foundation model in the PFN family. Strong benchmark results plus a forecast sub-module derived from TabPFN-TS.

Qu, Holzmüller, Le Morvan · Soda team, Inria

BSD-3-Clause

ProjectProbabilistic✓ Importable

KinPFN

PFN approximating RNA folding first-passage-time distributions

ICLR 2025. PFN applied to RNA kinetics: a transformer trained on synthetic kinetic priors directly predicts the CDF of first passage times, replacing expensive Kinfold simulations.

Scheuer, Runge et al. · AutoML Freiburg