Data-Efficient Operator Learning via Unsupervised Pretraining and
  In-Context Learning

doi:10.48550/arxiv.2402.15734

Preprint10.48550/arxiv.2402.15734

Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning

Wuyang Chen, +5 more

- 24 Feb 2024

7

TL;DR: Data-efficient operator learning via unsupervised pretraining and in-context learning significantly reduces the data requirements for PDE operator learning, improving generalizability and performance.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 10: Visualization of FNO reconstructions of unlabeled PDE data on the Poisson (“Pois.”), Helmholtz (“Helm.”), 2D Diffusion-Reaction (“D.R.”), and 2D incompressible Navier-Stokes (“N.S.”) equations during MAE pretraining. (Mask ratio: 0.1 for Poisson, Helmholtz, and 2D Diffusion-Reaction equations; 0.7 for incompressible Navier-Stokes.) In masks, only white areas are visible to the model during pretraining.

Figure 11: Visualization of FNO reconstructions of unlabeled PDE data on the 2D incompressible Navier-Stokes equations during MAE pertaining with mask ratio from 0.1 to 0.9.

Table 3: Simulation time costs on 2D Incompressible Navier-Stokes (“N.S.”) on PINO Dataset [9] and ReactionDiffusion (“R.D.”) on PDE-Bench [10]. “Re”: Reynolds number. “Du, Dv”: diffusion coefficients. N : number of samples. T : temporal resolution. H ×W : spatial resolution. C: input channels (1 for the vorticity in N.S., 2 for velocities u, v in R.D.).

Table 2: Hyperparameters for pretraining and training/fine-tuning. “N.S.”: 2D Incompressible Navier-Stokes. “DAdapt”: adaptive learning rate by D-adaptation [83]. “ns”: total number of simulated training samples. A batch size of “min(32, ns)” is because the total number of training samples might be fewer than 32.

Figure 1: Overview of our framework for data-efficient neural operator learning (with our contributions highlighted in red). Stage 1: Unsupervised pretraining only on unlabeled PDE data. Stage 2: Fine-tuning with reduced simulation costs of PDE data. Stage 3: Test-time in-context learning to improve the neural operator’s out-of-distribution performance, without additional training costs.

Figure 8: Comparison between our unsupervised pretraining method versus MoCo v2 [16].

Citations

Journal Article•10.48550/arxiv.2403.17728

Masked Autoencoders are PDE Learners

Anthony Zhou, +1 more

- 26 Mar 2024

- arXiv.org

TL;DR: Masked autoencoders are a novel technique for learning latent representations of PDEs, improving generalizability and performance on unseen equations.

...read moreread less

2

Journal Article•10.48550/arxiv.2407.20801

AhmedML: High-Fidelity Computational Fluid Dynamics Dataset for Incompressible, Low-Speed Bluff Body Aerodynamics

Neil Ashton, +3 more

- 30 Jul 2024

TL;DR: This paper presents AhmedML, a high-fidelity CFD dataset for incompressible, low-speed bluff body aerodynamics, comprising 500 geometric variations of the Ahmed Car Body, with simulation results in open-source formats, enabling machine learning method development and reproducibility.

...read moreread less

Journal Article•10.1016/j.icheatmasstransfer.2025.109964

ResFouriONet: A residual Fourier operator network with synthetic data generation for real-time laser-induced bioheat transfer modeling

Aditya Roy, +2 more

- 04 Nov 2025

- International Communications in Heat and...

Journal Article•10.48550/arxiv.2505.13755

Panda: A pretrained forecast model for universal representation of chaotic dynamics

Jeffrey Lai, +2 more

- 19 May 2025

- arXiv.org

TL;DR: Researchers introduce Panda, a pre-trained model for universal representation of chaotic dynamics, trained on a synthetic dataset of 20,000 systems, exhibiting zero-shot forecasting and emergent ability to predict partial differential equations without retraining.

...read moreread less

Preprint•10.48550/arxiv.2406.08473

Strategies for Pretraining Neural Operators

Aiguo Zhou, +3 more

- 12 Jun 2024

TL;DR: Pretraining neural operators for physics prediction improves generalizability and performance, but its effectiveness depends on model and dataset choices. Transfer learning and physics-based pretraining strategies are most effective. Data augmentations and fine-tuning in scarce data regimes further enhance performance.

...read moreread less

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Proceedings Article•10.1109/CVPR.2009.5206848

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

- 20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

75.9K

•Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, +11 more

- 22 Oct 2020

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

36.9K

•Posted Content

A Simple Framework for Contrastive Learning of Visual Representations

Ting Chen, +3 more

- 13 Feb 2020

- arXiv: Learning

TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

...read moreread less

16.3K

...

Expand