Preprint10.48550/arxiv.2402.15734
Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning
Wuyang Chen,Jialin Song,Pu Ren,Shashank Subramanian,Dmitriy Morozov,Michael W. Mahoney +5 more
- 24 Feb 2024
TL;DR: Data-efficient operator learning via unsupervised pretraining and in-context learning significantly reduces the data requirements for PDE operator learning, improving generalizability and performance.
read more
Abstract: Recent years have witnessed the promise of coupling machine learning methods and physical domain-specific insight for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding these expensive simulations. In this work, seeking data efficiency, we design unsupervised pretraining and in-context learning methods for PDE operator learning. To reduce the need for training data with simulated solutions, we pretrain neural operators on unlabeled PDE data using reconstruction-based proxy tasks. To improve out-of-distribution performance, we further assist neural operators in flexibly leveraging in-context learning methods, without incurring extra training costs or designs. Extensive empirical evaluations on a diverse set of PDEs demonstrate that our method is highly data-efficient, more generalizable, and even outperforms conventional vision-pretrained models.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 10: Visualization of FNO reconstructions of unlabeled PDE data on the Poisson (“Pois.”), Helmholtz (“Helm.”), 2D Diffusion-Reaction (“D.R.”), and 2D incompressible Navier-Stokes (“N.S.”) equations during MAE pretraining. (Mask ratio: 0.1 for Poisson, Helmholtz, and 2D Diffusion-Reaction equations; 0.7 for incompressible Navier-Stokes.) In masks, only white areas are visible to the model during pretraining. 
Figure 11: Visualization of FNO reconstructions of unlabeled PDE data on the 2D incompressible Navier-Stokes equations during MAE pertaining with mask ratio from 0.1 to 0.9. ![Table 3: Simulation time costs on 2D Incompressible Navier-Stokes (“N.S.”) on PINO Dataset [9] and ReactionDiffusion (“R.D.”) on PDE-Bench [10]. “Re”: Reynolds number. “Du, Dv”: diffusion coefficients. N : number of samples. T : temporal resolution. H ×W : spatial resolution. C: input channels (1 for the vorticity in N.S., 2 for velocities u, v in R.D.).](/figures/table3-1-592d0voxc1wk.png)
Table 3: Simulation time costs on 2D Incompressible Navier-Stokes (“N.S.”) on PINO Dataset [9] and ReactionDiffusion (“R.D.”) on PDE-Bench [10]. “Re”: Reynolds number. “Du, Dv”: diffusion coefficients. N : number of samples. T : temporal resolution. H ×W : spatial resolution. C: input channels (1 for the vorticity in N.S., 2 for velocities u, v in R.D.). ![Table 2: Hyperparameters for pretraining and training/fine-tuning. “N.S.”: 2D Incompressible Navier-Stokes. “DAdapt”: adaptive learning rate by D-adaptation [83]. “ns”: total number of simulated training samples. A batch size of “min(32, ns)” is because the total number of training samples might be fewer than 32.](/figures/table2-1-69k18bpm2vd3.png)
Table 2: Hyperparameters for pretraining and training/fine-tuning. “N.S.”: 2D Incompressible Navier-Stokes. “DAdapt”: adaptive learning rate by D-adaptation [83]. “ns”: total number of simulated training samples. A batch size of “min(32, ns)” is because the total number of training samples might be fewer than 32. 
Figure 1: Overview of our framework for data-efficient neural operator learning (with our contributions highlighted in red). Stage 1: Unsupervised pretraining only on unlabeled PDE data. Stage 2: Fine-tuning with reduced simulation costs of PDE data. Stage 3: Test-time in-context learning to improve the neural operator’s out-of-distribution performance, without additional training costs. ![Figure 8: Comparison between our unsupervised pretraining method versus MoCo v2 [16].](/figures/figure8-1-5yauqvw0359q.png)
Figure 8: Comparison between our unsupervised pretraining method versus MoCo v2 [16].
Citations
Masked Autoencoders are PDE Learners
Anthony Zhou,Amir Barati Farimani +1 more
TL;DR: Masked autoencoders are a novel technique for learning latent representations of PDEs, improving generalizability and performance on unseen equations.
2
AhmedML: High-Fidelity Computational Fluid Dynamics Dataset for Incompressible, Low-Speed Bluff Body Aerodynamics
Neil Ashton,Danielle C. Maddix,Samuel Gundry,Parisa M. Shabestari +3 more
- 30 Jul 2024
TL;DR: This paper presents AhmedML, a high-fidelity CFD dataset for incompressible, low-speed bluff body aerodynamics, comprising 500 geometric variations of the Ahmed Car Body, with simulation results in open-source formats, enabling machine learning method development and reproducibility.
ResFouriONet: A residual Fourier operator network with synthetic data generation for real-time laser-induced bioheat transfer modeling
Aditya Roy,Andrew DuPlissis,Adela Ben-Yakar +2 more
Panda: A pretrained forecast model for universal representation of chaotic dynamics
TL;DR: Researchers introduce Panda, a pre-trained model for universal representation of chaotic dynamics, trained on a synthetic dataset of 20,000 systems, exhibiting zero-shot forecasting and emergent ability to predict partial differential equations without retraining.
Strategies for Pretraining Neural Operators
Aiguo Zhou,Cooper Lorsung,AmirPouya Hemmasian,Amir Barati Farimani +3 more
- 12 Jun 2024
TL;DR: Pretraining neural operators for physics prediction improves generalizability and performance, but its effectiveness depends on model and dataset choices. Transfer learning and physics-based pretraining strategies are most effective. Data augmentations and fine-tuning in scarce data regimes further enhance performance.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
•Posted Content
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy,Lucas Beyer,Alexander Kolesnikov,Dirk Weissenborn,Xiaohua Zhai,Thomas Unterthiner,Mostafa Dehghani,Matthias Minderer,Georg Heigold,Sylvain Gelly,Jakob Uszkoreit,Neil Houlsby +11 more
TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
•Posted Content
A Simple Framework for Contrastive Learning of Visual Representations
TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.