A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

doi:10.48550/arXiv.2303.04226

Journal Article10.48550/arXiv.2303.04226

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Yihan Cao, +5 more

- 07 Mar 2023

- arXiv.org

- Vol. abs/2303.04226

307

TL;DR: A comprehensive review on the history of generative models, and basic components, recent advances in Artificial Intelligence Generated Content (AIGC) from unimodal interaction and multimodal interactions is provided in this paper .

Abstract: Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Fig. 4. Categories of pre-trained LLMs. Black line represents information flow in bidirectional models, while gray line representas left-to-right information flow. Encoder models, e.g. BERT, are trained with context-aware objectives. Decoder models, e.g. GPT, are trained with autoregressive objectives. Encoder-decoder models, e.g. T5 and BART, combines the two, which use context-aware structures as encoders and left-to-right structures as decoders.

Fig. 15. General procedure of prompt learning for emotion detection examples. First, the user need to construct a prompt that fits the problem well, the user can also use in-context learning and chain-of-thought (CoT) to help improve the performance. Then, an LLM will generate suitable words for the blank space in the prompt. Finally, a verbalizer will project the generated word to a specific classification category.

Fig. 10. Two types of to-language decoder models: jointly-trained models and frozen models. Jointly-trained models are normally trained end-to-end, while frozen models normally keep the language decoder frozen and only train the image encoder.

Fig. 1. Examples of AIGC in image generation. Text instructions are given to OpenAI DALL-E-2 model, and it generates two images according to the instructions.

Fig. 5. Statistics of model size [52] and training speed 1across different models and computing devices.

Fig. 14. A relation graph of a current research areas, applications and related companies, where dark blue circles represent research areas, light blue circles represent applications and green circles represents companies.

Citations

Journal Article•10.17705/1thci.00005

The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation

Jenny Preece, +1 more

- 31 Mar 2009

TL;DR: The Reader-to-Leader Framework is offered, aimed at helping researchers, designers, and managers understand what motivates technology-mediated social participation, to improve interface design and social support for their companies, government agencies, and non-governmental organizations.

...read moreread less

409

Journal Article•10.48550/arxiv.2308.05734

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Haohe Liu, +9 more

- 10 Aug 2023

- arXiv.org

TL;DR: A framework that utilizes the same learning method for speech, music, and sound effect generation, and introduces a general representation of audio, called "language of audio"(LOA), and performs self-supervised audio generation learning with a latent diffusion model conditioned on LOA.

...read moreread less

141

•Journal Article•10.1109/JPROC.2020.3034808

Artificial-Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges

Jiafu Wan, +5 more

- 01 Apr 2021

TL;DR: The experimental results have demonstrated that the AI-assisted CM offers the possibility of higher production flexibility and efficiency, and the state-of-the-art AI technologies, that is, machine learning, multiagent systems, Internet of Things, big data, and cloud-edge computing, are surveyed.

...read moreread less

115

Journal Article•10.1016/j.jksuci.2023.101675

Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions

Shahab Saquib Sohail, +7 more

- 26 Jul 2023

- Journal of King Saud University - Comput...

TL;DR: A comprehensive review of over 100 Scopus-indexed publications on ChatGPT is presented, aiming to provide a taxonomy ofChatGPT research and explore its applications, and identifies potential future directions for ChatG PT research.

...read moreread less

111

...

Expand

References

•Proceedings Article

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling

Lars Maaløe, +3 more

- 01 Jan 2019

TL;DR: The Bidirectional-Inference Variational Autoencoder (BIVA) as discussed by the authors uses a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path.

...read moreread less

•Posted Content

Knowledge Enhanced Contextual Word Representations

Matthew E. Peters, +6 more

- 09 Sep 2019

- arXiv: Computation and Language

TL;DR: After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.

...read moreread less

•Posted Content

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Yu Zhang, +8 more

- 09 Jul 2019

- arXiv: Computation and Language

TL;DR: A multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages and be able to transfer voices across languages, e.g. English and Mandarin.

...read moreread less

Proceedings Article•10.48550/arXiv.2206.04624

Factuality Enhanced Language Models for Open-Ended Text Generation

Nayeon Lee, +5 more

- 09 Jun 2022

TL;DR: This work measures and improves the factual accuracy of large-scale LMs for open-ended text generation, and proposes a factuality-enhanced training method that uses T OPIC P REFIX for better awareness of facts and sentence completion as the training objective, which can vastly reduce the factual errors.

...read moreread less

•Posted Content

Multi-Generator Generative Adversarial Nets

Quan Hoang, +3 more

- 08 Aug 2017

- arXiv: Learning

TL;DR: A new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem, and develops theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generator' distributions and the empirical data distribution is minimal, hence effectively avoiding the mode collapse.

...read moreread less

...

Expand

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Chat with Paper

AI Agents for this Paper

Figures

Citations

A Survey of Large Language Models

The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Artificial-Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges

Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions

References

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling

Knowledge Enhanced Contextual Word Representations

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Factuality Enhanced Language Models for Open-Ended Text Generation

Multi-Generator Generative Adversarial Nets