A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

doi:10.48550/arXiv.2303.04226

Journal Article10.48550/arXiv.2303.04226

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Yihan Cao, +5 more

- 07 Mar 2023

- arXiv.org

- Vol. abs/2303.04226

307

TL;DR: A comprehensive review on the history of generative models, and basic components, recent advances in Artificial Intelligence Generated Content (AIGC) from unimodal interaction and multimodal interactions is provided in this paper .

Abstract: Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Fig. 4. Categories of pre-trained LLMs. Black line represents information flow in bidirectional models, while gray line representas left-to-right information flow. Encoder models, e.g. BERT, are trained with context-aware objectives. Decoder models, e.g. GPT, are trained with autoregressive objectives. Encoder-decoder models, e.g. T5 and BART, combines the two, which use context-aware structures as encoders and left-to-right structures as decoders.

Fig. 15. General procedure of prompt learning for emotion detection examples. First, the user need to construct a prompt that fits the problem well, the user can also use in-context learning and chain-of-thought (CoT) to help improve the performance. Then, an LLM will generate suitable words for the blank space in the prompt. Finally, a verbalizer will project the generated word to a specific classification category.

Fig. 10. Two types of to-language decoder models: jointly-trained models and frozen models. Jointly-trained models are normally trained end-to-end, while frozen models normally keep the language decoder frozen and only train the image encoder.

Fig. 1. Examples of AIGC in image generation. Text instructions are given to OpenAI DALL-E-2 model, and it generates two images according to the instructions.

Fig. 5. Statistics of model size [52] and training speed 1across different models and computing devices.

Fig. 14. A relation graph of a current research areas, applications and related companies, where dark blue circles represent research areas, light blue circles represent applications and green circles represents companies.

Citations

Journal Article•10.17705/1thci.00005

The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation

Jenny Preece, +1 more

- 31 Mar 2009

TL;DR: The Reader-to-Leader Framework is offered, aimed at helping researchers, designers, and managers understand what motivates technology-mediated social participation, to improve interface design and social support for their companies, government agencies, and non-governmental organizations.

...read moreread less

409

Journal Article•10.48550/arxiv.2308.05734

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Haohe Liu, +9 more

- 10 Aug 2023

- arXiv.org

TL;DR: A framework that utilizes the same learning method for speech, music, and sound effect generation, and introduces a general representation of audio, called "language of audio"(LOA), and performs self-supervised audio generation learning with a latent diffusion model conditioned on LOA.

...read moreread less

141

•Journal Article•10.1109/JPROC.2020.3034808

Artificial-Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges

Jiafu Wan, +5 more

- 01 Apr 2021

TL;DR: The experimental results have demonstrated that the AI-assisted CM offers the possibility of higher production flexibility and efficiency, and the state-of-the-art AI technologies, that is, machine learning, multiagent systems, Internet of Things, big data, and cloud-edge computing, are surveyed.

...read moreread less

115

Journal Article•10.1016/j.jksuci.2023.101675

Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions

Shahab Saquib Sohail, +7 more

- 26 Jul 2023

- Journal of King Saud University - Comput...

TL;DR: A comprehensive review of over 100 Scopus-indexed publications on ChatGPT is presented, aiming to provide a taxonomy ofChatGPT research and explore its applications, and identifies potential future directions for ChatG PT research.

...read moreread less

111

...

Expand

References

Journal Article

Truncated Diffusion Probabilistic Models

Huangjie Zheng, +3 more

- arXiv.org

TL;DR: Experimental results show the truncated diﬀusion probabilistic models provide consistent improvements over the non-truncated ones in terms of the generation performance and the number of required inverse di-usion steps.

...read moreread less

36

•Proceedings Article•10.1145/3219819.3219977

TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data

Hengtong Zhang, +4 more

- 19 Jul 2018

TL;DR: A novel truth discovery method, named "TextTruth", which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers.

...read moreread less

32

•Journal Article•10.3390/e25040633

How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models

Giulio Franzese, +6 more

- 10 Jun 2022

- Entropy

TL;DR: This work shows how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process, and suggests a new method to improve quality and efficiency of both training and sampling, by adopting smaller diffusion times.

...read moreread less

29

•Journal Article•10.1177/21582440221082130

Artificial Intelligence-Generated and Human Expert-Designed Vocabulary Tests: A Comparative Study

Luo Yunjiu, +2 more

- 01 Jan 2022

- SAGE Open

TL;DR: The results of think-aloud data indicated that the AI-generated items and expert-designed items might assess different constructs, in which the former elicited test takers’ bottom-up test-taking strategies while the latter seemed more likely to trigger test taker’ rote memorization ability.

...read moreread less

21

•Posted Content

Semantic Graph Parsing with Recurrent Neural Network DAG Grammars

Federico Fancellu, +3 more

- 30 Sep 2019

- arXiv: Computation and Language

TL;DR: Recurrent neural network DAG grammars is presented, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction.

...read moreread less

20

...

Expand

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Chat with Paper

AI Agents for this Paper

Figures

Citations

A Survey of Large Language Models

The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Artificial-Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges

Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions

References

Truncated Diffusion Probabilistic Models

TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data

How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models

Artificial Intelligence-Generated and Human Expert-Designed Vocabulary Tests: A Comparative Study

Semantic Graph Parsing with Recurrent Neural Network DAG Grammars