Journal Article10.48550/arXiv.2303.04226
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
TL;DR: A comprehensive review on the history of generative models, and basic components, recent advances in Artificial Intelligence Generated Content (AIGC) from unimodal interaction and multimodal interactions is provided in this paper .
read more
Abstract: Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 4. Categories of pre-trained LLMs. Black line represents information flow in bidirectional models, while gray line representas left-to-right information flow. Encoder models, e.g. BERT, are trained with context-aware objectives. Decoder models, e.g. GPT, are trained with autoregressive objectives. Encoder-decoder models, e.g. T5 and BART, combines the two, which use context-aware structures as encoders and left-to-right structures as decoders. 
Fig. 15. General procedure of prompt learning for emotion detection examples. First, the user need to construct a prompt that fits the problem well, the user can also use in-context learning and chain-of-thought (CoT) to help improve the performance. Then, an LLM will generate suitable words for the blank space in the prompt. Finally, a verbalizer will project the generated word to a specific classification category. 
Fig. 10. Two types of to-language decoder models: jointly-trained models and frozen models. Jointly-trained models are normally trained end-to-end, while frozen models normally keep the language decoder frozen and only train the image encoder. 
Fig. 1. Examples of AIGC in image generation. Text instructions are given to OpenAI DALL-E-2 model, and it generates two images according to the instructions. ![Fig. 5. Statistics of model size [52] and training speed 1across different models and computing devices.](/figures/figure5-1-1zl6zqt3gw5f.png)
Fig. 5. Statistics of model size [52] and training speed 1across different models and computing devices. 
Fig. 14. A relation graph of a current research areas, applications and related companies, where dark blue circles represent research areas, light blue circles represent applications and green circles represents companies.
Citations
A Survey of Large Language Models
Wayne Xin Zhao,Kun Zhou,Junyi Li,Tianyi Tang,Xiaolei Wang,Yupeng Hou,Yingqian Min,Beichen Zhang,Junjie Zhang,Zican Dong,Yifan Du,Chen Yang,Yushuo Chen,Zhongyong Chen,Jinhao Jiang,Ruiyang Ren,Yifan Li,Xinyu Tang,Zikang Liu,Peiyu Liu,Jian-Yun Nie,Ji-Rong Wen +21 more
TL;DR: Recently, a large language model (LLM) as mentioned in this paper has been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.
The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation
Jenny Preece,Ben Shneiderman +1 more
- 31 Mar 2009
TL;DR: The Reader-to-Leader Framework is offered, aimed at helping researchers, designers, and managers understand what motivates technology-mediated social participation, to improve interface design and social support for their companies, government agencies, and non-governmental organizations.
409
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu,Qiao Tian,Yiitan Yuan,Xubo Liu,Xinhao Mei,Qiuqiang Kong,Yuping Wang,Wenwu Wang,Yuxuan Wang,MarkD . Plumbley +9 more
TL;DR: A framework that utilizes the same learning method for speech, music, and sound effect generation, and introduces a general representation of audio, called "language of audio"(LOA), and performs self-supervised audio generation learning with a latent diffusion model conditioned on LOA.
141
Artificial-Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges
Jiafu Wan,Xiaomin Li,Hong-Ning Dai,Andrew Kusiak,Miguel Martinez-Garcia,Di Li +5 more
- 01 Apr 2021
TL;DR: The experimental results have demonstrated that the AI-assisted CM offers the possibility of higher production flexibility and efficiency, and the state-of-the-art AI technologies, that is, machine learning, multiagent systems, Internet of Things, big data, and cloud-edge computing, are surveyed.
Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions
Shahab Saquib Sohail,Faiza Farhat,Yassine Himeur,Mohammad Nadeem,Dag Øivind Madsen,Yashbir Singh,Shadi Atalla,Wathiq Mansoor +7 more
TL;DR: A comprehensive review of over 100 Scopus-indexed publications on ChatGPT is presented, aiming to provide a taxonomy ofChatGPT research and explore its applications, and identifies potential future directions for ChatG PT research.
111
References
•Proceedings Article
BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling
Lars Maaløe,Marco Fraccaro,Valentin Liévin,Ole Winther +3 more
- 01 Jan 2019
TL;DR: The Bidirectional-Inference Variational Autoencoder (BIVA) as discussed by the authors uses a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path.
•Posted Content
Knowledge Enhanced Contextual Word Representations
Matthew E. Peters,Mark Neumann,Robert L. Logan,Roy Schwartz,Vidur Joshi,Sameer Singh,Noah A. Smith +6 more
TL;DR: After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.
•Posted Content
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang,Ron Weiss,Heiga Zen,Yonghui Wu,Zhifeng Chen,RJ Skerry-Ryan,Ye Jia,Andrew Rosenberg,Bhuvana Ramabhadran +8 more
TL;DR: A multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages and be able to transfer voices across languages, e.g. English and Mandarin.
Factuality Enhanced Language Models for Open-Ended Text Generation
Nayeon Lee,Wei Ping,Peng Xu,Md. Mostofa Ali Patwary,Mohammad Shoeybi,B. Catanzaro +5 more
- 09 Jun 2022
TL;DR: This work measures and improves the factual accuracy of large-scale LMs for open-ended text generation, and proposes a factuality-enhanced training method that uses T OPIC P REFIX for better awareness of facts and sentence completion as the training objective, which can vastly reduce the factual errors.
•Posted Content
Multi-Generator Generative Adversarial Nets
TL;DR: A new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem, and develops theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generator' distributions and the empirical data distribution is minimal, hence effectively avoiding the mode collapse.