Journal Article10.48550/arXiv.2303.04226
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
TL;DR: A comprehensive review on the history of generative models, and basic components, recent advances in Artificial Intelligence Generated Content (AIGC) from unimodal interaction and multimodal interactions is provided in this paper .
read more
Abstract: Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC), which involves the creation of digital content, such as images, music, and natural language, through AI models. The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace. AIGC is achieved by extracting and understanding intent information from instructions provided by human, and generating the content according to its knowledge and the intent information. In recent years, large-scale models have become increasingly important in AIGC as they provide better intent extraction and thus, improved generation results. With the growth of data and the size of the models, the distribution that the model can learn becomes more comprehensive and closer to reality, leading to more realistic and high-quality content generation. This survey provides a comprehensive review on the history of generative models, and basic components, recent advances in AIGC from unimodal interaction and multimodal interaction. From the perspective of unimodality, we introduce the generation tasks and relative models of text and image. From the perspective of multimodality, we introduce the cross-application between the modalities mentioned above. Finally, we discuss the existing open problems and future challenges in AIGC.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 4. Categories of pre-trained LLMs. Black line represents information flow in bidirectional models, while gray line representas left-to-right information flow. Encoder models, e.g. BERT, are trained with context-aware objectives. Decoder models, e.g. GPT, are trained with autoregressive objectives. Encoder-decoder models, e.g. T5 and BART, combines the two, which use context-aware structures as encoders and left-to-right structures as decoders. 
Fig. 15. General procedure of prompt learning for emotion detection examples. First, the user need to construct a prompt that fits the problem well, the user can also use in-context learning and chain-of-thought (CoT) to help improve the performance. Then, an LLM will generate suitable words for the blank space in the prompt. Finally, a verbalizer will project the generated word to a specific classification category. 
Fig. 10. Two types of to-language decoder models: jointly-trained models and frozen models. Jointly-trained models are normally trained end-to-end, while frozen models normally keep the language decoder frozen and only train the image encoder. 
Fig. 1. Examples of AIGC in image generation. Text instructions are given to OpenAI DALL-E-2 model, and it generates two images according to the instructions. ![Fig. 5. Statistics of model size [52] and training speed 1across different models and computing devices.](/figures/figure5-1-1zl6zqt3gw5f.png)
Fig. 5. Statistics of model size [52] and training speed 1across different models and computing devices. 
Fig. 14. A relation graph of a current research areas, applications and related companies, where dark blue circles represent research areas, light blue circles represent applications and green circles represents companies.
Citations
A Survey of Large Language Models
Wayne Xin Zhao,Kun Zhou,Junyi Li,Tianyi Tang,Xiaolei Wang,Yupeng Hou,Yingqian Min,Beichen Zhang,Junjie Zhang,Zican Dong,Yifan Du,Chen Yang,Yushuo Chen,Zhongyong Chen,Jinhao Jiang,Ruiyang Ren,Yifan Li,Xinyu Tang,Zikang Liu,Peiyu Liu,Jian-Yun Nie,Ji-Rong Wen +21 more
TL;DR: Recently, a large language model (LLM) as mentioned in this paper has been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.
The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation
Jenny Preece,Ben Shneiderman +1 more
- 31 Mar 2009
TL;DR: The Reader-to-Leader Framework is offered, aimed at helping researchers, designers, and managers understand what motivates technology-mediated social participation, to improve interface design and social support for their companies, government agencies, and non-governmental organizations.
409
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu,Qiao Tian,Yiitan Yuan,Xubo Liu,Xinhao Mei,Qiuqiang Kong,Yuping Wang,Wenwu Wang,Yuxuan Wang,MarkD . Plumbley +9 more
TL;DR: A framework that utilizes the same learning method for speech, music, and sound effect generation, and introduces a general representation of audio, called "language of audio"(LOA), and performs self-supervised audio generation learning with a latent diffusion model conditioned on LOA.
141
Artificial-Intelligence-Driven Customized Manufacturing Factory: Key Technologies, Applications, and Challenges
Jiafu Wan,Xiaomin Li,Hong-Ning Dai,Andrew Kusiak,Miguel Martinez-Garcia,Di Li +5 more
- 01 Apr 2021
TL;DR: The experimental results have demonstrated that the AI-assisted CM offers the possibility of higher production flexibility and efficiency, and the state-of-the-art AI technologies, that is, machine learning, multiagent systems, Internet of Things, big data, and cloud-edge computing, are surveyed.
Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions
Shahab Saquib Sohail,Faiza Farhat,Yassine Himeur,Mohammad Nadeem,Dag Øivind Madsen,Yashbir Singh,Shadi Atalla,Wathiq Mansoor +7 more
TL;DR: A comprehensive review of over 100 Scopus-indexed publications on ChatGPT is presented, aiming to provide a taxonomy ofChatGPT research and explore its applications, and identifies potential future directions for ChatG PT research.
111
References
Journal Article
Truncated Diffusion Probabilistic Models
TL;DR: Experimental results show the truncated diffusion probabilistic models provide consistent improvements over the non-truncated ones in terms of the generation performance and the number of required inverse di-usion steps.
TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data
Hengtong Zhang,Yaliang Li,Fenglong Ma,Jing Gao,Lu Su +4 more
- 19 Jul 2018
TL;DR: A novel truth discovery method, named "TextTruth", which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers.
32
How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models
Giulio Franzese,Simone Rossi,Lixuan Yang,Alessandro Finamore,Dario Rossi,Maurizio Filippone,Pietro Michiardi +6 more
TL;DR: This work shows how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process, and suggests a new method to improve quality and efficiency of both training and sampling, by adopting smaller diffusion times.
Artificial Intelligence-Generated and Human Expert-Designed Vocabulary Tests: A Comparative Study
Luo Yunjiu,Wei Wei,Ying Zheng +2 more
TL;DR: The results of think-aloud data indicated that the AI-generated items and expert-designed items might assess different constructs, in which the former elicited test takers’ bottom-up test-taking strategies while the latter seemed more likely to trigger test taker’ rote memorization ability.
21
•Posted Content
Semantic Graph Parsing with Recurrent Neural Network DAG Grammars
TL;DR: Recurrent neural network DAG grammars is presented, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction.
20