Preprint10.48550/arxiv.2406.16386
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Yuting Wan,Chaozheng Wang,Yi Dong,Wenxuan Wang,Shuqing Li,Yintong Huo,Michael R. Lyu +6 more
- 24 Jun 2024
TL;DR: Automatically generating UI code from screenshots is a time-consuming process. DCGen is a divide-and-conquer-based approach that effectively mitigates issues in generating UI code by focusing on smaller visual segments.
read more
Abstract: Websites are critical in today's digital world, with over 1.11 billion currently active and approximately 252,000 new sites launched daily. Converting website layout design into functional UI code is a time-consuming yet indispensable step of website development. Manual methods of converting visual designs into functional code present significant challenges, especially for non-experts. To explore automatic design-to-code solutions, we first conduct a motivating study on GPT-4o and identify three types of issues in generating UI code: element omission, element distortion, and element misarrangement. We further reveal that a focus on smaller visual segments can help multimodal large language models (MLLMs) mitigate these failures in the generation process. In this paper, we propose DCGen, a divide-and-conquer-based approach to automate the translation of webpage design to UI code. DCGen starts by dividing screenshots into manageable segments, generating descriptions for each segment, and then reassembling them into complete UI code for the entire screenshot. We conduct extensive testing with a dataset comprised of real-world websites and various MLLMs and demonstrate that DCGen achieves up to a 14% improvement in visual similarity over competing methods. To the best of our knowledge, DCGen is the first segment-aware prompt-based approach for generating UI code directly from screenshots.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Table 5: Performance metrics across language modality (%). 
Table 4: GPT-4o high-level performance (%) 
Figure 1: Annotation tool. Annotators are provided with an original webpage (left) with a bounding box specifying the visual element and a generated webpage (right). 
Table 2: Distribution of common mistakes (one element can have multiple mistakes). 
Figure 2: Examples of error cases (bottom). 
Figure 4: The framework of DCGen.
Citations
Bridging Design and Development with Automated Declarative UI Code Generation
Ting Zhou,Yanjie Zhao,Xinyi Hou,Xiaoyu Sun,Kai Chen,Sheng Wang +5 more
- 17 Sep 2024
TL;DR: Researchers propose DeclarUI, an automated approach that leverages computer vision, multimodal large language models, and compiler-driven optimization to generate high-quality declarative UI code from designs, outperforming baselines and state-of-the-art models on React Native, Flutter, and ArkUI frameworks.
DeclarUI: Bridging Design and Development with Automated Declarative UI Code Generation
Ti Zhou,Yanjie Zhao,Xinyi Hou,Xiaoyu Sun,Kai Chen,Haoyu Wang +5 more
- 19 Jun 2025
<scp>LaTCoder:</scp> Converting Webpage Design to Code with Layout-as-Thought
Yi Gui,Zhen Li,Zhongyi Zhang,Gaozhou Wang,Tamarit Lv,Guohong Jiang,Yi Liu,Dongping Chen,Yao Wan,Hongyu Zhang,Wenbin Jiang,Xuanhua Shi,Hai Jin +12 more
- 01 Aug 2025
TL;DR: LaTCoder, a novel design-to-code approach, enhances layout preservation in webpage design using a Chain-of-Thought reasoning-inspired method, achieving significant improvements in automatic metrics and human preference evaluation on complex layouts with multiple backbone MLLMs.
LLM-Augmented Ticket Aggregation for Low-cost Mobile OS Defect Resolution
Yongqian Sun,Bowen Hao,X. Wang,Chenyu Zhao,Y. Zhao,Binpeng Shi,Shenglin Zhang,Ge Qiao,Wenhu Li,Hua Wei,Dan Pei +10 more
- 23 Jun 2025
TL;DR: This study proposes TixFusion, an LLM-augmented framework for low-cost mobile OS defect resolution, using unsupervised clustering and LLM-based information extraction to aggregate duplicate tickets, outperforming existing methods with reduced labor cost.
DynEx: Dynamic Code Synthesis with Structured Design Exploration for Accelerated Exploratory Programming
Jenny Ma,Karthik Sreedhar,Vivian Liu,Sitong Wang,Pedro Alejandro Perez,Riya Sahni,Lydia B. Chilton +6 more
- 01 Oct 2024
TL;DR: DynEx, an LLM-based method, guides users through a structured Design Matrix for accelerated exploratory programming, enabling design exploration and iterative implementation, resulting in more complex and varied prototypes compared to existing tools.
References
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai,Junnan Li,Dongxu Li,Anthony Meng Huat Tiong,Junqi Zhao,Weisheng Wang,Bo Li,Pascale Fung,Steven Hoi +8 more
TL;DR: In this article , the authors conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models and introduce an instruction-aware Query Transformer which extracts informative features tailored to the given instruction.
PaLM-E: An Embodied Multimodal Language Model
Danny Driess,Fei Xia,Mehdi Sajjadi,Corey Lynch,Aakanksha Chowdhery,Brian Ichter,Ayzaan Wahid,Jonathan James Richard Tompson,Quan Vuong,Tianhe Yu,Wenrong Huang,Yevgen Chebotar,Pierre Sermanet,Daniel Duckworth,Sergey Levine,Vincent Vanhoucke,Karol Hausman,Marc Toussaint,Klaus Greff,Andy Zeng,Igor Mordatch,Peter R. Florence +21 more
TL;DR: In this paper , an embodied language model is proposed to directly incorporate real-world continuous sensor modalities into language models and establish the link between words and percepts for embodied reasoning tasks, including sequential robotic manipulation planning, visual question answering, and captioning.
Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation
TL;DR: In this paper, the authors empirically validate that the ranks of domains in each of the lists are easily altered, in the case of Alexa through as little as a single HTTP request.
432
LIMA: Less Is More for Alignment
Chunting Zhou,Pengfei Liu,Srinivasan Iyer,Jiao Sun,Yuning Mao,Xuezhe Ma,Avia Efrat,Ping Yu,Lili Yu,Susan Zhang,Gargi Ghosh,M. Lewis,Luke Zettlemoyer,Omer Levy +13 more
TL;DR: This paper trained a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling.
427