Automatically Generating UI Code from Screenshot: A
  Divide-and-Conquer-Based Approach

doi:10.48550/arxiv.2406.16386

Preprint10.48550/arxiv.2406.16386

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

Yuting Wan, +6 more

- 24 Jun 2024

5

TL;DR: Automatically generating UI code from screenshots is a time-consuming process. DCGen is a divide-and-conquer-based approach that effectively mitigates issues in generating UI code by focusing on smaller visual segments.

Abstract: Websites are critical in today's digital world, with over 1.11 billion currently active and approximately 252,000 new sites launched daily. Converting website layout design into functional UI code is a time-consuming yet indispensable step of website development. Manual methods of converting visual designs into functional code present significant challenges, especially for non-experts. To explore automatic design-to-code solutions, we first conduct a motivating study on GPT-4o and identify three types of issues in generating UI code: element omission, element distortion, and element misarrangement. We further reveal that a focus on smaller visual segments can help multimodal large language models (MLLMs) mitigate these failures in the generation process. In this paper, we propose DCGen, a divide-and-conquer-based approach to automate the translation of webpage design to UI code. DCGen starts by dividing screenshots into manageable segments, generating descriptions for each segment, and then reassembling them into complete UI code for the entire screenshot. We conduct extensive testing with a dataset comprised of real-world websites and various MLLMs and demonstrate that DCGen achieves up to a 14% improvement in visual similarity over competing methods. To the best of our knowledge, DCGen is the first segment-aware prompt-based approach for generating UI code directly from screenshots.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Table 5: Performance metrics across language modality (%).

Table 4: GPT-4o high-level performance (%)

Figure 1: Annotation tool. Annotators are provided with an original webpage (left) with a bounding box specifying the visual element and a generated webpage (right).

Table 2: Distribution of common mistakes (one element can have multiple mistakes).

Figure 2: Examples of error cases (bottom).

Citations

Journal Article•10.48550/arxiv.2409.11667

Bridging Design and Development with Automated Declarative UI Code Generation

Ting Zhou, +5 more

- 17 Sep 2024

TL;DR: Researchers propose DeclarUI, an automated approach that leverages computer vision, multimodal large language models, and compiler-driven optimization to generate high-quality declarative UI code from designs, outperforming baselines and state-of-the-art models on React Native, Flutter, and ArkUI frameworks.

...read moreread less

Journal Article•10.1145/3715726

DeclarUI: Bridging Design and Development with Automated Declarative UI Code Generation

Ti Zhou, +5 more

- 19 Jun 2025

Journal Article•10.1145/3711896.3737016

<scp>LaTCoder:</scp> Converting Webpage Design to Code with Layout-as-Thought

Yi Gui, +12 more

- 01 Aug 2025

TL;DR: LaTCoder, a novel design-to-code approach, enhances layout preservation in webpage design using a Chain-of-Thought reasoning-inspired method, achieving significant improvements in automatic metrics and human preference evaluation on complex layouts with multiple backbone MLLMs.

...read moreread less

Journal Article•10.1145/3696630.3728547

LLM-Augmented Ticket Aggregation for Low-cost Mobile OS Defect Resolution

Yongqian Sun, +10 more

- 23 Jun 2025

TL;DR: This study proposes TixFusion, an LLM-augmented framework for low-cost mobile OS defect resolution, using unsupervised clustering and LLM-based information extraction to aggregate duplicate tickets, outperforming existing methods with reduced labor cost.

...read moreread less

Journal Article•10.48550/arxiv.2410.00400

DynEx: Dynamic Code Synthesis with Structured Design Exploration for Accelerated Exploratory Programming

Jenny Ma, +6 more

- 01 Oct 2024

TL;DR: DynEx, an LLM-based method, guides users through a structured Design Matrix for accelerated exploratory programming, enabling design exploration and iterative implementation, resulting in more complex and varied prototypes compared to existing tools.

...read moreread less

References

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

Journal Article•10.48550/arXiv.2305.06500

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

Wenliang Dai, +8 more

- 11 May 2023

- arXiv.org

TL;DR: In this article , the authors conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models and introduce an instruction-aware Query Transformer which extracts informative features tailored to the given instruction.

...read moreread less

994

•Proceedings Article•10.14722/NDSS.2019.23386

Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation

Victor Le Pochat, +4 more

- 04 Jun 2018

- arXiv: Cryptography and Security

TL;DR: In this paper, the authors empirically validate that the ranks of domains in each of the lists are easily altered, in the case of Alexa through as little as a single HTTP request.

...read moreread less

432

Journal Article•10.48550/arXiv.2305.11206

LIMA: Less Is More for Alignment

Chunting Zhou, +13 more

- 18 May 2023

- arXiv.org

TL;DR: This paper trained a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling.

...read moreread less

427

...

Expand

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

Chat with Paper

AI Agents for this Paper

Figures

Citations

Bridging Design and Development with Automated Declarative UI Code Generation

DeclarUI: Bridging Design and Development with Automated Declarative UI Code Generation

<scp>LaTCoder:</scp> Converting Webpage Design to Code with Layout-as-Thought

LLM-Augmented Ticket Aggregation for Low-cost Mobile OS Defect Resolution

DynEx: Dynamic Code Synthesis with Structured Design Exploration for Accelerated Exploratory Programming

References

Bleu: a Method for Automatic Evaluation of Machine Translation

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

PaLM-E: An Embodied Multimodal Language Model

Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation

LIMA: Less Is More for Alignment