Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model

doi:10.1109/ase56229.2023.00184

Proceedings Article10.1109/ase56229.2023.00184

Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model

Abdulkarim Malkadi, +2 more

- 11 Sep 2023

pp 1492-1504

1

TL;DR: An empirical evaluation on both screenshots and screencast frames shows that CodeT5-OCRfix outperforms baseline code extraction models and is also more time-efficient, improving the state-of-the-art in code extraction techniques from screencasts and images.

Abstract: Accurate automatic code extraction from tutorial videos is crucial for software developers seeking to reuse the code contained in these videos. Current methods using optical character recognition (OCR) often yield inaccurate results due to code complexity and variations in screencast formats. To address this issue, we introduce CodeT5-OCRfix, an approach that leverages the pre-trained code-aware large language model CodeT5 to enhance code extraction accuracy by post-processing OCRed code. We first collect a large and diverse dataset of source code screenshots captured from more than 10K Java projects from GitHub. We then apply the most widely used OCR engine for the task of code extraction from videos, Tesseract, on these screenshots and collect the OCRed code along with the ground truth code extracted from the Java files. We built a training dataset of more than 585K pairs of OCRed and ground truth code pairs, which we then used to fine-tune CodeT5, obtaining our model CodeT5-OCRfix. An empirical evaluation on both screenshots and screencast frames shows that CodeT5-OCRfix outperforms baseline code extraction models and is also more time-efficient. Our approach therefore improves the state-of-the-art in code extraction techniques from screencasts and images.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.3390/math12071036

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

Mohammad D. Alahmadi, +1 more

- 30 Mar 2024

- Mathematics

TL;DR: The efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames are investigated and significant improvements in OCR accuracy with the use of SR are revealed.

...read moreread less

2

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Preprint•10.48550/arxiv.1706.03762

Attention Is All You Need

Ashish Vaswani, +7 more

- 01 Jan 2017

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

51.8K

•Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 23 Oct 2019

- arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

12.9K