Evaluating Code Comment Generation With Summarized API Docs

doi:10.1109/nlbse59153.2023.00019

Journal Article10.1109/nlbse59153.2023.00019

Evaluating Code Comment Generation With Summarized API Docs

Bilel Matmti, +1 more

- 01 May 2023

pp 60-63

1

TL;DR: This paper proposes to evaluate how summarizing the API Docs using an extractive text summarization technique, TextRank, will impact the overall performance of the API2Com, and confirms the inverse correlation between the number of APIs and the model's performance.

Abstract: Code comment generation is the task of generating a high-level natural language description for a given code snippet. API2Com is a comment generation model designed to leverage the Application Programming Interface Documentations (API Docs) as an external knowledge resource. Shahbazi et al. [1] showed that API Docs might help increase the model's performance. However, the model's performance in generating pertinent comments deteriorates due to the lengthy documentation used in the input as the number of APIs used in a method increases. In this paper, we propose to evaluate how summarizing the API Docs using an extractive text summarization technique, TextRank, will impact the overall performance of the API2Com. The results of our experiments using the same Java dataset confirm the inverse correlation between the number of APIs and the model's performance. As the number of APIs increases, the performance metrics tend to deteriorate for both configurations of the model, with or without API Docs summarization using TextRank. Experiments also show the impact of the number of APIs on TextRank algorithm capacity to improve the model per-formance. For example, with 8 APIs, TextRank summarization improved the model BLEU score by 18% on average, but the performance tends to decrease as the number of APIs increases. This demonstrates an open area of research to determine the winning combination in terms of the model configuration and the length of documentation used.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3617946.3617957

Summary of the 2nd Natural Language-based Software Engineering Workshop (NLBSE 2023)

Sebastiano Panichella, +1 more

- 17 Oct 2023

TL;DR: A summary of the 2nd edition of the Natural Language-Based Software Engagement Workshop (NLBSE) is presented, which comprised three full papers, four short/position papers, ve tool competi- tion/demonstration papers, two keynote talks (Automated Bug Management andTrends and Opportunities in the Application of Large Language Models: the Quest for Maximum E ect), fol- lowed by extensive discussion among NLBSE participants.

...read moreread less

1

References

•Proceedings Article

TextRank: Bringing Order into Text

Rada Mihalcea, +1 more

- 01 Jul 2004

TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.

...read moreread less

4.6K

•Proceedings Article•10.18653/V1/2020.FINDINGS-EMNLP.139

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, +10 more

- 19 Feb 2020

TL;DR: CodeBERT as mentioned in this paper is a pre-trained model for natural language code search and code documentation generation with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.

...read moreread less

1.3K

•Proceedings Article•10.18653/V1/P16-1195

Summarizing Source Code using a Neural Attention Model

Srinivasan Iyer, +3 more

- 01 Aug 2016

TL;DR: This paper presents the first completely datadriven approach for generating high level summaries of source code, which uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries.

...read moreread less

918

•Posted Content

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search.

Hamel Husain, +4 more

- 20 Sep 2019

- arXiv: Learning

TL;DR: The methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task are described.

...read moreread less

799

Journal Article•10.1016/J.ESWA.2020.113679

Automatic text summarization: A comprehensive survey

Wafaa S. El-Kassas, +4 more

- 01 Mar 2021

- Expert Systems With Applications

TL;DR: This research provides a comprehensive survey for the researchers by presenting the different aspects of ATS: approaches, methods, building blocks, techniques, datasets, evaluation methods, and future research directions.

...read moreread less

722

...

Expand