Journal Article10.1109/nlbse59153.2023.00019
Evaluating Code Comment Generation With Summarized API Docs
Bilel Matmti,Fatemeh Amini Fard +1 more
- 01 May 2023
pp 60-63
1
TL;DR: This paper proposes to evaluate how summarizing the API Docs using an extractive text summarization technique, TextRank, will impact the overall performance of the API2Com, and confirms the inverse correlation between the number of APIs and the model's performance.
read more
Abstract: Code comment generation is the task of generating a high-level natural language description for a given code snippet. API2Com is a comment generation model designed to leverage the Application Programming Interface Documentations (API Docs) as an external knowledge resource. Shahbazi et al. [1] showed that API Docs might help increase the model's performance. However, the model's performance in generating pertinent comments deteriorates due to the lengthy documentation used in the input as the number of APIs used in a method increases. In this paper, we propose to evaluate how summarizing the API Docs using an extractive text summarization technique, TextRank, will impact the overall performance of the API2Com. The results of our experiments using the same Java dataset confirm the inverse correlation between the number of APIs and the model's performance. As the number of APIs increases, the performance metrics tend to deteriorate for both configurations of the model, with or without API Docs summarization using TextRank. Experiments also show the impact of the number of APIs on TextRank algorithm capacity to improve the model per-formance. For example, with 8 APIs, TextRank summarization improved the model BLEU score by 18% on average, but the performance tends to decrease as the number of APIs increases. This demonstrates an open area of research to determine the winning combination in terms of the model configuration and the length of documentation used.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Summary of the 2nd Natural Language-based Software Engineering Workshop (NLBSE 2023)
Sebastiano Panichella,Andrea Di Sorbo +1 more
- 17 Oct 2023
TL;DR: A summary of the 2nd edition of the Natural Language-Based Software Engagement Workshop (NLBSE) is presented, which comprised three full papers, four short/position papers, ve tool competi- tion/demonstration papers, two keynote talks (Automated Bug Management andTrends and Opportunities in the Application of Large Language Models: the Quest for Maximum E ect), fol- lowed by extensive discussion among NLBSE participants.
1
References
•Proceedings Article
TextRank: Bringing Order into Text
Rada Mihalcea,Paul Tarau +1 more
- 01 Jul 2004
TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Zhangyin Feng,Daya Guo,Duyu Tang,Nan Duan,Xiaocheng Feng,Ming Gong,Linjun Shou,Bing Qin,Ting Liu,Daxin Jiang,Ming Zhou +10 more
- 19 Feb 2020
TL;DR: CodeBERT as mentioned in this paper is a pre-trained model for natural language code search and code documentation generation with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.
Summarizing Source Code using a Neural Attention Model
Srinivasan Iyer,Ioannis Konstas,Alvin Cheung,Luke Zettlemoyer +3 more
- 01 Aug 2016
TL;DR: This paper presents the first completely datadriven approach for generating high level summaries of source code, which uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries.
•Posted Content
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search.
TL;DR: The methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task are described.
799
Automatic text summarization: A comprehensive survey
TL;DR: This research provides a comprehensive survey for the researchers by presenting the different aspects of ATS: approaches, methods, building blocks, techniques, datasets, evaluation methods, and future research directions.
722