PyTorrent: A Python Library Corpus for Large-scale Language Models.

Open AccessPosted Content

PyTorrent: A Python Library Corpus for Large-scale Language Models.

- 04 Oct 2021

3

TL;DR: PyTorrent as mentioned in this paper is a large-scale collection of both semantic and natural language resources to leverage active Software Engineering research areas such as code reuse and code comprehensibility, and it contains 218,814 Python package libraries from PyPI and Anaconda environments.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3630010

How Important Are Good Method Names in Neural Code Generation? A Model Robustness Perspective

Guang Yang, +5 more

- 14 Mar 2024

- ACM Transactions on Software Engineering...

TL;DR: The importance of good method names in neural code generation is demonstrated. A novel approach, RADAR, is proposed to enhance the robustness of PCGMs against adversarial method name attacks.

...read moreread less

6

•Posted Content

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models.

Mehdi Bahrami, +6 more

- 16 Oct 2021

- arXiv: Software Engineering

TL;DR: Wang et al. as discussed by the authors introduced augmented code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance.

...read moreread less

1

•Proceedings Article•10.1109/saner56733.2023.00067

Constructing Temporal Networks of OSS Programming Language Ecosystems

01 Mar 2023

TL;DR: In this paper , the authors and projects of OSS projects are represented as nodes in a collaboration graph, which enables various forms of social network analysis on the scale of language ecosystems, and they capture the information on the ecosystems' evolution by slicing each network into 30 historical snapshots.

...read moreread less

References

Proceedings Article•10.18653/V1/N19-1423

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

24.6K

UCI Machine Learning Repository

A. Asuncion

- 01 Jan 2007

24.3K

•Proceedings Article

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 01 Jan 2019

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

10.3K

•Posted Content

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, +3 more

- 02 Oct 2019

- arXiv: Computation and Language

TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.

...read moreread less

7.3K

Journal Article•10.1109/TSE.1976.233837

A Complexity Measure

Thomas J. McCabe

- 01 Jul 1976

- IEEE Transactions on Software Engineerin...

TL;DR: Several properties of the graph-theoretic complexity are proved which show, for example, that complexity is independent of physical size and complexity depends only on the decision structure of a program.

...read moreread less

6K

...

Expand

PyTorrent: A Python Library Corpus for Large-scale Language Models.

Chat with Paper

AI Agents for this Paper

Citations

How Important Are Good Method Names in Neural Code Generation? A Model Robustness Perspective

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models.

Constructing Temporal Networks of OSS Programming Language Ecosystems

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

UCI Machine Learning Repository

PyTorch: An Imperative Style, High-Performance Deep Learning Library

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

A Complexity Measure

Related Papers (5)

Enabling Empirical Research: A Corpus of Large-Scale Python Systems

What Makes an Open Source Code Popular on Git Hub

LensKit for Python: Next-Generation Software for Recommender System Experiments

Collective Intelligence for Smarter API Recommendations in Python

Cross-project code clones in GitHub