Movatterモバイル変換


[0]ホーム

URL:


Skip to main
Published February 7, 2019 | Version v1
Dataset Open

Source Code Embeddings

  • 1. Athens University of Economics and Business

Description

A set of six pretrained fastText models for semantic representations of source code. 

Each of the models has been trained on high-quality GitHub repositories where the primary language is one of Java, Python, C++, C#, C, PHP. For collecting training data 13.144 repositories were cloned, 2.402.790.348 lines of code were read out of 944,467,560 files and preprocessed, to finally produce a total of 944.467.560 tokens of clean training data. 

For further details refer to the following paper: 

Efstathiou, V.,  Spinellis, D., 2019. "Semantic Source Code Models Using Identifier Embeddings". In16th International Conference on Mining Software Repositories: Data Showcase Track. MSR'19. 

Files

Files (13.2 GB)

NameSize Download all
md5:0a1797b09aa8020deaea4096e2dad518
3.1 GBDownload
md5:0331aa4fad384854552f79b9f7d382dc
2.6 GBDownload
md5:68de3f02881ff244033ee7a4fc4a7135
1.6 GBDownload
md5:f6701447ee02802c8dcc35a76c40d661
2.8 GBDownload
md5:2e691933bd4b7a5114cf09a06c91c1ed
1.4 GBDownload
md5:85127ad0f34bfeb1a17edf2cead912e8
1.6 GBDownload

Additional details

Funding

European Commission
CROSSMINER – Developer-Centric Knowledge Mining from Large Open-Source Software Repositories 732223
791
Views
456
Downloads

Versions

External resources

Indexed in

Communities

Details

DOI
10.5281/zenodo.2558730
DOI Badge

DOI

10.5281/zenodo.2558730

Markdown

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg)](https://doi.org/10.5281/zenodo.2558730)

reStructuredText

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg  :target: https://doi.org/10.5281/zenodo.2558730

HTML

<a href="https://doi.org/10.5281/zenodo.2558730"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg" alt="DOI"></a>

Image URL

https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg

Target URL

https://doi.org/10.5281/zenodo.2558730
Resource type
Dataset
Publisher
Zenodo

Rights

  • The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.Read more

Citation

Export

Technical metadata

Created
February 7, 2019
Modified
February 2, 2021

This site uses cookies. Find out more onhow we use cookies


[8]ページ先頭

©2009-2025 Movatter.jp