Source Code Embeddings
- 1. Athens University of Economics and Business
Description
A set of six pretrained fastText models for semantic representations of source code.
Each of the models has been trained on high-quality GitHub repositories where the primary language is one of Java, Python, C++, C#, C, PHP. For collecting training data 13.144 repositories were cloned, 2.402.790.348 lines of code were read out of 944,467,560 files and preprocessed, to finally produce a total of 944.467.560 tokens of clean training data.
For further details refer to the following paper:
Efstathiou, V., Spinellis, D., 2019. "Semantic Source Code Models Using Identifier Embeddings". In16th International Conference on Mining Software Repositories: Data Showcase Track. MSR'19.
Files
Files (13.2 GB)
Name | Size | Download all |
---|---|---|
md5:0a1797b09aa8020deaea4096e2dad518 | 3.1 GB | Download |
md5:0331aa4fad384854552f79b9f7d382dc | 2.6 GB | Download |
md5:68de3f02881ff244033ee7a4fc4a7135 | 1.6 GB | Download |
md5:f6701447ee02802c8dcc35a76c40d661 | 2.8 GB | Download |
md5:2e691933bd4b7a5114cf09a06c91c1ed | 1.4 GB | Download |
md5:85127ad0f34bfeb1a17edf2cead912e8 | 1.6 GB | Download |
Additional details
All versions | This version | |
---|---|---|
Views Total views | 791 | 789 |
Downloads Total downloads | 456 | 456 |
Data volume Total data volume | 3.3 TB | 3.3 TB |
Versions
External resources
Indexed in
Communities
Keywords and subjects
Details
- DOI
- DOI Badge
DOI
10.5281/zenodo.2558730
Markdown
[](https://doi.org/10.5281/zenodo.2558730)
reStructuredText
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg :target: https://doi.org/10.5281/zenodo.2558730
HTML
<a href="https://doi.org/10.5281/zenodo.2558730"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg" alt="DOI"></a>
Image URL
https://zenodo.org/badge/DOI/10.5281/zenodo.2558730.svg
Target URL
https://doi.org/10.5281/zenodo.2558730
- Resource type
- Dataset
- Publisher
- Zenodo
Rights
Creative Commons Attribution 4.0 International
The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.Read more
Citation
Export
Technical metadata
- Created
- February 7, 2019
- Modified
- February 2, 2021