Movatterモバイル変換


[0]ホーム

URL:


Skip to main
Published March 5, 2019 | Version 1.0.0
Dataset Open

The Software Heritage Graph Dataset

  • 1. Inria, France
  • 2. Athens University of Economics and Business, Greece
  • 3. University Paris Diderot and Inria, France

Description

Software Heritage is the largest existing public archive of software source
code and accompanying development history: it currently spans more than five
billion unique source code files and one billion unique commits, coming from
more than 80 million software projects.

This is the Software Heritage graph dataset: a fully-deduplicated
Merkle DAG representation of the Software Heritage archive. The dataset links
together file content identifiers, source code directories, Version Control
System (VCS) commits tracking evolution over time, up to the full states of VCS
repositories as observed by Software Heritage during periodic crawls. The
dataset’s contents come from major development forges (including GitHub and
GitLab), FOSS distributions (e.g., Debian), and language-specific package
managers (e.g., PyPI).  Crawling information is also included, providing
timestamps about when and where all archived source code artifacts have been
observed in the wild.

The Software Heritage graph dataset is available in multiple formats, including
downloadable CSV dumps and Apache Parquet files for local use, as well as a
public instance on Amazon Athena interactive query service for ready-to-use
powerful analytical processing.

By accessing the dataset, you agree with the Software Heritage Ethical Charter
for using the archive data
, and the terms of use for bulk access.

If you use this dataset for research purposes, please cite the following paper:

  • Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli. 
    The Software Heritage Graph Dataset: Public software development under one roof
    In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Co-located with ICSE 2019
    preprintbibtex

You can also refer to the above paper for more information the dataset and sample queries.

Files

athena.zip

Files (2.5 TB)

NameSize Download all
md5:d1f77570664ab7cce3baba7e4fe1f706
2.1 kBPreviewDownload
md5:b12c4f438ddfd219ab5958150442a4b8
230.7 GBDownload
md5:359c0e800f17bc76b7b031cff9c96b7a
516.4 GBDownload
md5:b1d77d90920c3b33c3a7508eba985b47
108.7 GBDownload
md5:4ebdd81c88f65c5114508749f6d4b261
186.0 GBDownload
md5:5d60cb3a1107a7a7f40c1483101349c3
417.5 MBDownload
md5:dc1f47d0dd34aceb630a25efcb555876
2.3 GBDownload
md5:46a517c78774291aac94d824b0a42cef
3.1 GBDownload
md5:1ae2608cb289849d9513b53e31dcafeb
96.8 MBDownload
md5:44e59add29889f7c56e3f89d820d4f59
1.5 GBDownload
md5:ea5eabc59ad881d0419657b8a817fbdf
107.4 GBDownload
md5:76e8c0721b14cfa1ff552e501a92cff9
50.9 GBDownload
md5:d857028e82c258085153c1e79f485236
4.3 MBDownload
md5:b3119f604d4c9126cddf5552427f0bc4
1.6 GBDownload
md5:739488ef204729e99750ad65b716f635
5.2 GBDownload
md5:458ebf45f39dca7692b4972833a78d90
4.2 GBDownload
md5:6fa71cd1515bc7be0d2a23a7563633ea
3.5 kBPreviewDownload
md5:685c6f77d4e7ec296a64fecc52fb4168
290.8 GBDownload
md5:03187f06b0d1c57eb6a90bfaea77ac9b
490.9 GBDownload
md5:941bc475e88a1009245fb96a87ad212e
120.8 GBDownload
md5:a6acd0be1536ce6d727ab74bd12f02ee
202.6 GBDownload
md5:12fdf82e1c3451e81dd9137c6a728b00
405.4 MBDownload
md5:fc1968d3ec5c14d541749cdc22cfa898
1.5 GBDownload
md5:0683cba1bf83c57b8d45ff09b564442a
3.3 GBDownload
md5:f7c92d038c0990428fc8d6ff2e08a518
53.3 MBDownload
md5:074a622e9bff1fe0c384c3fc70677b07
1.5 GBDownload
md5:c541834922e8efb23801fdc87ecf6338
113.4 GBDownload
md5:ec5fce2327d9318d02d35716d9c6f097
40.2 GBDownload
md5:c499fcad3804dad1713211bd55fd9d03
4.1 MBDownload
md5:e395b3475c6e6830962399ef2d9c1fac
1.7 GBDownload
md5:6aaa27a7bf1d42f03f72d21c3f516e7a
5.9 GBDownload
md5:e93b3587f65417da717dc1d90d41db42
3.3 GBDownload
md5:e641430767869bef5bfa6c606cf4c3f3
189 BytesDownload
md5:0e22533457eeff25035fa6f286d89fdc
3.2 kBPreviewDownload
md5:4dcab9a3a848fbf6fe8b5f51e45e3a63
13.1 MBDownload
3K
Views
3K
Downloads

Versions

External resources

Indexed in

Communities

Details

DOI
10.5281/zenodo.2583978
DOI Badge

DOI

10.5281/zenodo.2583978

Markdown

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2583978.svg)](https://doi.org/10.5281/zenodo.2583978)

reStructuredText

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.2583978.svg  :target: https://doi.org/10.5281/zenodo.2583978

HTML

<a href="https://doi.org/10.5281/zenodo.2583978"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.2583978.svg" alt="DOI"></a>

Image URL

https://zenodo.org/badge/DOI/10.5281/zenodo.2583978.svg

Target URL

https://doi.org/10.5281/zenodo.2583978
Resource type
Dataset
Publisher
Zenodo
Conference
Mining Software Repositories (MSR), Montreal, QC, Canada, 26-27 May 2019

Rights

  • The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.Read more

Citation

Export

Technical metadata

Created
March 12, 2019
Modified
January 24, 2020

This site uses cookies. Find out more onhow we use cookies


[8]ページ先頭

©2009-2025 Movatter.jp