RePEc: getting the metadata
RePEc is highly decentralized and pulled together from many sources. It can thus be quite difficult to get all you need even though it is freely available. This document should guide you to get what you need. Some data is available on request only, in particular when privacy concerns come into play. Email addresses are released under no circumstances.
Note that the archives participating in RePEc as well as the people volunteering with RePEc do so with the understanding that the collected data will be put to good use. This does not include commercial use. If you want to use RePEc data for commercial use, please firstcontact RePEc. Typically, we would require substantial contributions of data to RePEc for a commercial use to have a chance of being tolerated.
We want to discourage you strongly to scrape the data from the websites. This put unnecessary strain on our servers, and we have repeatedly noticed misconfigured scrping scripts running amok. And you very unlikely to get complete data that way.
Principles
The basic metadata is provided by publishers. Every RePEc services gets the metadata directly from the publishers and massages it in various ways for the users. Some services then provide additional data and make it available.
Anybody interested in handling RePEc metadata should familiarize oneself with theGuilford Protocol, which defines how the datafiles can be found and are structures, and theReDIF format, which defines the metadata fields and conventions.
Publisher data
Each publishers holds its RePEc metadata on its web or anonymous ftp server. The addresses are listed in their archive templates. All those templates are listed at the
RePEc:all archive. This is the standard way to acquire the core RePEc data: get the metadata from each of the publisher archives. The
remi software is very useful in acquiring this metadata from all the archives.
ReDIF-perl is useful to interpret the data.
One can also access the all the data in one place. There is, however, no guarantee that this is accurate or up-to-date. Only the publisher archives can guarantee that:
- ReDIF format
- AMF format
- OAI/PMH (sometimes flaky)
- Rsync
Person data
Basic metadata about people registered through the
RePEc Author Service is available through the RePEc:per archive. Note that it does not contain citation data and needs to have the full contents of RePEc metadata to be interpretable, as it contains RePEc handles throughout.
Any additional person data is subject to privacy requirements.
Citation data
The citation data from the
CitEc project can be obtained in two ways:
- AMF format
- plain lists of handles
Ranking data
The various impact factors for journals and series are available for
all years and
last ten years. Download and abstract views numbers can be found at
LogEc. Instructions for programmatic access to this data is
here. The big files with historic data for the latter for each month are
here. Additional ranking data is available at
IDEAS, including
historic data. Note that due to privacy concerns, data beyond what can be "screenscraped" is only available on request, and generally only in anonymized form for research purposes. We will work with national bodies if they want to use RePEc rankings for evaluation purposes.
Other data
There are two sources if you want to know which paper has been disseminated through which
NEP report:
the first and
the second.
The data output from theCollEc project ishere.
We link different versions of the same work to each other. The database with those links is foundhere.
Handles are supposed to be permanent, but sometimes series or journals move to a different archives. To translate the handles, you want to usethis.
TheEconPapers syntax checker reports onsyntax errors and warnings for ReDIF templates. It does also URL checks for all links in the templates, look for results by archive in the files starting with "url_" inthis directory.
There is more compiled data, but it may not be available because it has never been requested.
API
An API is now
available. This tool is meant to be a substitute if the above do not work. For example, an API is not good to download all the data, but rather specific slices at regular intervals or repeated quick calls for small bits of data. An API limited to citation and reference data is also available through
CitEc.