- Notifications
You must be signed in to change notification settings - Fork147
Python wrapper for the arXiv API
License
lukasschwab/arxiv.py
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Python wrapper forthe arXiv API.
arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.
$ pip install arxiv
In your Python script, include the line
importarxiv
importarxiv# Construct the default API client.client=arxiv.Client()# Search for the 10 most recent articles matching the keyword "quantum."search=arxiv.Search(query="quantum",max_results=10,sort_by=arxiv.SortCriterion.SubmittedDate)results=client.results(search)# `results` is a generator; you can iterate over its elements one by one...forrinclient.results(search):print(r.title)# ...or exhaust it into a list. Careful: this is slow for large results sets.all_results=list(results)print([r.titleforrinall_results])# For advanced query syntax documentation, see the arXiv API User Manual:# https://arxiv.org/help/api/user-manual#query_detailssearch=arxiv.Search(query="au:del_maestro AND ti:checkerboard")first_result=next(client.results(search))print(first_result)# Search for the paper with ID "1605.08386v1"search_by_id=arxiv.Search(id_list=["1605.08386v1"])# Reuse client to fetch the paper, then print its title.first_result=next(client.results(search_by_id))print(first_result.title)
importarxivbig_slow_client=arxiv.Client(page_size=1000,delay_seconds=10.0,num_retries=5)# Prints 1000 titles before needing to make another request.forresultinbig_slow_client.results(arxiv.Search(query="quantum")):print(result.title)
To inspect this package's network behavior and API logic, configure aDEBUG-level logger.
>>>import logging, arxiv>>> logging.basicConfig(level=logging.DEBUG)>>> client= arxiv.Client()>>> paper=next(client.results(arxiv.Search(id_list=["1605.08386v1"])))INFO:arxiv.arxiv:Requesting 100 results at offset 0INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979
AClient specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.
Clients configurations specify pagination and retry logic.Reusing a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.
ASearch specifies a search of arXiv's database. UseClient.results to get a generator yieldingResults.
TheResult objects yielded byClient.results include metadata about each paper and helper methods for downloading their content.
The meaning of the underlying raw data is documented in thearXiv API User Manual: Details of Atom Results Returned.
Result also exposes helper methods for downloading papers:Result.download_pdf andResult.download_source.
About
Python wrapper for the arXiv API
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.