Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit830ae02

Browse files
authored
Fix formatting, add links.
1 parent838c723 commit830ae02

File tree

1 file changed

+21
-8
lines changed

1 file changed

+21
-8
lines changed

‎api_tips.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Our advice is split into three sections:
2323

2424
###Pick the right service level.
2525

26-
Consider using our “Polite” or “Plus” versions of the REST API.
26+
Consider using our “[Polite](https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service)” or “[Plus](https://www.crossref.org/services/metadata-retrieval/metadata-plus/)” versions of the REST API.
2727

2828
What does this mean?
2929

@@ -43,7 +43,6 @@ Note that, in asking you to self-identify, we are not asking you to completely g
4343

4444
And finally, if you are using our REST API for a production service that requires high predictability-*you should really consider using our paid-for “Plus” service.* This service gets you an authentication token which, in turn, directs your request as a reserved pool of servers that are extremely predictable.
4545

46-
4746
###Understand the performance characteristics of REST API queries.
4847

4948
If you are using the API for simple reference matching, and are not doing any post validation (e.g. your own ranking of the returned results), then just ask for the first two results (`rows=2`). This allows you to identify the best result and ignore any where there is a tie in score on the first two results (e.g. an inconclusive match). If you*are* analyzing and ranking the results yourself, then you can probably get away with just requesting five results (`rows=5`). Anything beyond that is very unlikely to be a match. In either case- restricting the number of rows returned will be more efficient for you and for the API.
@@ -65,33 +64,43 @@ http://api.crossref.org/works?query="Toward a Unified Theory of High-Energy Meta
6564
Using the plain`query` parameter will search the entire record- including funder and other non bibliographic elements. This means that it will also match any record that includes the query text in these other elements- resulting in many, many false positives and distorted scores.
6665

6766
If you are trying to match references- the simplest approach is the best. Just use the`query.bibliographic` parameter. It restricts the matching to the bibliographic metadata and the default sort order and scoring mechanism will reliably list the best match first. Restricting the number of rows to`2` allows you to check to see if there is an ambiguous match (e.g. a “tie” in the scores of the first two items returned” (see above tip). So the best way to do the above queries is like this:
67+
6868
```
6969
http://api.crossref.org/works?query.bibliographic="Toward a Unified Theory of High-Energy Metaphysics, Josiah Carberry 2008-08-13"&rows=2
7070
```
7171

7272
###Optimise your requests and pay attention to errors.
7373

7474
If you have an overall error (`4XX` +`5XX`) rate >= 10%, seriously- please*stop* your script and figure out what is going on. Don’t just leave it hammering the API and generating errors- you will just be making other users (and Crossref staff) miserable until you fix your script.
75+
7576
<hr/>
77+
7678
If you get a`404` (not found) when looking up a DOI, do not just endlessly poll Crossref to see if it ever resolves correctly. First check to make sure the DOI is a Crossref DOI. If it is not a Crossref DOI, you can stop checking it with us and try checking it with another registration agency’s API. You can check the registration agency to which a DOI belongs as follows:
79+
7780
```
7881
https://api.crossref.org/works/{doi}/agency
7982
```
83+
8084
<hr/>
85+
8186
Adhere to rate limits. We rate limit by IP- so*yes*, you can “get around” the rate limit by running your scripts on multiple machines with different IPs- but then all you are doing is being inconsiderate of other users. And that makes us grumpy. You won’t like us when we are grumpy. There can be other good reasons to run your scripts on multiple machines with different IPs- but if you do, please continue to respect the overall-rate limit by restricting each process to working at an appropriate sub-rate of the overall rate limit.
87+
8288
<hr/>
89+
8390
Check your errors and respond to them. If you get an error - particularly a timeout error, a rate limit error (`429`), or a server error (`5XX`)- do not just repeat the request or immediately move onto the next request, back-off your request rate. Ideally, back-off exponentially. There are lots of libraries that make this very easy. Since a lot of our API users seem to use Python, here are links to a few libraries that allow you to do this properly:
84-
- Backoff
85-
- Retry
91+
92+
-[Backoff](https://pypi.org/project/backoff/)
93+
-[Retry](https://pypi.org/project/retry/)
94+
8695
But there are similar libraries for Java, Javascript, R, Ruby, PHP, Clojure, Golang, Rust, etc.
8796
<hr/>
8897
Make sure you URL-encode DOIs. DOIs can contain lots of characters that need to be escaped properly. We see lots of errors that are simply the result of people not taking care to properly encode their requests. Don’t be one of those people.
8998
<hr/>
9099
Cache the results of your requests. We know a lot of our users are extracting DOIs from references or other sources and then looking up their metadata. This means that, often, they will end up looking up metadata for the same DOI multiple times. We recommend that, at a minimum, you cache the results of your requests so that subsequent requests for the same resource don’t hit the API directly. Again, there are some very easy ways to do this using standard libraries. In Python, for example, the following libraries allow you to easily add caching to any function with just a single line of code:
91100

92-
- Requests-cache
93-
- Diskcache
94-
- Cachew
101+
-[Requests-cache](https://pypi.org/project/requests-cache/)
102+
-[Diskcache](https://pypi.org/project/diskcache/)
103+
-[Cachew](https://github.com/karlicoss/cachew#what-is-cachew)
95104

96105
There are similar libraries for other languages.
97106

@@ -100,4 +109,8 @@ If you are using the Plus API, make sure that you are making intelligent use of
100109

101110
<hr/>
102111

103-
Managing the snapshot can be cumbersome as it is inconveniently large-ish. Remember that you do*not have to uncompress and unarchive the snapshot in order to use it.* Most major programming languages have libraries that allow you to open and read files directly from a compressed archive. If you parallelize the process of reading data from the snapshot and loading it into your database, you should be able to scale the process linearly with the number of cores you are able to take advantage of.
112+
Managing the snapshot can be cumbersome as it is inconveniently large-ish. Remember that you do*not have to uncompress and unarchive the snapshot in order to use it.* Most major programming languages have libraries that allow you to open and read files directly from a compressed archive. For example:
113+
114+
-[tarfile](https://docs.python.org/3/library/tarfile.html)
115+
116+
If you parallelize the process of reading data from the snapshot and loading it into your database, you should be able to scale the process linearly with the number of cores you are able to take advantage of.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp