NotificationsYou must be signed in to change notification settings
Fork270
Star767

Commit830ae02

authored

Fix formatting, add links.

1 parent838c723 commit830ae02Copy full SHA for 830ae02

File tree

1 file changed

+21

-8

lines changed

api_tips.md

1 file changed

+21

-8

lines changed

`‎api_tips.md`

Lines changed: 21 additions & 8 deletions

Original file line number	Diff line number	Diff line change
`@@ -23,7 +23,7 @@ Our advice is split into three sections:`
`23`	`23`
`24`	`24`	`###Pick the right service level.`
`25`	`25`
`26`		`-Consider using our “Polite” or “Plus” versions of the REST API.`
	`26`	`+Consider using our “[Polite](https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service)” or “[Plus](https://www.crossref.org/services/metadata-retrieval/metadata-plus/)” versions of the REST API.`
`27`	`27`
`28`	`28`	`What does this mean?`
`29`	`29`
`@@ -43,7 +43,6 @@ Note that, in asking you to self-identify, we are not asking you to completely g`
`43`	`43`
`44`	`44`	`And finally, if you are using our REST API for a production service that requires high predictability-you should really consider using our paid-for “Plus” service. This service gets you an authentication token which, in turn, directs your request as a reserved pool of servers that are extremely predictable.`
`45`	`45`
`46`		`-`
`47`	`46`	`###Understand the performance characteristics of REST API queries.`
`48`	`47`
`49`	`48`	If you are using the API for simple reference matching, and are not doing any post validation (e.g. your own ranking of the returned results), then just ask for the first two results (`rows=2`). This allows you to identify the best result and ignore any where there is a tie in score on the first two results (e.g. an inconclusive match). If youare analyzing and ranking the results yourself, then you can probably get away with just requesting five results (`rows=5`). Anything beyond that is very unlikely to be a match. In either case- restricting the number of rows returned will be more efficient for you and for the API.
`@@ -65,33 +64,43 @@ http://api.crossref.org/works?query="Toward a Unified Theory of High-Energy Meta`
`65`	`64`	Using the plain`query` parameter will search the entire record- including funder and other non bibliographic elements. This means that it will also match any record that includes the query text in these other elements- resulting in many, many false positives and distorted scores.
`66`	`65`
`67`	`66`	If you are trying to match references- the simplest approach is the best. Just use the`query.bibliographic` parameter. It restricts the matching to the bibliographic metadata and the default sort order and scoring mechanism will reliably list the best match first. Restricting the number of rows to`2` allows you to check to see if there is an ambiguous match (e.g. a “tie” in the scores of the first two items returned” (see above tip). So the best way to do the above queries is like this:
	`67`	`+`
`68`	`68`	```
`69`	`69`	`http://api.crossref.org/works?query.bibliographic="Toward a Unified Theory of High-Energy Metaphysics, Josiah Carberry 2008-08-13"&rows=2`
`70`	`70`	```
`71`	`71`
`72`	`72`	`###Optimise your requests and pay attention to errors.`
`73`	`73`
`74`	`74`	If you have an overall error (`4XX` +`5XX`) rate >= 10%, seriously- pleasestop your script and figure out what is going on. Don’t just leave it hammering the API and generating errors- you will just be making other users (and Crossref staff) miserable until you fix your script.
	`75`	`+`
`75`	`76`	`<hr/>`
	`77`	`+`
`76`	`78`	If you get a`404` (not found) when looking up a DOI, do not just endlessly poll Crossref to see if it ever resolves correctly. First check to make sure the DOI is a Crossref DOI. If it is not a Crossref DOI, you can stop checking it with us and try checking it with another registration agency’s API. You can check the registration agency to which a DOI belongs as follows:
	`79`	`+`
`77`	`80`	```
`78`	`81`	`https://api.crossref.org/works/{doi}/agency`
`79`	`82`	```
	`83`	`+`
`80`	`84`	`<hr/>`
	`85`	`+`
`81`	`86`	Adhere to rate limits. We rate limit by IP- soyes, you can “get around” the rate limit by running your scripts on multiple machines with different IPs- but then all you are doing is being inconsiderate of other users. And that makes us grumpy. You won’t like us when we are grumpy. There can be other good reasons to run your scripts on multiple machines with different IPs- but if you do, please continue to respect the overall-rate limit by restricting each process to working at an appropriate sub-rate of the overall rate limit.
	`87`	`+`
`82`	`88`	`<hr/>`
	`89`	`+`
`83`	`90`	Check your errors and respond to them. If you get an error - particularly a timeout error, a rate limit error (`429`), or a server error (`5XX`)- do not just repeat the request or immediately move onto the next request, back-off your request rate. Ideally, back-off exponentially. There are lots of libraries that make this very easy. Since a lot of our API users seem to use Python, here are links to a few libraries that allow you to do this properly:
`84`		`-- Backoff`
`85`		`-- Retry`
	`91`	`+`
	`92`	`+-[Backoff](https://pypi.org/project/backoff/)`
	`93`	`+-[Retry](https://pypi.org/project/retry/)`
	`94`	`+`
`86`	`95`	`But there are similar libraries for Java, Javascript, R, Ruby, PHP, Clojure, Golang, Rust, etc.`
`87`	`96`	`<hr/>`
`88`	`97`	`Make sure you URL-encode DOIs. DOIs can contain lots of characters that need to be escaped properly. We see lots of errors that are simply the result of people not taking care to properly encode their requests. Don’t be one of those people.`
`89`	`98`	`<hr/>`
`90`	`99`	Cache the results of your requests. We know a lot of our users are extracting DOIs from references or other sources and then looking up their metadata. This means that, often, they will end up looking up metadata for the same DOI multiple times. We recommend that, at a minimum, you cache the results of your requests so that subsequent requests for the same resource don’t hit the API directly. Again, there are some very easy ways to do this using standard libraries. In Python, for example, the following libraries allow you to easily add caching to any function with just a single line of code:
`91`	`100`
`92`		`-- Requests-cache`
`93`		`-- Diskcache`
`94`		`-- Cachew`
	`101`	`+-[Requests-cache](https://pypi.org/project/requests-cache/)`
	`102`	`+-[Diskcache](https://pypi.org/project/diskcache/)`
	`103`	`+-[Cachew](https://github.com/karlicoss/cachew#what-is-cachew)`
`95`	`104`
`96`	`105`	`There are similar libraries for other languages.`
`97`	`106`
`@@ -100,4 +109,8 @@ If you are using the Plus API, make sure that you are making intelligent use of`
`100`	`109`
`101`	`110`	`<hr/>`
`102`	`111`
`103`		`-Managing the snapshot can be cumbersome as it is inconveniently large-ish. Remember that you donot have to uncompress and unarchive the snapshot in order to use it. Most major programming languages have libraries that allow you to open and read files directly from a compressed archive. If you parallelize the process of reading data from the snapshot and loading it into your database, you should be able to scale the process linearly with the number of cores you are able to take advantage of.`
	`112`	`+Managing the snapshot can be cumbersome as it is inconveniently large-ish. Remember that you donot have to uncompress and unarchive the snapshot in order to use it. Most major programming languages have libraries that allow you to open and read files directly from a compressed archive. For example:`
	`113`	`+`
	`114`	`+-[tarfile](https://docs.python.org/3/library/tarfile.html)`
	`115`	`+`
	`116`	`+If you parallelize the process of reading data from the snapshot and loading it into your database, you should be able to scale the process linearly with the number of cores you are able to take advantage of.`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit830ae02

File tree

1 file changed

1 file changed

`‎api_tips.md`

0 commit comments