Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Interesting datasets you could use with Algolia

NotificationsYou must be signed in to change notification settings

algolia/datasets

Repository files navigation

Welcome to the Algolia repository of datasets.

The goal of this repository is to help you build something with Algolia, evenif you don't have data of your own. Maybe you just want to try the API, ormaybe you got inspired by some of those datasets to build something of yourown.

What you'll find in this repository

Ready-made indices

Each directory of the repository holdsjson files that contains both theactual records and the index configuration. You can use them to push data toyour own application on configure your settings through the API.

Most of them also come with credentials to directly query this data from ourservers. In that case you won't need to push anything, but you will also belimited to querying as the API key we share only allows reading data, notupdating it.

Links to publicly available raw datasets

But there is much more data out there than what is in those files. That's whatwe've compiled a list of interesting potential datasources. They will eachcome in their own format (sometime azip file to download, sometimes an APIto query).

They're here to give you ideas a of what you could build with Algolia. If youever build something with any of those datasources, let us known, we'd love tosee what you did. If you ever know of another good data source, we're open toPull Requests as well :)

Academic Papers

http://academictorrents.com/

15.49TB of research data available on torrent format

Archive.org

https://archive.org/

Non-profit library of millions of free books, movies, software, music, and websites.

Amazon

https://aws.amazon.com/datasets/

Datasets publicly hosted on AWS and available. Including population of Japan,Wikipedia page traffic, metadata about a million songs, social graph of MarvelSuper-Heroes, Open StreetMap and much more.

Awesome JSON Datasets

https://github.com/jdorfman/awesome-json-datasets

A curated list of awesome JSON datasets that don't require authentication. Ithas a lot of choice, from food to gaming to quotes. Some links may be outdated.

Awesome Public Datasets

https://github.com/awesomedata/awesome-public-datasets

Another extensive list of links to various datasets and APIs.

APIs

Here is a Gist referencing curated APIs that would contain interesting datahttps://gist.github.com/soopa/8225112

And a dedicated repo:https://github.com/toddmotto/public-apis

CommonCrawl

http://commoncrawl.org/

7 years of crawled data on the web, million of pages and trillions of linksbetween them. Sort of an open source Google index

Gallica

http://gallica.bnf.fr/

French National Library (BNF) online archives.Books, maps, press

IMDB

http://www.imdb.com/interfaces

List of all actors, movies, shows, etc from IMDB.

Kaggle

https://www.kaggle.com/datasets

Large list of datasets for machine learning

Marvel

List of all super-heroes and super villains of the Marvel universe. Extractedfrom sources (Wikipedia,Marvel API,DBPedia and aggregated in Algolia records.

N-grams

http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html

List of n-grams extracted from the Google Books corpus. Not a regular dataset,but still worth noting.

OpenStreetMap

http://wiki.openstreetmap.org/wiki/Downloading_data

Open source alternative to Google Maps. Geolocated point of interestes, streets,etc.

Project Gutemberg

https://www.gutenberg.org/

Public domain books, available as both HTML and ebooks. They do not all followthe same format, so custom parsing is needed.

Photography

https://github.com/pixelastic/landscapes-data/tree/master/data/earthporn

All landscape photography published onr/earthporn up until Dec 12th, 2022.Includes popularity and LQIP (low quality image placeholder).

Public APIs

https://github.com/public-apis-dev/public-apishttps://publicapis.dev/

A collaborative list of public APIs for developers. Be careful not to use thedeprecatedpublic-apis/public-apisrepository (seethis issue for details)

Vogue

http://dh.library.yale.edu/projects/vogue/

All issues of the Vogue magazine, from 1892 to 2016. Including covers and pages.About 6TB of data.

Wikipedia

https://dumps.wikimedia.org/

Downloadable extracts of Wikipedia. In Wikitext, with metadata as embedded XML.

About

Interesting datasets you could use with Algolia

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp