Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Knowledge graph of the English language between 1800 and 2019 using open source data, Python 3, and ffmpeg.

NotificationsYou must be signed in to change notification settings

GraphTechnologyDevelopers/english-words-knowledge-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Watch the entire English language blossom from Wiktionary + Google Books N-grams, rendered as a living, breathing prefix galaxy.

How this repo is put together

  • Zero-config takeover./setup.sh spins up the virtualenv, fetches every dataset, caches the heavy lifts, and ships final MP4/GIF output.
  • Radial growth cinematics – the trie erupts from the core alphabet, framing decades of linguistic evolution as a neon fractal.
  • Repeatable science – every artifact (lemmata, first-year inference, trie counts, layouts) checkpoints to disk and into a reusable tarball for instant re-renders.
  • Battle-tested – streams 26 full 1-gram shards, handles 1.4GB Wiktionary dumps, and renders 220 frames in glorious 1080p.

Share it, remix it, drop it in your next data-viz thread.

Quickstart

cd /Users/grey/Projects/graph-visualizationsbash setup.sh

The script will:

  1. Create/upgradevenv/ with Python 3.
  2. Download Wiktionary + Google Books 1-gram shards (az).
  3. Extract English lemmas, infer first-use years, aggregate prefix counts.
  4. Render 220 radial frames (outputs/frames/frame-0000.pngframe-0219.png).
  5. Encodeoutputs/english_trie_timelapse.mp4 and a share-ready GIF.

Rerun the script anytime—artifact caching means future passes jump straight to rendering.

Anatomy

StageScriptOutput
Lemma extractionsrc/ingest/wiktionary_extract.pyartifacts/lemmas/lemmas.tsv
First-year inferencesrc/ingest/ngram_first_year.pyartifacts/years/first_years.tsv
Prefix aggregationsrc/build/build_prefix_trie.pyartifacts/trie/prefix_counts.jsonl
Layout generationsrc/viz/layout.pyartifacts/layout/prefix_positions.json (legacy back-compat)
Frame renderingsrc/viz/render_frames.pyoutputs/frames/
Encodingsrc/viz/encode.pyoutputs/english_trie_timelapse.mp4 +.gif

Render Only (after initial run)

source venv/bin/activatepython -m src.viz.render_frames artifacts/trie/prefix_counts.jsonl outputs/framespython -m src.viz.encode outputs/frames outputs/english_trie_timelapse.mp4 outputs/english_trie_timelapse.gif

Use flags such as--min-radius,--max-radius,--base-edge-alpha, or--start-progress to tune the vibe.

Neo4j Playground (Optional)

Loadartifacts/years/first_years.tsv to explore in Neo4j (Community & Enterprise safe):

:parambatch=> $rows;UNWIND $rowsASrowWITHrowWHERErow.wordISNOTNULLANDrow.word<>""MERGE (w:Word{text:row.word})SETw.first_year=CASEWHENrow.first_year=""THENNULLELSEtoInteger(row.first_year)END;

Documentation

Full documentation is available at theproject documentation site.

To run the documentation site locally:

cd docsbundle install --path vendor/bundlebundleexec jekyll serve --baseurl""

Visithttp://localhost:4000 to view the site locally.

Share-Worthy Ideas

  • Drop the GIF in language history threads (#linguistics #dataart).
  • Remix the radial layout with alternative color ramps or depth cutoffs.
  • Pair the timelapse with poetry readings for maximum feels.

Credits

  • Wiktionary community & Google Books N-gram team for open data.
  • You, for showing the world how beautifully language grows.

Community

For more open source software and content on Knowledge Graphs, GNNs, and Graph Databases,Join our community on X!


[8]ページ先頭

©2009-2025 Movatter.jp