GraphTechnologyDevelopers/english-words-knowledge-graphPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star10

Knowledge graph of the English language between 1800 and 2019 using open source data, Python 3, and ffmpeg.

graphtechnologydevelopers.github.io/english-words-knowledge-graph/

10 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
neo4j/cypher		neo4j/cypher
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Repository files navigation

English Lexicon Time Machine

Watch the entire English language blossom from Wiktionary + Google Books N-grams, rendered as a living, breathing prefix galaxy.

How this repo is put together

Zero-config takeover –./setup.sh spins up the virtualenv, fetches every dataset, caches the heavy lifts, and ships final MP4/GIF output.
Radial growth cinematics – the trie erupts from the core alphabet, framing decades of linguistic evolution as a neon fractal.
Repeatable science – every artifact (lemmata, first-year inference, trie counts, layouts) checkpoints to disk and into a reusable tarball for instant re-renders.
Battle-tested – streams 26 full 1-gram shards, handles 1.4GB Wiktionary dumps, and renders 220 frames in glorious 1080p.

Share it, remix it, drop it in your next data-viz thread.

Quickstart

cd /Users/grey/Projects/graph-visualizationsbash setup.sh

The script will:

Create/upgradevenv/ with Python 3.
Download Wiktionary + Google Books 1-gram shards (a–z).
Extract English lemmas, infer first-use years, aggregate prefix counts.
Render 220 radial frames (outputs/frames/frame-0000.png →frame-0219.png).
Encodeoutputs/english_trie_timelapse.mp4 and a share-ready GIF.

Rerun the script anytime—artifact caching means future passes jump straight to rendering.

Anatomy

Stage	Script	Output
Lemma extraction	`src/ingest/wiktionary_extract.py`	`artifacts/lemmas/lemmas.tsv`
First-year inference	`src/ingest/ngram_first_year.py`	`artifacts/years/first_years.tsv`
Prefix aggregation	`src/build/build_prefix_trie.py`	`artifacts/trie/prefix_counts.jsonl`
Layout generation	`src/viz/layout.py`	`artifacts/layout/prefix_positions.json` (legacy back-compat)
Frame rendering	`src/viz/render_frames.py`	`outputs/frames/`
Encoding	`src/viz/encode.py`	`outputs/english_trie_timelapse.mp4` +`.gif`

Render Only (after initial run)

source venv/bin/activatepython -m src.viz.render_frames artifacts/trie/prefix_counts.jsonl outputs/framespython -m src.viz.encode outputs/frames outputs/english_trie_timelapse.mp4 outputs/english_trie_timelapse.gif

Use flags such as--min-radius,--max-radius,--base-edge-alpha, or--start-progress to tune the vibe.

Neo4j Playground (Optional)

Loadartifacts/years/first_years.tsv to explore in Neo4j (Community & Enterprise safe):

:parambatch=> $rows;UNWIND $rowsASrowWITHrowWHERErow.wordISNOTNULLANDrow.word<>""MERGE (w:Word{text:row.word})SETw.first_year=CASEWHENrow.first_year=""THENNULLELSEtoInteger(row.first_year)END;

Documentation

Full documentation is available at theproject documentation site.

To run the documentation site locally:

cd docsbundle install --path vendor/bundlebundleexec jekyll serve --baseurl""

Visithttp://localhost:4000 to view the site locally.

Share-Worthy Ideas

Drop the GIF in language history threads (#linguistics #dataart).
Remix the radial layout with alternative color ramps or depth cutoffs.
Pair the timelapse with poetry readings for maximum feels.

Credits

Wiktionary community & Google Books N-gram team for open data.
You, for showing the world how beautifully language grows.

Community

For more open source software and content on Knowledge Graphs, GNNs, and Graph Databases,Join our community on X!

About

Knowledge graph of the English language between 1800 and 2019 using open source data, Python 3, and ffmpeg.

graphtechnologydevelopers.github.io/english-words-knowledge-graph/

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

English Lexicon Time Machine

How this repo is put together

Quickstart

Anatomy

Render Only (after initial run)

Neo4j Playground (Optional)

Documentation

Share-Worthy Ideas

Credits

Community

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages

Movatterモバイル変換

GraphTechnologyDevelopers/english-words-knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

English Lexicon Time Machine

How this repo is put together

Quickstart

Anatomy

Render Only (after initial run)

Neo4j Playground (Optional)

Documentation

Share-Worthy Ideas

Credits

Community

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages