This repository was archived by the owner on May 28, 2025. It is now read-only.

saattrupdan/autopoetPublic archive

NotificationsYou must be signed in to change notification settings
Fork0
Star6

Build poems from text sources

6 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
data		data
.gitignore		.gitignore
README.md		README.md
core.py		core.py
process_gutsyls.ipynb		process_gutsyls.ipynb
requirements.txt		requirements.txt
sents-to-small-sents.ipynb		sents-to-small-sents.ipynb
syllablecounter.py		syllablecounter.py
test_model.ipynb		test_model.ipynb
trumptweets.py		trumptweets.py

Repository files navigation

AutoPoet

Build poems from text sources.

Todos

Fetch Trump tweet data
Generate vocabulary from Trump tweets
Build baseline model to count syllables in English words
Optimise syllable counter
Use model to build Haikus from Trump tweets
Build progressive web app that generates poems
Enable working with live tweets
Enable working with other text sources

Syllable model

A large part of this project was to develop a model that counts syllables in English words.

The syllable counter is trained on a (slightly modified version of) theGutenberg syllable corpus, consisting of ~170,000 English words split into syllables. Theprocess_gutsyls notebook converts these into a format which is more convenient for our purposes. The raw dataset can be freely downloadedhere, and the preprocessed versions used for this project can be foundhere.

The model is a recurrent neural network that works at the character level, with the following rough architecture:

Embed the characters into 64-dimensional vectors
Process the characters through three bidirectional GRU layers, each having 2x128 = 256 hidden units
Process the GRU outputs through a time-distributed dense layers with 256 hidden units followed by a ReLU activation
Finally project the outputs from the dense layer down to a single dimension across time, outputting a sequence of real numbers between 0 and 1, of the same length as we started with
To get the syllable count, we sum up the probabilities and round to nearest integer

To get a more detailed view of the model's architecture seesyllablecounter.py, and check outcore.py for an idea of how the model is trained.

This model currently achieves a 96.89% validation accuracy.

The reason why we sum up (aggregates of) theprobabilities in point (5), rather than firstly rounding the probabilities, is to deal with the situation where the model is unsure whether two consecutive characters begin a new syllable. These will have probabilities ~50% and so will constitute a single syllable rather than two.

About

Build poems from text sources

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AutoPoet

Todos

Syllable model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

saattrupdan/autopoet

Folders and files

Latest commit

History

Repository files navigation

AutoPoet

Todos

Syllable model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages