Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Build poems from text sources

NotificationsYou must be signed in to change notification settings

saattrupdan/autopoet

Repository files navigation

Build poems from text sources.

Todos

  • Fetch Trump tweet data
  • Generate vocabulary from Trump tweets
  • Build baseline model to count syllables in English words
  • Optimise syllable counter
  • Use model to build Haikus from Trump tweets
  • Build progressive web app that generates poems
  • Enable working with live tweets
  • Enable working with other text sources

Syllable model

A large part of this project was to develop a model that counts syllables in English words.

The syllable counter is trained on a (slightly modified version of) theGutenberg syllable corpus, consisting of ~170,000 English words split into syllables. Theprocess_gutsyls notebook converts these into a format which is more convenient for our purposes. The raw dataset can be freely downloadedhere, and the preprocessed versions used for this project can be foundhere.

The model is a recurrent neural network that works at the character level, with the following rough architecture:

  1. Embed the characters into 64-dimensional vectors
  2. Process the characters through three bidirectional GRU layers, each having 2x128 = 256 hidden units
  3. Process the GRU outputs through a time-distributed dense layers with 256 hidden units followed by a ReLU activation
  4. Finally project the outputs from the dense layer down to a single dimension across time, outputting a sequence of real numbers between 0 and 1, of the same length as we started with
  5. To get the syllable count, we sum up the probabilities and round to nearest integer

To get a more detailed view of the model's architecture seesyllablecounter.py, and check outcore.py for an idea of how the model is trained.

This model currently achieves a 96.89% validation accuracy.

The reason why we sum up (aggregates of) theprobabilities in point (5), rather than firstly rounding the probabilities, is to deal with the situation where the model is unsure whether two consecutive characters begin a new syllable. These will have probabilities ~50% and so will constitute a single syllable rather than two.

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp