eternagame/EternaBrainPublic

NotificationsYou must be signed in to change notification settings
Fork12
Star19

Deep learning to solve RNA design puzzles

License

MIT license

19 stars 12 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
draw_rna		draw_rna
graphs		graphs
rna-prediction		rna-prediction
tests		tests
updates		updates
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
config-ViennaRNA.md		config-ViennaRNA.md
eternabrain_logo.png		eternabrain_logo.png
eternabrain_logo.svg		eternabrain_logo.svg
eternamoves-select.zip		eternamoves-select.zip
experts.txt		experts.txt
requirements.txt		requirements.txt
sql-exploration.txt		sql-exploration.txt
table4.txt		table4.txt
test_table_2_4.md		test_table_2_4.md

Repository files navigation

EternaBrain

UsingEterna data to understand and predict how players solve RNA folding puzzles.

These data are move sets graciously donated by Eterna players to accelerate scientific research into RNA design
Neural networks to learn how top players solve Eterna puzzles and to predict solutions to RNA folding puzzles
Unsupervised learning to group Eterna players based on their style of solving RNA folding puzzles

Author

Rohan Koodli

Notes

Link to paper onPLoS Computational Biology, link tocover art.
EternaBrain 1.2 is the version referenced in the above paper, and supports Python 2. EternaBrain 2.0 supports both Python 2 and 3.
Many thanks to the contributing authors who provided guidance, ran benchmarks, and did testing:Benjamin Keep,Katherine Coppess,Fernando Portela, andRhiju Das.
This software is freely available for non-commercial use. Royalties for commercial use will come back to fund research on Eterna, administered by Stanford University. Please seelicense.

Benchmarks

Eterna100

61/100

Dependencies

Python:numpy, tensorflow, pandas, seaborn, matplotlib, scikit-learn

Conda:viennarna (runconda install -c bioconda viennarna, you should be able to runpython -c "import RNA" without any errors)

RNAfold version1.8.5 from ViennaRNA (seeconfig-ViennaRNA.md for installation instructions)

R:ggplot2, reshape2

Using a pretrained model

Configure RNAfold, and enter the correct path to Vienna 1.8.5 in thepath inpredict_pm.py field, then runpython predict_pm.py "<valid structure in dot-bracket notation>".

Generating your own data and CNNs

Step 1: Generate the training data

Following curates a subset of training data "eternamoves-select" which trains an effective CNN move predictor with reasonable test accuracy.

Selecting expert solutions

Go toexperts.py and modify the variablescontent anduidList.content is the puzzle IDs of the puzzles you want movesets on, anduidList is the user ID's of the players you want movesets from. You can either specify these manually, or you can use functions to get them for you.getPid() will retrieve all the single state puzzles, andexperience will retrieve all players with an experience above a certain threshold.

Example:

content=getPid()# all single-state puzzlesuidList=experience(3000)# the top 70 experts, or the top 1 percent of all players

or, if you want less puzzles and more experts, you can read inteaching-puzzle-ids.txt, which contains 92 key puzzles:

withopen(os.getcwd()+'/movesets/teaching-puzzle-ids.txt')asf:progression=f.readlines()progression= [x.strip()forxincontent]progression= [int(x)forxincontent]progression.extend([6502966,6502968,6502973,6502976,6502984,6502985,6502993, \6502994,6502995,6502996,6502997,6502998,6502999,6503000])content=progressionuidList=experience(1000)

Selecting the fastest solutions

Go tofastest.py and modifycontent andmax_moves.content requires the same inputs as above, andmax_moves is an integer specifying the maximum number of moves you want the data to have.

Example:

content=getPid()# all the single state puzzlesmax_moves=30# all solutions in under 30 moves

Step 2: Training the convolutional neural network (CNN)

EternaBrain uses a convolutional neural net (CNN). Run bothbaseCNN.py andlocationCNN.py. Just specify the path and name of your pickled data files here:

forpidincontent:try:feats=pickle.load(open(os.getcwd()+'/pickles/X-exp-loc-'+str(pid),'rb'))ybase=pickle.load(open(os.getcwd()+'/pickles/y-exp-base-'+str(pid),'rb'))yloc=pickle.load(open(os.getcwd()+'/pickles/y-exp-loc-'+str(pid),'rb'))foriinrange(len(feats)):feats[i].append(yloc[i])real_X.extend(feats)real_y.extend(ybase)pids.append(feats)exceptIOError:continue

Specify the name and directory of where you want the model to be saved here:

saver.save(sess,os.getcwd()+'/models/base/baseCNN')saver.export_meta_graph(os.getcwd()+'/models/base/baseCNN.meta')

Step 3: Predicting

Load your model into the appropriate locations for the base predictor and location predictor inpredict_pm.py. Specify the RNA secondary structure, starting nucleotide sequence, and path to Vienna inDOT_BRACKET,NUCLEOTIDES, andpath. Also specify the natural energy and target energy incurrent_energy andtarget_energy (default is 0 kcal).

DOT_BRACKET='((((....))))'path=os.getcwd()+'./RNAfold'len_puzzle=len(dot_bracket)NUCLEOTIDES='A'*len_puzzlece=0.0# current energyte=0.0# target energy

You can specify the minimum amount of the puzzle you want the CNN to solve (on its own, it generally cannot solve long puzzles). The amount is calculated by how much of the current structure matches the target structure. Once it reaches the threshold specified or completes the maximum number of moves, the sequence moves to the Single Action Playout (SAP), which runs a Monte Carlo Tree Search to determine what mutations bring the RNA molecule closer to the target secondary structure.

MIN_THRESHOLD=0.65

Now, you can run the model and it will attempt to find a nucleotide sequence that will fold into the secondary structure provided.

Key Puzzles

Multi-state puzzles

6892343 - 6892348, 7254756 - 7254761

Key Players

8627, 55836, 231387, 42833

About

Deep learning to solve RNA design puzzles

software.eternagame.org/

Releases5

EternaBrain 2.0.1 Latest

Feb 4, 2025

+ 4 releases

Packages

No packages published

Movatterモバイル変換

License

eternagame/EternaBrain

Folders and files

Latest commit

History

Repository files navigation

EternaBrain

Author

Notes

Benchmarks

Eterna100

Dependencies

Using a pretrained model

Generating your own data and CNNs

Step 1: Generate the training data

Selecting expert solutions

Selecting the fastest solutions

Step 2: Training the convolutional neural network (CNN)

Step 3: Predicting

Key Puzzles

Multi-state puzzles

Key Players

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases5

Packages0

Uh oh!

Contributors5

Uh oh!

Languages

Packages