- Notifications
You must be signed in to change notification settings - Fork0
Quadtree - gradient-boosted decision tree model used to predict guanine quadruplexes in DNA sequences
License
patrikkaura/quadtree
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Quadtree is a gradient-boosted decision tree model used to predict guanine quadruplexes in DNA sequences. It's developed on top of the LightGBM python library. Each sequence base is encoded based on a given encoding prescription. The model was trained to be used with a sliding window and analyses the whole sequence. Machine learning model can be used as python script or thru preview websitequadtree.vercel.app
quadtree └─ web -> preview website source code └─ python └─ model -> lightgbm model params └─ train -> example files how training was performed └─ quadtree.py -> predictor
- lightgbm==3.3.2
- numpy==1.21.2
Before using install the requirements:
pip install -r requirements.txt
fromquadtreeimportQuadtreemodel=Quadtree()
- sequence as a string (maximum length is not limited)
- threshold (recommended values is 0.2)
- quadnet model file path
result=quadtree.analyse(sequence='ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACC...',model_path='/path/to/quadnet_model.txt',',score_threshold=0.1)
Results are then returned in given form which can be loaded into pandas DataFrame.
importpandasaspddf=pd.DataFrame(result)
index | position | sequence | length | |
---|---|---|---|---|
0 | 0 | 907 | GCAACAATGGCTGATCCAGAAGGTACAGACGGGGAGGGCACGGGTTGTAACGGCTGGTTTTATGTACAAGCTATTGTAGACAAAAAAACAGGAGATGTAATATCA | 105 |
1 | 1 | 1184 | GAGGCAGCACAGAAAACAGTCCATTAGGGGAGCGGCTGGAGGTGGATACAGAGTTAAGTCCACGGTTACAAGAAATATCTTTAAATAGTGGGCAGA | 96 |
2 | 2 | 1389 | ATGTAGTGGCGGCAGTACGGAGGCTATAGACAACGGGGGCACAGAGGGCAACAACAGCAGTGTAGACGGTACAAGTGACAATAGCAATATAGAAAATGTAAATCCAC | 107 |
3 | 3 | 1635 | AGATTGGGTTACAGCTATATTTGGAGTAAACCCAACAATAGCAGAAGGATTTAAAACACTAATACAGCCATTTAT | 75 |
4 | 4 | 2229 | AATAGATGAAGGGGGAGATTGGAGACCAATAGTGCAATTCCTGCGATACCAACAAATAGAGTTTATAACATTTTTAG | 77 |
These parameter were used to train lightgbm model
LGBM Classifier | value |
---|---|
colsample bytree | 0.817574864502621 |
learning rate | 0.03744835808549148 |
max bin | 127 |
min child sample | 3 |
number of estimators | 1000 |
number of leaves | 74 |
regularization alpha | 0.0033803043003857677 |
regularization lambda | 0.7013136087939289 |
objective | binary |
- Patrik Kaura -Main developer -patrikkaura
This project is licensed under the MIT License - see theLICENSE file for details. # quadtree
About
Quadtree - gradient-boosted decision tree model used to predict guanine quadruplexes in DNA sequences
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.