forked fromscollins83/teal_deer
- Notifications
You must be signed in to change notification settings - Fork0
DeepLearningSky/teal_deer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Currently just hacking notebook. However, the notebook scrapes text from a directory of academic research pdf's,and then does LDA on it for prioritization of reading. Dataset for this run included just a handful of paperson chatbots from arxiv. OCR portion relies on:https://github.com/euske/pdfminer/blob/master/tools/pdf2txt.py
In process:
Adding a text summarization feature to try to generate abstracts or short summaries for large blocksof text (i.e., an abstract for the rest of a paper). So, not only could papers be prioritized, but could besummarized as well.
Planned updates - See project tab as well:
- Finish out OCR from PDF files part
- Complete the text summarization portion - Thanks to Siraj Raval for making the video:https://www.youtube.com/watch?v=ogrJaOIuBx4
- Clean up into python scripts with test suites
- Experiment with other front-end usecases: i.e., a slackbot is currently underway (notebook to be added later).
- Add a CI framework into this repo.
- Cartoon for a fun logo :-)