Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Submission for HackDataKIBots 2018 - Web crawler combined with document analysis

License

NotificationsYou must be signed in to change notification settings

manuel-lang/Autonomous-Semantic-Search-Engine

Repository files navigation

A search engine that autonomously crawls documents from a given domain including their subdomains, analyzes them and renders them into a search frontend. This implementation demonstrates the functionality with theStanford University's website. This project was implemented during theNext Iteration Hackathon 2018

Web crawler

Python implementation with Scrapyhere.

Document analysis

Python implementation usingWatson NLU (for Named Entities, Keywords),gensim (for Summarization and Semantic Representation) and a custom Document Type classifier (Random Forest, withsklearn). Title, a thumbnail and embedded images are also extracted from documents. See notebooks for specific implementations.

Web frontend

A react frontend that displays the information with additional image information usingBing Image Searchhere.

brew Dependencies

About

Submission for HackDataKIBots 2018 - Web crawler combined with document analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp