Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A TLOZ inspired knowledge graph

License

NotificationsYou must be signed in to change notification settings

fferegrino/zeldaKG

Repository files navigation

         _     _       _   _______         | |   | |     | | / /  __ \ _______| | __| | __ _| |/ /| |  \/|_  / _ \ |/ _` |/ _` |    \| | __  / /  __/ | (_| | (_| | |\  \ |_\ \/___\___|_|\__,_|\__,_\_| \_/\____/a TLOZ inspired knowledge graph.
  • Step 0: Gather a lot of wiki pages (check if you can use a tool likeHTTrack), in this case, I downloaded a copy of the wholeZeldapedia andZelda Wiki.

  • Step 1: If you did not configure your crawlers/copiers correctly, the previous step might have got you a lot of useless sites, such as User pages, templates or even forum pages. The purpose of this step is to reduce the number of files to be processed by filtering out the documents whose name starts with "User_", "Category_Zeldapedians_", "Message_Wall_" and similar. In this cleaning stage, the real content of the site (in wikia that is the tagarticle) is extracted discarding the templated website out.

  • Step 2: Information Extraction

    • Title-Link relationship: extract a relationship between each file and the title of the article it represents into two dataframes.Title-Link relationship notebook.
    • Infobox extraction: extractraw relationships between entities extracted from the infobox of each page. The relationships are generated as json objects that are interpreted in the next step.for gamepedia andfor wikia.
    • Merge infobox sources: In this step we can extract information from the infoboxes. Information such as Gender, Race, Appereances, and many more.Merge sources notebook.
    • Text extraction using spaCy. In this step the text of each article is analysed using the spaCy package to extractraw relationships between aResource and names in the notebooktext_extraction, and then processed again to ground them to only relationships betweenResources existing in our graph, his happens intext_extraction_processing.
  • Step 3: Insertion into neo4j

The graph is not accurate nor complete. I'm just playing around since I want to learn a bit more about neo4j while performing information extraction and building a knowledge graph.


[8]ページ先頭

©2009-2025 Movatter.jp