- Notifications
You must be signed in to change notification settings - Fork17
A python parser for DBLP dataset
License
26hzhang/DBLPParser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
It is a python parser forDBLP dataset, the XML format dumped file can be downloadedhere fromDBLP Homepage.
This parser requiresdtd
file, so make sure you have bothdblp-XXX.xml
(dataset) anddblp-XXX.dtd
files. Note that you also should guarantee that bothxml
anddtd
files are in the same directory, and the name ofdtd
file shoud same as the name given in the<!DOCTYPE>
tag of thexml
file. Such information can be easily accessed throughhead dblp-XXX.xml
command. As shown below
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPEdblp SYSTEM "dblp-2017-08-29.dtd"><dblp><phdthesismdate="2016-05-04"key="phd/dk/Heine2010"><author>Carmen Heine</author><title>Modell zur Produktion von Online-Hilfen.</title>...
A sample to use the parser:
defmain():dblp_path='dataset/dblp.xml'save_path='article.json'try:context_iter(dblp_path)log_msg("LOG: Successfully loaded\"{}\".".format(dblp_path))exceptIOError:log_msg("ERROR: Failed to load file\"{}\". Please check your XML and DTD files.".format(dblp_path))exit()parse_article(dblp_path,save_path,save_to_csv=False)# default save as json format
Some extracted results:
Count the number of all different type of publications:
Count the number of all different attributes among all publications:
About
A python parser for DBLP dataset
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.