zjunlp/DeepKEPublic

NotificationsYou must be signed in to change notification settings
Fork725
Star4k

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

License

MIT license

4k stars 725 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,702 Commits
.github		.github
docker		docker
docs		docs
example		example
mcp-tools		mcp-tools
pics		pics
pretrained		pretrained
src/deepke		src/deepke
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_CN.md		README_CN.md
README_CNSCHEMA.md		README_CNSCHEMA.md
README_CNSCHEMA_CN.md		README_CNSCHEMA_CN.md
README_TAG.md		README_TAG.md
README_TAG_CN.md		README_TAG_CN.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

English |简体中文

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

DeepKE is a knowledge extraction toolkit for knowledge graph construction supportingcnSchema，low-resource,document-level andmultimodal scenarios forentity,relation andattribute extraction. We providedocuments,online demo,paper,slides andposter for beginners.

❗Want to useLarge Language Models with DeepKE? TryDeepKE-LLM andOneKE, have fun!
❗Want to train supervised models? TryQuick Start, we provide the NER models (e.g,LightNER(COLING'22),W2NER(AAAI'22)), relation extraction models (e.g.,KnowPrompt(WWW'22)), relational triple extraction models (e.g.,ASP(EMNLP'22),PRGC(ACL'21),PURE(NAACL'21)), and release off-the-shelf models atDeepKE-cnSchema, have fun!
We recommend using Linux; if using Windows, please use\\ in file paths;
If HuggingFace is inaccessible, please consider usingwisemodel ormodescape.

If you encounter any issues during the installation of DeepKE and DeepKE-LLM, please checkTips or promptly submit anissue, and we will assist you with resolving the problem!

What's New

June, 2025 We integrate theMCP service tools into DeepKE, enabling knowledge extraction through large language models (LLMs) as tool callers for lightweight models.
December, 2024 We open source theOneKE knowledge extraction framework, supporting multi-agent knowledge extraction across various scenarios.
April, 2024 We release a new bilingual (Chinese and English) schema-based information extraction model calledOneKE based on Chinese-Alpaca-2-13B.
Feb, 2024 We release a large-scale (0.32B tokens) high-quality bilingual (Chinese and English) Information Extraction (IE) instruction dataset namedIEPile, along with two models trained withIEPile,baichuan2-13b-iepile-lora andllama2-13b-iepile-lora.
Sep 2023 a bilingual Chinese English Information Extraction (IE) instruction dataset calledInstructIE was released for the Instruction based Knowledge Graph Construction Task (Instruction based KGC), as detailed inhere.
June, 2023 We updateDeepKE-LLM to supportknowledge extraction withKnowLM,ChatGLM, LLaMA-series, GPT-series etc.
Apr, 2023 We have added new models, includingCP-NER(IJCAI'23),ASP(EMNLP'22),PRGC(ACL'21),PURE(NAACL'21), providedevent extraction capabilities (Chinese and English), and offered compatibility with higher versions of Python packages (e.g., Transformers).
Feb, 2023 We have supported usingLLM (GPT-3) with in-context learning (based onEasyInstruct) & data generation, added a NER modelW2NER(AAAI'22).

Previous News

Nov, 2022 Add dataannotation instructions for entity recognition and relation extraction, automatic labelling of weakly supervised data (entity extraction andrelation extraction), and optimizemulti-GPU training.
Sept, 2022 The paperDeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population has been accepted by the EMNLP 2022 System Demonstration Track.
Aug, 2022 We have addeddata augmentation (Chinese, English) support forlow-resource relation extraction.
June, 2022 We have added multimodal support forentity andrelation extraction.
May, 2022 We have releasedDeepKE-cnschema with off-the-shelf knowledge extraction models.
Jan, 2022 We have released a paperDeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
Dec, 2021 We have addeddockerfile to create the enviroment automatically.
Nov, 2021 The demo of DeepKE, supporting real-time extration without deploying and training, has been released.
The documentation of DeepKE, containing the details of DeepKE such as source codes and datasets, has been released.
Oct, 2021pip install deepke
The codes of deepke-v2.0 have been released.
Aug, 2019 The codes of deepke-v1.0 have been released.
Aug, 2018 The project DeepKE startup and codes of deepke-v0.1 have been released.

Prediction Demo

There is a demonstration of prediction. The GIF file is created byTerminalizer. Get thecode.

Model Framework

DeepKE contains a unified framework fornamed entity recognition,relation extraction andattribute extraction, the three knowledge extraction functions.
Each task can be implemented in different scenarios. For example, we can achieve relation extraction instandard,low-resource (few-shot),document-level andmultimodal settings.
Each application scenario comprises of three components:Data including Tokenizer, Preprocessor and Loader,Model including Module, Encoder and Forwarder,Core including Training, Evaluation and Prediction.

Quick Start

DeepKE-LLM

In the era of large models, DeepKE-LLM utilizes a completely new environment dependency.

conda create -n deepke-llm python=3.9conda activate deepke-llmcd example/llmpip install -r requirements.txt

Please note that therequirements.txt file is located in theexample/llm folder.

DeepKE-MCP-Tools

We integrate the MCP (Model Calling Protocol) service tools into DeepKE, enabling knowledge extraction through large language models (LLMs) as tool callers for lightweight models.

The MCP service has been deployed and is accessible atURL.
For local deployment, refer to theREADME for detailed operational procedures.

DeepKE

DeepKE supportspip install deepke.
Take the fully supervised relation extraction for example.
DeepKE supports bothmanual anddocker image environment configuration, you can choose the appropriate way to build.
Highly recommended to install deepke in a Linux environment.

🔧Manual Environment Configuration

Step1 Download the basic code

git clone --depth 1 https://github.com/zjunlp/DeepKE.git

Step2 Create a virtual environment usingAnaconda and enter it.

conda create -n deepke python=3.8conda activate deepke

InstallDeepKE with source code

pip install -r requirements.txtpython setup.py installpython setup.py develop

InstallDeepKE withpip (NOT recommended!)
```
pip install deepke
```
- Please make sure that pip version <= 24.0

Step3 Enter the task directory

cd DeepKE/example/re/standard

Step4 Download the dataset, or follow theannotation instructions to obtain data

wget 121.41.117.246:8080/Data/re/standard/data.tar.gztar -xzvf data.tar.gz

Many types of data formats are supported,and details are in each part.

Step5 Training (Parameters for training can be changed in theconf folder)

We support visual parameter tuning by usingwandb.

python run.py

Step6 Prediction (Parameters for prediction can be changed in theconf folder)

Modify the path of the trained model inpredict.yaml.The absolute path of the model needs to be used，such asxxx/checkpoints/2019-12-03_ 17-35-30/cnn_ epoch21.pth.

python predict.py

❗NOTE: if you encounter any errors, please refer to theTips or submit a GitHub issue.

🐳Building With Docker Images

Step1 Install the Docker client

Install Docker and start the Docker service.

Step2 Pull the docker image and run the container

docker pull zjunlp/deepke:latestdocker run -it zjunlp/deepke:latest /bin/bash

The remaining steps are the same asStep 3 and onwards inManual Environment Configuration.

❗NOTE: You can refer to theTips to speed up installation

Requirements

DeepKE

python == 3.8

torch>=1.5,<=1.11
hydra-core==1.0.6
tensorboard==2.4.1
matplotlib==3.4.1
transformers==4.26.0
jieba==0.42.1
scikit-learn==0.24.1
seqeval==1.2.2
opt-einsum==3.3.0
wandb==0.12.7
ujson==5.6.0
huggingface_hub==0.11.0
tensorboardX==2.5.1
nltk==3.8
protobuf==3.20.1
numpy==1.21.0
ipdb==0.13.11
pytorch-crf==0.7.2
tqdm==4.66.1
openai==0.28.0
Jinja2==3.1.2
datasets==2.13.2
pyhocon==0.3.60

Introduction of Three Functions

1. Named Entity Recognition

Named entity recognition seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, organizations, etc.

The data is stored in.txt files. Some instances as following (Users can label data based on the toolsDoccano,MarkTool, or they can use theWeak Supervision with DeepKE to obtain data automatically):

Sentence	Person	Location	Organization
本报北京9月4日讯记者杨涌报道：部分省区人民日报宣传发行工作座谈会9月3日在4日在京举行。	杨涌	北京	人民日报
《红楼梦》由王扶林导演，周汝昌、王蒙、周岭等多位专家参与制作。	王扶林，周汝昌，王蒙，周岭
秦始皇兵马俑位于陕西省西安市,是世界八大奇迹之一。	秦始皇	陕西省，西安市

Read the detailed process in specific README
- STANDARD (Fully Supervised)
  Wesupport LLM and provide the off-the-shelf model,DeepKE-cnSchema-NER, which will extract entities in cnSchema without training.
  Step1 EnterDeepKE/example/ner/standard. Download the dataset.
```
wget 121.41.117.246:8080/Data/ner/standard/data.tar.gztar -xzvf data.tar.gz
```
  Step2 Training
  The dataset and parameters can be customized in thedata folder andconf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- FEW-SHOT
  Step1 EnterDeepKE/example/ner/few-shot. Download the dataset.
```
wget 121.41.117.246:8080/Data/ner/few_shot/data.tar.gztar -xzvf data.tar.gz
```
  Step2 Training in the low-resouce setting
  The directory where the model is loaded and saved and the configuration parameters can be cusomized in theconf folder.
```
python run.py +train=few_shot
```
  Users can modifyload_path inconf/train/few_shot.yaml to use existing loaded model.
  Step3 Add- predict toconf/config.yaml, modifyloda_path as the model path andwrite_path as the path where the predicted results are saved inconf/predict.yaml, and then runpython predict.py
```
python predict.py
```
- MULTIMODAL
  Step1 EnterDeepKE/example/ner/multimodal. Download the dataset.
```
wget 121.41.117.246:8080/Data/ner/multimodal/data.tar.gztar -xzvf data.tar.gz
```
  We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN viafaster_rcnn and visual grounding viaonestage_grounding.
  Step2 Training in the multimodal setting
  - The dataset and parameters can be customized in thedata folder andconf folder respectively.
  - Start with the model trained last time: modifyload_path inconf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

2. Relation Extraction

Relationship extraction is the task of extracting semantic relations between entities from a unstructured text.

The data is stored in.csv files. Some instances as following (Users can label data based on the toolsDoccano,MarkTool, or they can use theWeak Supervision with DeepKE to obtain data automatically):

Sentence	Relation	Head	Head_offset	Tail	Tail_offset
《岳父也是爹》是王军执导的电视剧，由马恩然、范明主演。	导演	岳父也是爹	1	王军	8
《九玄珠》是在纵横中文网连载的一部小说，作者是龙马。	连载网站	九玄珠	1	纵横中文网	7
提起杭州的美景，西湖总是第一个映入脑海的词语。	所在城市	西湖	8	杭州	2

!NOTE: If there are multiple entity types for one relation, entity types can be prefixed with the relation as inputs.
Read the detailed process in specific README
- STANDARD (Fully Supervised)
  Wesupport LLM and provide the off-the-shelf model,DeepKE-cnSchema-RE, which will extract relations in cnSchema without training.
  Step1 Enter theDeepKE/example/re/standard folder. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/standard/data.tar.gztar -xzvf data.tar.gz
```
  Step2 Training
  The dataset and parameters can be customized in thedata folder andconf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- FEW-SHOT
  Step1 EnterDeepKE/example/re/few-shot. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/few_shot/data.tar.gztar -xzvf data.tar.gz
```
  Step 2 Training
  - The dataset and parameters can be customized in thedata folder andconf folder respectively.
  - Start with the model trained last time: modifytrain_from_saved_model inconf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- DOCUMENT
  Step1 EnterDeepKE/example/re/document. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/document/data.tar.gztar -xzvf data.tar.gz
```
  Step2 Training
  - The dataset and parameters can be customized in thedata folder andconf folder respectively.
  - Start with the model trained last time: modifytrain_from_saved_model inconf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```
- MULTIMODAL
  Step1 EnterDeepKE/example/re/multimodal. Download the dataset.
```
wget 121.41.117.246:8080/Data/re/multimodal/data.tar.gztar -xzvf data.tar.gz
```
  We use RCNN detected objects and visual grounding objects from original images as visual local information, where RCNN viafaster_rcnn and visual grounding viaonestage_grounding.
  Step2 Training
  - The dataset and parameters can be customized in thedata folder andconf folder respectively.
  - Start with the model trained last time: modifyload_path inconf/train.yamlas the path where the model trained last time was saved. And the path saving logs generated in training can be customized bylog_dir.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

3. Attribute Extraction

Attribute extraction is to extract attributes for entities in a unstructed text.

The data is stored in.csv files. Some instances as following:

Sentence	Att	Ent	Ent_offset	Val	Val_offset
张冬梅，女，汉族，1968年2月生，河南淇县人	民族	张冬梅	0	汉族	6
诸葛亮，字孔明，三国时期杰出的军事家、文学家、发明家。	朝代	诸葛亮	0	三国时期	8
2014年10月1日许鞍华执导的电影《黄金时代》上映	上映时间	黄金时代	19	2014年10月1日	0

Read the detailed process in specific README
- STANDARD (Fully Supervised)
  Step1 Enter theDeepKE/example/ae/standard folder. Download the dataset.
```
wget 121.41.117.246:8080/Data/ae/standard/data.tar.gztar -xzvf data.tar.gz
```
  Step2 Training
  The dataset and parameters can be customized in thedata folder andconf folder respectively.
```
python run.py
```
  Step3 Prediction
```
python predict.py
```

4. Event Extraction

Event extraction is the task to extract event type, event trigger words, event arguments from a unstructed text.
The data is stored in.tsv files, some instances are as follows:

Sentence	Event type	Trigger	Role	Argument
据《欧洲时报》报道，当地时间27日，法国巴黎卢浮宫博物馆员工因不满工作条件恶化而罢工，导致该博物馆也因此闭门谢客一天。	组织行为-罢工	罢工	罢工人员	法国巴黎卢浮宫博物馆员工
			时间	当地时间27日
			所属组织	法国巴黎卢浮宫博物馆
中国外运2019年上半年归母净利润增长17%：收购了少数股东股权	财经/交易-出售/收购	收购	出售方	少数股东
			收购方	中国外运
			交易物	股权
美国亚特兰大航展13日发生一起表演机坠机事故，飞行员弹射出舱并安全着陆，事故没有造成人员伤亡。	灾害/意外-坠机	坠机	时间	13日
			地点	美国亚特兰

Read the detailed process in specific README
- STANDARD(Fully Supervised)
  Step1 Enter theDeepKE/example/ee/standard folder. Download the dataset.
```
wget 121.41.117.246:8080/Data/ee/DuEE.zipunzip DuEE.zip
```
  Step 2 Training
  The dataset and parameters can be customized in thedata folder andconf folder respectively.
```
python run.py
```
  Step 3 Prediction
```
python predict.py
```

Tips

1.Using nearest mirror,THU in China, will speed up the installation ofAnaconda;aliyun in China, will speed uppip install XXX.

2.When encounteringModuleNotFoundError: No module named 'past'，runpip install future .

3.It's slow to install the pretrained language models online. Recommend download pretrained models before use and save them in thepretrained folder. ReadREADME.md in every task directory to check the specific requirement for saving pretrained models.

4.The old version ofDeepKE is in thedeepke-v1.0 branch. Users can change the branch to use the old version. The old version has been totally transfered to the standard relation extraction (example/re/standard).

5.If you want to modify the source code, it's recommended to installDeepKE with source codes. If not, the modification will not work. Seeissue

6.More related low-resource knowledge extraction works can be found inKnowledge Extraction in Low-Resource Scenarios: Survey and Perspective.

7.Make sure the exact versions of requirements inrequirements.txt.

To do

In next version, we plan to release a stronger LLM for KE.

Meanwhile, we will offer long-term maintenance tofix bugs,solve issues and meetnew requests. So if you have any problems, please put issues to us.

Reading Materials

Data-Efficient Knowledge Graph Construction, 高效知识图谱构建 (Tutorial on CCKS 2022) [slides]

Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]

PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]

Related Toolkit

Doccano、MarkTool、LabelStudio: Data Annotation Toolkits

LambdaKG: A library and benchmark for PLM-based KG embeddings

EasyInstruct: An easy-to-use framework to instruct Large Language Models

Reading Materials:

Data-Efficient Knowledge Graph Construction, 高效知识图谱构建 (Tutorial on CCKS 2022) [slides]

Efficient and Robust Knowledge Graph Construction (Tutorial on AACL-IJCNLP 2022) [slides]

PromptKG Family: a Gallery of Prompt Learning & KG-related Research Works, Toolkits, and Paper-list [Resources]

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective [Survey][Paper-list]

Related Toolkit:

Doccano、MarkTool、LabelStudio: Data Annotation Toolkits

LambdaKG: A library and benchmark for PLM-based KG embeddings

EasyInstruct: An easy-to-use framework to instruct Large Language Models

Citation

Please cite our paper if you use DeepKE in your work

@inproceedings{EMNLP2022_Demo_DeepKE,author    ={Ningyu Zhang and               Xin Xu and               Liankuan Tao and               Haiyang Yu and               Hongbin Ye and               Shuofei Qiao and               Xin Xie and               Xiang Chen and               Zhoubo Li and               Lei Li},editor    ={Wanxiang Che and               Ekaterina Shutova},title     ={DeepKE: {A} Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population},booktitle ={{EMNLP} (Demos)},pages     ={98--108},publisher ={Association for Computational Linguistics},year      ={2022},url       ={https://aclanthology.org/2022.emnlp-demos.10}}

Contributors

Ningyu Zhang,Haofen Wang, Fei Huang, Feiyu Xiong, Liankuan Tao, Xin Xu, Honghao Gui, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Xiaohan Wang, Zekun Xi, Xinrong Li, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Peng Wang, Yuqi Zhu, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Jing Chen, Yuqi Zhu, Yujie Luo, Shumin Deng, Wen Zhang, Guozhou Zheng, Huajun Chen

Community Contributors: Shuo Shen, Zhoutian Shao, Wei Hu,thredreams,eltociear, Ziwen Xu, Rui Huang, Xiaolong Weng

Other Knowledge Extraction Open-Source Projects

About

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

deepke.zjukg.cn/

Releases9

DeepKE 2.2.7 Latest

Sep 21, 2023

+ 8 releases

Packages

No packages published

Contributors29

+ 15 contributors

Languages

Python99.3%
Other0.7%

Movatterモバイル変換

License

zjunlp/DeepKE

Folders and files

Latest commit

History

Repository files navigation

A Deep Learning Based Knowledge Extraction Toolkitfor Knowledge Graph Construction

Table of Contents

What's New

Prediction Demo

Model Framework

Quick Start

DeepKE-LLM

DeepKE-MCP-Tools

DeepKE

🔧Manual Environment Configuration

🐳Building With Docker Images

Requirements

DeepKE

Introduction of Three Functions

1. Named Entity Recognition

2. Relation Extraction

3. Attribute Extraction

4. Event Extraction

Tips

To do

Reading Materials

Related Toolkit

Citation

Contributors

Other Knowledge Extraction Open-Source Projects

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases9

Packages0

Uh oh!

Contributors29

Uh oh!

Languages

A Deep Learning Based Knowledge Extraction Toolkit
for Knowledge Graph Construction

Packages