You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,9 @@ ACE2005-toolkit
13
13
│ │ └── ...
14
14
│ └── index.html
15
15
├── cache_data (empty before run)
16
-
│ ├── Arabic
17
-
│ ├── Chinese
18
-
│ └── English
16
+
│ ├── Arabic/
17
+
│ ├── Chinese/
18
+
│ └── English/
19
19
├── filelist (train/dev/test doc files)
20
20
│ ├── ace.ar.dev
21
21
│ ├── ace.ar.test
@@ -27,7 +27,11 @@ ACE2005-toolkit
27
27
│ ├── ace.zh.test
28
28
│ └── ace.zh.train
29
29
│
30
-
├── output(final output, empty before run)
30
+
├── output (final output, empty before run)
31
+
│ ├── BIO (BIO output)
32
+
│ │ ├── train/
33
+
│ │ ├── test/
34
+
│ │ └── dev/
31
35
│ └── ...
32
36
├── udpipe (udpipe files)
33
37
│ ├── arabic-padt-ud-2.5-191206
@@ -47,7 +51,8 @@ ACE2005-toolkit
47
51
2. Install all the requirements by`pip install -r requirements.txt`;
48
52
3. Start preprocess by`bash run.sh en`,`en` can be replaced by`zh` or`ar`;
49
53
4. Enter`n` to get data divided by filelist, or enter`y` and`train/dev/test rate`(e.g.`0.8 0.1 0.1`) to get data divided by sentences;
50
-
5. The final output will in`output/`.
54
+
5. Enter`y` to get transform the data into BIO-type format, the transformed data will be in`output/BIO/`
55
+
6. The final output will be in`output/`.
51
56
####Output format
52
57
The output will save separately in`output/`, each file can be loaded by`json.loads()`. After loading, the data will be in`python list` type, each line will be in`python dict` type: