Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Read and modify constituency trees in Rust.

License

NotificationsYou must be signed in to change notification settings

sebpuetz/lumberjack

Repository files navigation

CrateBuild Status

lumberjack

Read and process constituency trees in various formats.

Install:

  • From crates.io:
cargo install lumberjack-utils
  • From GitHub:
cargo install --git https://github.com/sebpuetz/lumberjack

Usage as standalone:

  • Convert treebank in NEGRA export 4 format to bracketed TueBa V2 format
lumberjack-conversion --input_file treebank.negra --input_format negra \    --output_format tueba --output_file treebank.tueba --projectivize
  • Retain only root node,NPs andPPs and print to simple bracketed format:
echo"NP PP"> filter_set.txtlumberjack-conversion --input_file treebank.simple --input_format simple \    --output_format tueba --output_file treebank.filtered \    --filter filter_set.txt
  • Convert from treebank in simple bracketed to CONLLX format and annotateparent tags of terminals as features.
lumberjack-conversion --input_file treebank.simple --input_format  simple\    --output_format conllx --output_file treebank.conll --parent
  • Modifications in the following order:
  1. Reattach all terminals with part-of-speech starting with$ to theroot node
  2. Remove all nonterminals except the root,Ss,NPs,PPs andVPs
  3. Assign unique identifiers based on the closestS to terminals
  4. Insert nodes with labellabel above terminals that aren't dominated byNP orPP
  5. Annotate label of parent node on terminals.
  6. Print to CONLLX format with annotations.
echo"S VP NP PP"> filter_set.txtecho"NP PP"> insert_set.txtecho"S"> id_set.txtlumberjack-conversion --input_file treebank.simple --input_format simple\    --output_format conllx --insertion_set insert_set.txt \    --insertion_label label --id_set id_set.txt --reattach $\    --parent parent --output_file treebank.conllx

Usage as rust library:

  • read and projectivize trees from NEGRA format and print to simplebracketed format
use std::io::{BufReader,File};use lumberjack::io::{NegraReader,PTBFormat};use lumberjack::Projectivize;fnprint_negra(path:&str){let file =File::open(path).unwrap();let reader =NegraReader::new(BufReader::new(file));for treein reader{letmut tree = tree.unwrap();        tree.projectivize();println!("{}",PTBFormat::Simple.tree_to_string(&tree).unwrap());}}
  • filter non-terminal nodes from trees in a treebank and print tosimple bracketed format:
use lumberjack::{io::PTBFormat,Tree,TreeOps, util::LabelSet};fnfilter_nodes(iter:implIterator<Item=Tree>,set:LabelSet){formut treein iter{        tree.filter_nonterminals(|tree, nt| set.matches(tree[nt].label())).unwrap();println!("{}",PTBFormat::Simple.tree_to_string(&tree).unwrap());}}
  • convert treebank in simple bracketed format to CONLLX with constituency structureencoded in the features field
use conllx::graph::Sentence;use lumberjack::io::Encode;use lumberjack::{Tree,TreeOps,UnaryChains};fnto_conllx(iter:implIterator<Item=Tree>){formut treein iter{        tree.collaps_unary_chains().unwrap();        tree.annotate_absolute().unwrap();println!("{}",Sentence::from(&tree));}}

[8]ページ先頭

©2009-2025 Movatter.jp