Movatterモバイル変換


[0]ホーム

URL:


Software > Stanford Parser

About

A natural language parser is a program that works out the grammaticalstructure of sentences, for instance, which groups of words go together(as "phrases") and which words are thesubject orobject of averb. Probabilistic parsers use knowledge of language gained fromhand-parsed sentences to try to produce themost likely analysis of newsentences. These statistical parsers still make some mistakes, butcommonly work rather well. Their development was one of the biggest breakthroughs innatural language processing in the 1990s. You cantry out our parseronline.

Package contents

This package is a Java implementation of probabilistic natural languageparsers, both highly optimized PCFG and lexicalized dependency parsers, and alexicalized PCFG parser. The original version of this parser was mainly written by Dan Klein,with support code and linguistic grammar development by Christopher Manning. Extensive additional work (internationalization and language-specificmodeling, flexible input/output, grammar compaction, lattice parsing,k-best parsing,typed dependencies output,user support, etc.) has been done by Roger Levy, Christopher Manning,Teg Grenager, Galen Andrew, Marie-Catherine de Marneffe, BillMacCartney, Anna Rafferty, Spence Green, Huihsin Tseng, Pi-Chuan Chang, WolfgangMaier, and Jenny Finkel.

The lexicalized probabilistic parser implements a factored productmodel, with separate PCFG phrase structure and lexical dependency experts, whose preferences are combined by efficient exact inference, using an A* algorithm.Or the software can be used simply as an accurate unlexicalized stochasticcontext-free grammar parser.Either of these yields a good performance statistical parsing system.A GUI is provided for viewing the phrase structure tree output of the parser.

As well as providing anEnglish parser, the parser can beand has been adapted to work with other languages.AChinese parser based on the Chinese Treebank, aGermanparser based on the Negra corpus andArabic parsers based on the Penn Arabic Treebank are also included.The parser has also been used for other languages, such as Italian,Bulgarian, and Portuguese.

The parser providesUniversal Dependencies (v1) and Stanford Dependencies output as well as phrase structure trees. Typed dependencies areotherwise knowngrammatical relations. This style of output is available only for English and Chinese.For more details, please refer to theStanford Dependencies webpage and theUniversal Dependencies v1 documentation. (See alsothe current Universal Dependencies documentation, but we are yet to update to it.).

Shift-reduce constituency parser

As of version 3.4 in 2014, the parser includes the code necessary to run ashift reduce parser, a much faster constituent parser with competitive accuracy. Models for this parser are linked below.

Neural-network dependency parser

In version 3.5.0 (October 2014) we released ahigh-performance dependency parser powered by a neural network. The parser outputs typed dependency parses for English and Chinese. The models for this parser are included in the general Stanford Parser models package.

Dependency scoring

The package includes a tool for scoring of generic dependency parses, in a classedu.stanford.nlp.trees.DependencyScoring. This tool measures scores for dependency trees, doing F1 and labeled attachment scoring. The included usage message gives a detailed description of how to use the tool.

Usage notes

The current version of the parser requires Java 8 or later.(You can also download an old version of the parser, version 1.4,which runs under JDK 1.4, version 2.0 which runs under JDK 1.5, version 3.4.1which runs under JDK 1.6, but those distributions are no longer supported.)The parser also requires a reasonable amount of memory (at least 100MB to run as a PCFG parser on sentences up to 40 words in length; typically around 500MB of memory to be able to parse similarly long typical-of-newswire sentences using the factored model).

The parser is available for download,licensed under theGNUGeneral Public License (v2 or later). Source is included. The packageincludes components for command-line invocation, a Java parsingGUI, and a Java API.

The download is a 261 MB zipped file (mainly consisting of included grammar data files). If you unpack the zip file, you should have everything needed. Simple scripts are included to invoke the parser on a Unix or Windows system. For another system, you merely need to similarly configure the classpath.

Licensing

The parser code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under thefull GPL,which allows many free uses.For distributors ofproprietarysoftware,commercial licensing is available.If you don't need a commercial license, but would like to supportmaintenance of these tools, we welcome gift funding: usethis form and write "Stanford NLP Group open source software" in the Special Instructions.

Citing the Stanford Parser

The main technical ideas behind how these parsers work appear in thesepapers. Feel free to cite one or more of the following papers or people depending on what youare using. Since the parser is regularly updated, we appreciate it ifpapers with numerical results reflecting parser performance mention theversion of the parser being used!

For the neural-network dependency parser:
Danqi Chen and Christopher D Manning. 2014.A Fast and Accurate Dependency Parser using Neural Networks.Proceedings of EMNLP 2014
For the Compositional Vector Grammar parser (starting at version 3.2):
Richard Socher, John Bauer, Christopher D. Manning and Andrew Y. Ng. 2013.Parsing With Compositional Vector Grammars.Proceedings of ACL 2013
For the Shift-Reduce Constituency parser (starting at version 3.2):
This parser was written by John Bauer. You can thank him and cite the web page describing it:https://nlp.stanford.edu/software/srparser.html. You can also cite the original research papers of others mentioned on that page.
For the PCFG parser (which also does POS tagging):
Dan Klein and Christopher D. Manning. 2003.Accurate Unlexicalized Parsing.Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423-430.
For the factored parser (which also does POS tagging):
Dan Klein and Christopher D. Manning. 2003.Fast Exact Inference with a Factored Model for Natural Language Parsing. InAdvancesin Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA: MIT Press, pp. 3-10.
For the Universal Dependencies representation:
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič,Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira,Reut Tsarfaty, and Daniel Zeman. 2016.Universal Dependencies v1: A Multilingual Treebank Collection. InLREC 2016.
For the English Universal Dependencies converter and the enhanced English Universal Dependencies representation:
Sebastian Schuster and Christopher D. Manning. 2016.Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks.InLREC 2016.
For the (English) Stanford Dependencies representation:
Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning. 2006.GeneratingTyped Dependency Parses from Phrase Structure Parses. InLREC 2006.
For the German parser:
Anna Rafferty and Christopher D. Manning. 2008.Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines.InACL Workshop on Parsing German.
For the Chinese Parser:
Roger Levy and Christopher D. Manning.2003.Is it harder to parse Chinese, or the Chinese Treebank?ACL 2003, pp. 439-446.
For the Chinese Stanford Dependencies:
Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning.2009.Discriminative Reordering with Chinese Grammatical Relations Features.InProceedings of the Third Workshop on Syntax and Structure in Statistical Translation.
For the Arabic parser:
Spence Green and Christopher D. Manning.2010.Better Arabic Parsing: Baselines, Evaluations, and Analysis.InCOLING 2010.
For the French parser:
Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning.2010.Multiword Expression Identification with Tree Substitution Grammars: A Parsingtour de force with French..InEMNLP 2011.
For the Spanish parser:
Most of the work on Spanish was by Jon Gauthier. There is no published paper, but you can thank him and/or citethis webpage:https://nlp.stanford.edu/software/spanish-faq.html

Questions about the parser?

  1. If you're new to parsing, you can start by running the GUI to tryout the parser. Scripts are included for linux (lexparser-gui.sh) andWindows (lexparser-gui.bat).
  2. Take a look at the Javadoclexparser packagedocumentation andLexicalizedParser class documentation.(Point your web browser at theindex.html file in the includedjavadoc directory and navigate to those items.)
  3. Look at theparser FAQ for answers to common questions.
  4. If none of that helps, please seeour emailguidelines for instructions on how to reach us for further assistance.

Download



The standard download includes models for Arabic, Chinese, English, French, German, and Spanish. Thereare additional models we do not release with the standalone parser, including shift-reduce models, thatcan be found in the models jars for each language. Below are links to those jars.


Extensions: Packages by others using the parser

Java

  • tydevi Typed DependencyViewer that makes a picture of the Stanford Dependencies analysis of a sentence. By Bernard Bou.
  • DependenSee A Dependency Parse Visualisation Tool that makespictures of Stanford Dependency output. By Awais Athar. (GitHub)
  • GATEplug-in. By the GATE Team (esp. Adam Funk).
  • GrammarScopegrammatical relation browser. GUI, especially focusing on grammatical relations (typed dependencies), including an editor. ByBernard Bou.

PHP

  • PHP-Stanford-NLP. Supports POS Tagger, NER, Parser. By Anthony Gentile (agentile).

Python/Jython

Ruby

.NET / F# / C#

OS X

  • If you use Homebrew, you can install the Stanford Parser with:brew install stanford-parser

Release history


Version 4.2.02020-11-17Retrain English models with treebank fixesarabic  chinese  english  french  german  spanish
Version 4.0.02020-05-22Model tokenization updated to UDv2.0arabic  chinese  english  french  german  spanish
Version 3.9.22018-10-17Updated for compatibilityarabic  chinese  english  french  german  spanish
Version 3.9.12018-02-27new French and Spanish UD models, misc. UD enhancements, bug fixesarabic  chinese  english  french  german  spanish
Version 3.8.02017-06-09Updated for compatibilityarabic  chinese  english  french  german  spanish
Version 3.7.02016-10-31new UD modelsarabic  chinese  english  french  german  spanish
Version 3.6.02015-12-09Updated for compatibilitychinese  english  french  german  spanish
Version 3.5.22015-04-20Switch to universal dependenciesshift reduce parser models
Version 3.5.12015-01-29Dependency parser fixes and model improvementsshift reduce parser models
Version 3.5.02014-10-31Upgrade to Java 8; addneural-network dependency parsershift reduce parser models
Version 3.4.12014-08-27Add Spanish modelsshift reduce parser models
Version 3.42014-06-16Shift-reduce parser, dependency improvements, French parser uses CC tagsetshift reduce parser models
Version 3.3.12014-01-04English dependency "infmod" and "partmod" combined into "vmod", other minor dependency improvements
Version 3.3.02013-11-12English dependency "attr" removed, other dependency improvements, imperative training data added
Version 3.2.02013-06-20New CVG based English model with higher accuracy
Version 2.0.52013-04-05Dependency improvements, -nthreads option, ctb7 model
Version 2.0.42012-11-12Improved dependency code extraction efficiency, other dependency changes
Version 2.0.32012-07-09Minor bug fixes
Version 2.0.22012-05-22Some models now support training with extra tagged, non-tree data
Version 2.0.12012-03-09Caseless English model included, bugfix for enforced tags
Version 2.02012-02-03Threadsafe!
Version 1.6.92011-09-14Improved recognition of imperatives, dependencies now explicitely include a root, parser knows osprey is a noun
Version 1.6.82011-06-19New French model, improved foreign language models, bug fixes
Version 1.6.72011-05-18Minor bug fixes.
Version 1.6.62011-04-20Internal code and API changes (ArrayLists rather than Sentence; use of CoreLabel objects) to match tagger and CoreNLP.
Version 1.6.52010-11-30Further improvements to English Stanford Dependencies and other minor changes
Version 1.6.42010-08-20More minor bug fixes and improvements to English Stanford Dependencies and question parsing
Version 1.6.32010-07-09Improvements to English Stanford Dependencies and question parsing, minor bug fixes
Version 1.6.22010-02-26Improvements to Arabic parser models, and to English and Chinese Stanford Dependencies
Version 1.6.12008-10-26Slightly improved Arabic and German parsing, and Stanford Dependencies
Version 1.62007-08-19Added Arabic, k-best PCCFG parsing; improved English grammatical relations
Version 1.5.12006-06-11Improved English and Chinese grammatical relations; fixed UTF-8 handling
Version 1.52005-07-21Added grammatical relations output; fixed bugs introduced in 1.4
Version 1.42004-03-24Made PCFG faster again (by FSA minimization); added German support
Version 1.32003-09-06Made parser over twice as fast; added tokenization options
Version 1.22003-07-20Halved PCFG memory usage; added support for Chinese
Version 1.12003-03-25Improved parsing speed; included GUI, improved PCFG grammar
Version 1.02002-12-05Initial release

Sample input and output

The parser can read various forms of plain text input and can outputvarious analysis formats, including part-of-speech tagged text, phrasestructure trees, and a grammatical relations (typed dependency) format.For example, consider the text:

The strongest rain ever recorded in India shut down the financialhub of Mumbai, snapped communication lines, closed airports and forcedthousands of people to sleep in their offices or walk home during thenight, officials said today.

The following output showspart-of-speech tagged text, then a context-free phrase structure grammarrepresentation, and finally a typed dependency representation. All ofthese are different views of the output of the parser.

The/DT strongest/JJS rain/NN ever/RB recorded/VBN in/IN India/NNPshut/VBD down/RP the/DT financial/JJ hub/NN of/IN Mumbai/NNP ,/,snapped/VBD communication/NN lines/NNS ,/, closed/VBD airports/NNSand/CC forced/VBD thousands/NNS of/IN people/NNS to/TO sleep/VB in/INtheir/PRP$ offices/NNS or/CC walk/VB home/NN during/IN the/DT night/NN,/, officials/NNS said/VBD today/NN ./. (ROOT  (S    (S      (NP        (NP (DT The) (JJS strongest) (NN rain))        (VP          (ADVP (RB ever))          (VBN recorded)          (PP (IN in)            (NP (NNP India)))))      (VP        (VP (VBD shut)          (PRT (RP down))          (NP            (NP (DT the) (JJ financial) (NN hub))            (PP (IN of)              (NP (NNP Mumbai)))))        (, ,)        (VP (VBD snapped)          (NP (NN communication) (NNS lines)))        (, ,)        (VP (VBD closed)          (NP (NNS airports)))        (CC and)        (VP (VBD forced)          (NP            (NP (NNS thousands))            (PP (IN of)              (NP (NNS people))))          (S            (VP (TO to)              (VP                (VP (VB sleep)                  (PP (IN in)                    (NP (PRP$ their) (NNS offices))))                (CC or)                (VP (VB walk)                  (NP (NN home))                  (PP (IN during)                    (NP (DT the) (NN night))))))))))    (, ,)    (NP (NNS officials))    (VP (VBD said)      (NP-TMP (NN today)))    (. .)))det(rain-3, The-1)amod(rain-3, strongest-2)nsubj(shut-8, rain-3)nsubj(snapped-16, rain-3)nsubj(closed-20, rain-3)nsubj(forced-23, rain-3)advmod(recorded-5, ever-4)partmod(rain-3, recorded-5)prep_in(recorded-5, India-7)ccomp(said-40, shut-8)prt(shut-8, down-9)det(hub-12, the-10)amod(hub-12, financial-11)dobj(shut-8, hub-12)prep_of(hub-12, Mumbai-14)conj_and(shut-8, snapped-16)ccomp(said-40, snapped-16)nn(lines-18, communication-17)dobj(snapped-16, lines-18)conj_and(shut-8, closed-20)ccomp(said-40, closed-20)dobj(closed-20, airports-21)conj_and(shut-8, forced-23)ccomp(said-40, forced-23)dobj(forced-23, thousands-24)prep_of(thousands-24, people-26)aux(sleep-28, to-27)xcomp(forced-23, sleep-28)poss(offices-31, their-30)prep_in(sleep-28, offices-31)xcomp(forced-23, walk-33)conj_or(sleep-28, walk-33)dobj(walk-33, home-34)det(night-37, the-36)prep_during(walk-33, night-37)nsubj(said-40, officials-39)root(ROOT-0, said-40)tmod(said-40, today-41)

This output was generated with thecommand:

java -mx200m edu.stanford.nlp.parser.lexparser.LexicalizedParser-retainTMPSubcategories -outputFormat"wordsAndTags,penn,typedDependencies" englishPCFG.ser.gz mumbai.txt

[8]ページ先頭

©2009-2025 Movatter.jp