A dependency parse is a tool that shows dependencies in a sentence. For example, in the sentenceThe cat wore a hat, the root of the sentence is the verb,wore, and both the subject,the cat, and the object,a hat, are dependents. The dependency parse can be very useful in many NLP tasks since it shows the grammatical structure of the sentence, with the subject, the main verb, the object, and so on. It can then be used indownstream processing.
- Run the file and languageutility notebooks:
%run -i "../util/file_utils.ipynb"%run -i "../util/lang_utils.ipynb"
- Define the sentence we willbe parsing:
sentence = 'I have seldom heard him mention her under any other name.'
- Define a function that will print the word, its grammatical function embedded in the
dep_
attribute, and the explanation of that attribute. Thedep_
attribute of theToken
object shows the grammatical function of the word inthe sentence:def print_dependencies(sentence, model): doc = model(sentence) for token in doc: print(token.text, "\t", token.dep_, "\t", spacy.explain(token.dep_))
- Now, let’s use this function on the first sentence in our list. We can see that the verb
heard
is theROOT
word of the sentence, with all other words dependingon it:print_dependencies(sentence, small_model)
The result should beas follows:
I nsubj nominal subjecthave aux auxiliaryseldom advmod adverbial modifierheard ROOT roothim nsubj nominal subjectmention ccomp clausal complementher dobj direct objectunder prep prepositional modifierany det determinerother amod adjectival modifiername pobj object of preposition. punct punctuation
- To explore the dependency parse structure, we can use the attributes of the
Token
class. Using theancestors
andchildren
attributes, we can get the tokens that this token depends on and the tokens that depend on it, respectively. The function to print the ancestors isas follows:def print_ancestors(sentence, model): doc = model(sentence) for token in doc: print(token.text, [t.text for t in token.ancestors])
- Now, let’s use this function on the first sentence inour list:
print_ancestors(sentence, small_model)
The output will be as follows. In the result, we see thatheard
has no ancestors since it is the main word in the sentence. All other words depend on it, and in fact, containheard
in theirancestor lists.
The dependency chain can be seen by following the ancestor links for each word. For example, if we look at the wordname
, we see that its ancestors areunder
,mention
, andheard
. The immediate parent ofname
isunder
, the parent ofunder
ismention
, and the parent ofmention
isheard
. A dependency chain will always lead to the root, or the main word, ofthe sentence:
I ['heard']have ['heard']seldom ['heard']heard []him ['mention', 'heard']mention ['heard']her ['mention', 'heard']under ['mention', 'heard']any ['name', 'under', 'mention', 'heard']other ['name', 'under', 'mention', 'heard']name ['under', 'mention', 'heard']. ['heard']
- To see all the children, use the following function. This function prints out each word and the words that depend on it,itschildren:
def print_children(sentence, model): doc = model(sentence) for token in doc: print(token.text,[t.text for t in token.children])
- Now, let’s use this function on the first sentence inour list:
print_children(sentence, small_model)
The result should be as follows. Now, the wordheard
has a list of words that depend on it since it is the main word inthe sentence:
I []have []seldom []heard ['I', 'have', 'seldom', 'mention', '.']him []mention ['him', 'her', 'under']her []under ['name']any []other []name ['any', 'other']. []
- We can also see left and right children in separate lists. In the following function, we print the childrenas two separate lists, left and right. This can be useful when doing grammatical transformations inthe sentence:
def print_lefts_and_rights(sentence, model): doc = model(sentence) for token in doc: print(token.text, [t.text for t in token.lefts], [t.text for t in token.rights])
- Let’s use this function on the first sentence inour list:
print_lefts_and_rights(sentence, small_model)
The result should beas follows:
I [] []have [] []seldom [] []heard ['I', 'have', 'seldom'] ['mention', '.']him [] []mention ['him'] ['her', 'under']her [] []under [] ['name']any [] []other [] []name ['any', 'other'] []. [] []
- We can also see thesubtree that the token is in by usingthis function:
def print_subtree(sentence, model): doc = model(sentence) for token in doc: print(token.text, [t.text for t in token.subtree])
- Let’s use this function on the first sentence inour list:
print_subtree(sentence, small_model)
The result should be as follows. From the subtrees that each word is part of, we can see the grammatical phrases that appear in the sentence, such as thenoun phrase,any other name
, and theprepositional phrase,under any
other name
:
I ['I']have ['have']seldom ['seldom']heard ['I', 'have', 'seldom', 'heard', 'him', 'mention', 'her', 'under', 'any', 'other', 'name', '.']him ['him']mention ['him', 'mention', 'her', 'under', 'any', 'other', 'name']her ['her']under ['under', 'any', 'other', 'name']any ['any']other ['other']name ['any', 'other', 'name']. ['.']