The Google Natural Language API reveals the structure and meaning oftext by offering powerful machine learning models in an easy to use RESTAPI. You can use it to extract information about people, places, eventsand much more, mentioned in text documents, news articles or blog posts.You can also use it to understand sentiment about your product on socialmedia or parse intent from customer conversations happening in a callcenter or a messaging app.
Read moreon theGoogle Natural Language API
The Natural Language API returns natural language understandingtechnologies. You can call them individually, or the default is toreturn them all. The available returns are:
- Entity analysis - Finds named entities (currently propernames and common nouns) in the text along with entity types, salience,mentions for each entity, and other properties. If possible, will alsoreturn metadata about that entity such as a Wikipedia URL.
- Syntax - Analyzes the syntax of the text and providessentence boundaries and tokenization along with part of speech tags,dependency trees, and other properties.
- Sentiment - The overall sentiment of the text, representedby a magnitude
[0, +inf]
and score between-1.0
(negative sentiment) and1.0
(positivesentiment). - Content Classification - Analyzes a document and returns alist of content categories that apply to the text found in the document.A complete list of content categories can be foundhere.
Demo for Entity Analysis
You can pass a vector of text which will call the API for eachelement. The return is a list of responses, each response being a listof tibbles holding the different types of analysis.
library(googleLanguageR)# random text form wikipediatexts<-c("Norma is a small constellation in the Southern Celestial Hemisphere between Ara and Lupus, one of twelve drawn up in the 18th century by French astronomer Nicolas Louis de Lacaille and one of several depicting scientific instruments. Its name refers to a right angle in Latin, and is variously considered to represent a rule, a carpenter's square, a set square or a level. It remains one of the 88 modern constellations. Four of Norma's brighter stars make up a square in the field of faint stars. Gamma2 Normae is the brightest star with an apparent magnitude of 4.0. Mu Normae is one of the most luminous stars known, but is partially obscured by distance and cosmic dust. Four star systems are known to harbour planets. ","Solomon Wariso (born 11 November 1966 in Portsmouth) is a retired English sprinter who competed primarily in the 200 and 400 metres.[1] He represented his country at two outdoor and three indoor World Championships and is the British record holder in the indoor 4 × 400 metres relay.")nlp_result<-gl_nlp(texts)
Each text has its own entry in returned tibbles
str(nlp_result,max.level =2)List of7$ sentences:List of2 ..$:'data.frame':7 obs. of4 variables: ..$:'data.frame':1 obs. of4 variables:$ tokens:List of2 ..$:'data.frame':139 obs. of17 variables: ..$:'data.frame':54 obs. of17 variables:$ entities:List of2 ..$:Classes ‘tbl_df’, ‘tbl’ and'data.frame':52 obs. of9 variables: ..$:Classes ‘tbl_df’, ‘tbl’ and'data.frame':8 obs. of9 variables:$ language: chr [1:2]"en""en"$ text: chr [1:2]"Norma is a small constellation in the Southern Celestial Hemisphere between Ara and Lupus, one of twelve drawn "| __truncated__"Solomon Wariso (born 11 November 1966 in Portsmouth) is a retired English sprinter who competed primarily in th"| __truncated__$ documentSentiment:Classes ‘tbl_df’, ‘tbl’ and'data.frame':2 obs. of2 variables: ..$ magnitude: num [1:2]2.40.1 ..$ score: num [1:2]0.30.1$ classifyText:Classes ‘tbl_df’, ‘tbl’ and'data.frame':1 obs. of2 variables: ..$ name: chr"/Science/Astronomy" ..$ confidence: num0.93
Sentence structure and sentiment:
## sentences structurenlp_result$sentences[[2]]content1 SolomonWariso (born11 November1966in Portsmouth) is a retired English sprinter who competed primarilyin the200 and400 metres.[1] He represented his country at two outdoor and three indoor World Championships and is the British record holderin the indoor4 ×400 metres relay. beginOffset magnitude score100.10.1
Information on what words (tokens) are within each text:
# word tokens datastr(nlp_result$tokens[[1]])'data.frame':139 obs. of17 variables:$ content: chr"Norma""is""a""small" ...$ beginOffset: int06911173134384757 ...$ tag: chr"NOUN""VERB""DET""ADJ" ...$ aspect: chr"ASPECT_UNKNOWN""ASPECT_UNKNOWN""ASPECT_UNKNOWN""ASPECT_UNKNOWN" ...$ case: chr"CASE_UNKNOWN""CASE_UNKNOWN""CASE_UNKNOWN""CASE_UNKNOWN" ...$ form: chr"FORM_UNKNOWN""FORM_UNKNOWN""FORM_UNKNOWN""FORM_UNKNOWN" ...$ gender: chr"GENDER_UNKNOWN""GENDER_UNKNOWN""GENDER_UNKNOWN""GENDER_UNKNOWN" ...$ mood: chr"MOOD_UNKNOWN""INDICATIVE""MOOD_UNKNOWN""MOOD_UNKNOWN" ...$ number: chr"SINGULAR""SINGULAR""NUMBER_UNKNOWN""NUMBER_UNKNOWN" ...$ person: chr"PERSON_UNKNOWN""THIRD""PERSON_UNKNOWN""PERSON_UNKNOWN" ...$ proper: chr"PROPER""PROPER_UNKNOWN""PROPER_UNKNOWN""PROPER_UNKNOWN" ...$ reciprocity: chr"RECIPROCITY_UNKNOWN""RECIPROCITY_UNKNOWN""RECIPROCITY_UNKNOWN""RECIPROCITY_UNKNOWN" ...$ tense: chr"TENSE_UNKNOWN""PRESENT""TENSE_UNKNOWN""TENSE_UNKNOWN" ...$ voice: chr"VOICE_UNKNOWN""VOICE_UNKNOWN""VOICE_UNKNOWN""VOICE_UNKNOWN" ...$ headTokenIndex: int1144149995 ...$ label: chr"NSUBJ""ROOT""DET""AMOD" ...$ value: chr"Norma""be""a""small" ...
What entities within text have been identified, with optionalwikipedia URL if its available.
nlp_result$entities[[1]]# A tibble: 52 x 9 name type salience mid wikipedia_url magnitude score beginOffset mention_type<chr><chr><dbl><chr><chr><dbl><dbl><int><chr>1 angle OTHER0.0133NANA00261 COMMON2 Ara ORGANIZATION0.0631NANA0076 PROPER3 astronomerNANANANANANA144 COMMON4 carpenter PERSON0.0135NANA00328 COMMON5 constellation OTHER0.150NANA0017 COMMON6 constellations OTHER0.0140NANA0.90.9405 COMMON7 distance OTHER0.00645NANA00649 COMMON8 dust OTHER0.00645NANA0.3-0.3669 COMMON9 field LOCATION0.00407NANA0.6-0.6476 COMMON10 French LOCATION0.0242NANA00137 PROPER# ... with 42 more rows[[2]]# A tibble: 8 x 9 name type salience mid wikipedia_url magnitude score beginOffset mention_type<chr><chr><dbl><chr><chr><dbl><dbl><int><chr>1 British LOCATION0.0255NANA00226 PROPER2 country LOCATION0.0475NANA00155 COMMON3 English OTHER0.0530NANA0066 PROPER4 Portsmouth LOCATION0.0530/m/0619_ https://en.wiki…0041 PROPER5 record holder PERSON0.0541NANA00234 COMMON6 Solomon Wariso ORGANIZATION0.156/g/120x5nf6 https://en.wiki…000 PROPER7 sprinter PERSON0.600NANA0074 COMMON8 World Championships EVENT0.0113NANA0.10.1195 PROPER
Sentiment of the entire text:
The category for the text as defined by the listhere.
The language for the text:
nlp_result$language# [1] "en" "en"
The original passed in text, to aid with working with the output:
nlp_result$text[1]"Norma is a small constellation in the Southern Celestial Hemisphere between Ara and Lupus, one of twelve drawn up in the 18th century by French astronomer Nicolas Louis de Lacaille and one of several depicting scientific instruments. Its name refers to a right angle in Latin, and is variously considered to represent a rule, a carpenter's square, a set square or a level. It remains one of the 88 modern constellations. Four of Norma's brighter stars make up a square in the field of faint stars. Gamma2 Normae is the brightest star with an apparent magnitude of 4.0. Mu Normae is one of the most luminous stars known, but is partially obscured by distance and cosmic dust. Four star systems are known to harbour planets."[2]"Solomon Wariso (born 11 November 1966 in Portsmouth) is a retired English sprinter who competed primarily in the 200 and 400 metres.[1] He represented his country at two outdoor and three indoor World Championships and is the British record holder in the indoor 4 × 400 metres relay."