Content Classification Tutorial

Audience

This tutorial is designed to let you quickly start exploringand developing applications with the Cloud Natural Language API. It isdesigned for people familiar with basic programming, though even without muchprogramming knowledge, you should be able to follow along. Having walked throughthis tutorial, you should be able to use theReference documentation to create your ownbasic applications.

This tutorial steps through a Natural Language application using Pythoncode. The purpose here is not to explain the Python client libraries, but toexplain how to make calls to the Natural Language API. Applications in Javaand Node.js are essentially similar. Consult the Natural Language APISamples for samples in other languages (including the sample inthis tutorial).

Prerequisites

This tutorial has several prerequisites:

You'veset up a Cloud Natural Language project in the Google Cloud console.
You've set up your environment usingApplication Default Credentials in the Google Cloud console.
You are familiar withPython in the Google Cloud console programming.
You have set up your Python development environment. It is recommended that you havethe latest version of Python,pip, andvirtualenv installed on your system.For instructions, see thePython Development Environment Setup Guidefor Google Cloud Platform.
You've installed theGoogle Cloud Client Library for Python

Overview

This tutorial walks you through a basic Natural Language application, usingclassifyText requests, which classifies content into categories along witha confidence score, such as:

category: "/Internet & Telecom/Mobile & Wireless/Mobile Apps & Add-Ons"confidence: 0.6499999761581421

To see the list of all available category labels, seeCategories.

In this tutorial, you will create an application to perform the following tasks:

Classify multiple text files and write the result to an index file.
Process input query text to find similar text files.
Process input query category labels to find similar text files.

The tutorial uses content from Wikipedia. You could create a similar applicationto process news articles, online comments, and so on.

Source Files

You can find the tutorial source code in thePython Client Library Sampleson GitHub.

This tutorial uses sample source text from Wikipedia. You can find thesample text files in theresources/textsfolder of the GitHub project.

Importing libraries

To use the Cloud Natural Language API, you must to import thelanguage module from thegoogle-cloud-language library. Thelanguage.types modulecontains classes that are required for creating requests. Thelanguage.enums moduleis used to specify the type of the input text. This tutorialclassifies plain text content (language.enums.Document.Type.PLAIN_TEXT).

To calculate the similarity between text based on their resultingcontent classification, this tutorial usesnumpy for vector calculations.

Python

To learn how to install and use the client library for Natural Language, seeNatural Language client libraries. For more information, see theNatural LanguagePython API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importargparseimportjsonimportosfromgoogle.cloudimportlanguage_v1importnumpy

Step 1. Classify content

You can use the Python client library to make a request to theNatural Language API to classify content. The Python client libraryencapsulates the details for requests to and responses from theNatural Language API.

Theclassify function in the tutorial calls the Natural Language APIclassifyText method, by first creatingan instance of theLanguageServiceClient class, and then calling theclassify_textmethod of theLanguageServiceClient instance.

The tutorialclassify function only classifies text content for thisexample. You can also classify the content ofa web page by passing in the source HTML of the web page as thetextand by setting thetype parameter tolanguage.enums.Document.Type.HTML.

For more information, seeClassifying Content.For details about the structure of requests to the Natural Language API, see theNatural Language Reference.

Python

To learn how to install and use the client library for Natural Language, seeNatural Language client libraries. For more information, see theNatural LanguagePython API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

defclassify(text,verbose=True):"""Classify the input text into categories."""language_client=language_v1.LanguageServiceClient()document=language_v1.Document(content=text,type_=language_v1.Document.Type.PLAIN_TEXT)response=language_client.classify_text(request={"document":document})categories=response.categoriesresult={}forcategoryincategories:# Turn the categories into a dictionary of the form:# {category.name: category.confidence}, so that they can# be treated as a sparse vector.result[category.name]=category.confidenceifverbose:print(text)forcategoryincategories:print("="*20)print("{:<16}:{}".format("category",category.name))print("{:<16}:{}".format("confidence",category.confidence))returnresult

The returned result is a dictionary with the category labels as keys, andconfidence scores as values, such as:

{    "/Computers & Electronics": 0.800000011920929,    "/Internet & Telecom/Mobile & Wireless/Mobile Apps & Add-Ons": 0.6499999761581421}

The tutorial Python script is organized so that it can be run from the command line for quickexperiments. For example you can run:

python classify_text_tutorial.py classify "Google Home enables users to speak voice commands to interact with services through the Home's intelligent personal assistant called Google Assistant. A large number of services, both in-house and third-party, are integrated, allowing users to listen to music, look at videos or photos, or receive news updates entirely by voice. "

Note: The content to be classified must have at least 20 tokens (words)in order for the Natural Language API to return a response.

Step 2. Index multiple text files

Theindex function in the tutorial script takes, as input, a directorycontaining multiple text files, and the path to a file where it storesthe indexed output (the default file name isindex.json).Theindex function reads the contentof each text file in the input directory, and then passes the text filesto the Cloud Natural Language API to be classified intocontent categories.

Python

To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see theNatural LanguagePython API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

defindex(path,index_file):"""Classify each text file in a directory and write    the results to the index_file.    """result={}forfilenameinos.listdir(path):file_path=os.path.join(path,filename)ifnotos.path.isfile(file_path):continuetry:withopen(file_path)asf:text=f.read()categories=classify(text,verbose=False)result[filename]=categoriesexceptException:print(f"Failed to process{file_path}")withopen(index_file,"w",encoding="utf-8")asf:f.write(json.dumps(result,ensure_ascii=False))print(f"Texts indexed in file:{index_file}")returnresult

The results from the Cloud Natural Language API for each file are organized into a single dictionary,serialized as a JSON string, and then written to a file. For example:

{    "android.txt": {        "/Computers & Electronics": 0.800000011920929,        "/Internet & Telecom/Mobile & Wireless/Mobile Apps & Add-Ons": 0.6499999761581421    },    "google.txt": {        "/Internet & Telecom": 0.5799999833106995,        "/Business & Industrial": 0.5400000214576721    }}

To index text files from the command line with the default output filenameindex.json, run the following command:

python classify_text_tutorial.py index resources/texts

Step 3. Query the index

Query with category labels

Once the index file (default file name =index.json) has been created, we canmake queries to the index to retrieve some of the filenames and theirconfidence scores.

One way to do this is to use a category label as the query, which the tutorialaccomplishes with thequery_category function. The implementation ofthe helper functions, such assimilarity, can be found in theclassify_text_tutorial.py file. In your applications thesimilarity scoring and ranking should be carefully designed around specificuse cases.

Python

To learn how to install and use the client library for Natural Language, seeNatural Language client libraries. For more information, see theNatural LanguagePython API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

defquery_category(index_file,category_string,n_top=3):"""Find the indexed files that are the most similar to    the query label.    The list of all available labels:    https://cloud.google.com/natural-language/docs/categories    """withopen(index_file)asf:index=json.load(f)# Make the category_string into a dictionary so that it is# of the same format as what we get by calling classify.query_categories={category_string:1.0}similarities=[]forfilename,categoriesinindex.items():similarities.append((filename,similarity(query_categories,categories)))similarities=sorted(similarities,key=lambdap:p[1],reverse=True)print("="*20)print(f"Query:{category_string}\n")print(f"\nMost similar{n_top} indexed texts:")forfilename,siminsimilarities[:n_top]:print(f"\tFilename:{filename}")print(f"\tSimilarity:{sim}")print("\n")returnsimilarities

For a list of allof the available categories, seeCategories.

As before, you can call thequery_category function from the command line:

python classify_text_tutorial.py query-category index.json "/Internet & Telecom/Mobile & Wireless"

You should see output similar to the following:

Query: /Internet & Telecom/Mobile & WirelessMost similar 3 indexed texts:  Filename: android.txt  Similarity: 0.665573579045  Filename: google.txt  Similarity: 0.517527175966  Filename: gcp.txt  Similarity: 0.5

Query with text

Alternatively, you can query with text that may not be part of the indexedtext. The tutorialquery function is similar to thequery_category function,with the added step of making aclassifyText request for the text input, andusing the results to query the index file.

Python

To learn how to install and use the client library for Natural Language, seeNatural Language client libraries. For more information, see theNatural LanguagePython API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

defquery(index_file,text,n_top=3):"""Find the indexed files that are the most similar to    the query text.    """withopen(index_file)asf:index=json.load(f)# Get the categories of the query text.query_categories=classify(text,verbose=False)similarities=[]forfilename,categoriesinindex.items():similarities.append((filename,similarity(query_categories,categories)))similarities=sorted(similarities,key=lambdap:p[1],reverse=True)print("="*20)print(f"Query:{text}\n")forcategory,confidenceinquery_categories.items():print(f"\tCategory:{category}, confidence:{confidence}")print(f"\nMost similar{n_top} indexed texts:")forfilename,siminsimilarities[:n_top]:print(f"\tFilename:{filename}")print(f"\tSimilarity:{sim}")print("\n")returnsimilarities

To do this from the command line, run:

python classify_text_tutorial.py query index.json "Google Home enables users to speak voice commands to interact with services through the Home's intelligent personal assistant called Google Assistant. A large number of services, both in-house and third-party, are integrated, allowing users to listen to music, look at videos or photos, or receive news updates entirely by voice. "

This prints something similar to the following:

Query:GoogleHomeenablesuserstospeakvoicecommandstointeractwithservicesthroughtheHome'sintelligentpersonalassistantcalledGoogleAssistant.Alargenumberofservices,bothin-houseandthird-party,areintegrated,allowinguserstolistentomusic,lookatvideosorphotos,orreceivenewsupdatesentirelybyvoice.Category:/Internet &Telecom,confidence:0.509999990463Category:/Computers & Electronics/Software,confidence:0.550000011921Mostsimilar3indexedtexts:Filename:android.txtSimilarity:0.600579500049Filename:google.txtSimilarity:0.401314790229Filename:gcp.txtSimilarity:0.38772339779

What's next

With the content classification API you can create other applications. For example:

Classify every paragraph in an article to see the transition between topics.
Classify timestamped content and analyze the trend of topics over time.
Compare content categories with content sentiment using theanalyzeSentiment method.
Compare content categories with entities mentioned in the text.

Additionally, other Google Cloud Platform products can be used to streamline your workflow:

In the sample application for this tutorial, we processed local text files,but you can modify the codeto process text files stored in a Google Cloud Storage bucketby passing a Google Cloud Storage URI to theclassify_text method.
In the sample application for this tutorial, we stored the index file locally,and each query is processed by reading through the whole index file. Thismeans high latency if you have a large amount of indexed data or if you need toprocess numerous queries.Datastore is a natural and convenient choicefor storing the index data.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Content Classification Tutorial Stay organized with collections Save and categorize content based on your preferences.

Audience

Prerequisites

Overview

Source Files

Importing libraries

Python

Step 1. Classify content

Python

Step 2. Index multiple text files

Python

Step 3. Query the index

Query with category labels

Python

Query with text

Python

What's next

Content Classification Tutorial