Create a content connector

Acontent connector is a software program used to traverse the data in anenterprise's repository and populate a data source. Google provides the followingoptions for developing content connectors:

The Content Connector SDK. This is a good option if you are programmingin Java. The Content Connector SDK is a wrapper aroundthe REST API allowing you to quickly create connectors. To create a contentconnector using the SDK, refer toCreate a content connector using the Content Connector SDK.
A low-level REST API or API libraries. Use these options if you're notprogramming in Java, or if your codebase better accommodates aREST API or a library. To create a content connector using the REST API, refertoCreate a content connector using the REST API.

A typical content connector performs the following tasks:

Reads and processes configuration parameters.
Pulls discrete chunks of indexable data, called "items," from the third-partycontent repository.
Combines ACLs, metadata, and content data into indexable items.
Indexes items to the Cloud Search data source.
(optional) Listens to change notifications from the third-party contentrepository. Change notifications are converted into indexing requests to keepthe Cloud Search data source in sync with the third-party repository. Theconnector only performs this task if the repository supports change detection.

Create a content connector using the Content Connector SDK

The following sections explain how to create a content connector using theContent Connector SDK.

Set up dependencies

You must include certain dependencies in your build file to use the SDK. Clickon a tab below to view the dependencies for your build environment:

Maven

<dependency><groupId>com.google.enterprise.cloudsearch</groupId><artifactId>google-cloudsearch-indexing-connector-sdk</artifactId><version>v1-0.0.3</version></dependency>

Gradle

compile group: 'com.google.enterprise.cloudsearch',        name: 'google-cloudsearch-indexing-connector-sdk',        version: 'v1-0.0.3'

Create your connector configuration

Every connector has a configuration file containing parameters used by theconnector, such as the ID for your repository. Parameters are defined askey-value pairs, such asapi.sourceId=1234567890abcdef.

The Google Cloud Search SDK contains several Google-supplied configurationparameters used by all connectors. You must declare the followingGoogle-supplied parameters in your configuration file:

For a content connector, you must declareapi.sourceId andapi.serviceAccountPrivateKeyFile as these parameters identify the locationof your repository and private key needed to access the repository.

Note: If your private key file is not a JSON key, you must also overrideapi.serviceAccountId.

For an identity connector, you must declareapi.identitySourceId as thisparameter identifies the location of your external identity source. If you aresyncing users, you must also declareapi.customerId as the unique ID foryour enterprise's Google Workspace account.

Unless you want to override the default values of other Google-suppliedparameters, you do not need to declare them in your configuration file.For additional information on the Google-supplied configuration parameters, suchas how to generate certain IDs and keys, refer to Google-supplied configuration parameters.

You can also define your own repository-specific parameters for use in yourconfiguration file.

Note: There is no strict naming requirement for the connectorproperties file, but we recommend saving the file using a.propertiesor.config extension.

Pass the configuration file to the connector

Set the system propertyconfig to pass the configuration file to yourconnector. You can set the property using the-D argument when startingthe connector. For example, the following command starts the connectorwith theMyConfig.properties configuration file:

java-classpathmyconnector.jar;...-Dconfig=MyConfig.propertiesMyConnector

If this argument is missing, the SDK attempts to access a default configurationfile namedconnector-config.properties.

Determine your traversal strategy

The primary function of a content connector is to traverse a repository andindex its data. You must implement a traversal strategy based on the size andlayout of data in your repository. You can design your own strategy or choosefrom the following strategies implemented in the SDK:

Full traversal strategy

A full traversal strategy scans the entire repository and blindly indexesevery item. This strategy is commonly used when you have a small repository andcan afford the overhead of doing a full traversal every time you index.

This traversal strategy is suitable for small repositories with mostlystatic, non-hierarchical, data. You might also use this traversal strategywhen change detection is difficult or not supported by the repository.

List traversal strategy

A list traversal strategy scans the entire repository, including all childnodes, determining the status of each item. Then, the connector takes a secondpass and only indexes items that are new or have been updated since the lastindexing. This strategy is commonly used to perform incrementalupdates to an existing index (instead of having to do a full traversal everytime you update the index).

This traversal strategy is suitable when change detection is difficult ornot supported by the repository, you have non-hierarchical data, and you areworking with very large data sets.

Graph traversal

A graph traversal strategy scans the entire parent node determining thestatus of each item. Then, the connector takes a second pass and only indexesitems in the root node are new or have been updated since the last indexing.Finally, the connector passes any child IDs then indexes items in the child nodesthat are new or have been updated. The connector continues recursively throughall child nodes until all items have been addressed. Such traversal is typicallyused for hierarchical repositories where listing of all IDs isn'tpractical.

This strategy is suitable if you have hierarchical data that needs to becrawled, such as a series of directories or web pages.

Note: The terms “item” and “document” are synonymous in this document andsample code.

Each of these traversal strategies is implemented by a template connectorclass in the SDK. While you can implement your own traversal strategy, thesetemplates greatly speed up the development of your connector. Tocreate a connector using a template, proceeed to the section corresponding toyour traversal strategy:

Create a full traversal connector using a template class

This section of the docs refers to code snippets from theFullTraversalSample example.

Implement the connector’s entry point

The entry point to a connector is themain() method. This method’s primary task is to create an instance of theApplicationclass and invoke itsstart()method to run the connector.

Before callingapplication.start(),use theIndexingApplication.Builderclass to instantiate theFullTraversalConnectortemplate. TheFullTraversalConnectoraccepts aRepositoryobject whose methods you implement. The following code snippet shows howto implement themain() method:

FullTraversalSample.java

Movatterモバイル変換

Create a content connector Stay organized with collections Save and categorize content based on your preferences.

Create a content connector using the Content Connector SDK

Set up dependencies

Maven

Gradle

Create your connector configuration

Pass the configuration file to the connector

Determine your traversal strategy

Create a full traversal connector using a template class

Implement the connector’s entry point

Implement theRepository interface

Get custom configuration parameters

Perform a full traversal

Set the permissions for an item

Set the metadata for an item

Create the indexable item

Package each indexable item in an iterator

Next Steps

Create a list traversal connector using a template class

Implement the connector’s entry point

Implement theRepository interface

Get custom configuration parameters

Perform the list traversal

Push item IDs and hash values

Retrieve and handle each item

Handle deleted items

Handle unchanged items

Set the permissions for an item

Set the metadata for an item

Create an indexable item

Next Steps

Create a graph traversal connector using a template class

Implement the connector’s entry point

Implement theRepository interface

Get custom configuration parameters

Perform the graph traversal

Push item IDs and hash values

Retrieve and handle each item

Handle deleted items

Set the permissions for an item

Set the metadata for an item

Create the indexable item

Place the child IDs in the Cloud Search Indexing Queue

Next Steps

Create a content connector using the REST API

Determine your traversal strategy

Implement your traversal strategy and index items

Handle repository changes

Create a content connector

Implement the`Repository` interface

Implement the`Repository` interface

Implement the`Repository` interface