Deploy to Data Hub Service

Data Hub Service

You can deploy yourData Hub project in the cloud instead of setting up your own. TheData Hub Service (DHS) is a cloud-based solution that provides a preconfigured MarkLogic cluster in which you can run flows and from which you can serve harmonized data.

You can useMarkLogic Data Hub to develop and test your project locally (your development environment) then deploy it to a DHS cluster (your production environment). Alternatively, you can have both development and production environments in DHS instances and useHub Central as your development tool.

Tip: You can have multiple services that use the sameData Hub project files. For example, you can set up a DHS project as a testing environment and another as your production environment, using the same project files in both environments.

In a DHS environment, the databases, app servers, and security roles are automatically set up. Admins can create user accounts.

The following configurations might be different between on-premises projects and DHS projects:

  • Roles — The DHS roles are automatically created as part of provisioning your DHS environment.
  • Database names — If database names are customized in theData Hub environment, they might be different.
  • Gradle settings — Thegradle.properties file contains some DHS-only settings, includingmlIsHostLoadBalancer andmlIsProvisionedEnvironment, which are set totrue to enableData Hub to work correctly in DHS.

The configurations for ports and load balancers for app servers are the same between on-premises projects and DHS projects.

To learn more about Data Hub Service (DHS), go totheData Hub Service overview andtheDHS-AWS documentation ortheDHS-Azure documentation.

Before you begin

  • AData Hub project that has been set up and tested locally
  • A provisioned MarkLogic Data Hub Service environment withData Hub
    • For private endpoints, a bastion host inside a virtual network
    • Information from your DHS administrator:
      • Your DHS host name (typically, the curation endpoint)
      • REST curation endpoint URL (including port number) for testing
      • The username and password of the user account associated with the roles required to deploy to your DHS instance.

Procedure

  1. Copy your entireData Hub project directory to the machine from which you will access the endpoints, and perform the following steps on that machine.
  2. Open a command-line window, and navigate to yourData Hub project root directory.
  3. Set up yourgradle-dhs.properties file.
    1. Download the Gradle configuration file from your Data Hub Service instance to your project root.
      Note: By default, the downloaded file is namedgradle-dhs.properties. If you use a different filename,
      • The filename must be in the formatgradle-env.properties, whereenv is any string you want to represent an environment. For example, you can store the settings for your development environment ingradle-dev.properties.
      • Remember to update the value of the-PenvironmentName parameter toenv in the Gradle commands in the following steps.
    2. Set the values for the usernames and passwords as indicated in the configuration file.
  4. Deploy your modules and other resources, including indexes.

    Depending on the roles assigned to your user account, you can deploy different assets using the appropriatehubDeploy task.

    Important: To disable TDE (Template Driven Extration) generation, settdeGenerationDisabled to true when deploying the project artifacts.
    Role(s)Use this Gradle taskTo deploy
    data-hub-developer
    ./gradlew hubDeployAsDeveloper -PenvironmentName=dhs -igradlew.bat hubDeployAsDeveloper -PenvironmentName=dhs -i
    • User modules and artifacts (entities, flows, mappings, and step definitions)
    • Alert configurations, rules, and actions
    • STAGING, FINAL, and JOBS database indexes
    • Scheduled tasks
    • Schemas
    • Temporal axes and collections
    • Triggers
    • Protected paths and query rolesets
    data-hub-security-admin
    ./gradlew hubDeployAsSecurityAdmin -PenvironmentName=dhs -igradlew.bat hubDeployAsSecurityAdmin -PenvironmentName=dhs -i
    • Definitions of custom roles and privileges with the following restrictions:
      • A custom role cannot inherit from any other role.
      • A custom role can only inherit privileges granted to the user creating the role.
      • A customexecute privilege must be assigned an action starting withhttp://datahub.marklogic.com/custom/.
    Bothdata-hub-developer anddata-hub-security-admin
    ./gradlew hubDeploy -PenvironmentName=dhs -igradlew.bat hubDeploy -PenvironmentName=dhs -i
    • All of the above
    Bothdata-hub-developer anddata-hub-security-admin
    ./gradlew hubDeployToReplica -PenvironmentName=dhs -igradlew.bat hubDeployToReplica -PenvironmentName=dhs -i
    • Configuration changes to the disaster recovery cluster
      Note: This task does not write to the databases.

    Learn more:Users and Roles

    Learn more abouthubDeploy andhubDeployAsDeveloper.
  5. Run a flow with an ingestion step.

    You can use any of the following:

  6. Run a flow with a mapping step and/or a mastering step.
    ./gradlew hubRunFlow -PflowName=your-flow-name -PentityName=your-entity-name -PenvironmentName=dhs -igradlew.bat hubRunFlow -PflowName=your-flow-name -PentityName=your-entity-name -PenvironmentName=dhs -i
    Important: If the value of a Gradle parameter contains a blank space, youmust enclose the value in double quotation marks. If the value does not contain a blank space, youmust not enclose the value in quotation marks.
  7. Verify that your documents are in the databases.
    1. In the following URLs, replaceOPERATIONS-REST-ENDPOINT-URL andCURATION-REST-ENDPOINT-URL with the appropriate endpoint URLs from your DHS administrator.
      Final databasehttp://OPERATIONS-REST-ENDPOINT-URL:8011/v1/search
      Staging databasehttp://CURATION-REST-ENDPOINT-URL:8010/v1/search

      Example:http://internal-mlaas-xxx-xxx-xxx.us-west-2.elb.amazonaws.com:8011/v1/search

      Tip: Narrow the search to return fewer items. SeeMarkLogic REST API Search.
    2. In a web browser, navigate to one of the URLs.
    The result is an XML list of all your documents in the database. Each item in the list includes the document's URI, path, and other metadata, as well as a preview of the content.

What to do next

If you update your flows after the initial project upload, you can redeploy your flow updates by running the role-appropriatehubDeploy* Gradle task again and then running the flows.