Import logs from Cloud Storage to Cloud Logging

Last reviewed 2025-02-19 UTC

This reference architecture describes how you can import logs that werepreviously exported to Cloud Storage back to Cloud Logging.

This reference architecture is intended for engineers and developers,including DevOps, site reliability engineers (SREs), and security investigators,who want to configure and run the log importing job. This document assumes youare familiar with running Cloud Run jobs, and how to useCloud Storage and Cloud Logging.

Architecture

The following diagram shows how Google Cloud services are used in thisreference architecture:

Workflow diagram of log import from Cloud Storage to Cloud Logging.

This workflow includes the following components:

  • Cloud Storage bucket: Contains the previously exported logs you want to import back toCloud Logging. Because these logs were previously exported, they'reorganized in the expectedexport format.
  • Cloud Run job: Runs the import logs process:
    • Reads the objects that store log entries from Cloud Storage.
    • Finds exported logs for the specified log ID, in the requested time range, based on theorganization of the exported logs in the Cloud Storage bucket.
    • Converts the objects into Cloud Logging APILogEntry structures.MultipleLogEntry structures are aggregated into batches,to reduce Cloud Logging API quota consumption. The architecture handlesquota errors when necessary.
    • Writes the converted log entries to Cloud Logging.If you re-run the same job multiple times, duplicate entries can result.For more information, seeRun the import job.
  • Cloud Logging: Ingests and stores the converted log entries.The log entries are processed as described in theRouting and storage overview.
    • The Loggingquotas and limits apply,including the Cloud Logging API quotas and limits and a 30-dayretention period.This reference architecture is designed to work with the defaultwrite quotas, with a basic retrying mechanism. If your write quota islower than the default, the implementation might fail.
    • The imported logs aren't included inlog-based metrics,because their timestamps are in the past. However, if you opt touse a label,the timestamp records the import time, and the logs are included in themetric data.
  • BigQuery: Uses SQL to run analytical queries on imported logs (optional).To import audit logs from Cloud Storage, this architecturemodifies the log IDs; you must account for this renaming whenyou query the imported logs.

Use case

You might choose to deploy this architecture if your organization requiresadditional log analysis for incident investigations or other audits of pastevents. For example, you might want to analyze connections to your databases forthe first quarter of the last year, as a part of a database access audit.

Design alternatives

This section describes alternatives to the default design shown in this referencearchitecture document.

Retention period and imported logs

Cloud Logging requires incoming log entries to have timestamps that don'texceed a 30-dayretention period.Imported log entries with timestamps older than 30 days from the import time arenot stored.

This architecture validates the date range set in the Cloud Runjob to avoid importing logs that are older than 29 days, leaving a one-daysafety margin.

To import logs older than 29 days, you need to make thefollowing changes to the implementation code,and then build a new container image to use in the Cloud Run jobconfiguration.

  • Remove the 30-day validation of the date range
  • Add the original timestamp as a user label to the log entry
  • Reset the timestamp label of the log entry to allow it to be ingestedwith the current timestamp

When you use this modification, you must use thelabels field instead of thetimestamp field in your Log Analytics queries. For more information about Log Analyticsqueries and samples, seeSample SQL queries.

Design considerations

The following guidelines can help you to develop an architecture that meetsyour organization's requirements.

Cost optimization

The cost for importing logs by using this reference architecture has multiplecontributing factors.

You use the following billable components of Google Cloud:

Consider the following factors that might increase costs:

  • Log duplication: To avoid additional log storage costs, don't runthe import job with the same configuration multiple times.
  • Storage in additional destinations: To avoid additional log storagecosts, disable routing policies at the destination project to prevent logstorage in additional locations or forwarding logs to other destinationssuch as Pub/Sub or BigQuery.
  • Additional CPU and memory: If your import job times out, you mightneed to increase the import job CPU and memory in yourimport job configuration.Increasing these values might increase incurred Cloud Run costs.
  • Additional tasks: If the expected number of logs to be imported eachday within the time range is high, you might need to increase the number oftasks in theimport job configuration.The job will split the time range equally between the tasks, so each taskwill process a similar number of days from the range concurrently.Increasing the number of tasks might increase incurredCloud Run costs.
  • Storage class: If your Cloud Storage bucket's storage class is other thanStandard, such as Nearline, Durable Reduced Availability (DRA), orColdline, you might incur additional charges.
  • Data traffic between different locations: Configure the import jobto run in the same location as the Cloud Storage bucket from which you import thelogs. Otherwise,network egress costs might be incurred.

To generate a cost estimate based on your projected usage, includingCloud Run jobs, use thepricing calculator.

Operational efficiency

This section describes considerations for managing analytical queries afterthe solution is deployed.

Log names and queries

Logs are stored to the project that is defined in thelogName field of the log entry. To import the logs to the selected project, this architecturemodifies thelogName field of each imported log. The import logs are stored inthe selected project's default log bucket that has the log IDimported_logs (unlessthe project has a log routing policy that changes the storage destination).The original value of thelogName field is preserved in thelabels field with the keyoriginal_logName.

You must account for the location of the originallogName value when you querythe imported logs. For more information about Log Analytics queries and samples,seeSample SQL queries.

Performance optimization

If the volume of logs that you're importing exceeds Cloud Run capacitylimits, the job might time out before the import is complete. To prevent an incompletedata import, consider increasing thetasks value in theimport job. IncreasingCPUandmemory resources can also helpimprove task performance when you increase the number of tasks.

Deployment

To deploy this architecture, seeDeploy a job to import logs from Cloud Storage to Cloud Logging.

What's Next

Contributors

Author:Leonid Yankulin | Developer Relations Engineer

Other contributors:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-02-19 UTC.