Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: check if cache latest files enabled for cache local file#8219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
hengfeiyang merged 1 commit intomainfromfix/compactor-cache
Sep 1, 2025

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyanghengfeiyang commentedSep 1, 2025
edited by github-actionsbot
Loading

User description

We also should check if cache latest files enabled when we try to cache local file to disk cache.


PR Type

Bug fix


Description

  • Guard caching withenabled flag

  • Apply to parquet merge paths

  • Apply to Tantivy index caching

  • Prevent unintended disk writes


Diagram Walkthrough

flowchart LR  CFG["Config: cache_latest_files.enabled"]  P1["parquet.rs merge_files"]  M1["compact/merge.rs merge_files (path 1)"]  M2["compact/merge.rs merge_files (path 2)"]  T1["tantivy create_tantivy_index"]  DISK["disk cache set(...)"]  CFG -- "enabled && cache_* && download_from_node" --> P1  CFG -- "enabled && cache_* && download_from_node" --> M1  CFG -- "enabled && cache_* && download_from_node" --> M2  CFG -- "enabled && cache_* && download_from_node" --> T1  P1 -- "conditional write" --> DISK  M1 -- "conditional write" --> DISK  M2 -- "conditional write" --> DISK  T1 -- "conditional write" --> DISK
Loading

File Walkthrough

Relevant files
Bug fix
parquet.rs
Guard parquet caching with global enable flag                       

src/job/files/parquet.rs

  • Addcfg.cache_latest_files.enabled to condition
  • Gate parquet disk caching on enabled flag
+4/-1     
merge.rs
Enforce enabled flag in merge caching paths                           

src/service/compact/merge.rs

  • Addenabled check to first merge upload path
  • Addenabled check to secondary merge upload path
  • Prevent cache write when caching disabled
+7/-2     
mod.rs
Respect enabled flag for Tantivy index caching                     

src/service/tantivy/mod.rs

  • Storeget_config() incfg
  • Addenabled check to index caching condition
+4/-2     

greptile-apps[bot] reacted with thumbs up emoji
@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Redundant Clone

The code convertspuffin_bytes toBytes usingBytes::from(puffin_bytes.clone()). Ifto_puffin_bytes() already returns aVec<u8>, the clone may be unnecessary. Consider moving or reusing the buffer without cloning to avoid extra allocation.

infra::cache::file_data::disk::set(&idx_file_name,Bytes::from(puffin_bytes.clone())).await?;log::info!("file: {idx_file_name} file_data::disk::set success");
Config Retrieval

get_config() is called once and stored incfg here, but other parts of the module may still callget_config() repeatedly. Ensure consistency across the codebase to avoid mixed patterns and potential stale config reads if config can change at runtime.

let cfg =get_config();if cfg.cache_latest_files.enabled    && cfg.cache_latest_files.cache_index    && cfg.cache_latest_files.download_from_node{

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                   Impact
General
Remove unnecessary bytes clone

Avoid cloningpuffin_bytes when converting toBytes to prevent an unnecessary
allocation and copy, which is significant for large indexes. Convert the owned
vector directly intoBytes usingBytes::from(puffin_bytes) and adjust the variable
to be mutable if needed.

src/service/tantivy/mod.rs [70-77]

 let cfg = get_config(); if cfg.cache_latest_files.enabled     && cfg.cache_latest_files.cache_index     && cfg.cache_latest_files.download_from_node {-    infra::cache::file_data::disk::set(&idx_file_name, Bytes::from(puffin_bytes.clone()))+    infra::cache::file_data::disk::set(&idx_file_name, Bytes::from(puffin_bytes))         .await?;     log::info!("file: {idx_file_name} file_data::disk::set success"); }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies an avoidable clone ofpuffin_bytes and proposes moving it intoBytes::from, reducing allocations for large data; it's accurate and improves efficiency without altering logic.

Medium
Avoid cloning large buffer

Eliminate the extrabuf.clone() before caching to avoid duplicating potentially
large buffers. Sincebuf is created right before and not reused later in this scope,
move it intoset directly.

src/job/files/parquet.rs [799-805]

 if cfg.cache_latest_files.enabled     && cfg.cache_latest_files.cache_parquet     && cfg.cache_latest_files.download_from_node {-    infra::cache::file_data::disk::set(&new_file_key, buf.clone()).await?;+    infra::cache::file_data::disk::set(&new_file_key, buf).await?;     log::debug!("merge_files {new_file_key} file_data::disk::set success"); }
Suggestion importance[1-10]: 6

__

Why: Replacingbuf.clone() with movingbuf intoset saves memory and CPU; impact is moderate but beneficial ifbuf is not needed afterward, and matches the new hunk code exactly.

Low
Remove redundant buffer clone

Prevent unnecessary buffer cloning when writing to cache to save memory and CPU.
Passbuf by value intoset if it isn't needed afterward in the scope.

src/service/compact/merge.rs [934-940]

 if cfg.cache_latest_files.enabled     && cfg.cache_latest_files.cache_parquet     && cfg.cache_latest_files.download_from_node {-    infra::cache::file_data::disk::set(&new_file_key, buf.clone()).await?;+    infra::cache::file_data::disk::set(&new_file_key, buf).await?;     log::debug!("merge_files {new_file_key} file_data::disk::set success"); }
Suggestion importance[1-10]: 6

__

Why: Eliminatingbuf.clone() in this caching path avoids unnecessary duplication; it's a correct micro-optimization aligned with the surrounding code and conditions.

Low

Copy link
Contributor

@greptile-appsgreptile-appsbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Greptile Summary

This PR fixes a configuration logic bug in the cache latest files system by adding missing checks for the masterenabled flag across three critical files. The OpenObserve codebase has a hierarchical caching configuration system wherecache_latest_files.enabled serves as the master switch that should gate all caching operations, with specific sub-flags likecache_parquet,cache_index, anddownload_from_node controlling individual caching behaviors.

The affected files were inconsistently checking the configuration flags:

  • src/job/files/parquet.rs: Missingenabled check in the ingester's parquet file caching logic
  • src/service/compact/merge.rs: Missingenabled check in two locations within the compactor's merge operations
  • src/service/tantivy/mod.rs: Missingenabled check in the Tantivy search index caching logic

The fix ensures that all three components now follow the same pattern: first check ifcache_latest_files.enabled is true, then verify the specific feature flags (cache_parquet orcache_index), and finally confirmdownload_from_node is enabled. This creates consistent behavior where settingZO_CACHE_LATEST_FILES_ENABLED=false will properly disable all local file caching operations across the ingester, compactor, and search indexing components.

Additionally, the Tantivy module change includes a minor optimization by caching the config reference in a local variable instead of callingget_config() multiple times.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it only adds missing configuration checks without changing core logic
  • Score reflects simple, consistent changes that fix a clear configuration bug with no breaking changes to existing functionality
  • No files require special attention as all changes follow the same straightforward pattern

3 files reviewed, no comments

Edit Code Review Bot Settings |Greptile

@hengfeiyanghengfeiyang merged commitf49a4a3 intomainSep 1, 2025
28 of 29 checks passed
@hengfeiyanghengfeiyang deleted the fix/compactor-cache branchSeptember 1, 2025 05:08
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@haohuaijinhaohuaijinhaohuaijin approved these changes

@Loaki07Loaki07Awaiting requested review from Loaki07

+1 more reviewer

@greptile-appsgreptile-apps[bot]greptile-apps[bot] left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@hengfeiyang@haohuaijin

[8]ページ先頭

©2009-2025 Movatter.jp