Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

importinto: scanning large amount of compressed files is slow #64770

Open
@joechenrh

Description

@joechenrh

Enhancement

In some cases, the size of single file of import data may be small, causing too much files need to scan. And it may consume more than one hour to read about 1 million files. Below is the part of the log, as you can see, it costs about 40 minutes to generate subtasks.

[2025/11/29 05:37:51.623 +00:00] [INFO] [scheduler.go:309] ["on next subtasks batch"] [keyspaceName=SYSTEM] [task-id=1] [task-key=xxx] [curr-step=init] [next-step=encode] [node-count=18] [table-id=15][2025/11/29 06:14:22.097 +00:00] [INFO] [table_import.go:420] ["populate chunks start"] [keyspaceName=SYSTEM] [task-id=1] [task-key=xxx] [curr-step=init] [next-step=encode] [node-count=18] [table-id=15]

There are two thing we can improve here:

  • Skip reading files before submitting task with global sort, since we will read the files again on dxf scheduler, and we only used total file size to calculate dxf node resource.
  • We can use part of the files to estimate a overall compression ratio, to avoid opening every file.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp