CN118964311B

Movatterモバイル変換

Info

Publication number: CN118964311B
Application number: CN202410948684.1A
Authority: CN
Inventors: 高经郡; 高海玲; 谢晋
Original assignee: Beijing Kejie Technology Co ltd
Current assignee: Beijing Kejie Technology Co ltd
Priority date: 2024-07-16
Filing date: 2024-07-16
Publication date: 2025-02-11
Anticipated expiration: 2044-07-16
Also published as: CN118964311A

Abstract

Translated fromChinese

本发明涉及文件清理技术领域，特别是指一种基于存算分离的数据湖文件按时清理系统及方法，该按时清理方法包括：扫描所述数据湖内各文件，根据扫描结果确定需进行清理的文件，以得到若干孤立文件；累计所述孤立文件以得到实际标记数量，根据所述实际标记数量或所述初始清理周期对所述数据湖进行批次清理；对于任一清理批次，将该批次清理对应的所述孤立文件生成清理集，对所述清理集内的各待清理文件进行清理顺序排列；基于清理顺序排列结果进行清理时，实时监控清理过程，依次判定清理目标数据、清理过程数据和清理操作的一致性，并根据判定结果确定相应的所述过程控制模式；周期性获取清理结果，根据清理结果对清理条件进行反馈调节。

The present invention relates to the field of file cleaning technology, and in particular to a system and method for timely cleaning of data lake files based on storage and computing separation. The timely cleaning method comprises: scanning each file in the data lake, determining the files to be cleaned according to the scanning results to obtain a number of isolated files; accumulating the isolated files to obtain an actual number of tags, and performing batch cleaning on the data lake according to the actual number of tags or the initial cleaning cycle; for any cleaning batch, generating a cleaning set for the isolated files corresponding to the batch cleaning, and arranging the files to be cleaned in the cleaning set in a cleaning sequence; when cleaning based on the cleaning sequence arrangement result, real-time monitoring of the cleaning process, determining the consistency of cleaning target data, cleaning process data and cleaning operations in turn, and determining the corresponding process control mode according to the determination result; periodically obtaining the cleaning result, and feedback-adjusting the cleaning condition according to the cleaning result.