Movatterモバイル変換


[0]ホーム

URL:


CN119003533B - Real-time index construction and intelligent optimization method and system based on large model driving - Google Patents

Real-time index construction and intelligent optimization method and system based on large model driving
Download PDF

Info

Publication number
CN119003533B
CN119003533BCN202411472747.7ACN202411472747ACN119003533BCN 119003533 BCN119003533 BCN 119003533BCN 202411472747 ACN202411472747 ACN 202411472747ACN 119003533 BCN119003533 BCN 119003533B
Authority
CN
China
Prior art keywords
index
construction
data
real
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411472747.7A
Other languages
Chinese (zh)
Other versions
CN119003533A (en
Inventor
叶杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhuochen Info Tech Co ltd
Original Assignee
Shanghai Zhuochen Info Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhuochen Info Tech Co ltdfiledCriticalShanghai Zhuochen Info Tech Co ltd
Priority to CN202411472747.7ApriorityCriticalpatent/CN119003533B/en
Publication of CN119003533ApublicationCriticalpatent/CN119003533A/en
Application grantedgrantedCritical
Publication of CN119003533BpublicationCriticalpatent/CN119003533B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了基于大模型驱动的实时索引构建与智能优化方法及系统,属于数据库索引技术领域,其具体包括:通过数据流处理与动态预处理技术收集并分析样本数据特性与访问模式;融合预训练语言模型与知识图谱方法,对样本数据进行深度语义解析,并采用自适应混合索引策略生成索引方案;实施索引构建,并应用多目标优化算法优化索引性能,通过实时监控与机器学习预测模型,能够动态调整索引配置,以最优索引组合提升数据库查询效率,实现了智能、高效的索引管理策略。

The present invention discloses a real-time index construction and intelligent optimization method and system based on large model driving, which belongs to the field of database index technology, and specifically includes: collecting and analyzing sample data characteristics and access patterns through data stream processing and dynamic preprocessing technology; integrating pre-trained language model and knowledge graph method, performing deep semantic analysis on sample data, and adopting adaptive hybrid index strategy to generate index scheme; implementing index construction, and applying multi-objective optimization algorithm to optimize index performance, and through real-time monitoring and machine learning prediction model, being able to dynamically adjust index configuration, improve database query efficiency with optimal index combination, and realize intelligent and efficient index management strategy.

Description

Real-time index construction and intelligent optimization method and system based on large model driving
Technical Field
The invention belongs to the technical field of database indexing, and particularly relates to a real-time index construction and intelligent optimization method and system based on large model driving.
Background
An index is a data structure used to increase the speed of database queries, in a relational database, the index is designed to sort the values of one or more columns in the database table and hold these values and a list of logical pointers to the pages of data in the table where they are physically stored, the index acting like a book directory, by which specific information in the database table can be located quickly without scanning the entire table. The index construction and optimization play a vital role in modern database management and information retrieval systems, and along with the explosive growth of data volume and the increasing complexity of query requirements, how to efficiently construct the index and intelligently optimize the index so as to improve the speed and accuracy of data retrieval becomes an important research topic in the fields of databases and information technology.
The early index construction and optimization method mainly relies on index creation commands provided by a database management system, a statistical language model is used for predicting future words by calculating the generation probability of word sequences, however, the model faces the problem of dimension disasters when processing large-scale data due to the need of estimating exponential transition probability, and the index updating method is generally delayed from data updating, so that query performance is reduced, and the diversification of query requirements is difficult to meet.
The patent with publication number CN116955348A discloses a database index construction method and device, which comprises the steps of inserting data to be inserted into a position to be inserted under the condition of receiving a data insertion request, if a detection result of a first detection is that nodes of a global index are increased, distributing required address space for a newly added node based on the address space of a dynamic random access memory, if a detection result of a second detection is that a father node of the newly added node is positioned in a nonvolatile memory, inserting the newly added node into a shortcut index, and if the occupied space ratio of the dynamic random access memory reaches a preset proportion, migrating the newly added node from the address space of the dynamic random access memory to the address space of the nonvolatile memory, wherein the global index and the shortcut index both adopt a jump table data structure. According to the technical scheme, the jump table data structure is adopted, the index structure of the memory type database is optimized, and the space consumption of the index to the DRAM is reduced on the premise of ensuring high data response speed.
The patent with the publication number of CN117971837A discloses a real-scene three-dimensional model space index construction method which comprises the steps of analyzing tile data of a real-scene three-dimensional model, analyzing an LOD hierarchical structure, establishing a unified LOD hierarchical structure, establishing unified root tiles and LOD parameters, reforming the LOD hierarchical structure, wherein the number of layers is a multiple of 3, calculating a space bounding box of the real-scene three-dimensional model, dividing the whole space into a uniform quadtree structure from top to bottom again, numbering grids, and filling the tile data of each layer into new grids to generate new tile files. According to the technical scheme, the index structure is optimized, the index tree of the large model is split into the local small indexes with basically consistent sizes, so that the method is convenient and quick to load, is favorable for optimizing memory allocation, can reduce the pressure on a network when index files are transmitted, does not need to inquire from the root node of the whole model, and therefore the random reading performance of the model is greatly improved.
The prior art has the following problems of 1) insufficient intellectualization and adaptivity, 2) low complexity and flexibility of index construction, and 3) lack of long-term learning and optimizing capability.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a real-time index construction and intelligent optimization method and system based on large model driving, which are used for collecting and analyzing sample data characteristics and access modes through a data stream processing and dynamic preprocessing technology, fusing a pre-training language model and a knowledge graph method, carrying out deep semantic analysis on sample data, adopting a self-adaptive mixed index strategy to generate an index scheme, implementing index construction, applying a multi-objective optimization algorithm to optimize index performance, dynamically adjusting index configuration through a real-time monitoring and machine learning prediction model, improving database query efficiency through optimal index combination, and realizing an intelligent and efficient index management strategy.
In order to achieve the above purpose, the present invention provides the following technical solutions:
The real-time index construction and intelligent optimization method based on large model driving comprises the following steps:
Step S1, collecting sample data by utilizing a data stream processing mode, performing dynamic self-adaptive preprocessing, analyzing characteristics of the sample data, and evaluating an access mode of the sample data;
s2, predicting thermal data based on the historical access record and the current data characteristics by using a statistical method, automatically triggering a loading mechanism according to a thermal data prediction result, and loading the thermal data into a memory in advance;
Step S3, loading a pre-training language model, combining a knowledge graph method, carrying out deep semantic analysis on sample data, adopting a self-adaptive hybrid index strategy according to analysis results, sample data characteristics, access modes and thermal data prediction results, combining entity relation information in the knowledge graph, generating a hybrid index scheme, and storing the hybrid index scheme in an index database;
S4, implementing index construction in a database system according to the hybrid index scheme, performing multi-dimensional performance evaluation on the hybrid index scheme by utilizing a pre-training language model, automatically generating an index optimization scheme by adopting a multi-objective optimization algorithm according to an evaluation result, and implementing the optimized index scheme;
And S5, monitoring performance indexes of index construction and query processing in real time, and predicting query optimization effects under different index combinations by utilizing a machine learning algorithm in combination with historical query data and index performance data to obtain an optimal index combination.
Specifically, the specific steps of the step S3 include:
s3.1, loading a pre-trained language model and a word segmentation device matched with the pre-trained language model, and using the word segmentation device of the pre-trained language model to segment and encode input sample data;
S3.2, inputting the encoded data into a pre-training language model, running forward propagation of the pre-training language model, and extracting the hidden state of the last layer of the model as the deep semantic feature of sample data;
and S3.3, acquiring embedded representation of related entities and relations in the knowledge graph in DistMult mode, and combining the embedded representation of the entities and relations acquired from the knowledge graph with deep semantic features extracted from the pre-training language model by using a splicing and dimension transformation method to form feature vectors.
Specifically, the specific steps of the step S3 further include:
s3.4, collecting access logs of the database, and carrying out statistical analysis on the collected access logs to obtain a user access mode and a query requirement;
s3.5, extracting the characteristics of the characteristic vector according to the access mode of the user and the query requirement to obtain index candidate characteristics, and evaluating influence factors of different index candidate characteristics on the query performance through simulation testWherein, the method comprises the steps of, wherein,An influence factor representing an nth index candidate feature, n representing the number of index candidate features;
S3.6, according to the evaluation result, influencing factorsDescending order is carried out, and the lowest influencing factor threshold value is set as;
If it isThen obtain the index featureBased on the indexing features and the query requirements, a hybrid indexing scheme is generated using an adaptive hybrid indexing strategy, wherein,Represents the mth index feature, m represents the number of index features.
Specifically, the specific steps of the step S3 further include:
s3.7, introducing a dynamic adjustment mechanism of the index, monitoring index inquiry efficiency in real time based on a mixed index scheme, and dynamically adjusting an index structure according to a monitoring result;
S3.8, predicting the trend of index query by using a trained machine learning model, reading and analyzing the predicted trend result of the index query, identifying query hotspots and mode changes, and evaluating whether index structures or parameters need to be adjusted to optimize the adaptive hybrid index strategy according to the predicted trend result of the index query and the current index;
S3.9, compiling an adaptive hybrid index strategy construction script, executing the adaptive hybrid index strategy construction script on a large-scale data set, and monitoring resource consumption and performance indexes in the construction process;
And S3.10, after the new index strategy is executed, storing the constructed mixed index into an index database.
Specifically, the specific steps of the multi-objective optimization algorithm in step S4 include:
s4.1, setting an objective function and constraint conditions of index optimization, and generating a group of initial index configuration schemes as candidate solution sets through a heuristic method, wherein the objective functionThe formula of (2) is:
;
Wherein,Indicating that a given i-th index configuration scheme,Representing the average or total response time required to perform a query operation under the ith index configuration scheme,Indicating the amount of memory space occupied by the index structure under the ith index configuration scheme,Representing the time required to build an index under the ith index configuration scheme,Representing the weight coefficient;
S4.2, performing multi-dimensional performance evaluation on each index configuration scheme in the candidate solution set by utilizing a pre-training language model;
And S4.3, selecting part of solutions from the candidate solution set to serve as parents to generate new solutions to serve as offspring according to the multi-dimensional performance evaluation result, and meanwhile, combining the parents and the offspring to form a new solution set to evaluate.
Specifically, the specific steps of the multi-objective optimization algorithm in step S4 further include:
S4.4, layering is carried out in the new solution set according to the dominant relation of the solutions, so that each layer comprises a group of solutions which are not dominant to each other, different non-dominant layers are obtained, and the crowding degree of all individuals in the same non-dominant layer is initialized to be 0;
S4.5, sequencing the individuals in each non-dominant layer on each objective function, calculating the objective function difference between the individuals and the adjacent individuals for each objective function, and carrying out normalization processing;
S4.6, adding the normalized difference values on all objective functions to obtain the crowding degree of the individual, and selecting the individual of the next generation population based on the level and the crowding degree of the non-dominant order to perform iterative operation;
S4.7, setting the maximum iteration times, stopping iteration if the maximum iteration times are met, obtaining a Pareto optimal solution, otherwise, returning to the step S4.3, and continuing iteration;
and S3.8, generating an index optimization scheme according to the Pareto optimal solution.
Specifically, the step of generating the hybrid index scheme in S3.6 adopts a hash table and bitmap index method, and combines entity relationship information in the knowledge graph to generate the hybrid index scheme.
The real-time index construction and intelligent optimization system based on large model driving comprises a data processing module, an index generating module, an index optimizing module and a monitoring module;
the data processing module is used for collecting sample data and performing dynamic self-adaptive preprocessing;
The index generation module is used for creating an index structure according to the preprocessed sample data and generating a mixed index scheme;
the index optimization module is used for implementing index construction and adjusting and optimizing an index structure according to requirements;
The monitoring module is used for monitoring the index construction and the query processing performance indexes in real time.
The index generation module comprises a model loading unit, a knowledge graph fusion unit, a strategy making unit and a storage unit;
The model loading unit is used for loading a pre-training language model and carrying out deep semantic representation;
The knowledge spectrum fusion unit is used for fusing an external knowledge spectrum or external knowledge into the index construction process, enhancing the semantic understanding capability of the index and improving the query accuracy;
the strategy making unit is used for making a self-adaptive mixed index strategy according to the analysis result, the data characteristic and the access mode of the sample data;
The storage unit is used for generating an index scheme according to the self-adaptive hybrid index strategy, storing the index scheme in an index database, realizing physical storage of indexes and managing index files.
The computer readable storage medium has stored thereon computer instructions which, when executed, perform the steps of a large model driven real-time index building and intelligent optimization method.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a real-time index construction and intelligent optimization system based on large model driving, and performs optimization and improvement on architecture, operation steps and flow, and the system has the advantages of simple flow, low investment and operation cost and low production and working cost.
2. The invention provides a real-time index construction and intelligent optimization method based on large model driving, which is provided with a self-adaptive index mode, and through deep semantic analysis, a self-adaptive mixed index strategy, multi-dimensional performance evaluation and dynamic adjustment, the intellectualization and efficiency of index construction are improved, the self-adaptability of index is improved, the query performance is optimized, and the real-time performance of data processing and the operation efficiency of an index construction and intelligent optimization system are improved.
Drawings
FIG. 1 is a schematic diagram of a real-time index construction and intelligent optimization method based on large model driving in the invention;
FIG. 2 is a schematic flow chart of a real-time index construction and intelligent optimization method based on large model driving in the invention;
FIG. 3 is a flow chart of the implementation of the indexing scheme of the real-time index construction and intelligent optimization method based on large model driving in the invention;
FIG. 4 is a diagram of a real-time index building and intelligent optimization system architecture based on large model driving in accordance with the present invention.
Detailed Description
Example 1
Referring to fig. 1-3, the method for constructing and intelligently optimizing real-time indexes based on large model driving according to the embodiment of the invention comprises the following steps:
Step S1, collecting sample data by utilizing a data stream processing mode, performing dynamic self-adaptive preprocessing, analyzing characteristics of the sample data, and evaluating an access mode of the sample data;
The sample data comprises data distribution, access frequency and update frequency information.
Further, the specific steps of step S1 include:
a1, setting a data stream access point, configuring a preprocessing strategy library, and initializing a data analysis tool and an index construction system;
a2, receiving new data items from a data source in real time, carrying out dynamic self-adaptive preprocessing on the new data items, and storing preprocessed sample data into a temporary storage area or directly carrying out characteristic analysis;
and A3, carrying out statistical analysis, pattern recognition and correlation analysis on the sample data in the storage area at regular intervals, analyzing an access log, mining an access pattern and user behaviors, and storing an analysis result into an analysis result database for subsequent index construction and optimization.
S2, predicting thermal data based on the historical access record and the current data characteristics by using a statistical method, automatically triggering a loading mechanism according to a thermal data prediction result, and loading the thermal data into a memory in advance;
Step S3, loading a pre-training language model, combining a knowledge graph method, carrying out deep semantic analysis on sample data, adopting a self-adaptive hybrid index strategy according to analysis results, sample data characteristics, access modes and thermal data prediction results, combining entity relation information in the knowledge graph, generating a hybrid index scheme, and storing the hybrid index scheme in an index database;
S4, implementing index construction in a database system according to the hybrid index scheme, performing multi-dimensional performance evaluation on the hybrid index scheme by utilizing a pre-training language model, automatically generating an index optimization scheme by adopting a multi-objective optimization algorithm according to an evaluation result, and implementing the optimized index scheme;
Step S5, monitoring performance indexes of index construction and query processing in real time, and predicting query optimization effects under different index combinations by utilizing a machine learning algorithm in combination with historical query data and index performance data to obtain an optimal index combination;
Step S6, feeding new query data and index performance data back to the machine learning model, and continuously updating and optimizing the machine learning model by adopting an incremental learning mode to improve the prediction accuracy and adaptability, wherein the incremental learning method is the prior art content in the field and is not an inventive scheme of the application, and is not repeated here.
The specific steps of the step S3 include:
s3.1, loading a pre-trained language model and a word segmentation device matched with the pre-trained language model, and using the word segmentation device of the pre-trained language model to segment and encode input sample data;
GPT-3 is used as the pre-training language model in the present invention.
Further, the step of S3.1 includes:
(1) Ensuring that the PyTorch and transformers libraries are installed in the environment, as the transformers library provides a pre-trained language model and its word segmenters;
(2) Using AutoModel in transformers libraries to load a pre-trained language model and using AutoTokenizer to load a supporting word segmenter;
(3) The input sample data is segmented using a loaded segmenter and the segmentation results are converted into an ID sequence, i.e. encoding, that the model can understand, which typically involves segmenting the text into sub-words, adding special tokens, such as CLS and SEP, and mapping to an index in the vocabulary.
S3.2, inputting the encoded data into a pre-training language model, running forward propagation of the pre-training language model, and extracting the hidden state of the last layer of the model as the deep semantic feature of sample data;
further, the specific step of S3.2 includes:
(1) Encoding the text data into a format acceptable to the pre-training language model through tokenizer, and converting the encoded data into a tensor format suitable for input by the pre-training language model;
(2) Transmitting the encoded data as input to a pre-training language model, wherein the pre-training language model performs forward propagation and outputs a plurality of results including hidden states and attention weights;
(3) Extracting the hidden state of the last layer from the output of the pre-training language model by accessing the output.last_hidden_state, wherein the output.last_hidden_state refers to the hidden state of the last transducer layer generated after the model processes the input data, and is usually formed by stacking a plurality of transducer layers, each layer can perform a series of operations on the input data or the output of the previous layer, and finally the hidden state of the layer is output, and the series of operations comprise a self-attention mechanism and a feedforward network;
(4) The extracted hidden states are further processed, such as averaged pooling, taking vectors of specific locations, as needed to obtain deep semantic features of the sample data.
S3.3, acquiring embedded representations of related entities and relations in the knowledge graph in DistMult mode, combining the embedded depth semantic features extracted from the pre-training language model by using a splicing and dimension transformation method to form feature vectors;
Further, the specific step of S3.3 includes:
(1) Preparing triplet data and text data of the knowledge graph;
(2) Loading DistMult a model and training to obtain embedded representations of entities and relationships;
(3) Loading a pre-training language model and extracting deep semantic features of text data;
(4) Embedding the knowledge graph into text depth semantic features to splice or splice after dimension transformation;
(5) Forming the final feature vector.
S3.4, collecting access logs of the database, and carrying out statistical analysis on the collected access logs to obtain a user access mode and a query requirement;
s3.5, extracting the characteristics of the characteristic vector according to the access mode of the user and the query requirement to obtain index candidate characteristics, and evaluating influence factors of different index candidate characteristics on the query performance through simulation testWherein, the method comprises the steps of, wherein,An influence factor representing an nth index candidate feature, n representing the number of index candidate features;
The influence factors comprise complexity of search conditions, time consumption of search, hit data blocks, hit number, sorting mode and user satisfaction.
S3.6, according to the evaluation result, influencing factorsDescending order is carried out, and the lowest influencing factor threshold value is set as;
If it isThen obtain the index featureBased on the indexing features and the query requirements, a hybrid indexing scheme is generated using an adaptive hybrid indexing strategy, wherein,Represents the mth index feature, m represents the number of index features,Is formed by influencing factors greater thanThe set of index candidate features forms an index feature set;
s3.7, introducing a dynamic adjustment mechanism of the index, monitoring index inquiry efficiency in real time based on a mixed index scheme, and dynamically adjusting an index structure according to a monitoring result;
S3.8, predicting the trend of index query by using a trained machine learning model, reading and analyzing the predicted trend result of the index query, identifying query hotspots and mode changes, and evaluating whether index structures or parameters need to be adjusted to optimize the adaptive hybrid index strategy according to the predicted trend result of the index query and the current index;
S3.9, compiling an adaptive hybrid index strategy construction script, executing the adaptive hybrid index strategy construction script on a large-scale data set, and monitoring resource consumption and performance indexes in the construction process;
And S3.10, after the new index strategy is executed, storing the constructed mixed index into an index database.
The specific steps of the multi-objective optimization algorithm in the step S4 include:
s4.1, setting an objective function and constraint conditions of index optimization, and generating a group of initial index configuration schemes as candidate solution sets through a heuristic method, wherein the objective functionThe formula of (2) is:
;
Wherein,Indicating that a given i-th index configuration scheme,Representing the average or total response time required to perform a query operation under the ith index configuration scheme,Indicating the amount of memory space occupied by the index structure under the ith index configuration scheme,Representing the time required to build an index under the ith index configuration scheme,Representing the weight coefficient;
S4.2, performing multi-dimensional performance evaluation on each index configuration scheme in the candidate solution set by utilizing a pre-training language model;
Further, the specific step of S4.2 includes:
(1) Collecting each index configuration scheme in the candidate solution set, ensuring that each scheme has clear definition and parameters, and preparing a data set for evaluating the performance of the index configuration scheme, wherein the data set is used for covering a plurality of dimensions so as to comprehensively evaluate the effect of the index configuration;
(2) Preprocessing the text in the evaluation data set, including cleaning, word segmentation and stop word removal, so as to ensure the text quality input to the pre-training language model;
(3) Extracting features from the preprocessed text by using a pre-training language model, wherein the pre-training language model outputs deep semantic feature representations of each text sequence, which can be used for subsequent evaluation tasks;
(4) Determining performance evaluation indexes such as query efficiency, response time, accuracy and recall rate according to the evaluation requirements, wherein the indexes are used for quantifying the performance of an index configuration scheme;
(5) Correlating the extracted features of the pre-training language model with the index configuration schemes, and evaluating the performance of different index configuration schemes on a specific data set in a mode of simulating query or actual query;
(6) Scoring or ranking each index configuration scheme using the determined performance evaluation index;
(7) And analyzing the evaluation result to find out an index configuration scheme with excellent performance and characteristics thereof.
S4.3, selecting part of solutions from the candidate solution set to serve as a parent to generate a new solution to serve as a child according to a multi-dimensional performance evaluation result, and meanwhile, combining the parent and the child to form a new solution set to evaluate, wherein the genetic algorithm is the prior art content in the field and is not an inventive scheme of the application and is not repeated here;
S4.4, layering is carried out in the new solution set according to the dominant relation of the solutions, so that each layer comprises a group of solutions which are not dominant to each other, different non-dominant layers are obtained, and the crowding degree of all individuals in the same non-dominant layer is initialized to be 0;
S4.5, sequencing the individuals in each non-dominant layer on each objective function, calculating the objective function difference between the individuals and the adjacent individuals for each objective function, and carrying out normalization processing;
S4.6, adding the normalized difference values on all objective functions to obtain the crowding degree of the individual, and selecting the individual of the next generation population based on the level and the crowding degree of the non-dominant order to perform iterative operation;
S4.7, setting the maximum iteration times, stopping iteration if the maximum iteration times are met, obtaining a Pareto optimal solution, otherwise, returning to the step S4.3, and continuing iteration;
and S4.8, generating an index optimization scheme according to the Pareto optimal solution.
And S3.6, generating a hybrid index scheme by adopting a hash table and bitmap index method and combining entity relation information in the knowledge graph.
Example 2
Referring to FIG. 4, in another embodiment of the present invention, a real-time index building and intelligent optimization system based on large model driving includes a data processing module, an index generating module, an index optimizing module, and a monitoring module;
the data processing module is used for collecting sample data and carrying out dynamic self-adaptive preprocessing to ensure the quality and consistency of the data;
the index generation module is used for creating an index structure according to the preprocessed sample data, generating a mixed index scheme, accelerating the data query process and reducing the search time and resource consumption;
The index optimization module is used for implementing index construction, and adjusting and optimizing an index structure according to the need so as to adapt to the change of data and the evolution of a query mode;
the monitoring module is used for monitoring the performance indexes of index construction and query processing in real time, can discover and solve problems in time, and ensures the stability and reliability of the index construction and intelligent optimization system.
The data processing module comprises a data stream processing unit, a dynamic preprocessing unit, a characteristic analysis unit and an access mode evaluation unit;
The data stream processing unit is used for receiving and processing sample data in the data stream in real time and can rapidly identify and process new data input;
The dynamic preprocessing unit is used for preprocessing sample data, such as data cleaning, missing value processing and data conversion, dynamically adjusting the preprocessing process according to the characteristics and access modes of the sample data, ensuring the data quality and improving the consistency and accuracy of the data;
the characteristic analysis unit is used for analyzing the statistical characteristic and the distribution characteristic of the sample data, performing characteristic selection and engineering, reducing redundant information and enhancing index characteristics;
the access mode evaluation unit is used for analyzing the access frequency and the access mode of a user or a system, predicting the query requirement, providing guidance for index construction and optimization, and ensuring that an index structure meets the query requirement.
The index generation module comprises a model loading unit, a knowledge graph fusion unit, a strategy making unit and a storage unit;
The model loading unit is used for loading the pre-training language model and carrying out deep semantic representation;
the knowledge map fusion unit is used for fusing an external knowledge map or external knowledge into the index construction process, so that the semantic understanding capability of the index is enhanced, and the query accuracy is improved;
The strategy making unit is used for making a self-adaptive mixed index strategy according to the analysis result, the data characteristic and the access mode of the sample data;
and the storage unit is used for generating an index scheme according to the self-adaptive hybrid index strategy, storing the index scheme in an index database, realizing physical storage of indexes and managing index files.
The index optimization module comprises an index construction unit, a performance evaluation unit, an optimization algorithm unit and an optimization implementation unit;
an index construction unit for constructing an index in the database system according to the hybrid index scheme;
The performance evaluation unit is used for testing and evaluating the mixed index scheme by utilizing the pre-training language model and collecting query execution time and resource use condition indexes such as query speed and storage space;
The optimization algorithm unit is used for reconstructing, merging or deleting unnecessary index items by adopting a multi-objective optimization algorithm, improving index performance and reducing maintenance cost;
and the optimization implementation unit is used for implementing the optimized indexing scheme and updating the indexing structure.
The monitoring module comprises a performance monitoring unit, a prediction model unit and a dynamic adjustment unit;
The performance monitoring unit is used for monitoring performance indexes of index construction and query processing in real time, such as response time, CPU (Central processing Unit) utilization rate and throughput;
The prediction model unit is used for predicting query optimization effects under different index combinations by using a machine learning algorithm, planning resource allocation in advance and preventing performance from being reduced;
The dynamic adjustment unit is used for dynamically adjusting the configuration of the index construction and intelligent optimization system, such as an index strategy and query optimization parameters, according to the monitoring result and the prediction result, so as to ensure the stability and the high efficiency of the index construction and intelligent optimization system.
Example 3
A computer readable storage medium having stored thereon computer instructions which when executed perform the steps of a large model driven real-time index building and intelligent optimization method, wherein the storage medium may be a volatile or non-volatile computer readable storage medium.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and variations, modifications, substitutions and alterations can be made to the above-described embodiments by those having ordinary skill in the art without departing from the spirit and scope of the present invention, and these are all within the protection of the present invention.

Claims (10)

Translated fromChinese
1.基于大模型驱动的实时索引构建与智能优化方法,其特征在于,包括:1. A real-time index construction and intelligent optimization method driven by a large model, characterized by including:步骤S1:利用数据流处理方式收集样本数据,并进行动态自适应预处理,同时,分析样本数据特性,并评估样本数据的访问模式;Step S1: collect sample data using data stream processing and perform dynamic adaptive preprocessing. At the same time, analyze the characteristics of the sample data and evaluate the access pattern of the sample data.步骤S2:利用统计方法,基于历史访问记录和当前数据特性预测热数据,根据热数据预测结果,自动触发加载机制,提前加载到内存;Step S2: Use statistical methods to predict hot data based on historical access records and current data characteristics, and automatically trigger the loading mechanism according to the hot data prediction results to load it into the memory in advance;步骤S3:加载预训练语言模型,结合知识图谱方法,对样本数据进行深度语义分析,根据分析结果、样本数据特性、访问模式和热数据预测结果,采用自适应混合索引策略,结合知识图谱中的实体关系信息,生成混合索引方案,并存储到索引数据库;Step S3: Load the pre-trained language model, combine the knowledge graph method, perform deep semantic analysis on the sample data, adopt an adaptive hybrid indexing strategy based on the analysis results, sample data characteristics, access patterns and hot data prediction results, combine the entity relationship information in the knowledge graph, generate a hybrid indexing scheme, and store it in the index database;步骤S4:根据混合索引方案,在数据库系统中实施索引构建,利用预训练语言模型对混合索引方案进行多维度性能评估,根据评估结果,采用多目标优化算法自动生成索引优化方案,并实施优化后的索引方案;Step S4: According to the hybrid indexing scheme, index construction is implemented in the database system, and the hybrid indexing scheme is evaluated in multiple dimensions using a pre-trained language model. According to the evaluation results, a multi-objective optimization algorithm is used to automatically generate an index optimization scheme, and the optimized indexing scheme is implemented;步骤S5:实时监控索引构建和查询处理的性能指标,利用机器学习算法结合历史查询数据和索引性能数据,预测不同索引组合下的查询优化效果,获得最优索引组合。Step S5: Monitor the performance indicators of index construction and query processing in real time, use machine learning algorithms to combine historical query data and index performance data, predict the query optimization effects under different index combinations, and obtain the optimal index combination.2.如权利要求1所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,所述步骤S3的具体步骤包括:2. The method for real-time index construction and intelligent optimization based on large model drive according to claim 1, characterized in that the specific steps of step S3 include:S3.1:加载预训练语言模型及其配套的分词器,使用预训练语言模型的分词器对输入的样本数据进行分词和编码;S3.1: Load the pre-trained language model and its matching word segmenter, and use the word segmenter of the pre-trained language model to segment and encode the input sample data;S3.2:将编码后的数据输入到预训练语言模型中,运行预训练语言模型的前向传播,并提取模型的最后一层隐藏状态作为样本数据的深度语义特征;S3.2: Input the encoded data into the pre-trained language model, run the forward propagation of the pre-trained language model, and extract the last hidden state of the model as the deep semantic features of the sample data;S3.3:使用DistMult方式获取知识图谱中相关实体和关系的嵌入表示,使用拼接和维度变换方法将从知识图谱中获取的实体和关系嵌入与预训练语言模型提取的深度语义特征相结合,形成特征向量。S3.3: Use the DistMult method to obtain the embedded representation of relevant entities and relations in the knowledge graph, and use the concatenation and dimension transformation methods to combine the entity and relationship embeddings obtained from the knowledge graph with the deep semantic features extracted by the pre-trained language model to form a feature vector.3.如权利要求2所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,所述步骤S3的具体步骤还包括:3. The method for real-time index construction and intelligent optimization based on large model drive according to claim 2, characterized in that the specific steps of step S3 further include:S3.4:收集数据库的访问日志,对收集到的访问日志进行统计分析,获得用户访问模式和查询需求;S3.4: Collect database access logs, perform statistical analysis on the collected access logs, and obtain user access patterns and query requirements;S3.5:根据用户访问模式和查询需求,对特征向量进行特征提取,获得索引候选特征,通过模拟测试,评估不同索引候选特征对查询性能的影响因子,其中,表示第n个索引候选特征的影响因子,n表示索引候选特征的数量;S3.5: Extract features from feature vectors based on user access patterns and query requirements to obtain index candidate features. Use simulation tests to evaluate the impact of different index candidate features on query performance. ,in, represents the influence factor of the nth index candidate feature, where n represents the number of index candidate features;S3.6:根据评估结果,将影响因子进行降序排列,并设置最低影响因子阈值为S3.6: Based on the evaluation results, the impact factors Sort in descending order and set the minimum impact factor threshold to ;,则获得索引特征,基于索引特征和查询需求,使用自适应混合索引策略生成混合索引方案,其中,表示第m个索引特征,m表示索引特征的数量。like , then the index feature is obtained ,Based on the index characteristics and query requirements, an adaptive hybrid index strategy is used to generate a hybrid index scheme, where, represents the mth index feature, and m represents the number of index features.4.如权利要求3所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,所述步骤S3的具体步骤还包括:4. The method for real-time index construction and intelligent optimization based on large model drive according to claim 3, characterized in that the specific steps of step S3 further include:S3.7:引入索引的动态调整机制,基于混合索引方案,实时监测索引查询效率,并根据监测结果动态调整索引结构;S3.7: Introduce a dynamic index adjustment mechanism, based on a hybrid indexing solution, monitor the index query efficiency in real time, and dynamically adjust the index structure based on the monitoring results;S3.8:使用训练好的机器学习模型预测索引查询的趋势,对预测的索引查询的趋势结果进行解读和分析,识别出查询热点和模式变化,并根据预测的索引查询的趋势结果和当前的索引,评估是否需要调整索引结构或参数以优化自适应混合索引策略;S3.8: Use the trained machine learning model to predict the trend of index queries, interpret and analyze the predicted trend results of index queries, identify query hot spots and pattern changes, and evaluate whether the index structure or parameters need to be adjusted to optimize the adaptive hybrid index strategy based on the predicted trend results of index queries and the current index;S3.9:编写自适应混合索引策略构建脚本,并在大规模数据集上执行自适应混合索引策略构建脚本,监控构建过程中的资源消耗和性能指标;S3.9: Write an adaptive hybrid index strategy construction script and execute it on a large-scale data set, monitoring resource consumption and performance indicators during the construction process;S3.10:在新的索引策略执行后,将构建好的混合索引存储到索引数据库中。S3.10: After the new indexing strategy is executed, the constructed hybrid index is stored in the index database.5.如权利要求4所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,所述步骤S4中多目标优化算法的具体步骤包括:5. The method for real-time index construction and intelligent optimization based on large model drive according to claim 4, characterized in that the specific steps of the multi-objective optimization algorithm in step S4 include:S4.1:设定索引优化的目标函数和约束条件,通过启发式方法生成一组初始的索引配置方案作为候选解集,目标函数的公式为:S4.1: Set the objective function and constraints of index optimization, and generate a set of initial index configuration solutions as candidate solutions through heuristic methods. The formula is: ;其中,表示给定的第i个索引配置方案,表示在第i个索引配置方案下,执行查询操作所需的平均或总响应时间,表示在第i个索引配置方案下,索引结构所占用的存储空间大小,表示在第i个索引配置方案下,构建索引所需的时间,表示权重系数;in, Represents a given i-th index configuration scheme, It represents the average or total response time required to perform a query operation under the i-th index configuration scheme. Indicates the storage space occupied by the index structure under the i-th index configuration scheme. Indicates the time required to build the index under the i-th index configuration scheme. , , represents the weight coefficient;S4.2:利用预训练语言模型对候选解集中的每个索引配置方案进行多维度性能评估;S4.2: Use the pre-trained language model to perform multi-dimensional performance evaluation on each index configuration scheme in the candidate solution set;S4.3:根据多维度性能评估结果,使用遗传算法从候选解集中选择部分解作为父代,生成新的解作为子代,同时,将父代和子代合并形成新解集,并进行评估。S4.3: Based on the results of multi-dimensional performance evaluation, a genetic algorithm is used to select some solutions from the candidate solution set as parents, generate new solutions as offspring, and at the same time, merge the parents and offspring to form a new solution set, which is then evaluated.6.如权利要求5所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,所述步骤S4中多目标优化算法的具体步骤还包括:6. The method for real-time index construction and intelligent optimization based on large model drive according to claim 5, characterized in that the specific steps of the multi-objective optimization algorithm in step S4 further include:S4.4:在新解集中,根据解的支配关系进行分层,使得每一层包含一组互不支配的解,得到不同的非支配层,并将同一非支配层内的所有个体的拥挤度初始化为0;S4.4: In the new solution set, the solutions are layered according to their dominance relations, so that each layer contains a set of mutually non-dominated solutions, and different non-dominated layers are obtained. The crowding degree of all individuals in the same non-dominated layer is initialized to 0.S4.5:对每个非支配层中的个体,在每个目标函数上分别进行排序,对于每个目标函数,计算个体与其相邻个体之间的目标函数差值,并进行归一化处理;S4.5: For each individual in the non-dominated layer, sort them on each objective function respectively. For each objective function, calculate the objective function difference between the individual and its adjacent individuals and perform normalization.S4.6:将所有目标函数上的归一化差值相加,得到该个体的拥挤度,同时,基于非支配排序的层级和拥挤度来选择下一代种群的个体,进行迭代操作;S4.6: Add the normalized differences of all objective functions to obtain the crowding degree of the individual. At the same time, select the individuals of the next generation population based on the level and crowding degree of the non-dominated sorting and perform iterative operations;S4.7:设置最大迭代次数,若满足最大迭代次数,则停止迭代,获得Pareto最优解,否则,返回步骤S4.3继续迭代;S4.7: Set the maximum number of iterations. If the maximum number of iterations is met, stop the iteration and obtain the Pareto optimal solution. Otherwise, return to step S4.3 to continue the iteration.S4.8:根据Pareto最优解,生成索引优化方案。S4.8: Generate an index optimization plan based on the Pareto optimal solution.7.如权利要求6所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,所述S3.6中生成混合索引方案采用哈希表和位图索引方法,结合知识图谱中的实体关系信息,生成混合索引方案。7. The real-time index construction and intelligent optimization method based on large model drive as described in claim 6 is characterized in that the hybrid index scheme generated in S3.6 adopts hash table and bitmap index method, combined with entity relationship information in the knowledge graph, to generate a hybrid index scheme.8.基于大模型驱动的实时索引构建与智能优化系统,其用于实现权利要求1-7中任一项所述的基于大模型驱动的实时索引构建与智能优化方法,其特征在于,包括:数据处理模块、索引生成模块、索引优化模块、监控模块;8. A real-time index construction and intelligent optimization system driven by a large model, which is used to implement the real-time index construction and intelligent optimization method driven by a large model as described in any one of claims 1 to 7, characterized in that it comprises: a data processing module, an index generation module, an index optimization module, and a monitoring module;所述数据处理模块,用于收集样本数据,并进行动态自适应预处理;The data processing module is used to collect sample data and perform dynamic adaptive preprocessing;所述索引生成模块,用于根据预处理后的样本数据创建索引结构,生成混合索引方案;The index generation module is used to create an index structure based on the preprocessed sample data and generate a hybrid index scheme;所述索引优化模块,用于实施索引构建,并按需调整和优化索引结构;The index optimization module is used to implement index construction and adjust and optimize the index structure as needed;所述监控模块,用于实时监控索引构建和查询处理的性能指标。The monitoring module is used to monitor the performance indicators of index construction and query processing in real time.9.如权利要求8所述的基于大模型驱动的实时索引构建与智能优化系统,其特征在于,所述索引生成模块包括:模型加载单元、知识图谱融合单元、策略制定单元、存储单元;9. The real-time index construction and intelligent optimization system based on large model drive according to claim 8, characterized in that the index generation module includes: a model loading unit, a knowledge graph fusion unit, a strategy formulation unit, and a storage unit;所述模型加载单元,用于加载预训练语言模型,并进行深度语义表示;The model loading unit is used to load the pre-trained language model and perform deep semantic representation;所述知识图谱融合单元,用于将外部知识图谱或外部知识融入到索引构建过程中,增强索引的语义理解能力,提高查询的准确度;The knowledge graph fusion unit is used to integrate the external knowledge graph or external knowledge into the index construction process, enhance the semantic understanding ability of the index, and improve the accuracy of the query;所述策略制定单元,用于根据样本数据的分析结果、数据特性和访问模式,制定自适应混合索引策略;The strategy formulation unit is used to formulate an adaptive hybrid index strategy based on the analysis results, data characteristics and access patterns of the sample data;所述存储单元,用于根据自适应混合索引策略生成索引方案,并存储到索引数据库中,实现索引的物理存储,管理索引文件。The storage unit is used to generate an indexing scheme according to the adaptive hybrid indexing strategy and store it in an index database to implement physical storage of the index and manage index files.10.一种计算机可读存储介质,其特征在于,其上存储有计算机指令,当计算机指令运行时执行权利要求1-7任一项所述的基于大模型驱动的实时索引构建与智能优化方法的步骤。10. A computer-readable storage medium, characterized in that computer instructions are stored thereon, and when the computer instructions are executed, the steps of the real-time index construction and intelligent optimization method based on large model drive described in any one of claims 1 to 7 are executed.
CN202411472747.7A2024-10-222024-10-22Real-time index construction and intelligent optimization method and system based on large model drivingActiveCN119003533B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411472747.7ACN119003533B (en)2024-10-222024-10-22Real-time index construction and intelligent optimization method and system based on large model driving

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411472747.7ACN119003533B (en)2024-10-222024-10-22Real-time index construction and intelligent optimization method and system based on large model driving

Publications (2)

Publication NumberPublication Date
CN119003533A CN119003533A (en)2024-11-22
CN119003533Btrue CN119003533B (en)2024-12-27

Family

ID=93491928

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411472747.7AActiveCN119003533B (en)2024-10-222024-10-22Real-time index construction and intelligent optimization method and system based on large model driving

Country Status (1)

CountryLink
CN (1)CN119003533B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119311700B (en)*2024-12-182025-04-01杭州电子科技大学 A nearest neighbor retrieval method based on automatic construction of neighbor graph index
CN119961265B (en)*2025-01-132025-09-19上海异工同智信息科技有限公司 Data table processing method, device, storage medium and electronic equipment
CN119782359A (en)*2025-03-112025-04-08智慧足迹数据科技有限公司 Data query optimization method, system, database and electronic device based on machine learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117876016A (en)*2024-03-122024-04-12江西开放大学Distributed market data acquisition management system
CN117932074A (en)*2023-12-082024-04-26北京国电通网络技术有限公司Audit knowledge mapping system based on digital audit platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11948560B1 (en)*2019-11-072024-04-02Kino High CourseyMethod for AI language self-improvement agent using language modeling and tree search techniques
US20220230628A1 (en)*2021-01-202022-07-21Microsoft Technology Licensing, LlcGeneration of optimized spoken language understanding model through joint training with integrated knowledge-language module

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117932074A (en)*2023-12-082024-04-26北京国电通网络技术有限公司Audit knowledge mapping system based on digital audit platform
CN117876016A (en)*2024-03-122024-04-12江西开放大学Distributed market data acquisition management system

Also Published As

Publication numberPublication date
CN119003533A (en)2024-11-22

Similar Documents

PublicationPublication DateTitle
CN119003533B (en)Real-time index construction and intelligent optimization method and system based on large model driving
CN110110858B (en)Automatic machine learning method based on reinforcement learning
CN103336791B (en)Hadoop-based fast rough set attribute reduction method
CN110674636B (en)Power consumption behavior analysis method
CN101093559A (en)Method for constructing expert system based on knowledge discovery
CN111581454A (en)Depth map compression algorithm-based parallel query expression prediction system and method
CN113342597B (en)System fault prediction method based on Gaussian mixture hidden Markov model
CN114662793B (en)Business process remaining time prediction method and system based on interpretable hierarchical model
CN118733006B (en) Script file generation method, task processing method and electronic device
CN101546290B (en) A Method of Improving the Prediction Accuracy of Class Hierarchy Quality in Object-Oriented Software
CN116226281A (en) Automatic database partition method and system based on deep map compression algorithm
CN119673329A (en) A method, device and medium for predicting the amount of geological storage of carbon dioxide
CN103365923A (en)Method and device for assessing partition schemes of database
CN118428246B (en)Information distributed management method and system for data driven modeling
CN113326343B (en)Road network data storage method and system based on multi-level grids and file indexes
CN119622389A (en) A method and system for generating typical scenarios of microgrids
CN113743453A (en)Population quantity prediction method based on random forest
KR20220099745A (en)A spatial decomposition-based tree indexing and query processing methods and apparatus for geospatial blockchain data retrieval
CN118855447A (en) Intelligent optimization method, system and medium for oilfield infill well location
CN117667606A (en) High-performance computing cluster energy consumption prediction method and system based on user behavior
Xiao et al.A novel method for intelligent reasoning of machining step sequences based on deep reinforcement learning
CN113240161B (en)Method and device for establishing net present value prediction model, storage medium and electronic equipment
CN116128162A (en)Method, system and storage medium for predicting initial productivity of fracturing well based on small sample
RU2745492C1 (en)Method and system for the search for analogues of oil and gas fields
Li et al.Parameters optimization of back propagation neural network based on memetic algorithm coupled with genetic algorithm

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp