Movatterモバイル変換


[0]ホーム

URL:


CN113934976A - A Multithreaded Loopback Enumeration Algorithm Based on NumPy - Google Patents

A Multithreaded Loopback Enumeration Algorithm Based on NumPy
Download PDF

Info

Publication number
CN113934976A
CN113934976ACN202111060114.1ACN202111060114ACN113934976ACN 113934976 ACN113934976 ACN 113934976ACN 202111060114 ACN202111060114 ACN 202111060114ACN 113934976 ACN113934976 ACN 113934976A
Authority
CN
China
Prior art keywords
path
iteration
thread
index
numpy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111060114.1A
Other languages
Chinese (zh)
Other versions
CN113934976B (en
Inventor
李刚
张雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing UniversityfiledCriticalNanjing University
Priority to CN202111060114.1ApriorityCriticalpatent/CN113934976B/en
Publication of CN113934976ApublicationCriticalpatent/CN113934976A/en
Application grantedgrantedCritical
Publication of CN113934976BpublicationCriticalpatent/CN113934976B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于NumPy的多线程回环枚举算法,首先进行数据预处理,将多源异构数据处理成符合要求的点集和边集的图数据,并进一步构造NumPy邻接矩阵;主线程方面,分别计算所有环的路径索引,每次迭代计算生成本次迭代相对于前一次迭代的路径索引,并发送至辅助线程进行展开;迭代完成后,主线程接收辅助线程发送的所有环的路径;辅助线程接收主线程传递的路径索引,结合存储的节点间的路径数据获得环的具体路径,当迭代结束时,将所有的环结果发送回主线程。本发明提供的多线程回环枚举算法,利用高效的NumPy库,可以进一步提高算法的执行效率。

Figure 202111060114

The invention discloses a multi-thread loopback enumeration algorithm based on NumPy. First, data preprocessing is performed, multi-source heterogeneous data is processed into graph data of point sets and edge sets that meet the requirements, and a NumPy adjacency matrix is further constructed; In terms of threads, the path indexes of all rings are calculated separately, and the path indexes of this iteration relative to the previous iteration are generated by each iteration, and sent to the auxiliary thread for expansion; after the iteration is completed, the main thread receives all the rings sent by the auxiliary thread. Path; the auxiliary thread receives the path index passed by the main thread, and combines the stored path data between nodes to obtain the specific path of the ring. When the iteration ends, it sends all ring results back to the main thread. The multi-thread loopback enumeration algorithm provided by the present invention can further improve the execution efficiency of the algorithm by using the efficient NumPy library.

Figure 202111060114

Description

Multithreading loopback enumeration algorithm based on NumPy
Technical Field
The invention relates to the technical field of graph data processing, in particular to a multithreading loopback enumeration algorithm based on NumPy.
Background
The loop enumeration problem for graph data is a common problem in the field of graph analysis, and is often used in graph analysis scenarios such as reconnaissance and criminal investigation. In most real-world scenarios, the length of the ring to be enumerated is limited, and specifying an upper limit on the length of the enumerated ring can greatly mitigate computational costs. The existing loop enumeration algorithm is calculated based on the data of the structural graph and has the defects of large memory consumption and long calculation time.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the background art, the invention provides a multithreading loopback enumeration algorithm based on NumPy, which improves the efficiency of loop calculation through a multithreading technology and a NumPy library and provides an efficient solution method for loop calculation under different scenes
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a multithreading loopback enumeration algorithm based on NumPy comprises the following steps:
step S1, preprocessing data; processing multi-source heterogeneous data into graph data of a point set and an edge set meeting requirements through an ETL process, and further constructing an adjacency matrix based on the graph data;
step S2, calculating a path index by the main thread; the main thread executes n iterations, respectively calculates path indexes of all rings with the lengths of 1-n, generates a path index of the current iteration relative to the previous iteration through each iteration calculation, and sends the path index to the auxiliary thread for expansion; after the iteration is finished, the main thread receives all the paths of the rings sent by the auxiliary thread;
step S3, the assistant thread receives the path index sent by the main thread, and combines the stored path between the nodes and the path index to obtain the path of the ring; when the iteration is over, the auxiliary thread sends the paths of all the rings to the main thread.
Further, the constructing of the adjacency matrix in step S1 specifically includes:
acquiring an adjacent matrix of nodes from the graph data according to the input graph data and the maximum length of a ring to be calculated, constructing the adjacent matrix into a Boolean NumPy matrix, representing that the nodes are connected by adopting True, and representing that the nodes are not connected by adopting False; constructing a path dictionary with the distance of 1 hop between nodes from graph data, wherein keys are the ids of a starting node and a terminating node of an edge, and the value is the id of the edge; taking the maximum length of the ring to be calculated as the total iteration number N; the generated path dictionary is used as an initial parameter to start the auxiliary thread.
Further, the step S2 of calculating the path index by the main thread specifically includes:
s2.1, calculating a dot product of an adjacent matrix of i-1 hop and an initial adjacent matrix in the last iteration turn based on a dot operator of NumPy, and taking the dot product as the adjacent matrix of the ith hop;
s2.2, acquiring the position of a non-zero element in the row and column of the ith hop of adjacent matrix by using a nonzero method of NumPy; expressing the position in a row vector and column vector mode, acquiring row data of an adjacent matrix of i-1 jump by taking the row vector as a coordinate, and acquiring initial column data of the adjacent matrix by taking the column vector as a column coordinate;
s2.3, searching the position where the row data and the column data are equal through the AND operation and the where operator of the NumPy; obtaining a path index of the current turn according to the position where the row data and the column data are equal, and transmitting the current iteration turn i and the path index to an auxiliary thread for expansion; the diagonal element in the result of the dot product in step S2.1 is reset to 0.
Further, the iterative process executed by the assist thread in step S3 specifically includes:
taking an edge in graph data as an initial value of an inter-node path stored by an auxiliary thread, acquiring a path index obtained by each iteration and the total number of times of iteration by the auxiliary thread from a main thread, acquiring a path value between nodes of the current iteration according to the path and the index between the nodes stored by the auxiliary thread, selecting all loop paths obtained in the current iteration from all the inter-node path values, and updating the node path stored by the auxiliary thread into the inter-node path value obtained by the current iteration; and after the iteration is finished, sending the paths of all the rings to the main thread, and finishing the execution of the auxiliary thread.
Further, the auxiliary thread obtains a message queue of a transmission path index and a message queue of a transmission loop path result from the main thread;
firstly, the auxiliary thread obtains an initial path index from a message queue of the path index as an expansion result of the path index of the step 0 iteration; in a specific iteration process, the auxiliary thread obtains a current iteration turn and a path index from a message queue of the path index; the path index refers to the path index in the last iteration;
expanding the path index of the iteration according to the path index of the previous iteration, wherein the key of the path index represents the initial node and the final node of the path, and the value of the path index with the same initial node and final node is used as the path of the ring and is added into a list representing all the paths of the ring; updating a variable representing the path index of the previous step according to the path expansion result;
and when the current iteration id is more than or equal to the maximum iteration times, transmitting the list representing all loop paths to the main thread through the message queue for transmitting the loop path result, and finishing the loop.
Further, the ETL in the step S1 refers to obtaining satisfactory data from the multi-source heterogeneous data through extraction, conversion, and loading.
Has the advantages that:
the multithreading loopback enumeration algorithm based on the NumPy is provided, discovery and expansion of a loopback are decoupled by means of multithreading technology, and execution efficiency of the algorithm is further improved by means of efficient operation of a NumPy library. In order to deal with data differences of different scenes, a link of multi-source data preprocessing is designed, and the general type of the algorithm is further improved.
Drawings
FIG. 1 is a flow chart of the NumPy-based multi-threaded loopback enumeration algorithm of the present invention;
FIG. 2 is a flow chart of main thread and assist thread execution in an embodiment of the invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The multithreading loopback enumeration algorithm based on the NumPy is shown in figure 1 and specifically comprises the following steps:
step S1, preprocessing data; and processing the multi-source heterogeneous data into graph data of a point set and an edge set meeting requirements through an ETL process, and further constructing an adjacency matrix based on the graph data. The ETL process here refers to obtaining satisfactory data from multi-source heterogeneous data through various ways such as extraction, conversion, and loading. The finally constructed adjacency matrix is a NumPy matrix.
Acquiring an adjacent matrix of nodes from the graph data according to the input graph data and the maximum length of a ring to be calculated, constructing the adjacent matrix into a Boolean NumPy matrix, representing that the nodes are connected by adopting True, and representing that the nodes are not connected by adopting False; constructing a path dictionary with the distance of 1 hop between nodes from graph data, wherein keys are the ids of a starting node and a terminating node of an edge, and the value is the id of the edge; taking the maximum length of the ring to be calculated as the total iteration number N; the generated path dictionary is used as an initial parameter to start the auxiliary thread.
Step S2, calculating a path index by the main thread; the main thread executes n iterations, respectively calculates path indexes of all rings with the lengths of 1-n, generates a path index of the current iteration relative to the previous iteration through each iteration calculation, and sends the path index to the auxiliary thread for expansion; and after the iteration is finished, the main thread receives all the paths of the rings sent by the auxiliary thread. In particular, the amount of the solvent to be used,
and S2.1, calculating the dot product of the initial adjacency matrix and the adjacency matrix of the i-1 hop in the last iteration turn by using a dot operator based on the NumPy, and taking the dot product as the adjacency matrix of the i-th hop.
And calculating the i power of the adjacent matrix of the graph by using the i-1 power result of the last iteration and the adjacent matrix of the graph to perform dot product, wherein all the matrixes are stored by adopting a Boolean NumPy matrix, and the result is recorded as matrix _ i.
S2.2, acquiring the position of a nonzero element in matrix _ i by using a nonzero method of NumPy; its position is recorded with a row vector rows and a column vector columns. Extracting rows in matrix _ i according to row vectors rows to generate new row vectors new _ rows, extracting columns of an original image adjacent matrix according to column vectors, and generating new column vectors new _ columns; splitting a row vector new _ rows by using unique, cumsum and split operators of NumPy and performing grouping perfect operation on a column vector new _ columns, wherein the split row vector is new _ rows _ unique, and the grouped column vector is new _ columns _ group;
step S2.3, generating a new path index according to the new row vector and the new column vector, wherein the specific method comprises the following steps: taking elements of new _ rows _ unique and new _ columns _ group, packing the elements into tuples (row _ index, columns _ list) one by one, and generating new path indexes by taking (row [ row _ index ], columns [ row _ index ]) as keys, (row [ row _ index ], columns _ list, columns [ row _ index ]) as values, wherein (row [ row _ index ], column _ list) represents (row number, column number list) of the previous path dictionary, (column _ list, columns [ row _ index ]) represents (row number list, column number) of the initial path dictionary; initiating and sending the current round and the path index generated in the last step to an auxiliary thread; all diagonal elements of matrix _ i are set to 0, and the next round of the adjacency matrix dot product matrix _ (i-1) is updated with matrix _ i.
Step S3, the assistant thread receives the path index sent by the main thread, and combines the stored path between the nodes and the path index to obtain the path of the ring; when the iteration is over, the auxiliary thread sends the paths of all the rings to the main thread.
Taking an edge in graph data as an initial value of an inter-node path stored by an auxiliary thread, acquiring a path index obtained by each iteration and the total number of times of iteration by the auxiliary thread from a main thread, acquiring a path value between nodes of the current iteration according to the path and the index between the nodes stored by the auxiliary thread, selecting all loop paths obtained in the current iteration from all the inter-node path values, and updating the node path stored by the auxiliary thread into the inter-node path value obtained by the current iteration; and after the iteration is finished, sending the paths of all the rings to the main thread, and finishing the execution of the auxiliary thread. In particular, the amount of the solvent to be used,
the helper thread accepts several parameters as follows: the total number of iterations N, the queue path _ message _ queue of the transmission path index and the queue cycles _ queue of all rings transmitted. The auxiliary thread acquires an initial path dictionary from the path _ message _ queue as a path index of the step 0 iteration. And in the process of each iteration, the auxiliary thread acquires the updated path index of the main thread from the queue, expands the current path index by combining with the path dictionary of the previous iteration round, acquires the path dictionary of the current round, extracts all rings from the path dictionary, and adds the rings into a list representing all ring paths. The iteration ends, sending the results of all loops back to the main thread.
Further, obtaining the id of the current iteration and the path index of the current iteration from the path _ message _ queue; the path index is a variable of a dictionary type, represents an index of a path dictionary of the previous iteration, and expands the value of the path dictionary according to the variable, wherein the expanding method comprises the steps of obtaining a path list from the path dictionary of the previous iteration according to (rows [ row _ index ], column _ list ], obtaining the path list from an initial path dictionary according to (column _ list, columns [ row _ index ]), carrying out Cartesian product on the two lists, and storing the result as a new value in the path index of the current iteration round; assigning a variable of a path dictionary representing the previous round as an updated path index, acquiring all loop paths from the updated path index, and storing the loop paths into a list representing all loop paths; and judging whether the obtained id of the current round is larger than the total iteration number N, and if so, sending a list representing all the loop paths back to the main thread.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (6)

1. A multithreading loopback enumeration algorithm based on NumPy is characterized by comprising the following steps:
step S1, preprocessing data; processing multi-source heterogeneous data into graph data of a point set and an edge set meeting requirements through an ETL process, and further constructing an adjacency matrix based on the graph data;
step S2, calculating a path index by the main thread; the main thread executes n iterations, respectively calculates path indexes of all rings with the lengths of 1-n, generates a path index of the current iteration relative to the previous iteration through each iteration calculation, and sends the path index to the auxiliary thread for expansion; after the iteration is finished, the main thread receives all the paths of the rings sent by the auxiliary thread;
step S3, the assistant thread receives the path index sent by the main thread, and combines the stored path between the nodes and the path index to obtain the path of the ring; when the iteration is over, the auxiliary thread sends the paths of all the rings to the main thread.
2. The NumPy-based multithread loopback enumeration algorithm as recited in claim 1, wherein the constructing the adjacency matrix in step S1 specifically comprises:
acquiring an adjacent matrix of nodes from the graph data according to the input graph data and the maximum length of a ring to be calculated, constructing the adjacent matrix into a Boolean NumPy matrix, representing that the nodes are connected by adopting True, and representing that the nodes are not connected by adopting False; constructing a path dictionary with the distance of 1 hop between nodes from graph data, wherein keys are the ids of a starting node and a terminating node of an edge, and the value is the id of the edge; taking the maximum length of the ring to be calculated as the total iteration number N; the generated path dictionary is used as an initial parameter to start the auxiliary thread.
3. The NumPy-based multithread loopback enumeration algorithm of claim 2, wherein the calculating the path index by the main thread in the step S2 specifically comprises:
s2.1, calculating a dot product of an adjacent matrix of i-1 hop and an initial adjacent matrix in the last iteration turn based on a dot operator of NumPy, and taking the dot product as the adjacent matrix of the ith hop;
s2.2, acquiring the position of a non-zero element in the row and column of the ith hop of adjacent matrix by using a nonzero method of NumPy; expressing the position in a row vector and column vector mode, acquiring row data of an adjacent matrix of i-1 jump by taking the row vector as a coordinate, and acquiring initial column data of the adjacent matrix by taking the column vector as a column coordinate;
s2.3, searching the position where the row data and the column data are equal through the AND operation and the where operator of the NumPy; obtaining a path index of the ith iteration according to the position where the row data and the column data are equal, and transmitting the current iteration turn i and the path index to an auxiliary thread for expansion; the diagonal element in the result of the dot product in step S2.1 is reset to 0.
4. The NumPy-based multithread loopback enumeration algorithm as recited in claim 3, wherein the iterative process performed by the assist thread in step S3 specifically comprises:
taking an edge in graph data as an initial value of an inter-node path stored by an auxiliary thread, acquiring a path index obtained by each iteration and the total number of times of iteration by the auxiliary thread from a main thread, acquiring a path value between nodes of the current iteration according to the path and the index between the nodes stored by the auxiliary thread, selecting all loop paths obtained in the current iteration from all the inter-node path values, and updating the node path stored by the auxiliary thread into the inter-node path value obtained by the current iteration; and after the iteration is finished, sending the paths of all the rings to the main thread, and finishing the execution of the auxiliary thread.
5. The NumPy-based multithreading loopback enumeration algorithm of claim 4, wherein the assistant thread obtains a message queue of a pass path index and a message queue of a pass loop path result from a main thread;
firstly, the auxiliary thread obtains an initial path index from a message queue of the path index as an expansion result of the path index of the step 0 iteration; in a specific iteration process, the auxiliary thread obtains a current iteration turn and a path index from a message queue of the path index; the path index refers to the path index in the last iteration;
expanding the path index of the iteration according to the path index of the previous iteration, wherein the key of the path index represents the initial node and the final node of the path, and the value of the path index with the same initial node and final node is used as the path of the ring and is added into a list representing all the paths of the ring; updating a variable representing the path index of the previous step according to the path expansion result;
and when the current iteration id is more than or equal to the maximum iteration times, transmitting the list representing all loop paths to the main thread through the message queue for transmitting the loop path result, and finishing the loop.
6. The NumPy-based multithread loopback enumeration algorithm of claim 1, wherein the ETL in the step S1 refers to obtaining qualified data from multi-source heterogeneous data through extraction, conversion and loading.
CN202111060114.1A2021-09-102021-09-10 A multi-threaded loop enumeration algorithm based on NumPyActiveCN113934976B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111060114.1ACN113934976B (en)2021-09-102021-09-10 A multi-threaded loop enumeration algorithm based on NumPy

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111060114.1ACN113934976B (en)2021-09-102021-09-10 A multi-threaded loop enumeration algorithm based on NumPy

Publications (2)

Publication NumberPublication Date
CN113934976Atrue CN113934976A (en)2022-01-14
CN113934976B CN113934976B (en)2025-06-13

Family

ID=79275598

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111060114.1AActiveCN113934976B (en)2021-09-102021-09-10 A multi-threaded loop enumeration algorithm based on NumPy

Country Status (1)

CountryLink
CN (1)CN113934976B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1609792A (en)*2003-10-242005-04-27微软公司Programming interface for a computer program
CN1700211A (en)*2004-05-212005-11-23微软公司Method and system for graph analysis and synchronization
US20080208915A1 (en)*2007-02-282008-08-28Numenta, Inc.Episodic Memory With A Hierarchical Temporal Memory Based System
CN103440238A (en)*2012-03-092013-12-11辉达公司Fully parallel in-place construction of 3D acceleration structures in a graphics processing unit
EP3321796A1 (en)*2016-11-152018-05-16Palantir Technologies Inc.Multi-platform interface framework
CN112783503A (en)*2021-01-182021-05-11中山大学NumPy operation accelerated optimization method based on Arm framework

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1609792A (en)*2003-10-242005-04-27微软公司Programming interface for a computer program
CN1700211A (en)*2004-05-212005-11-23微软公司Method and system for graph analysis and synchronization
US20080208915A1 (en)*2007-02-282008-08-28Numenta, Inc.Episodic Memory With A Hierarchical Temporal Memory Based System
CN103440238A (en)*2012-03-092013-12-11辉达公司Fully parallel in-place construction of 3D acceleration structures in a graphics processing unit
EP3321796A1 (en)*2016-11-152018-05-16Palantir Technologies Inc.Multi-platform interface framework
CN112783503A (en)*2021-01-182021-05-11中山大学NumPy operation accelerated optimization method based on Arm framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
束学渊;汪立新;: "联合循环平稳特征PCA与XGBoost的频谱感知", 计算机应用与软件, no. 04, 12 April 2020 (2020-04-12)*

Also Published As

Publication numberPublication date
CN113934976B (en)2025-06-13

Similar Documents

PublicationPublication DateTitle
Wu et al.Mixed precision quantization of convnets via differentiable neural architecture search
Zhong et al.Practical block-wise neural network architecture generation
Pavone et al.Clonal selection: an immunological algorithm for global optimization over continuous spaces
US9939792B2 (en)Systems and methods to adaptively select execution modes
Gajpal et al.An ant colony algorithm for scheduling in flowshops with sequence-dependent setup times of jobs
CN113836174B (en)Asynchronous SQL (structured query language) connection query optimization method based on reinforcement learning DQN (direct-to-inverse) algorithm
CN111324630B (en)MPI-based neural network architecture search parallelization method and equipment
Yan et al.Hm-nas: Efficient neural architecture search via hierarchical masking
CN108021507B (en)Parallel path searching method and device for symbol execution
JP6325762B1 (en) Information processing apparatus, information processing method, and information processing program
CN109815537B (en) A time-based prediction-based high-throughput material simulation calculation optimization method
CN109074348A (en) Apparatus and iterative methods for iterative clustering of input datasets
JP6208018B2 (en) Image recognition algorithm combination selection device
CN118171233A (en)Attention mechanism-based space-time fusion map generation method facing multimode traffic
Hayashi et al.Assembly sequence optimization of spatial trusses using graph embedding and reinforcement learning
CN113934976A (en) A Multithreaded Loopback Enumeration Algorithm Based on NumPy
CN119861762A (en)Initial pressure optimizing method and device for operation of thermal power generating unit
CN114254117A (en) A Knowledge Graph Reasoning Method Based on Data Augmentation and Adaptive Negative Sampling
US9946779B2 (en)Pipleline merge operations using source data and multiple destination data structures
CN108122033B (en) Training method of neural network and neural network obtained by the training method
Shen et al.Bs-nas: Broadening-and-shrinking one-shot nas with searchable numbers of channels
CN104570759A (en)Fast binary tree method for point location problem in control system
WO2021068529A1 (en)Image recognition method and apparatus, computer device and storage medium
CN119988236B (en) Intelligent equipment software testing method based on convolutional neural network combined with genetic algorithm
Detkov et al.Reparameterization through spatial gradient scaling

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp