Movatterモバイル変換


[0]ホーム

URL:


HK40027310B - Method and apparatus for processing task, and computer-readable storage medium - Google Patents

Method and apparatus for processing task, and computer-readable storage medium
Download PDF

Info

Publication number
HK40027310B
HK40027310BHK42020017548.7AHK42020017548AHK40027310BHK 40027310 BHK40027310 BHK 40027310BHK 42020017548 AHK42020017548 AHK 42020017548AHK 40027310 BHK40027310 BHK 40027310B
Authority
HK
Hong Kong
Prior art keywords
task
target
matrix
processing unit
processing
Prior art date
Application number
HK42020017548.7A
Other languages
Chinese (zh)
Other versions
HK40027310A (en
Inventor
严石伟
李明耀
丁凯
蒋楠
Original Assignee
腾讯科技(深圳)有限公司
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司filedCritical腾讯科技(深圳)有限公司
Publication of HK40027310ApublicationCriticalpatent/HK40027310A/en
Publication of HK40027310BpublicationCriticalpatent/HK40027310B/en

Links

Description

Task processing method and device and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a task processing method and apparatus, and a computer-readable storage medium.
Background
With the rapid development of information technology, the traditional retail and electronic commerce are gradually difficult to meet the consumption requirements of people, innovative technologies represented by technologies such as big data, cloud computing, artificial intelligence and the like show more and more important roles in the retail field, and an intelligent retail mode is derived. In the field of intelligent retail, user information can be acquired in real time through a deep learning calculation framework in an artificial intelligence technology and processed, and therefore passenger flow information is obtained.
In the related art, when user information is processed through a deep learning computation framework, a plurality of tasks for image processing need to be processed simultaneously, and in order to ensure computation speed, a plurality of tasks can be concentrated on a GPU (graphics processing Unit) for performing targeted computation processing.
In the research and practice process of the related art, the inventors of the present application found that, in the prior art, when a plurality of image processing tasks are all concentrated on a GPU for processing, the load of the GPU is too large, so that task processing is blocked, the task processing speed is affected, and the task processing efficiency is too low.
Disclosure of Invention
The embodiment of the application provides a task processing method and device and a computer readable storage medium, which can improve task processing efficiency.
The embodiment of the application provides a task processing method, which comprises the following steps:
determining at least one object processing task for the target image;
determining a task type of each of the object processing tasks;
determining the task level of the object processing task according to the task type;
acquiring first load information of an image processing unit and second load information of a central processing unit, wherein the processing priority of the image processing unit to an object processing task is higher than that of the central processing unit;
determining a target load ratio based on the first load information and the second load information;
when the target load ratio is larger than a preset threshold, determining a task level range needing to be processed by the image processing unit according to a threshold interval where the target load ratio is located;
and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing.
Correspondingly, an embodiment of the present application further provides a task processing device, including:
a first determination unit for determining at least one object processing task of a target image;
a second determining unit configured to determine a task type of each of the object processing tasks;
a third determining unit, configured to determine a task level of the object processing task according to the task type;
an acquisition unit configured to acquire first load information of an image processing unit and second load information of a central processing unit, wherein the image processing unit has a higher processing priority for an object processing task than the central processing unit;
a fourth determining unit configured to determine a target load ratio based on the first load information and the second load information;
a fifth determining unit, configured to determine, when the target load ratio is greater than a preset threshold, a task level range that needs to be processed by the image processing unit according to a threshold interval in which the target load ratio is located;
and the distribution unit is used for distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing.
In some embodiments, the processing subunit may be to: acquiring a plurality of data vectors corresponding to each subtask in a target subtask queue and a feature matrix corresponding to each data vector; performing aggregation processing on the plurality of data vectors to obtain a data matrix; aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix; acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue;
and transmitting the shared database and the target subtask queue after matrix aggregation to the image processing unit through a video memory bandwidth, and processing the target subtask queue after matrix aggregation based on the shared database.
In some embodiments, the fifth determining unit may include:
a dividing subunit, configured to divide a threshold interval greater than the preset threshold into a first threshold interval and a second threshold interval, where the second threshold interval is greater than the first threshold interval;
a first determining subunit, configured to determine that a task level range that needs to be processed by the image processing unit includes a first task level and a second task level if a threshold interval in which the target load ratio is located is the first threshold interval;
and the second determining subunit is configured to determine that the task level range that needs to be processed by the image processing unit includes the first task level if the threshold interval in which the target load ratio is located is the second threshold interval.
In some embodiments, the fourth determining unit may include:
and the calculating subunit is configured to perform ratio calculation on the first load information and the second load information to obtain a target load ratio between the first load information and the second load information.
In some embodiments, the task processing device may further include:
and the processing unit is used for distributing all object processing tasks to the image processing unit for processing when the target load ratio is smaller than a preset threshold value.
Accordingly, the embodiment of the present application further provides a computer-readable storage medium, where a plurality of instructions are stored, and the instructions are suitable for being loaded by a processor to perform the task processing method described above.
The method comprises the steps of determining at least one object processing task of a target image; determining a task type of each object processing task; determining the task level of the object processing task according to the task type; acquiring first load information of an image processing unit and second load information of a central processing unit; determining a target load ratio based on the first load information and the second load information; when the target load ratio is larger than a preset threshold, determining a task level range to be processed by the image processing unit according to a threshold interval where the target load ratio is located; and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing. Therefore, the load ratio formed among different processing units is monitored by determining the task type of the object processing task, and when the load ratio reaches a certain condition, the object processing tasks of different task levels are distributed to different processing units according to the load ratio for parallel processing, so that the processing pressure of the processing units is relieved, and the task processing efficiency is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a scenario of a task processing system according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of a first task processing method according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating a second task processing method according to an embodiment of the present application.
Fig. 4 is a task scheduling diagram of a task processing method according to an embodiment of the present application.
Fig. 5 is a schematic view of subtask aggregation in a task processing method according to an embodiment of the present disclosure.
Fig. 6 is a block diagram of a first task processing device according to an embodiment of the present application.
Fig. 7 is a block diagram of a second task processing device according to an embodiment of the present application.
Fig. 8 is a block diagram of a third task processing device according to an embodiment of the present application.
Fig. 9 is a block diagram of a fourth task processing device according to an embodiment of the present application.
Fig. 10 is a block diagram of a fifth task processing device according to an embodiment of the present application.
Fig. 11 is a block diagram of a sixth task processing device according to an embodiment of the present application.
Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a task processing method and device, a computer readable storage medium and a terminal. Specifically, the embodiment of the application provides a task processing device suitable for computer equipment. The computer device may be a terminal or a server, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited herein.
Referring to fig. 1, fig. 1 is a schematic view of a scene of a task processing system provided in an embodiment of the present application, and the task processing system includes a camera device and a network device (the task processing system may further include a plurality of camera devices, a specific number of the camera devices is not limited herein), the camera device and the network device may be connected through a communication network, and the communication network may include a wireless network and a wired network, where the wireless network includes one or a combination of multiple wireless wide area networks, wireless local area networks, wireless metropolitan area networks, and wireless personal area networks. The network includes network entities such as routers, gateways, etc., which are not shown in the figure. The camera device may perform information interaction with the network device through the communication network, for example, the camera device may capture a video stream in real time, and send the video stream to the network device through the communication network.
The task processing system may include a task processing device, which may be specifically integrated in a network device such as a terminal or a server, as shown in fig. 1, where the network device receives a video stream sent by a camera, receives a target processing task when processing the video stream, determines a task type of the target processing task, determines a task level of the target processing task according to the task type of the target processing task, and simultaneously obtains first load information of an image processing unit and second load information of a central processing unit, calculates a ratio of the first load information to the second load information, determines a load ratio, and distributes the target processing task to the image processing unit and the central processing unit for processing according to the task level of the target processing task and the load ratio of the image processing unit to the central processing unit. The image processing unit and the central processing unit can be used for processing the object processing task at the same time, so that the task processing speed is increased, and the task processing efficiency is improved.
It should be noted that the scenario diagram of the task processing system shown in fig. 1 is only an example, and the task processing system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows, with the evolution of the task processing system and the occurrence of a new business scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
Based on the above problems, embodiments of the present application provide a first task processing method, a first task processing device, and a computer-readable storage medium, which can effectively improve the task processing speed of a server. The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
The embodiments of the present application provide a task processing method, which may be executed by a terminal or a server, and the embodiments of the present application describe a task processing method executed by a server as an example.
As shown in fig. 2, fig. 2 is a schematic flowchart of a first task processing method provided in the embodiment of the present application. The specific flow of the task processing method can be as follows:
101. at least one object processing task of the target image is determined.
The scene of the embodiment of the application can be applied to a large-scale shopping mall or a supermarket, a monitoring camera can be arranged at the main position of the large-scale shopping mall or the supermarket and used for shooting behavior information of customers in real time to generate a video, and the information such as passenger flow data, customer identities, shopping tracks and the like is provided for the shopping mall and the shop by processing the behavior information of the customers in the video. When the customer behavior information is processed, the system background service outputs final information such as passenger flow, identity and the like for the customer through effective packaging of a Software Development Kit (SDK), reasonable scheduling of a request task, efficient and high-availability processing of output data.
The object processing task may be an image processing task, and the image processing may refer to performing corresponding image processing by extracting a video frame of a video acquired by the camera device.
102. A task type for each object processing task is determined.
After receiving the object processing task, a task type of the object processing task may be determined. The task type of the object processing task may be a type of an image processing task, for example, the image processing task may include a plurality of image processing types of face and body detection, face and body tracking, face and body optimization, face registration, body key points, face and body binding, face and body extraction features, extraction properties, retrieval, and the like.
103. And determining the task level of the object processing task according to the task type.
The task types of the object processing task may include multiple types, and the calculation complexity of each task type is different, for example, the face, the features of the human body trajectory, the attributes of the human body trajectory (the face may be gender and age, and the human body may be handbag or clothing), the trajectory retrieval, and the like. When the SDK completes the functions of extracting features, extracting attributes, tracking, retrieving and the like, a plurality of elements of the human face need to be considered, so the computational complexity of the tasks of extracting features, extracting attributes and retrieving of the human body of the human face is high, and the tasks can be computationally intensive. Detecting and binding human faces: the human face detection is to obtain a human face frame, the human body detection is to obtain a human body frame, and the binding is to bind the obtained human face and the corresponding human body to determine that the human face and the corresponding human body are the same person. And human face and human body optimization, key point and registration: the face quality score calculation is preferably completed, the key points are obtained face or human body coordinates, and the registration is obtained face five-point information, so that the calculation complexity of the tasks of face and human body optimization, key points and registration is low, and the tasks can be non-calculation intensive tasks.
In some embodiments, the step of determining a task level of the object processing task according to a computational complexity of the task type may include:
the object processing task is classified into a first task level, a second task level, and a third task level based on the task type, wherein the first task level is better than the second task level, and the second task level is better than the third task level.
Wherein the object processing task may be divided into a first task level, a second task level, and a third task level based on a computational complexity of the task type. The computational complexity of the first task level may be higher than the computational complexity of the second task level, which may be higher than the computational complexity of the third task level, each level corresponding to a respective type of processing task.
For example, object processing tasks may include types of tracking, feature extraction, attribute extraction, retrieval, detection, binding, preference, registration, keypoint extraction, and the like. The task type of tracking, feature extraction, attribute extraction and retrieval can be determined as a first task level if the computation complexity involved in the task of tracking, feature extraction, attribute extraction and retrieval is high; the computational complexity involved in the tasks of the detection and binding types is general, and the level levels of the detection and binding task types can be determined as a second task level; the task of the preference, registration, and keypoint type involves a lower computational complexity, and the rank level of the task types of preference, registration, and keypoint may be determined as the third task level. For example, as shown in table 1:
TABLE 1
104. First load information of an image processing unit and second load information of a central processing unit are acquired, wherein the processing priority of the image processing unit for the object processing task is higher than that of the central processing unit.
The image Processing Unit and the Central Processing Unit may be a GPU (Graphics Processing Unit) and a CPU (Central Processing Unit).
The GPU is also called a display core, a visual processor, and a display chip, and is a microprocessor that is specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (e.g., a tablet computer, a smart phone, etc.). The display control circuit is used for converting and driving display information required by a computer system, providing a line scanning signal for a display and controlling the display of the display correctly, is an important element for connecting the display and a personal computer mainboard, and is also one of important equipment for man-machine conversation. The display card is used as an important component in the computer host and takes on the task of outputting display graphics.
The CPU is one of the main devices of an electronic computer, and is a core accessory in the computer. Its functions are mainly to interpret computer instructions and to process data in computer software. The CPU is responsible for reading the instructions, decoding the instructions and executing the core components of the instructions in all operations in the computer.
In the embodiment of the application, since most object processing tasks are image processing tasks, the processing is mainly performed on the GPU processor, and a small number of object processing tasks can be completed with the assistance of the CPU.
Thus, the first load information of the image processing unit may be a load of the GPU, and the second load information of the central processing unit may be a load of the CPU. The load refers to the number of tasks that the processor can simultaneously process within a certain time, for example, the number of tasks that the processor can simultaneously process may be 10, the number of actually processed tasks may be 4, and the load of the current processor may be 4:10, that is, 0.4, where the load information reflects a busy processing state of the processor, and the higher the load information is, the more busy the processor is, and the lower the load information is, the more idle the processor is.
105. A target load ratio is determined based on the first load information and the second load information.
The target load ratio is determined according to the load of the first processor and the load of the second processor, the target load ratio can reflect the load relative relationship between the image processing unit and the central processing unit, when the target load ratio is high, the image processing unit is higher in load degree, the central processing unit is lower in load degree, and when the target load ratio is low, the image processing unit is lower in load degree, and the central processing unit is higher in load degree. In some embodiments, the determining the target load ratio according to the first load information and the second load information may include:
and calculating the ratio of the first load information to the second load information to determine the target load ratio.
Wherein, after acquiring the first load of the image processing unit and the second load of the central processing unit, the ratio of the first load to the second load can be calculated. For example, the first load of the image processing unit may be 0.4 and the second load of the central processing unit may be 0.4, and the target load ratio may be determined to be 1.
106. And when the target load ratio is larger than a preset threshold, determining the task level range to be processed by the image processing unit according to the threshold interval where the target load ratio is located.
In the related art, the object processing tasks are all run on the GPU, and CPU resources on the system are not basically utilized, which results in waste of CPU resources, actually, some object processing tasks with lower computational complexity, for example, tasks such as tracking, prioritizing, and registering, are calculated on the CPU at a speed similar to that of the GPU, so that when the GPU is busy, some tasks with lower computational complexity may be allocated to the CPU for calculation, so as to increase the efficiency of task processing, the preset threshold is a value defining that the GPU is busy, for example, 1.1, and when the target load ratio exceeds the preset threshold, it is indicated that the GPU is at a higher load level with respect to the CPU.
Furthermore, when the target load ratio is not higher than the preset threshold value, it is indicated that the degree of exceeding of the GPU with respect to the CPU load is not high, and some object processing tasks at task levels with low computational complexity may be allocated to the central processing unit for processing. Under the condition that the target load ratio is higher than the preset threshold value, the exceeding degree of the GPU relative to the CPU load is higher, some task-level object processing tasks with low computational complexity and some task-level object processing tasks with general computational complexity can be distributed to the central processing unit for processing, the pressure of the GPU is relieved better, and the information calculation efficiency is increased.
In some embodiments, assuming that the first load may be 0.3 and the second load may be 0.6, the target load ratio may be determined to be 0.5, which is less than the preset threshold. Under the condition that the target load ratio is smaller than the preset threshold value, the GPU load is lower than the CPU load, and the GPU is in a relatively idle state, some task-level object processing tasks with low computational complexity, task-level object processing tasks with general computational complexity and object processing tasks with high computational complexity can be distributed to the image processing unit for processing, and under the condition that the image processing unit works normally, the task processing efficiency is improved.
In some embodiments, the step of determining the task level range that the image processing unit needs to process according to the threshold interval in which the target load ratio is located may include:
(1) dividing a threshold interval larger than a preset threshold into a first threshold interval and a second threshold interval, wherein the second threshold interval is larger than the first threshold interval;
(2) if the threshold interval in which the target load ratio is located is a first threshold interval, determining that the task level range needing to be processed by the image processing unit comprises a first task level;
(3) and if the threshold interval in which the target load ratio is located is the second threshold interval, determining that the task level range needing to be processed by the image processing unit comprises a first task level and a second task level.
The threshold interval greater than the preset threshold may be divided into a first threshold interval and a second threshold interval, where the first threshold interval may be smaller than the second threshold interval. For example, if the preset threshold is 1.1, the first threshold interval may be set to (1.1, 2), and the second threshold interval may be [2, + ∞ ], where (1.1, 2) represents a threshold interval greater than 1 and less than 2, and [2, + ∞) represents a threshold interval greater than or equal to 2.
Based on this, when the target load ratio is in the first threshold section, it indicates that the load of the image processing unit is larger than that of the central processing unit, but the usage rate of the image processing unit is still in a normal state. At this time, it may be determined that the task level range that the image processing unit needs to process includes the first task level and the second task level.
In some embodiments, when the target load ratio is in the second threshold interval, indicating that the load of the image processing unit is much larger than that of the central processing unit, the usage rate of the image processing unit exceeds the normal state. At this time, it may be determined that the task level range that the image processing unit needs to process includes the first task level.
107. And distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing.
When the target load ratio is in the first threshold interval, the object processing tasks of the first task level and the second task level may be allocated to the image processing unit for processing, the object processing tasks of the third task level may be allocated to the central processing unit for processing, and the object processing tasks are processed by the image processing unit and the central processing unit together, so as to improve task processing efficiency.
Alternatively, when the target load ratio is within the second threshold interval, the target processing task at the first task level may be assigned to the image processing unit and processed, and the target processing task at the second task level and the target processing task at the third task level may be assigned to the central processing unit and processed. By scheduling the object processing tasks with low task level to the central processing unit for processing, the reasonable utilization of computing resources during multi-task parallel processing is realized, and the processing efficiency is improved.
In some embodiments, the step of assigning the object processing task having the task level within the task level range to the image processing unit for processing may include:
(1.1) acquiring a subtask queue corresponding to the object processing task, wherein the subtask queue comprises a plurality of subtasks;
(1.2) task aggregation is carried out on the subtasks with the same subtask attribute, and a target subtask queue after the task aggregation is generated;
and (1.3) carrying out matrix aggregation on the data matrix and the characteristic matrix of the target subtask queue, and processing the target subtask queue after matrix aggregation through the image processing unit.
The object processing task may be a plurality of object processing tasks, and each object processing task may include a plurality of subtasks. Based on this, the subtasks corresponding to all the object processing tasks are collected, and a subtask queue can be obtained.
For example, the object processing task may include a first object processing task, a second object processing task, and a third generation processing task, the first object processing task may include a first sub-task, a second sub-task, a third sub-task; the second object processing task may include a fourth subtask, a fifth subtask, and a sixth subtask; the third object processing task may include a seventh sub-task, an eighth sub-task, a ninth sub-task, and the like. Then, a sub-task queue may be determined based on sub-tasks corresponding to all object processing tasks, where the sub-task queue includes: the task management system comprises a first subtask, a second subtask, a third subtask, a fourth subtask, a fifth subtask, a sixth subtask, a seventh subtask, an eighth subtask, and a ninth subtask.
After the subtask queue is determined, task aggregation may be performed on subtasks with the same subtask attribute in the subtask queue, so as to generate an aggregated target subtask queue. Wherein the subtask attribute may indicate a type of the subtask.
For example, a first subtask, a second subtask, a third subtask, a fourth subtask, a fifth subtask, and so on may be included in the subtask queue, and an attribute of the subtask of the first subtask may be the first attribute, an attribute of the subtask of the second subtask may be the first attribute, an attribute of the third subtask may be the first attribute, an attribute of the fourth subtask may be the second attribute, and an attribute of the fifth subtask may be the third attribute. Based on this, it can be determined that the first subtask, the second subtask, and the third subtask have the same subtask attribute, and the subtasks having the same subtask attribute can be aggregated based on the subtask attribute, so that a target subtask queue after the task aggregation can be obtained, where the target subtask queue includes the first subtask, the second subtask, and the third subtask.
After the target subtask queue is determined, matrix aggregation may be performed on the data matrix and the feature matrix of the target subtask queue, and the target subtask queue after matrix aggregation may be processed by the image processing unit. The data matrix may be a plurality of data to be processed by the subtask, and the feature matrix may be feature information of each data. For example, each image may include a plurality of pixels, each pixel is a datum, and a pixel datum includes a plurality of parameter information (color value, brightness value, etc.), and then the plurality of parameter information may be feature information of each datum.
In some embodiments, the processing, by the image processing unit, the target subtask queue after matrix aggregation may include:
(2.1) acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue;
and (2.2) transmitting the shared database and the target subtask queue after matrix aggregation to the image processing unit through video memory bandwidth, and processing the target subtask queue after matrix aggregation based on the shared database.
The shared database may include data information required by the subtasks in the target task queue during the calculation process. The shared database required for subtasks of different task attributes may be different.
In the related art, a shared database needs to be acquired once for each processing of a subtask in a target task queue. When the number of subtasks in the target task queue is large, the shared database needs to be acquired for many times correspondingly, and video memory bandwidth congestion is caused when video memory data are loaded to the video memory bandwidth, so that the waiting time of task processing congestion is too long, and the efficiency of task processing is affected. According to the embodiment of the application, the subtasks in the target task queue are aggregated according to the preset number, so that the acquisition times of the shared database are reduced, the smoothness of the video memory bandwidth is ensured when the video memory data are loaded to the video memory bandwidth, the time for waiting for task processing jam is reduced, and the task processing efficiency can be improved.
After determining the shared database corresponding to the target subtask queue, the target subtask queue and the shared database corresponding to the target subtask queue may be transmitted to the image processing unit through a video memory bandwidth, and then, in the image processing unit, the subtasks in the target subtask queue may be processed based on the shared database.
The video memory bandwidth refers to the data transmission rate between the display chip and the video memory, and is in bytes/second. Video memory bandwidth is one of the most important factors determining the performance and speed of a video card. The video memory, also called a frame buffer, is used for storing rendering data processed or to be extracted by the video card chip. Today, high density operations are done by GPUs on graphics cards, thereby further increasing the dependency on graphics memory. Since the display is on the video card, the speed and bandwidth of the display directly affect the overall speed of the video card.
In some embodiments, after the step of generating the target subtask queue after task aggregation, the method may further include:
(3.1) when detecting that the number of the subtasks of the target subtask queue meets the preset number, performing a step of performing matrix aggregation on the data matrix and the feature matrix of the target subtask queue;
and (3.2) when the fact that the number of the subtasks of the target subtask queue does not meet the preset number is detected, the subtasks with the same subtask attribute are waited for task aggregation within the preset time, and the step of carrying out matrix aggregation on the data matrix and the feature matrix of the target subtask queue is carried out.
The preset number may be the maximum number of subtasks for performing aggregation processing. For example, the number of the subtasks in the target subtask queue may be 5, and the preset number may be 5, at this time, the 5 subtasks in the target subtask queue may be aggregated. The data matrixes of the 5 subtasks can be aggregated to obtain an aggregated data matrix; and aggregating the feature matrixes of the 5 subtasks to obtain an aggregated feature matrix.
For another example, the number of the subtasks in the target subtask queue may be 2, and the preset number may be 5, where the number of the tasks in the target subtask queue does not satisfy the preset number, and in order to improve the parallel processing efficiency of the processing unit, a preset time may be set, and the subtasks with the same subtask attribute may be waited for aggregation within the preset time. The preset time may be 20ms (milliseconds), and when waiting for the aggregation of the subtasks with the same subtask attribute, there may be several cases: in the first case, when the waiting time reaches 10ms and the number of the subtasks with the same attribute reaches the preset number, the waiting is stopped, and the subtasks in the target subtask queue are aggregated; in the second case, when the waiting time reaches the preset time of 20ms, and the number of the subtasks with the same attribute still does not reach the preset number, the waiting is stopped, and the subtasks in the target subtask queue are aggregated.
In some embodiments, the matrix aggregating all data matrices and feature matrices of the target subtask queue may include:
(4.1) acquiring a plurality of data vectors corresponding to each subtask in the target subtask queue and a feature matrix corresponding to each data vector;
(4.2) carrying out aggregation processing on the plurality of data vectors to obtain a data matrix;
and (4.3) aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix.
The data vectors represent data which needs to be processed by each subtask, wherein each data vector corresponds to a feature matrix, and the feature matrix comprises feature information of each data vector.
After the data vector of each subtask in the target task queue is obtained, the data vectors of all subtasks can be aggregated to obtain a data matrix of the target subtask queue; after the feature matrix of each data vector is obtained, aggregation processing can be performed on the feature matrices of all the data vectors to obtain a target feature matrix, and a target subtask queue subjected to matrix aggregation processing is obtained according to the data matrix and the target feature matrix.
In some embodiments, the processing, by the image processing unit, the target subtask queue after matrix aggregation may include:
(5.1) acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue;
and (5.2) transmitting the shared database, the data matrix and the target characteristic matrix to the image processing unit through a video memory bandwidth, and processing the data matrix and the target characteristic matrix based on the shared database.
The step of obtaining the shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue is described in the above step, and is not described herein again.
After determining a shared database corresponding to subtasks in a target subtask queue, transmitting a data matrix obtained by aggregating data vectors, a target feature matrix obtained by aggregating the feature matrix, and the shared database to an image processing unit, and performing corresponding calculation processing on the data matrix and the target feature matrix in the image processing unit through the shared database.
The method comprises the steps of determining at least one object processing task of a target image; determining a task type of each object processing task; determining the task level of the object processing task according to the task type; acquiring first load information of an image processing unit and second load information of a central processing unit; determining a target load ratio based on the first load information and the second load information; when the target load ratio is larger than a preset threshold, determining a task level range to be processed by the image processing unit according to a threshold interval where the target load ratio is located; and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing. Therefore, the load ratio formed among different processing units is monitored by determining the task type of the object processing task, and when the load ratio reaches a certain condition, the object processing tasks of different task levels are distributed to different processing units according to the load ratio for parallel processing, so that the processing pressure of the processing units is relieved, and the task processing efficiency is greatly improved.
Based on the above description, the task processing method of the present application will be further described below by way of example. In the present embodiment, the task processing device will be described by taking an example in which the task processing device is specifically integrated in a server. Referring to fig. 3, fig. 3 is a flowchart illustrating a second task processing method according to an embodiment of the present application. The specific process can be as follows:
201. the server acquires the object processing task and determines the task type of the object processing task.
The embodiment of the application can be applied to an intelligent retail system, human face information is collected through the camera, the human face information is processed, and passenger flow information is determined. When the human face and human body information is processed, the complete AI functional line of human face detection, human body detection, tracking, optimization, human face registration, human body key points, human face and human body binding, characteristic extraction, attribute extraction, retrieval and the like is processed. The AI function line may be a task queue, and the task queue may include a plurality of object processing tasks, and process the plurality of object processing tasks to complete the processing of the human body information of the human face.
Thus, the server can acquire the task from the request task queue as the object processing task. After the object processing task is acquired, the task type of the object processing task may be further determined. For the task types of the object processing tasks, reference is made to the above embodiments, and details are not repeated here.
202. And the server determines the task level of the object processing task according to the object processing task type.
The server may first obtain a correspondence between each sample task type and each sample level.
For example, sample task types may include: tracking, features, attributes, retrieval, detection, binding, preference, registration, key points, etc., the sample level hierarchy may include a first task level, a second task level, a third task level, etc., and the sample task type may correspond to the sample level hierarchy by: tracking, characterizing, attributing, retrieving corresponding to a first task level, detecting and binding corresponding to a second task level, and optimizing, registering and corresponding to a third task level by a key point.
In an embodiment, the computation complexity of the first task level is higher than that of the second task level, and the computation complexity of the second task level is higher than that of the third task level, that is, the computation pressure of the processor in processing the tracking, feature, attribute, and retrieval task type is higher than the computation pressure of the processor in processing the detection and binding task type, and the computation pressure of the processing task type in detection and binding is higher than the computation pressure of the task type in preference, registration, and key point.
203. The server acquires the load information of the image processing unit and the load information of the central processing unit, and determines a target load ratio.
The image processing unit may be a GPU processing unit, the central processing unit may be a CPU processing unit, and the processing efficiency of the image processing unit on the object processing task may be better than that of the central processing unit on the object processing task.
After the load information of the image processing unit and the load information of the central processing unit are obtained, the ratio of the load information of the image processing unit to the load information of the central processing unit can be calculated, so that a target load ratio can be determined, the target load ratio can represent the proportion of processing tasks of the image processing unit and the central processing unit, the target load ratio can represent the load relative relationship between the image processing unit and the central processing unit, when the target load ratio is higher, the load degree of the image processing unit is larger, when the load degree of the central processing unit is relatively lower, the load degree of the image processing unit is lower, and when the target load ratio is lower, the load degree of the central processing unit is higher.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating task scheduling of a task processing method according to an embodiment of the present application, fig. 4. As shown in fig. 4, the upper left part of fig. 4 represents the request task queue of the image processing unit, the lower part represents the request task queue of the central processing unit, the uppermost "t 1, t2, t3, t4, t5, t6, t7, t 8" on the left side of fig. 4 represents a time period, and the rectangular box below the time period represents the load state of the processing unit, that is, load information. See the rectangular boxes below fig. 4, where the different rectangular boxes respectively represent different load states, and the load states may include a first load state, a second load state, a third load state and a fourth load state, where the first load state represents that the load degree of the processing unit is low, the second load state represents that the load degree of the processing unit is low, the third load state represents that the load degree of the processing unit is high, and the fourth load state represents that the load degree of the processing unit is high.
Referring to fig. 4, during the time period t1, there is no object processing task in the request task queue of the image processing unit and no object processing task in the request task queue of the central processing unit, and at this time, both the image processing unit and the central processing unit are in the first load state.
In the time period t2, the request task queue of the image processing unit includes the object processing task: and detecting 1, wherein no object processing task exists in a request task queue of the central processing unit, the image processing unit is in the second load state, and the central processing unit is in the first load state.
In the time period t3, the object processing task is included in the request task queue of the image processing unit: registering 1 and detecting 2, wherein no object processing task exists in a request task queue of the central processing unit, at the moment, the image processing unit is in a second load state, and the central processing unit is in a first load state.
In the time period t4, the request task queue of the image processing unit includes the object processing task: preferably, the method comprises the steps of 1, registering 2, detecting 3 and detecting 4, wherein no object processing task exists in a request task queue of the central processing unit, at the moment, the image processing unit is in the third load state, and the central processing unit is in the first load state.
In the time period t5, the request task queue of the image processing unit includes the object processing task: feature 1, preferably 2, registration 3, registration 4, detection 5, no object processing task is in the request task queue of the central processing unit, at this time, the image processing unit is in the fourth load state, and the central processing unit is in the first load state.
In the time period t6, the request task queue of the image processing unit includes the object processing task: retrieving 1, feature 2, preferably 3, preferably 4, registering 5, the central processing unit having no object processing task in the request task queue, the image processing unit being in the fourth load state and the central processing unit being in the first load state.
In the time period t7, the request task queue of the image processing unit includes the object processing task: search 2, feature 3, feature 4, and preferably 5, the image processing unit is in the third load state and the central processing unit is in the first load state when there is no object processing task in the request task queue of the central processing unit.
In the time period t8, the request task queue of the image processing unit includes the object processing task: and searching 3, searching 4 and characteristic 5, wherein the request task queue of the central processing unit has no object processing task, and at the moment, the image processing unit is in the second load state, and the central processing unit is in the first load state. The load status of the image processing unit and the load status of the central processing unit can be determined by the task queue in the image processing unit on the left side of fig. 4 and the task queue in the central processing unit. In each period, a target load ratio of the image processing unit to the central processing unit may be determined from the load state of the image processing unit and the load state of the central processing unit.
204. The server judges whether the target load ratio is larger than a first preset threshold value.
After determining the target load ratio, the server may determine whether the target load ratio is greater than a first preset threshold, where the first preset threshold may be 1.1, and is not limited herein.
For example, if the target load ratio may be 1.5 and the first preset threshold may be 1.1, it may be determined that the target load ratio is greater than the first preset threshold, and step 205 may be performed; for another example, if the target load ratio may be 0.5 and the first preset threshold may be 1.1, it may be determined that the target load ratio is smaller than the first preset threshold, and step 208 may be performed.
205. The server judges whether the target load ratio is larger than a second preset threshold value.
After it is determined that the target load ratio is greater than the first preset threshold, it may be continuously determined whether the target load ratio is greater than a second preset threshold, where the second preset threshold may be 2, and the second preset threshold is greater than the first preset threshold.
For example, the target load ratio may be 2.5, and the second preset threshold may be 2, then it may be determined that the target load ratio is greater than the second preset threshold, and step 206 may be performed; for another example, if the target load ratio may be 1.5 and the second preset threshold may be 2, it may be determined that the target load ratio is smaller than the second preset threshold, and step 207 may be executed.
206. And the server distributes the object processing tasks to the image processing unit and the central processing unit for processing according to the task level of the object processing tasks and a preset first distribution rule.
Wherein, presetting the first allocation rule may include: and distributing the object processing tasks of the task types corresponding to the first task level to the image processing unit for processing, and distributing the object processing tasks of the task types corresponding to the second task level and the third task level to the central processing unit for processing.
For example, referring to fig. 4, the load ratio of the image processing unit to the central processing unit is greater than a first preset threshold and greater than a second preset threshold when the time periods t5, t6 on the left side are satisfied. During the time period t5, object processing tasks are included: feature 1, preferably 2, registration 3, registration 4, detection 5, wherein feature 1 corresponds to a first task level and detection 5 corresponds to a second task level, preferably 2, registration 3 and registration 4 corresponds to a third task level. And allocating the object processing tasks according to the first allocation rule, namely executing the task scheduling in fig. 4, which is shown in the right side of fig. 4, after the task scheduling, allocating the features 1 and the detections 5 to the image processing unit for processing, and allocating the features 2, the registrations 3 and the registrations 4 to the second task processing unit for processing. After the task scheduling is carried out, the load state of the image processing unit is changed from the fourth load state to the second load state, and the load state of the central processing unit is changed from the first load state to the second load state, so that the balance of task processing of the image processing unit and the central processing unit is ensured, and the task processing efficiency can be improved.
207. And the server distributes the object processing tasks to the image processing unit and the central processing unit for processing according to the task level of the object processing tasks and a preset second distribution rule.
Wherein the presetting of the second allocation rule may include: and allocating the object processing tasks of the task types corresponding to the first task level and the second task level to the image processing unit for processing, and allocating the object processing tasks of the task types corresponding to the third task level to the central processing unit for processing.
For example, referring to fig. 4, the load ratio of the image processing unit to the central processing unit satisfies the condition that the load ratio is greater than the first preset threshold and less than the second preset threshold for the left time periods t4 and t 7. During the time period t4, object processing tasks are included: preferably, 1,2, 3, and 4 are registered, where 1 and 2 are preferably corresponding to a third task level, and 3 and 4 are corresponding to a second processing level, and the object processing tasks are allocated according to a second allocation rule, that is, the task scheduling in fig. 4 is executed, as shown in the right side of fig. 4, after the task scheduling is performed, 1 and 2 are preferably allocated to the image processing unit for processing, and 3 and 4 are allocated to the central processing unit for processing. After the task scheduling is carried out, the load state of the image processing unit is changed from the third load state to the second load state, and the load state of the central processing unit is changed from the first load state to the second load state.
208. And the server distributes the object processing tasks to the image processing unit and the central processing unit for processing according to the task level of the object processing tasks and a preset third distribution rule.
Wherein, presetting the third distribution rule may include: and distributing the object processing tasks of the task types corresponding to the first task level, the second task level and the third task level to the image processing unit for processing.
For example, referring to fig. 4, in the left time period t8, the condition is satisfied that the load ratio of the image processing unit to the central processing unit is less than the first preset threshold. During the time period t8, object processing tasks are included: and searching 3, searching 4 and characteristics 5, wherein the searching 3, the searching 4 and the characteristics 5 correspond to the first task level, the object processing tasks are distributed according to a second distribution rule, namely, the task scheduling in fig. 4 is executed, and after the task scheduling is carried out, the searching 3, the searching 4 and the characteristics 5 are distributed to the image processing unit for processing, which is shown in the right side of fig. 4. After the task scheduling is carried out, the load state of the image processing unit is kept in the second load state, and when the load state of the image processing unit is normal, the task processing efficiency is improved.
In some embodiments, when the type of object processing task corresponding to the first task level is processed in the image processing unit, the sub-task queues corresponding to the object processing tasks may be obtained, the sub-tasks of the object processing tasks of the same type are aggregated, after the sub-tasks of the preset number are aggregated, the sub-tasks of the preset number are subjected to matrix aggregation to obtain an aggregated target sub-task queue, and then the target sub-task queue is processed.
Referring to fig. 5, fig. 5 is a schematic view of a subtask aggregation of a task processing method according to an embodiment of the present disclosure. The object processing Task may correspond to a sub-Task queue, and the sub-Task queue may include a plurality of sub-tasks, for example, the plurality of sub-tasks may be Task1 to Task n.
After the subtask queue is obtained, whether the number of subtasks in the subtask queue meets a preset number is judged, the preset number may be the maximum number of subtasks for task aggregation, and the preset number may be 10, which is not limited herein.
And if the number of the subtasks in the subtask queue meets the preset number, performing aggregation processing on all the subtasks in the subtask queue. Wherein, performing aggregation processing on all subtasks in the subtask queue may include: a plurality of data vectors corresponding to each subtask, that is, the data vectors shown in fig. 5 are u11.. u 1.. Unm, where the data vectors may represent different data to be processed in each subtask, and the data vectors are subjected to matrix aggregation, so that a data matrix subjected to aggregation processing may be obtained. A feature matrix corresponding to each data vector, that is, the feature matrix shown in fig. 5 is obtained, that is, the feature matrix may be represented as data information of the data vector, and the like, and the feature matrices of all the data vectors are subjected to matrix aggregation processing, so that the feature matrix after the aggregation processing may be obtained.
Further, after matrix aggregation processing is performed, a shared database corresponding to a subtask in the subtask queue may be obtained, the shared database, the aggregated data matrix, and the aggregated feature matrix are loaded to a video memory bandwidth of the image processing unit, and then the image processing unit performs calculation processing on the subtask queue of the target processing task. The shared database, that is, the video memory data shown in fig. 5, includes various calculation data used for processing tasks.
In some embodiments, if the number of the subtasks in the subtask queue is less than the preset number, the aggregation waiting time may be set, and the aggregation waiting time may be 20 ms. For example, the number of the subtasks in the subtask queue may be 5, and the preset number may be 10, and after determining that the number of the subtasks in the subtask queue is less than the preset number, the method may wait for the set aggregation time to wait for another subtask having the same attribute as the subtask in the subtask queue. If the number of the subtasks reaches the preset number before the waiting time is set, the waiting is stopped, and all the subtasks are aggregated, and the specific aggregation processing step is referred to the above description and is not described herein. If the number of the subtasks does not reach the preset number when the set waiting time is reached, the waiting is stopped, and the obtained subtasks are subjected to aggregation processing, and the specific aggregation processing step is referred to the above description and is not described herein. By aggregating a plurality of subtasks in the object processing task, the difference between the video memory bandwidth and the computational power in the processing unit can be reduced, and the task processing efficiency is improved.
The method comprises the steps of determining at least one object processing task of a target image; determining a task type of each object processing task; determining the task level of the object processing task according to the task type; acquiring first load information of an image processing unit and second load information of a central processing unit; determining a target load ratio based on the first load information and the second load information; when the target load ratio is larger than a preset threshold, determining a task level range to be processed by the image processing unit according to a threshold interval where the target load ratio is located; and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing. Therefore, the load ratio formed among different processing units is monitored by determining the task type of the object processing task, and when the load ratio reaches a certain condition, the object processing tasks of different task levels are distributed to different processing units according to the load ratio for parallel processing, so that the processing pressure of the processing units is relieved, and the task processing efficiency is greatly improved.
In order to better implement the task processing method provided by the embodiments of the present application, embodiments of the present application further provide a task processing device based on the task processing method. The terms are the same as those in the above task processing method, and details of implementation may refer to the description in the method embodiment.
Referring to fig. 6, fig. 6 is a block diagram of a first task processing device according to an embodiment of the present disclosure, where the task processing device may be applied to a terminal such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and a cloud server providing a cloud service, a cloud database, a cloud computing, a cloud function, a cloud storage, a Network service, a cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a basic cloud computing service such as a big data and artificial intelligence platform, and the device includes:
a first determination unit 301 for determining at least one object processing task of a target image;
a second determining unit 302, configured to determine a task type of each of the object processing tasks;
a third determining unit 303, configured to determine a task level of the object processing task according to the task type;
an acquisition unit 304 configured to acquire first load information of an image processing unit and second load information of a central processing unit, wherein the image processing unit has a higher processing priority for an object processing task than the central processing unit;
a fourth determining unit 305 configured to determine a target load ratio based on the first load information and the second load information;
a fifth determining unit 306, configured to determine, when the target load ratio is greater than a preset threshold, a task level range that needs to be processed by the image processing unit according to a threshold interval where the target load ratio is located;
the allocating unit 307 is configured to allocate the object processing task whose task level is within the task level range to the image processing unit for processing, and allocate the object task whose task level is not within the task level range to the central processing unit for processing.
In some embodiments, please refer to fig. 7, and fig. 7 is a block diagram illustrating a second task processing device according to an embodiment of the present disclosure. The distribution unit 307 may include:
the obtaining subunit 3071, configured to obtain a subtask queue corresponding to the object processing task, where the subtask queue includes multiple subtasks;
the aggregation subunit 3072 is configured to perform task aggregation on the subtasks with the same subtask attribute, and generate a target subtask queue after the task aggregation;
the processing subunit 3073 is configured to perform matrix aggregation on the data matrix and the feature matrix of the target subtask queue, and process the target subtask queue after matrix aggregation through the image processing unit.
In some embodiments, please refer to fig. 8, and fig. 8 is a block diagram illustrating a third task processing device according to an embodiment of the present disclosure. The distribution unit 307 may further include:
the first execution subunit 3074, configured to, when detecting that the number of the subtasks in the target subtask queue satisfies the preset number, execute a step of performing matrix aggregation on the data matrix and the feature matrix of the target subtask queue;
the second executing subunit 3075 is configured to, when it is detected that the number of the subtasks in the target subtask queue does not meet the preset number, wait for the subtasks with the same subtask attribute to perform task aggregation within a preset time, and perform a step of performing matrix aggregation on the data matrix and the feature matrix of the target subtask queue.
In some embodiments, the processing subunit 3073, may be configured to: acquiring a plurality of data vectors corresponding to each subtask in a target subtask queue and a feature matrix corresponding to each data vector; performing aggregation processing on the plurality of data vectors to obtain a data matrix; and aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix.
In some embodiments, the processing subunit 3073, may be configured to: acquiring a plurality of data vectors corresponding to each subtask in a target subtask queue and a feature matrix corresponding to each data vector; performing aggregation processing on the plurality of data vectors to obtain a data matrix; aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix; acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue; and transmitting the shared database, the data matrix and the target characteristic matrix to the image processing unit through a video memory bandwidth, and processing the data matrix and the target characteristic matrix based on the shared database.
In some embodiments, the processing subunit 3073, may be configured to: acquiring a plurality of data vectors corresponding to each subtask in a target subtask queue and a feature matrix corresponding to each data vector; performing aggregation processing on the plurality of data vectors to obtain a data matrix; aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix; acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue;
and transmitting the shared database and the target subtask queue after matrix aggregation to the image processing unit through a video memory bandwidth, and processing the target subtask queue after matrix aggregation based on the shared database.
In some embodiments, please refer to fig. 9, and fig. 9 is a block diagram illustrating a fourth task processing device according to an embodiment of the present disclosure. The fifth determining unit 306 may include:
a dividing subunit 3061, configured to divide the threshold interval greater than the preset threshold into a first threshold interval and a second threshold interval, where the second threshold interval is greater than the first threshold interval;
a first determining subunit 3062, configured to determine that the task level range that needs to be processed by the image processing unit includes a first task level and a second task level if the threshold interval in which the target load ratio is located is the first threshold interval;
a second determining subunit 3063, configured to determine that the task level range that needs to be processed by the image processing unit includes the first task level if the threshold interval in which the target load ratio is located is the second threshold interval.
In some embodiments, please refer to fig. 10, and fig. 10 is a block diagram illustrating a fifth task processing device according to an embodiment of the present disclosure. The fourth determining unit 305 may include:
a calculating subunit 3051, configured to perform ratio calculation on the first load information and the second load information to obtain a target load ratio between the first load information and the second load information.
In some embodiments, please refer to fig. 11, and fig. 11 is a block diagram illustrating a structure of a sixth task processing device according to an embodiment of the present disclosure. The task processing device may further include:
a processing unit 308, configured to, when the target load ratio is smaller than a preset threshold, allocate all object processing tasks to the image processing unit for processing.
The embodiment of the application discloses a task processing device, which determines at least one object processing task of a target image through a first determining unit 301; a second determining unit 302 that determines a task type of each object processing task; a third determining unit 303, configured to determine a task level of the object processing task according to the task type; an acquisition unit 304 that acquires first load information of an image processing unit and second load information of a central processing unit, wherein the image processing unit has a higher processing priority for an object processing task than the central processing unit; a fourth determination unit 305 that determines a target load ratio based on the first load information and the second load information; a fifth determining unit 306, configured to determine a task level range to be processed by the image processing unit according to a threshold interval where the target load ratio is located when the target load ratio is greater than a preset threshold; the allocating unit 307 allocates the object processing task whose task level is within the task level range to the image processing unit for processing, and allocates the object task whose task level is not within the task level range to the central processing unit for processing. Therefore, the task level is obtained by determining the task type of the object processing task, the load ratio formed among different processing units is monitored, when the load ratio reaches a certain condition, the object processing tasks of different levels are distributed to different processing units according to the load ratio for parallel processing, the processing pressure of the processing units is relieved, and the task processing efficiency is greatly improved.
The embodiment of the present application further provides a computer device, as shown in fig. 12, which shows a schematic structural diagram of a server according to the embodiment of the present application, and specifically:
the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 12 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; optionally, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The computer device further comprises a power supply 403 for supplying power to the respective components, and optionally, the power supply 403 may be logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The computer device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, so as to implement the various method steps provided by the foregoing embodiments, as follows:
determining at least one object processing task for the target image;
determining a task type of each of the object processing tasks;
determining the task level of the object processing task according to the task type;
acquiring first load information of an image processing unit and second load information of a central processing unit, wherein the processing priority of the image processing unit to an object processing task is higher than that of the central processing unit;
determining a target load ratio based on the first load information and the second load information;
when the target load ratio is larger than a preset threshold, determining a task level range needing to be processed by the image processing unit according to a threshold interval where the target load ratio is located;
and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing.
The embodiment of the application discloses a task processing method and device and a computer readable storage medium. The method comprises the steps of determining at least one object processing task of a target image; determining a task type of each object processing task; determining the task level of the object processing task according to the task type; acquiring first load information of an image processing unit and second load information of a central processing unit; determining a target load ratio based on the first load information and the second load information; when the target load ratio is larger than a preset threshold, determining a task level range to be processed by the image processing unit according to a threshold interval where the target load ratio is located; and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing. Therefore, the load ratio formed among different processing units is monitored by determining the task type of the object processing task, and when the load ratio reaches a certain condition, the object processing tasks of different task levels are distributed to different processing units according to the load ratio for parallel processing, so that the processing pressure of the processing units is relieved, and the task processing efficiency is greatly improved.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the task processing methods provided in the present application. For example, the instructions may perform the steps of:
determining at least one object processing task for the target image; determining a task type of each object processing task; determining the task level of the object processing task according to the task type; acquiring first load information of an image processing unit and second load information of a central processing unit, wherein the processing priority of the image processing unit to an object processing task is higher than that of the central processing unit; determining a target load ratio based on the first load information and the second load information; when the target load ratio is larger than a preset threshold, determining a task level range to be processed by the image processing unit according to a threshold interval where the target load ratio is located; and distributing the object processing tasks with the task levels within the task level range to the image processing unit for processing, and distributing the object tasks with the task levels not within the task level range to the central processing unit for processing.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the computer-readable storage medium can execute the steps in any task processing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any task processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.
The task processing method, the task processing device, and the computer-readable storage medium provided in the embodiments of the present application are described in detail above, and a specific example is applied in the description to explain the principles and the embodiments of the present application, and the description of the above embodiments is only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (13)

1. A task processing method, comprising:
determining at least one object processing task for the target image;
determining a task type of each of the object processing tasks;
determining the task level of the object processing task according to the task type;
acquiring first load information of an image processing unit and second load information of a central processing unit, wherein the processing priority of the image processing unit to an object processing task is higher than that of the central processing unit;
determining a target load ratio based on the first load information and the second load information;
when the target load ratio is larger than a preset threshold, determining a task level range needing to be processed by the image processing unit according to a threshold interval where the target load ratio is located;
if the priority level of the object processing task is within the task level range, acquiring a subtask queue corresponding to the object processing task, wherein the subtask queue comprises a plurality of subtasks;
task aggregation is carried out on subtasks with the same subtask attribute, and a target subtask queue after the task aggregation is generated;
performing matrix aggregation on the data matrix and the characteristic matrix of the target subtask queue, and processing the target subtask queue after matrix aggregation through the image processing unit;
and if the priority level of the object processing task is not in the task level range, distributing the object processing task to the central processing unit for processing.
2. The method of claim 1, further comprising, after the step of generating a task aggregated target subtask queue:
when detecting that the number of the subtasks of the target subtask queue meets a preset number, executing a step of performing matrix aggregation on a data matrix and a feature matrix of the target subtask queue;
and when detecting that the number of the subtasks of the target subtask queue does not meet the preset number, waiting for the subtasks with the same subtask attribute to perform task aggregation within preset time, and performing a step of performing matrix aggregation on the data matrix and the feature matrix of the target subtask queue.
3. The method of claim 2, wherein the step of matrix aggregating the data matrix and the feature matrix of the target subtask queue comprises:
acquiring a plurality of data vectors corresponding to each subtask in a target subtask queue and a feature matrix corresponding to each data vector;
performing aggregation processing on the plurality of data vectors to obtain a data matrix;
and aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix.
4. The method according to claim 3, wherein the step of processing the matrix aggregated target subtask queue by the image processing unit comprises:
acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue;
and transmitting the shared database, the data matrix and the target characteristic matrix to the image processing unit through a video memory bandwidth, and processing the data matrix and the target characteristic matrix based on the shared database.
5. The method according to claim 1, wherein the step of processing the matrix aggregated target subtask queue by the image processing unit comprises:
acquiring a shared database corresponding to the target subtask queue according to the subtask attribute in the target subtask queue;
and transmitting the shared database and the target subtask queue after matrix aggregation to the image processing unit through a video memory bandwidth, and processing the target subtask queue after matrix aggregation based on the shared database.
6. The method of claim 1, wherein the task levels include a first task level, a second task level, and a third task level;
the step of determining the task level range to be processed by the image processing unit according to the threshold interval where the target load ratio is located includes:
dividing a threshold interval which is larger than the preset threshold into a first threshold interval and a second threshold interval, wherein the second threshold interval is larger than the first threshold interval;
if the threshold interval in which the target load ratio is located is the first threshold interval, determining that the task level range needing to be processed by the image processing unit comprises a first task level and a second task level;
and if the threshold interval in which the target load ratio is located is the second threshold interval, determining that the task level range needing to be processed by the image processing unit comprises a first task level.
7. The method of claim 1, wherein the step of determining a target load ratio based on the first load information and the second load information comprises:
and calculating the ratio of the first load information to the second load information to obtain a target load ratio between the first load information and the second load information.
8. The method of any one of claims 1-6, further comprising:
and when the target load ratio is smaller than a preset threshold value, distributing all object processing tasks to the image processing unit for processing.
9. A task processing apparatus, comprising:
a first determination unit for determining at least one object processing task of a target image;
a second determining unit configured to determine a task type of each of the object processing tasks;
a third determining unit, configured to determine a task level of the object processing task according to the task type;
an acquisition unit configured to acquire first load information of an image processing unit and second load information of a central processing unit, wherein the image processing unit has a higher processing priority for an object processing task than the central processing unit;
a fourth determining unit configured to determine a target load ratio based on the first load information and the second load information;
a fifth determining unit, configured to determine, when the target load ratio is greater than a preset threshold, a task level range that needs to be processed by the image processing unit according to a threshold interval in which the target load ratio is located;
the first allocation unit is used for acquiring a subtask queue corresponding to the object processing task if the priority level of the object processing task is within the task level range, wherein the subtask queue comprises a plurality of subtasks;
the aggregation unit is used for aggregating the subtasks with the same subtask attribute to generate a target subtask queue after task aggregation;
the processing unit is used for carrying out matrix aggregation on the data matrix and the characteristic matrix of the target subtask queue and processing the target subtask queue after the matrix aggregation through the image processing unit;
and the second allocation unit is used for allocating the object processing task to the central processing unit for processing if the priority level of the object processing task is not in the task level range.
10. The apparatus of claim 9, further comprising:
the first execution unit is used for executing the step of carrying out matrix aggregation on the data matrix and the characteristic matrix of the target subtask queue when detecting that the number of the subtasks of the target subtask queue meets the preset number;
and the second execution unit is used for waiting for subtasks with the same subtask attribute to perform task aggregation within preset time and executing the step of performing matrix aggregation on the data matrix and the feature matrix of the target subtask queue when detecting that the number of the subtasks of the target subtask queue does not meet the preset number.
11. The apparatus of claim 9, wherein the processing unit comprises:
the first acquiring subunit is used for acquiring a plurality of data vectors corresponding to each subtask in the target subtask queue and a feature matrix corresponding to each data vector;
the first processing subunit is used for carrying out aggregation processing on the plurality of data vectors to obtain a data matrix;
and the second processing subunit is used for aggregating the plurality of characteristic matrixes to obtain a target characteristic matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target characteristic matrix.
12. The apparatus of claim 9, wherein the processing unit comprises:
the second acquiring subunit is used for acquiring a plurality of data vectors corresponding to each subtask in the target subtask queue and a feature matrix corresponding to each data vector;
the first aggregation subunit is used for aggregating the plurality of data vectors to obtain a data matrix;
the second aggregation subunit is used for aggregating the plurality of feature matrices to obtain a target feature matrix, and obtaining a target subtask queue after matrix aggregation according to the data matrix and the target feature matrix;
the third obtaining subunit is configured to obtain, according to the subtask attribute in the target subtask queue, a shared database corresponding to the target subtask queue;
and the transmission subunit is used for transmitting the shared database, the data matrix and the target characteristic matrix to the image processing unit through a video memory bandwidth, and processing the data matrix and the target characteristic matrix based on the shared database.
13. A computer-readable storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor for performing the steps of the task processing method according to any one of claims 1 to 8.
HK42020017548.7A2020-10-10Method and apparatus for processing task, and computer-readable storage mediumHK40027310B (en)

Publications (2)

Publication NumberPublication Date
HK40027310A HK40027310A (en)2021-01-22
HK40027310Btrue HK40027310B (en)2021-04-30

Family

ID=

Similar Documents

PublicationPublication DateTitle
CN111506434B (en)Task processing method and device and computer readable storage medium
CN112764936B (en)Edge calculation server information processing method and device based on deep reinforcement learning
US9479358B2 (en)Managing graphics load balancing strategies
CN104794194B (en)A kind of distributed heterogeneous concurrent computational system towards large scale multimedia retrieval
CN111614769B (en) A behavioral intelligent analysis engine system and control method of deep learning technology
US11307898B2 (en)Server resource balancing using a dynamic-sharing strategy
CN108776934A (en)Distributed data computational methods, device, computer equipment and readable storage medium storing program for executing
CN115033340A (en)Host selection method and related device
CN117785465A (en)Resource scheduling method, device, equipment and storage medium
CN109840597B (en)Model prediction method and device, electronic equipment and storage medium
CN115237595A (en)Data processing method, data processing device, distribution server, data processing system, and storage medium
CN117193992A (en)Model training method, task scheduling device and computer storage medium
CN118394592A (en) A Paas platform based on cloud computing
BiçiciA cloud monitor to reduce energy consumption with constrained optimization of server loads
CN119396413B (en)Model deployment scheme generation method, model processing method, model deployment scheme generation device, model processing device and electronic equipment
CN115576534A (en) Atomic service orchestration method, device, electronic device and storage medium
CN115080197B (en) Computing task scheduling method, device, electronic device and storage medium
CN118132010B (en)Data storage method and device
CN116680086B (en)Scheduling management system based on offline rendering engine
CN117170886B (en)Continuous learning resource scheduling method and device for large-connection video analysis
CN118034938A (en)Job scheduling method, intelligent computing cloud operating system and computing platform
HK40027310A (en)Method and apparatus for processing task, and computer-readable storage medium
HK40027310B (en)Method and apparatus for processing task, and computer-readable storage medium
CN113138909A (en)Load statistical method, device, storage medium and electronic equipment
CN112905351B (en)GPU and CPU load scheduling method, device, equipment and medium

[8]ページ先頭

©2009-2025 Movatter.jp