Disclosure of Invention
The invention provides a method, a device and electronic equipment for detecting the traffic flow of a bus station, which are characterized in that a camera arranged at the bus station is used for shooting the video of pedestrians in a waiting area, and an improved head detection network is used for detecting the heads of the pedestrians in the waiting area in the video to determine the number of the pedestrians in the waiting area.
In a first aspect, the present invention provides a method for detecting a traffic flow of a bus station, including:
Acquiring a real-time video image shot by a camera arranged at a bus station;
Dividing the real-time video image according to the position of the waiting area in the video picture to obtain the real-time video image of the waiting area;
performing head detection on the real-time video image of the waiting area by using a head detection network based on an anchor free algorithm which is trained in advance to obtain an initial head detection frame, wherein the head detection network based on the anchor free algorithm takes a MobileNet network added with a channel attention module as a main network;
removing the head detection frame which is detected by mistake in the initial head detection frame according to a preset duplication removal rule to obtain a final head detection frame;
and determining the people flow information of the bus station waiting area according to the number of the final people head detection frames.
In an optional embodiment, the dividing the real-time video image according to the position of the waiting area in the video frame to obtain the real-time video image of the waiting area includes:
Configuring image segmentation parameters according to the position of the waiting area in the video picture;
and dividing the real-time video image according to the image dividing parameters to obtain a real-time video image of the waiting area.
In an optional embodiment, the removing the head detection frame of the false detection in the initial head detection frame according to a preset duplication removal rule to obtain a final head detection frame includes:
Removing overlapped human head detection frames in the initial human head detection frames by adopting a non-maximum suppression method to obtain screened human head detection frames;
Classifying the screened head detection frames according to the height information of the head detection frames in the image to obtain head detection frame lists with different heights, and calculating the average areas of the head detection frames in the head detection frame lists with different heights;
Calculating the ratio of the absolute value of the difference between the area of each head detection frame and the average area of the head detection frames in the head detection frame list with different heights to the average area of the head detection frames;
judging whether the ratio of the absolute value of the difference between the area of each head detection frame and the average area of the head detection frames in the head detection frame list with different heights to the average area of the head detection frames is larger than a preset threshold value or not;
If yes, deleting the corresponding human head detection frame.
Further, before the capturing of the real-time video image captured by the camera installed at the bus station, the method further includes:
acquiring human head sample data and human head labeling data;
Processing the human head sample data by using a mosaic data enhancement method to obtain human head training data;
And training the constructed human head detection network by utilizing the human head training data and the human head marking data to obtain the trained human head detection network.
Further, the process of training the constructed human head detection network by utilizing the human head training data comprises the following rule that when the anchor frame anchor is matched, the distance between the center point of the anchor frame anchor and the center point of the marking frame group-truth and the intersection ratio IOU of the anchor frame anchor and the marking frame group-truth are calculated, and the positive sample and the negative sample of the anchor frame anchor are determined according to the preset super-parameters, the distance between the center point of the anchor frame anchor and the center point of the marking frame group-truth and the intersection ratio IOU of the anchor frame anchor and the marking frame group-truth.
In an alternative embodiment, the method further comprises:
And sending the people flow information to a bus management platform to schedule the bus number in real time.
In a second aspect, the present invention provides a traffic flow detection device for a bus station, including:
The acquisition module is used for acquiring real-time video images shot by cameras erected at the bus station;
the segmentation module is used for segmenting the real-time video image according to the position of the waiting area in the video picture to obtain the real-time video image of the waiting area;
The detection module is used for carrying out head detection on the real-time video image of the waiting area by utilizing a head detection network based on an anchor free algorithm which is trained in advance to obtain an initial head detection frame, wherein the head detection network based on the anchor free algorithm takes a MobileNet network added with a channel attention module as a main network;
the screening module is used for removing the false-detected head detection frame according to the initial head detection frame and a preset duplication removal rule to obtain a final head detection frame;
and the determining module is used for determining the traffic information of the bus station waiting area according to the number of the final head detection frames.
In an alternative embodiment, the device further comprises a sending module;
and the people flow information is used for sending the people flow information to a bus management platform so as to schedule the bus number in real time.
In a third aspect, the present invention provides an electronic device comprising at least one processor and a memory;
The memory stores computer-executable instructions;
The at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the people flow detection method of any one of the first aspects.
The method, the device and the electronic equipment for detecting the traffic flow of the bus station comprise the steps of obtaining a real-time video image shot by a camera erected on the bus station, dividing the real-time video image according to the position of a waiting area in a video image to obtain the real-time video image of the waiting area, detecting the traffic flow of the waiting area by using a pre-trained traffic detection network based on an anchor free algorithm to obtain an initial traffic detection frame, wherein the traffic detection network based on the anchor free algorithm takes a MobileNet network with a channel attention module as a main network, removing false detection traffic detection frames according to the initial traffic detection frame and a preset duplicate removal rule to obtain a final traffic detection frame, and determining traffic flow information of the waiting area of the bus station according to the number of the final traffic detection frames. Compared with the prior art, the pedestrian video of the waiting area is shot through the camera arranged at the bus station, the pedestrian quantity of the waiting area can be determined by detecting the pedestrian head in the waiting area in the video through improving the head detection network, the bus station people flow can be determined in real time, data support is provided for bus dispatching, and the accuracy rate of head detection and the bus dispatching efficiency are improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The urban bus dispatching is realized based on traffic people flow, and the traffic people flow is determined at present mainly by counting the people flow when the previous buses run, so that the people flow of each bus route is determined, and then the bus shift is set according to the people flow of each bus route. The method can only set the bus shift according to the people flow of the past bus route, can not know the real-time traffic people flow to automatically adjust the bus shift, and has low bus dispatching efficiency.
In recent years, with the rapid development of computer vision, particularly in the field of deep learning, a bus station is automatically detected by using a big data and deep learning method, and the number of people at the bus station is obtained. The people flow detection technology based on people head detection can realize real-time automatic detection of the people flow of the bus station, so that the method has important significance for improving the dispatching efficiency of the bus.
Fig. 1 is a schematic diagram of a scene architecture on which the present disclosure is based, as shown in fig. 1, and the scene architecture on which the present disclosure is based may include a people flow detection device 1 and a camera 2.
The traffic detection device 1 is hardware or software that can interact with the camera 2 via a network, and can be used to perform the traffic detection method described in each of the embodiments described below.
When the traffic detection device 1 is hardware, it may be an electronic device having an arithmetic function. When the traffic detection device 1 is software, it may be installed in an electronic device having an arithmetic function. Including but not limited to servers, notebooks, and desktop computers, among others.
The camera 2 may be a hardware device with shooting function, such as a rifle bolt, a ball camera, etc.
In an actual scenario, the traffic detection device 1 may be a service end integrated or installed on the camera 2, the traffic detection device 1 may operate on the camera 2, and the traffic detection device 1 may also be integrated or installed in a server for processing vehicle videos to provide traffic detection service for a bus dispatching system, where the camera 2 may be a device including a rifle bolt, a ball machine, etc. capable of communicating with the traffic detection device 1 and interacting with data through a network. The camera 2 can send the real-time video stream of the bus station to the traffic detection device 1, so that the traffic detection device 1 detects the traffic of the real-time video stream of the bus station by adopting the following method.
The method and the device for detecting the traffic flow of the bus station and the electronic equipment provided by the application are further described below:
Example 1
Fig. 2 is a flow chart of a method for detecting traffic flow at a bus station according to an embodiment of the disclosure. As shown in fig. 2, a method for detecting a traffic flow of a bus station according to an embodiment of the present disclosure includes:
s21, acquiring real-time video images shot by cameras erected at the bus station.
The camera can shoot a waiting area of the bus station.
In this embodiment, the video decoding API may be called to decode the original image data from the real-time video stream data based on the RTSP access to the real-time video stream.
S22, dividing the real-time video image according to the position of the waiting area in the video picture to obtain the real-time video image of the waiting area.
In this embodiment, when preprocessing a real-time video image, only a waiting area portion may be reserved for the case where passengers taking a bus are mainly concentrated in the waiting area, and the real-time video image is segmented according to the position of the waiting area in the video image.
S23, performing head detection on the real-time video image of the waiting area by using a head detection network based on an anchor free algorithm which is trained in advance to obtain an initial head detection frame, wherein the head detection network based on the anchor free algorithm takes a MobileNet network added with a channel attention module as a backbone network.
In this embodiment, the head detection network is constructed based on the anchor free algorithm, and the anchor frame anchor generation mode of the anchor free does not need to use kmean clusters to generate the size of the anchor frame anchor, so that generalization is stronger, in order to consider the accuracy and instantaneity of head detection, the head detection network is realized by adding the attention mechanism on the basis of MobileNet by taking the MobileNet network with the channel attention module as the backbone network, and the advantages of the two are combined, so that the reasoning time is greatly reduced, and the accuracy is improved by focusing on the effective area of the feature map. MobileNet can greatly reduce the calculated amount and improve the network reasoning time under the condition that the network precision loss is ignored. MobileNet is that a depth-separable convolution is designed, the depth-separable convolution is composed of a depth convolution and a point-by-point convolution, the calculated quantity of the depth convolution is DF-DC-DCxCi, the point-by-point convolution is a standard convolution with a convolution kernel of 1*1, and the calculated quantity is Ci*Co -DF. And the calculated amount DF DC Ci*Co of the standard convolution, the ratio of the depth-separable convolution to the standard convolution isWhere DF is the size of the feature map, DC is the size of the convolution kernel, and Ci and Co are the number of channels input and output, respectively. Assuming that the size of the convolution kernel is 3*3, that is, DC is 3, the calculated amount of the depth separable convolution is reduced by more than 9 times, and the corresponding reasoning speed can be increased by about 9 times.
And S24, removing the false head detection frame in the initial head detection frame according to a preset weight removal rule to obtain a final head detection frame.
In this embodiment, for a head detection frame in which false detection may exist in an initial head detection frame, false detection may be removed according to a preset duplication removal rule, where the preset duplication removal rule is set according to the areas of head detection frames with different heights, and the head detection frame with a larger average difference between the head detection frame area and the head detection frame area with the corresponding height is removed.
S25, determining the traffic information of the bus station waiting area according to the number of the final head detection frames.
In this embodiment, the number of final head detection frames at the current moment is counted to obtain the traffic information of the bus station waiting area.
The embodiment provides a traffic flow detection method of a bus station, which comprises the steps of obtaining real-time video images shot by cameras erected on the bus station, dividing the real-time video images according to positions of a waiting area in video images to obtain the real-time video images of the waiting area, carrying out traffic detection on the real-time video images of the waiting area by using a pre-trained traffic detection network based on an anchor free algorithm to obtain an initial traffic detection frame, wherein the traffic detection network based on the anchor free algorithm takes a MobileNet network added with a channel attention module as a backbone network, removing false detection frames in the initial traffic detection frame according to a preset weight removal rule to obtain a final traffic detection frame, and determining traffic flow information of the waiting area of the bus station according to the number of the final traffic detection frame. By adopting the technical scheme provided by the disclosure, the real-time detection of the traffic flow of the bus station is realized, the data support is provided for the bus dispatching, and the accuracy of the traffic flow detection and the bus dispatching efficiency are improved.
Based on the embodiment shown in fig. 2, the embodiment of the disclosure provides a specific method for segmenting a real-time video image, where step S22 in the foregoing embodiment is further described, and S22 includes:
S221, configuring image segmentation parameters according to the position of a waiting area in a video picture;
S232, dividing the real-time video image according to the image dividing parameters to obtain a real-time video image of the waiting area.
In this embodiment, for the situation that the positions of the cameras erected at different buses are different, so that the waiting areas are located at different positions of the video frames, corresponding image segmentation parameters can be configured for different cameras, so that only the images of the waiting areas are reserved in the segmented real-time video images, for example, in the video frames shot by the cameras of the buses 6, the waiting areas are located at the left side of the whole video frames, the real-time video images are segmented from the middle positions, the reserved left half real-time video images are real-time video images of the waiting areas of the buses 6, and in the video frames shot by the cameras of the buses 8, the waiting areas are located at the right third of the whole video frames, the real-time video images are segmented from the right third of the positions, and the reserved right third real-time video images are the real-time video images of the waiting areas of the buses 8.
On the basis of the technical scheme, the specific image segmentation mode is provided, and the advantage of the arrangement of the embodiment is that interference to people head detection in other invalid areas can be avoided, people flow in the target area is only concerned, and people flow of a bus station can be detected more quickly and accurately.
On the basis of the embodiment shown in fig. 2, the embodiment of the disclosure provides a specific method for removing the false head detection frame, and further describes step S24 in the foregoing embodiment, where S24 includes:
s241, removing overlapped human head detection frames in the initial human head detection frames by adopting a non-maximum suppression method to obtain screened human head detection frames;
s242, classifying the screened head detection frames according to the height information of the head detection frames in the image to obtain head detection frame lists with different heights, and calculating the average areas of the head detection frames in the head detection frame lists with different heights.
S243, calculating the ratio of the absolute value of the difference between the area of each human head detection frame and the average area of the human head detection frames in the human head detection frame list with different heights to the average area of the human head detection frames;
s244, judging whether the ratio of the absolute value of the difference between the area of each human head detection frame and the average area of the human head detection frames in the human head detection frame list with different heights to the average area of the human head detection frames is larger than a preset threshold value;
And S245, if yes, deleting the corresponding human head detection frame.
In this embodiment, for the case that there is a head detection frame with false detection in the initial head detection frame, a non-maximum suppression method may be first adopted to remove overlapping head detection frames, then the head detection frames detected in the video are classified according to the height information of the head detection frames in the image, the head detection frames with the same height are put into the same list, the average area of the head detection frames is calculated for the head detection frames in the same list, the difference between the area of each head detection frame and the average area is calculated, the absolute value is taken, the ratio of each absolute value to the average area of the head detection frames is calculated, and if the ratio of the absolute value to the average area of the head detection frames is greater than the preset threshold, the head detection frame corresponding to the absolute value is the head detection frame with false detection, and the above operations are repeated until the calculation of all the lists is completed.
According to the technical scheme, the method for removing the false detection head detection frame is provided on the basis of the technical scheme, and the false detection head detection frame can be removed by the aid of the method, so that the calculated people flow is more accurate according to the false detection head detection frame.
On the basis of the above embodiment, fig. 3 is a schematic flow chart of a method for training a human head detection network according to the first embodiment of the present disclosure, and before the step S21 of obtaining the real-time video image captured by the camera installed at the bus station, the method further includes a training stage of the human head detection network, as shown in fig. 3, including:
s31, acquiring head sample data and head labeling data.
In this embodiment, the head sample data is head image data of different shooting angles, and the corresponding head label data is head frame label data corresponding to the head image data of different shooting angles.
S32, processing the human head sample data by using a mosaics data enhancement method to obtain human head training data.
In this embodiment, for abundant background information, a mosaic data enhancement method is used to combine 4 training pictures into one picture, so that the number of positive samples of the combined picture is greatly increased, the background is also greatly abundant, and each back propagation in the training stage is equivalent to training four picture data, so that the training time is greatly reduced and the detection accuracy is improved.
And S33, training the constructed human head detection network by utilizing the human head training data and the human head marking data to obtain the trained human head detection network.
In the embodiment, human head training data are input into a human head detection network to be trained, human head training data are processed through the human head detection network to be trained to obtain output human head detection frame information, loss function values are calculated according to the output human head detection frame information and human head frame annotation data and are reversely transmitted to each layer of the human head detection network to update weight parameters of each layer according to the loss function values, and the training steps are repeated until the human head detection network converges.
When the network training is carried out, the matching strategy of positive and negative samples of the anchor frame anchor can also have an important influence on the detection result, a new anchor frame anchor matching rule is designed, the intersection ratio IOU of the anchor frame anchor and the marking frame group-truth is used as the basis of the anchor frame anchor matching, the distance between the center point of the anchor frame anchor and the center point of the marking frame group-truth is considered as the basis of the anchor frame anchor matching, and the matching rule is specifically that the distance between the center point of the anchor frame anchor and the center point of the marking frame group-truth and the intersection ratio IOU of the anchor frame anchor and the marking frame group-truth are calculated when the anchor frame anchor is matched, and the positive and negative samples of the anchor frame anchor and the negative samples are determined according to the preset super-parameters and the intersection ratio IOU of the center point of the anchor frame anchor and the marking frame group-truth.
After the traffic information of the bus station waiting area is determined, the bus number can be scheduled according to the traffic of the bus station, and on the basis of the above embodiment, fig. 4 is a schematic flow chart of another traffic detection method of the bus station according to the first embodiment of the disclosure, where the method further includes:
s26, sending the people flow information to a bus management platform to schedule the bus number in real time.
In this embodiment, the traffic information may be sent to the bus management platform in real time through the network, so that the bus management platform adjusts the bus shift according to the traffic information, and realizes bus scheduling.
Example two
Fig. 5 is a schematic structural diagram of a traffic flow detection device of a bus station according to a second embodiment of the present disclosure. For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 5, the device for detecting the traffic of people at a bus station comprises:
the acquisition module 51 is used for acquiring real-time video images shot by cameras erected at the bus station;
The segmentation module 52 is configured to segment the real-time video image according to the position of the waiting area in the video frame, so as to obtain a real-time video image of the waiting area;
the detection module 53 is configured to perform head detection on the real-time video image of the waiting area by using a pre-trained head detection network based on an Anchor free algorithm to obtain an initial head detection frame, where the head detection network based on the Anchor free algorithm uses a MobileNet network with a channel attention module as a backbone network;
The screening module 54 is configured to remove the head detection frame that is misdetected according to the initial head detection frame and a preset duplication removal rule, so as to obtain a final head detection frame;
And the determining module 55 is used for determining the traffic information of the bus station waiting area according to the number of the final head detection frames.
The embodiment provides a traffic detection device of a bus station, which comprises an acquisition module for acquiring real-time video images shot by cameras erected on the bus station, a segmentation module for segmenting the real-time video images according to the positions of a waiting area in a video picture to obtain real-time video images of the waiting area, a detection module for detecting the traffic of the waiting area by using a pre-trained traffic detection network based on an anchor free algorithm to obtain an initial traffic detection frame, wherein the traffic detection network based on the anchor free algorithm takes a MobileNet network with a channel attention module as a backbone network, a screening module for removing false detection traffic detection frames in the initial traffic detection frame according to a preset duplication elimination rule to obtain a final traffic detection frame, and a determination module for determining traffic information of the waiting area of the bus station according to the number of the final traffic detection frames. By adopting the technical scheme provided by the disclosure, the real-time detection of the traffic flow of the bus station is realized, the data support is provided for the bus dispatching, and the accuracy of the traffic flow detection and the bus dispatching efficiency are improved.
Optionally, the dividing module 52 is specifically configured to:
Configuring image segmentation parameters according to the position of the waiting area in the video picture;
and dividing the real-time video image according to the image dividing parameters to obtain a real-time video image of the waiting area.
Optionally, the screening module 54 is specifically configured to:
Removing overlapped human head detection frames in the initial human head detection frames by adopting a non-maximum suppression method to obtain screened human head detection frames;
Classifying the screened head detection frames according to the height information of the head detection frames in the image to obtain head detection frame lists with different heights, and calculating the average areas of the head detection frames in the head detection frame lists with different heights.
Calculating the ratio of the absolute value of the difference between the area of each head detection frame and the average area of the head detection frames in the head detection frame list with different heights to the average area of the head detection frames;
judging whether the ratio of the absolute value of the difference between the area of each head detection frame and the average area of the head detection frames in the head detection frame list with different heights to the average area of the head detection frames is larger than a preset threshold value or not;
If yes, deleting the corresponding human head detection frame.
Optionally, the apparatus further comprises a network training module 56;
The network training module 56 is specifically configured to obtain head sample data and head labeling data, process the head sample data by using a mosaic data enhancement method to obtain head training data, and train the constructed head detection network by using the head training data and the head labeling data to obtain the trained head detection network.
Optionally, the process of training the constructed human head detection network by using the human head training data comprises the following rule that when the anchor frame anchor is matched, the distance between the center point of the anchor frame anchor and the center point of the labeling frame group-truth and the intersection ratio IOU of the anchor frame anchor and the labeling frame group-truth are calculated, and the positive sample and the negative sample of the anchor frame anchor are determined according to the preset super-parameters and the distance between the center point of the anchor frame anchor and the center point of the labeling frame group-truth and the intersection ratio IOU of the anchor frame anchor and the labeling frame group-truth.
Optionally, the apparatus further comprises a transmitting module 57;
the sending module 57 is configured to send the traffic information to a bus management platform to schedule the number of buses in real time.
The product can execute the method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the method.
Example III
Fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present disclosure, and as shown in fig. 6, an electronic device 60 according to the present embodiment may include a memory 61 and a processor 62.
A memory 61 for storing a computer program (such as an application program, a function module, etc. for implementing the above-mentioned method for detecting the flow of people at a bus stop), a computer instruction, etc.;
the computer programs, computer instructions, etc. described above may be stored in one or more of the memories 61 in a partitioned manner. And the above-described computer programs, computer instructions, etc. may be invoked by the processor 62.
A processor 62 for executing the computer program stored in the memory 61 to realize the respective steps in the method according to the above-described embodiment.
Reference may be made in particular to the description of the embodiments of the method described above.
The memory 61 and the processor 62 may be separate structures or may be integrated structures integrated together. When the memory 61 and the processor 62 are separate structures, the memory 61 and the processor 62 may be coupled by a bus 63.
An electronic device of the present embodiment may execute the technical solution in the method of the first embodiment, and specific implementation processes and technical principles thereof are described in the related descriptions in the method of the first embodiment, which are not repeated herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some ports, devices or units, and may be in electrical, mechanical or other forms. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.