Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an airport passenger flow distribution prediction system based on an airport WIFI AP record and a flight scheduling record, and aims to solve the problem of airport passenger flow distribution prediction. The method can carry out the prior planning and arrangement according to the predicted passenger space-time distribution, thereby achieving the purposes of more effectively utilizing airport resources and better airport service.
The technical scheme for solving the technical problems is to provide an airport passenger flow distribution prediction method based on WIFI AP (wireless device access number) records, which comprises the following steps: acquiring WIFI AP records from a control center, preprocessing the WIFI AP records, classifying the WIFI APs according to the number of access devices of the WIFI AP records, respectively constructing training sample sets for the WIFI APs, and constructing a regression model by using the training sample sets; and constructing a test sample set and predicting airport passenger flow distribution.
The preprocessing operation specifically comprises the steps of carrying out missing value processing on the obtained WIFI AP records, and filling missing data of a certain WIFIAP by using the average value of the connection number of the equipment of the WIFI AP at the moment recorded in the preset number of days D corresponding to the missing data; smoothing the filled data by using an ARMA (autoregressive moving average) model, then processing dirty data, and processing the WIFI AP data subjected to the dirty data processing according to a formula:
calculating the equipment connection number of the ith time period after the WIFI AP protocol is finished, and carrying out protocol on the WIFI AP connection number by using the average value in a unit of a preset time period T, wherein x
ijThe number of the devices connected at the jth moment of the ith time period of the WIFI AP is the number of the devices connected at the jth moment of the ith time period of the WIFI AP.
The classifying of the WIFI APs specifically comprises the steps of calculating the variance of the connection number of each WIFI AP, sorting the WIFI APs from large to small according to the variance, and then dividing the WIFI APs into two types by using a twenty-eight rule, wherein the WIFI APs with smaller variance are first type WIFI APs, and the WIFI APs with larger variance are second type WIFI APs.
And for the first-class WIFI AP, data of the last preset days D are taken, and a first-class WIFI AP training set is established.
And for the second type of WIFI AP, data of the last preset days D are taken, and a second type of training set is constructed through label extraction and feature extraction. The label and the feature are two parts forming the sample, the feature is the expression of each attribute of the sample, and the label is the attribute with marking behavior for the sample. By means of the features and the labels, a sample is formed.
The method for constructing the second class training set comprises the following steps: taking the device connection number y of the WIFI AP with the number of i at the time j to form a sample x (i, j, F, y), wherein F is the characteristic of the sample and comprises 3 parts of sub-characteristics: (1) history characteristics: and respectively calculating the average value, the minimum value, the maximum value and the variance information of the WIFI AP at the same time in units of days for the same time of the WIFI AP. (2) Flight characteristics: according to the boarding gate position information recorded by the flight scheduling, the takeoff number of the airplanes in the boarding gate position within a preset time period (within 10 minutes, 30 minutes, 60 minutes and 120 minutes) is counted, and the data are merged after the departure number is associated with the position information of the WIFI AP. (3) Acquiring position characteristics: the method comprises the area where the WIFI AP is located, the floor where the WIFI AP is located, the group number where the WIFI AP is located and the coordinate information of the WIFI AP.
For the first type WIFI AP, using a first type WIFI AP training set according to a formula
Calculating the predicted value y of the WIFI AP with the number i at the moment j
ijConstructing a first-class WIFI AP regression model
Wherein x is
ijkAnd set1 is a first WIFI AP set, wherein the number of the connected devices of the WIFI AP numbered i at the time of j on the kth day is shown as the number of the connected devices of the WIFI AP numbered i. According to a first class model Y
1And predicting to obtain the equipment connection quantity of the first-class WIFI AP.
For the second type of WIFI AP, the variance of the connection number of the equipment is high. For the WIFI AP, label extraction is carried out according to data of the latest preset days D before the forecast day, feature extraction is carried out to obtain a second training sample set, and a formula y
ij=h(x
ij) Calculating a predicted value y of the WIFI AP with the number i at the moment j
ij,Construction of a second type regression model
Wherein x is
ijFor predicting the sample, the obtaining method of the prediction sample is the same as that of the sample of the second type of training set, the label of the sample is configured to be null, set2 is a second type of WIFI AP set, and the h function is a GBDT regression model based on the optimal leaf splitting and trained by using the second type of training set. Using a second type of model Y
2The prediction is carried out in such a way that,and obtaining the equipment connection quantity of the second type of WIFI AP.
According to the formula Y ═ Y1∪Y2And integrating the first type model and the second type model. And integrating the prediction result of the first type model and the prediction result of the second type model to serve as a final prediction result. And the prediction result is the equipment access number of each WIFI AP at each moment in the prediction time, and the information such as the people flow number, the people flow density and the like of the area where each WIFI AP is located is obtained through the equipment access number of each WIFI AP.
According to the method, through the characteristic of long tail effect after the variance of the connection quantity of the WIFI APs is sorted, the WIFI AP points are divided into two types by using a principle of twenty-eight, the two types of WIFI APs are respectively modeled, and compared with a method established on single model prediction, the method is more accurate in prediction result.
Detailed Description
The technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, rather than all embodiments, and the technical solutions and the scope of the claims of the present invention cannot be limited thereby. All other embodiments that may be made available to a person skilled in the art without the inventive step are within the scope of protection of the present application.
Fig. 1 is a flowchart of an airport passenger flow distribution prediction method based on an airport WIFI AP record and a flight scheduling record provided by the present invention, which specifically includes:
obtain WIFI AP record and flight scheduling record from control center, general WIFI AP record contains three rows, and the first row is WIFI AP's label, and the scheduling record contains four rows, does respectively: flight number, etc. A record of the last predetermined number of days D (e.g., 30 days) is selected. The WIFI AP record comprises three columns, the first column is a label of the WIFI AP, the inherent information of the WIFI AP is contained, the area where the WIFI AP is located, the floor where the WIFI AP is located, the group number where the WIFI AP is located and the coordinate information of the WIFI AP are mainly contained, the second column is the equipment connection number of the WIFI AP, and the third column is a time stamp. The flight scheduling record comprises four columns which are respectively: flight number, scheduled take-off and landing time, actual take-off and landing time, and gate information.
And carrying out missing value processing on the obtained WIFI AP record and flight scheduling record. And for the missing data of a certain WIFI AP, filling the missing data by using the relevant numerical values corresponding to the average value of the connection number of the devices at the corresponding moment recorded by the WIFI AP on the last preset days of the missing data.
And carrying out dirty data processing on the WIFI AP record subjected to missing value processing. The data were smoothed using the ARMA model. For each WIFI AP, the equipment access number of the WIFI AP in continuous time is input, the equipment access number of the WIFI AP in continuous time processed by the ARMA model is output, and the output data has the characteristic that the equipment access number of each WIFI AP point changes more smoothly along with the time compared with the input data. And carrying out data protocol on the WIFI AP data subjected to the dirty data processing. The number of WIFI AP connections is reduced by an average value in a unit of a predetermined time period T (e.g., 10 minutes), that is, one piece of data is generated per time period T. According to the formula
Calculating the equipment connection number r of the ith preset time period after the WIFI AP protocol
iWherein x is
ijThe number of the devices connected for the jth minute of the ith predetermined time period of a certain WIFI AP is obtained.
And for each WIFI AP, calculating the variance of the connection number of the equipment, sorting the WIFI APs from large to small according to the variance, and dividing the WIFI APs into two categories by using a two-eight rule. The WIFI AP with the smaller variance is the first type WIFI AP, and the WIFI AP with the larger variance is the second type WIFI AP. The variance calculation method comprises the following steps: and taking a sequence formed by the equipment access quantity of a certain WIFI AP at each time, and calculating the variance of the sequence to be used as the variance of the WIFI AP. The two-eight rule division method comprises the following steps: and taking the WIFI AP with the larger variance of the first 20% as the second type WIFI AP, and taking the WIFI AP with the smaller variance of the last 80% as the first type WIFIAP.
For a first-class WIFI AP, data of the last preset number of days D are taken, a first-class WIFI AP training set is established, the training set is composed of a plurality of samples x (i, j, y), wherein i is the number of the WIFI AP, j is a certain moment, and y is the equipment connection number of the WIFI AP with the number i at the moment j.
And for the second-class WIFI AP, extracting the tags by using the data of the latest preset days D before the forecast date, wherein the tags are the equipment connection number of the WIFI AP at a certain moment, and extracting the characteristics of the second-class WIFI AP. And performing feature extraction according to the acquired data, wherein the acquired data comprises a WIFI AP record and a flight record.
The method is characterized by comprising 3 parts:
(1) history characteristics: and respectively calculating the average value, the minimum value, the maximum value and the variance information of the WIFI AP at the same time in units of days for the same time of the WIFI AP.
(2) Flight characteristics: the flight is one of main factors influencing the fluctuation of the connection number, the number of flights taking off and landing in each preset time interval at the position of the gate is counted according to the gate position information of the flight, and the data are combined after the gate position information is associated with the position information of the WIFI AP to obtain flight characteristics.
(3) Position characteristics: the method comprises the area where the WIFI AP is located, the floor where the WIFI AP is located, the group number where the WIFI AP is located and the coordinate information of the WIFI AP.
For the first type WIFI AP, according to the first type WIFI AP training set and the formula
Calculating the device connection number y of the WIFI AP with the number i at the moment j
ijConstructing a first-class WIFI AP regression model
Wherein x is
ijkAnd set1 is a first WIFI AP set, wherein the number of the connected devices of the WIFI AP numbered i at the time of j on the kth day is shown as the number of the connected devices of the WIFI AP numbered i. According to the first classModel Y
1Performing prediction with the result of P
1. Predicted result P
1The predicted device connection number of a certain WIFI AP at a certain time is obtained.
For the second type of WIFI AP, the variance of the connection number of the equipment is high. For such WIFI APs, according to the formula y
ij=h(x
ij) Calculating a predicted value y of the WIFI AP with the number i at the moment j
ij,Construction of a second type regression model
Wherein x is
ijFor the test samples, set2 is a second type WIFI AP set, and the h-function is an optimal leaf split based GBDT regression model trained using the second type training set. Using a second type of model Y
2Performing prediction with the result of P
2。P
2The number of device connections of a certain WIFI AP is obtained according to the second type WIFI AP training set.
The training set is a second type training set, namely a training sample set formed by a second type WIFI AP, the training method comprises the steps of inputting the training set, constructing a prediction model through a GBDT algorithm, then inputting the prediction set, and predicting through the constructed GBDT model, wherein one sample (record) consists of features and labels, one group of features corresponds to one label, and the labels are the equipment connection number of the WIFI AP.
According to the formula Y ═ Y1∪Y2And integrating the first type model and the second type model.
According to the formula P ═ P1∪P2And integrating the prediction result of the first type model and the prediction result of the second type model to serve as a final prediction result. And the prediction result is the equipment access number of each WIFI AP at each moment.