Spectroscopy chromaticity online measurement method for urban drinking waterTechnical Field
The invention relates to a chromaticity measurement technology of urban drinking water, in particular to a spectral method chromaticity online measurement method for urban drinking water.
Background
The urban drinking water is drinking water and domestic water for urban residents, and is particularly critical to online measurement of the urban drinking water. The water quality chromaticity is an index for measuring the color of natural water or various types of treated water in a definite manner, and is one of sensory indexes for water. The water is a colorless, odorless and tasteless transparent liquid, and when some substances (such as some soluble organic substances, part of inorganic ions, colored suspended particles and the like) exist in the water, the water may become colored, that is, the water may appear a certain color, that is, the color is chroma.
The traditional water color is measured by a platinum-cobalt standard colorimetric method, namely, a standard solution for measuring the color is prepared by potassium chloroplatinate (K2PtCl6) and cobalt chloride (CoCl 2.6H 2O), and when 2.419mg of potassium chloroplatinate and 2.00mg of cobalt chloride are contained in 1L of water, the color shade generated by 1mg of platinum (Pt) per liter is determined to be 1 degree (1 degree). The method needs chemical reagents, is complex to operate, is easy to cause secondary pollution, and cannot finish automatic, quick and in-situ measurement of water quality chromaticity.
The traditional water quality detection technology by spectroscopy mainly comprises a single-wavelength and multi-wavelength combined method. The single wavelength method is used for measuring the absorbance of the water body at 550nm, the linear regression method is used for measuring the chromaticity of the water body, and the defect that the model prediction precision is low due to the fact that the available data amount is very small exists; the common multi-wavelength combination method is a PLS method, which can well search linear features for regression but cannot capture nonlinear features by continuously extracting principal components to simplify data and establishing a regression model.
Disclosure of Invention
The invention provides a spectral method for online measurement of chromaticity of drinking water in cities and towns, aiming at solving the technical problems that the existing water chromaticity measurement needs chemical reagents, is complex to operate, easily causes secondary pollution, cannot realize automatic, rapid and in-situ measurement of water chromaticity, and a spectral water quality detection technology has or has very little available data volume, so that the prediction precision is low, or nonlinear characteristics cannot be captured.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a method for on-line measurement of chromaticity by using a spectrum method for drinking water in cities and towns is characterized by comprising the following steps:
1) acquisition of spectral data
Measuring ultraviolet-visible-near infrared transmission spectrum curves of a plurality of measured water bodies and ultraviolet-visible-near infrared transmission spectrum curves of standard deionized water by adopting an ultraviolet-visible-near infrared full spectrum analysis module;
2) standard water reference ratio
A standard water reference was obtained by the following formula:
in the formula I1Is the transmission spectrum of the measured water body, I0Is the transmission spectrum of standard deionized water;
3) absorbance conversion
The standard water reference was converted to an absorbance spectrum by the following formula:
A=-log(I)
4) separation of training and validation sets
Dividing all the obtained absorbance spectra into a training set and a testing set, wherein the number of samples in the training set is more than 70% of the total samples of all the absorbance spectra, and recording the absorbance spectra A in the training setmnIs a matrix with rows and columns of m and n, where m is the number of samples in the training set, n is the number of pixels per sample, AmnEach row of (a) is a spectral curve of one sample, each column is a pixel value of all samples at the same spectral position, and m chroma values corresponding to m samples in the training set are recorded as a vector c, c ═ c1,c2,…,cm];
5) Calculating an optimal segmentation variable j and an optimal segmentation point s
5.1) determining the optimal segmentation point of the first column
5.1.1) setting a partitioning variable j1Is a first column vector x1,x1=[A11,A21,…,Am1]Dividing point s1Is 1;
5.1.2) according to the division point s1Dividing the first column vector intoTwo sub-regions R1And R2;
Wherein: r1The number of elements p ═ s11, region R1The set of elements is represented as: r1(j1,s1)=A11;
R2The number of elements q in (m-p), region R2The set of elements is represented as: r2(j1,s1)=[A21,,A31,,…,Am1];
5.1.3) calculating the predicted output value
And
wherein: x is the number of1i∈R1(j1,s1)=A11;
x2i∈R2(j1,s1)=[A21,,A31,,…,Am1];
f (epsilon) is a function for realizing setting, and the median of elements in the vector is taken as an output value of f (epsilon);
5.1.4) calculation
5.1.5) dividing the points s1Are sequentially replaced bys22 to smUsing the procedures of step 5.1.2) to step 5.1.4) to give M(s), respectively2)…M(sm) A value of (d);
5.1.6) taking M(s)1)、M(s2)…M(sm) Minimum value of M(s)w),w∈[1,m],M(sw) Corresponding swTo divide variable j1Is marked as M (j)1,sw1);
5.2) determining the optimal segmentation points of the rest columns
Traversing the segmentation variable j by the method of step 5.1)2Is a second column vector x2To a partition variable jnIs the n-th column vector xnIs marked as M (j)2,sw2)…M(jn,swn);
Wherein x is2=[A12,A22,…,Am2];
xn=[A1n,A2n,…,Amn];
5.3) determining the optimal segmentation variable j and the optimal segmentation point s
Calculate M (j)1,sw1)、M(j2,sw2)…M(jn,swn) Minimum value of M (j)a,swb),a∈[1,n],b∈[1,m]Remember ja,swbRespectively an optimal segmentation variable j and an optimal segmentation point s;
6) establishing a nonlinear full-spectrum colorimetric quantitative analysis model
6.1) dividing the optimal segmentation variable into two sub-regions R1 and R2 according to the optimal segmentation variable j and the optimal segmentation point s;
wherein: setting the column vector determined by the optimal segmentation variable j as x, xiIs one element of x;
the number p of elements in R1 is s, and the element set of the region R1 is represented as R1(j,s),R1(j, s) are all x in xiA fraction of ≤;
the number q of elements in R2 is m-p, and the element set of the region R2 is represented as R2(j,s),R2(j, s) are all x in xiA moiety > s;
6.2) determining the sub-regions R separately by the method of step 5.1)1(j, s) and R2(j, s) and dividing the sub-region R according to the optimal dividing point1(j, s) into new secondary sub-regions, and dividing the sub-region R into two sub-regions2(j, s) dividing into new secondary sub-regions;
6.3) dividing each secondary subregion by the method of the step 5.1);
6.4) repeatedly executing the step 6.3), and dividing each divided sub-region again until the number q of all the sub-regions after n-level division reaches a set value, and completing establishment of a nonlinear full-spectrum chromaticity quantitative analysis model, wherein q is 2n;
7) Testing
Inputting a test set absorbance spectrum sample B into the nonlinear full spectrum chromaticity quantitative analysis model in the step 6.4), and dividing the sample B into sub-regions
k∈{1,…,q},
The chroma value corresponding to the element in (1) is c
kThen the chroma value z of sample B is:
z=f(ck)。
further, in step 6.4), the n-level partition is a five-level partition, and the number of the sub-regions is 32.
Further, in step 5.1.3), f (∈) is a function for realizing setting, and an output value with the median of elements in the vector as f (∈) is specifically:
f (epsilon) is a function for realizing setting, all elements in the vector epsilon are sorted and recorded as a vector theta, if the number of the elements in the f (epsilon) is more than 6, three maximum values and three minimum values of the elements in the vector theta are removed and recorded as a vector beta, the median pi of the elements in the orientation quantity beta is obtained, and the pi is an output value of the f (epsilon); and if the number of the elements in f (epsilon) is less than or equal to 6, the median of the elements in the vector epsilon is the output value of f (epsilon).
Further, in step 4), all the total samples of absorbance spectra further include a validation set;
step A) verification is further included between the step 6) and the step 7); and inputting the verification set serving as a sample into a nonlinear full-spectrum chromaticity quantitative analysis model, and verifying the accuracy of the model on the verification set.
Compared with the prior art, the invention has the advantages that:
1. according to the invention, mass spectral data are obtained by measuring the ultraviolet-visible-near infrared transmission spectrum of the water body, and the spectral data are analyzed to establish a nonlinear full-spectrum water body chromaticity quantitative analysis model, so that the water body chromaticity measurement can be completed.
2. Compared with the traditional single-spectrum and multi-spectrum modeling method, the method can more fully utilize the spectrum information, excavate the nonlinear characteristics of the spectrum information in a high-dimensional space, and establish a more accurate colorimetric quantitative analysis model, so that the measurement precision is higher, and the method is more suitable for the drinking water with relatively good water quality.
3. The invention establishes a full spectrum water chromaticity quantitative analysis model for water chromaticity prediction by measuring the transmission spectrum of the water body, predicts the water chromaticity by the spectrum data, and has the advantages of high accuracy, high measurement speed, no secondary pollution and the like.
4. Partial least square method correlation coefficient R of traditional linear analysis modeling method20.6293, mean square error MSE 2.3469; model correlation coefficient R established by the method of the invention20.9418, the mean square error MSE 0.4474 can significantly improve the prediction accuracy of the chroma, so that the measured mean square error of the chroma is reduced by an order of magnitude.
Drawings
FIG. 1 is a flow chart of the method for on-line measurement of chromaticity by spectroscopy for drinking water for urban life according to the present invention;
FIG. 2 is a graph of a transmission spectrum in an online measuring method of the chromaticity by using a spectrum method for drinking water in cities and towns according to the invention;
FIG. 3 is a diagram of a result of a chromaticity prediction set in the online measuring method of the chromaticity by using the spectrum method for the drinking water of the town.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
As shown in figure 1, the invention relates to a method for on-line measuring the chromaticity by using a spectrometry for drinking water in cities and towns, which comprises the following steps:
1) acquisition of spectral data
Measuring the ultraviolet-visible-near infrared transmission spectrum curves of a plurality of measured water bodies and the ultraviolet-visible-near infrared transmission spectrum curve of standard deionized water by adopting an ultraviolet-visible-near infrared full spectrum analysis module, wherein the ultraviolet-visible-near infrared transmission spectrum curves are shown in figure 2;
2) standard water reference ratio
A standard water reference was obtained by the following formula:
in the formula I1Is the transmission spectrum of the measured water body, I0Is the transmission spectrum of standard deionized water;
3) absorbance conversion
The standard water reference was converted to absorbance spectrum a by the following formula:
A=-log(I)
4) separation of training and validation sets
Dividing all the obtained absorbance spectra into a training set, a verification set and a test set, wherein the number of samples in the training set is more than 70% of the total samples of all the absorbance spectra, the training set of the embodiment accounts for 80% of the total samples, and the absorbance spectra A of the training set is recordedmnIs a matrix with rows and columns of m and n, where m is the number of samples in the training set, n is the number of pixels per sample, AmnEach row of (a) is a spectral curve of one sample, each column is a pixel value of all samples at the same spectral position, and m chroma values corresponding to m samples in the training set are recorded as a vector c, c ═ c1,c2,…,cm](ii) a Wherein the chroma value of each sample is composed of n samplesThe invention is to find a method for calculating chromatic values by n pixels;
5) training by using the absorbance spectrum in the training set, and calculating the optimal segmentation variable j and the optimal segmentation point s
5.1) determining the optimal segmentation point of the first column
5.1.1) setting a partitioning variable j1Is a first column vector x1,x1=[A11,A21,…,Am1]Dividing point s1Is 1;
5.1.2) according to the division point s1Dividing the first column vector into two sub-regions R1And R2;
Wherein: r1The number of elements p ═s11, region R1The set of elements is represented as: r1(j1,s1)=A11;
R2The number of elements q in (m-p), region R2The set of elements is represented as: r2(j1,s1)=[A21,,A31,,…,Am1];
5.1.3) calculating the predicted output value
And
wherein: x is the number of1i∈R1(j1,s1)=A11;
x2i∈R2(j1,s1)=[A21,,A31,,…,Am1];
Wherein, f (epsilon) is a function for realizing setting, the meaning of the function is that all elements in the vector epsilon are ordered and recorded as a vector theta, the vector beta is recorded after three maximum values and three minimum values of the elements in the vector theta are removed, the median pi of the elements in the orientation quantity beta is, and the pi is the output value of f (epsilon); when the number of elements in f (epsilon) is less than or equal to 6, f (epsilon) is the median of the elements in the direct orientation quantity epsilon. Instantiate an e1=[1,3,4,7,4,2,8,9,5,3,11]Then f (∈ f)1)=5;∈2=[1,3,4,7,4]Then f (∈ f)2)=4;
5.1.4) calculation
In the formula: x is the number of1i∈R1(j1,s1)=A11;
x2i∈R2(j1,s1)=[A21,,A31,,…,Am1];
5.1.5) dividing the points s
1Are sequentially replaced by
s22 to s
mBy the methods of step 5.1.2) to step 5.1.4) to give each ═ m
M(s
2)…M(s
m) A value of (d);
5.1.6) taking M(s)1)、M(s2)…M(sm) Minimum value of M(s)w),w∈[1,m],M(sw) Corresponding swTo divide variable j1Is marked as M (j)1,sw1);
5.2) determining the optimal segmentation points of the rest columns
Traversing the segmentation variable j by the method of step 5.1)2Is a second column vector x2To a partition variable jnIs the n-th column vector xnIs marked as M (j)2,sw2)…M(jn,swn);
Wherein x is2=[A12,A22,…,Am2];
xn=[A1n,A2n,…,Amn];
5.3) determining the optimal segmentation variable J and the optimal segmentation point s
Calculate M (j)1,sw1)、M(j2,sw2)…M(jn,swn) Minimum value of M (j)a,swb),a∈[1,n],b∈[1,m]Remember ja,swbRespectively an optimal segmentation variable j and an optimal segmentation point s;
6) establishing a nonlinear full-spectrum colorimetric quantitative analysis model
6.1) dividing the optimal segmentation variable into two sub-regions R1 and R2 according to the optimal segmentation variable j and the optimal segmentation point s;
wherein: setting the column vector determined by the optimal segmentation variable j as x, xiIs one element of x;
the number p of elements in R1 is s, and the element set of the region R1 is represented as R1(j,s),R1(j, s) are all x in xiA fraction of ≤;
the number q of elements in R2 is m-p, and the element set of the region R2 is represented as R2(j,s),R2(j, s) are all x in xiA moiety > s;
6.2) determining the sub-regions R separately by the method of step 5.1)
1(j, s) and R
2(j, s) and dividing the sub-region R according to the optimal dividing point
1(j, s) partitioning into new secondary sub-regions
And combining the sub-region R
2(j, s) partitioning into new secondary sub-regions
Four secondary subregions are obtained
6.3) dividing each secondary subregion by the method of the step 5.1);
6.4) repeatedly executing step 6.3), and dividing each sub-area after division again until the number q of all sub-areas after n-level division reaches a set value, wherein q is 2
n(ii) a This embodiment divides the sample into five-level sub-regions
The number of the total number is 32, the numerical value of 32 is preset at the moment, the effect is optimal, and the establishment of a nonlinear full-spectrum chromaticity quantitative analysis model is completed;
7) authentication
Inputting the verification set serving as a sample into the nonlinear full-spectrum chromaticity quantitative analysis model in the step 6.4), and verifying the accuracy of the model on the verification set; the results are shown in FIG. 3, the correlation coefficient R20.9418, mean square error MSE 0.4474;
8) testing
Inputting an absorbance spectrum sample B of the test set into the nonlinear full spectrum chromaticity quantitative analysis model verified by the verification set, wherein the sample B is a vector and has n pixels, and the sample B can be directly divided into sub-regions
k∈{1,…,32},
The chroma value corresponding to the element in (1) is c
kThen the chroma value z of sample B is:
z=f(ck)。
the invention utilizes full spectrum data to carry out modeling, utilizes the model to carry out water quality chromaticity quantitative analysis, and can more fully utilize spectrum information, mine the nonlinear characteristics of the spectrum information in a high-dimensional space and establish a more accurate chromaticity quantitative analysis model compared with the traditional single-spectrum and multi-spectrum modeling method. Traditional linear analysis modeling methodPartial least square method correlation coefficient R20.6293, mean square error MSE 2.3469; the correlation coefficient R of the method of the present invention20.9418, the mean square error MSE is 0.4474, so the analytical model based on the method of the present invention can significantly improve the prediction accuracy of the chroma, and the measured mean square error of the chroma is reduced by an order of magnitude, which is only 19% of the traditional PLS method.
The above description is only for the preferred embodiment of the present invention and does not limit the technical solution of the present invention, and any modifications made by those skilled in the art based on the main technical idea of the present invention belong to the technical scope of the present invention.