Disclosure of Invention
The present invention aims to at least solve the technical problems existing in the prior art. Therefore, the invention provides a cantonese lip reading identification method, equipment and a storage medium, which can improve the identification precision of a trained model.
The first aspect of the invention provides a cantonese lip reading identification method, which comprises the following steps:
acquiring a first Guangdong video clip;
Cutting useless segments in the first Guangdong video segments to obtain second Guangdong video segments, wherein the useless segments comprise segments with unmanned human voice and/or segments with unmatched human voice and human images;
dividing a video sequence and an audio sequence in the second Guangdong video segment, performing word segmentation on the audio sequence, generating a word segmentation time stamp, and generating a label according to the word segmentation and the word segmentation time stamp;
Extracting face images in the video sequence, filtering incomplete face images, and generating a sample sequence according to the filtered face images and the labels;
training a preset cantonese lip reading identification model according to the sample sequence to obtain the trained cantonese lip reading identification model;
And identifying the target video sequence according to the training-completed cantonese lip reading identification model to obtain an identification result.
According to the embodiment of the invention, at least the following technical effects are achieved:
after a first Guangdong video segment is obtained, eliminating segments of human voice unmanned figures and/or segments of human voice unmanned figures in the first Guangdong video segment to obtain a second Guangdong video segment, and generating a sample sequence data set containing labels by using the second Guangdong video segment. The method can collect the lip reading sample sequence data set of the Guangdong word level, makes up for the blank that no large-scale lip reading sample sequence data set exists at present, and trains the model through the sample sequence due to the elimination of useless sequences in the video sequence, so that the recognition precision of the trained model can be improved.
According to some embodiments of the invention, before training the preset cantonese lip reading recognition model according to the sample sequence, the method further comprises:
and adding boundary information to the sample sequence, and encoding the sample sequence added with the boundary information according to Libjpeg.
According to some embodiments of the invention, the cantonese lip reading recognition model includes a feature extraction network, an LSTM network, a three-layer BiGRU network, and a mutual information maximization network, and the training process of the cantonese lip reading recognition model includes:
extracting features in the sample sequence according to the feature extraction network, and setting mutual information constraint between the features and the labels;
Generating corresponding weights based on different frames of the tags according to the LSTM network;
classifying the characteristics according to the three-layer BiGRU network to obtain an output result;
Generating global average characteristics according to the output result and the weight;
and maximizing mutual information between the global average feature and the tag according to the mutual information maximizing network.
According to some embodiments of the present invention, the feature extraction network includes a 3D CNN network, a spatial maximization pooling layer, a ResNet network, and a global averaging pooling layer connected in sequence, and the extracting the features in the sample sequence according to the feature extraction network and setting mutual information constraints between the features and tags includes:
extracting initial features in the sample sequence according to the 3D CNN network;
compressing the initial feature according to the spatial maximization pooling layer;
dividing the initial features into a plurality of parts, respectively extracting the features of each part according to the ResNet network, and adding mutual information constraint between the features and the labels;
And carrying out average pooling on the characteristics added with mutual information constraint according to the global average pooling layer.
According to some embodiments of the invention, the ResNet network is a ResNet-34 network.
According to some embodiments of the present invention, the LSTM network includes an LSTM layer and a linear layer connected in sequence, and the calculation formula for generating the corresponding weight based on different frames of the tag according to the LSTM network includes:
at=Relu(wlinear×LSTM(G)t+blinear)
Wherein, G represents an output result of the spatial maximum pooling layer, wlinear and blinear represent parameters of the linear layer, LSTM (G)t represents a hidden state of the LSTM layer at time step t, relu () represents a Relu function, and at represents a weight of a frame sequence when the time step is t.
According to some embodiments of the invention, the optimization function of the mutual information maximization network comprises:
LossMI=Ep(F,L)[(log(MI(F,L))]+Ep(F)p(L)[(log(1-MI(F,L))]
Wherein the LossMI represents an optimization function of the global mutual information maximization network, the p (F, L) represents a joint distribution of the pair of samples (F, L), the p (F) p (L) represents an edge distribution of the pair of samples (F, L), the MI(F,L) represents mutual information between F and L, E is used to find mathematical expectations, F represents a global average feature, and L represents a label.
According to some embodiments of the invention, the loss function of the cantonese lip reading recognition model comprises:
Wherein the saidRepresents a cross entropy loss function, said c represents the total number of classes of vocabulary in the sample sequence,A score representing label Li output by the three-tier BiGRU network.
In a second aspect of the invention, an electronic device is provided, comprising at least one control processor and a memory communicatively coupled to the at least one control processor, the memory storing instructions executable by the at least one control processor to enable the at least one control processor to perform the cantonese lip reading identification method described above.
In a third aspect of the present invention, there is provided a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the above-described cantonese lip reading recognition method.
It should be noted that the advantages of the second and third aspects of the present invention with the prior art are the same as those of the cantonese lip reading identification method described above with the prior art, and will not be described in detail herein.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Referring to fig. 1 to 3, in one embodiment of the present invention, there is provided a cantonese lip reading recognition method including the steps of:
Step S110, a first Guangdong language video clip is obtained.
For example, the first video clip of Guangdong is obtained by crawling video programs of Guangdong such as news lineups of Guangdong, variety of Guangdong, interviews of Guangdong characters, talk shows, etc. from the Internet using a you-get tool (the you-get tool is a video, picture and music download tool based on Python 3).
And step S120, cutting useless segments in the first Guangdong language video segments to obtain second Guangdong language video segments, wherein the useless segments comprise segments with unmanned human voice images and/or segments with unmatched human voice images. Here, useless segments in the first guang video segment may be manually cut.
And step S130, dividing the video sequence and the audio sequence in the second Guangdong video segment, segmenting the audio sequence, generating a segmentation time stamp, and generating a label according to the segmentation and the segmentation time stamp.
For example, a speech transcription tool of the science fiction is used for word segmentation and word segmentation time stamp generation of the audio sequence, the video sequence and the audio sequence are named according to the same naming mode, and the video sequence and the corresponding audio sequence are named with the same name, so that the later pairing is facilitated. Expanding the left and right of all word segmentation time stamps for 0.02s, and generating labels according to the video sequence names, the word segmentation time stamps, the word segmentation pinyin and the word segmentation generation sequence. The audio sequence of this step is to mark the sample data set obtained later, and the corresponding text information is generated by using the speech transcription tool of the science fiction to process the audio, and the text information is the text content of the corresponding video sequence, namely, is equivalent to a label.
Step S140, extracting face images in the video sequence, filtering incomplete face images, and generating a sample sequence according to the filtered face images and the labels.
For example, a MEDIAPIPE tool is used to extract faces in the video sequence, a filter is trained to filter out all images of abnormally identified faces (incomplete face clipping), and finally a sample sequence data set consisting of a plurality of face images is obtained.
And step S150, training a preset cantonese lip reading identification model according to the sample sequence to obtain the trained cantonese lip reading identification model.
And step 160, identifying the target video sequence according to the training-completed cantonese lip reading identification model to obtain an identification result.
After a first Guangdong video segment is obtained, eliminating segments of human voice unmanned figures and/or segments of human voice unmanned figures in the first Guangdong video segment to obtain a second Guangdong video segment, and generating a sample image data set containing labels by using the second Guangdong video segment. The method can collect the lip reading sample data set of the Guangdong word level, makes up for the blank that no large-scale lip reading sample data set exists at present, and trains the model through the sample sequence because useless sequences in the video sequence are removed, so that the recognition precision of the trained model can be improved.
In the related scheme, the phenomenon that boundary information is ambiguous and a key frame and an useless frame are not distinguished well exists in the training process of the model, wherein when the boundary information is ambiguous, the useless frame in the boundary information is difficult to reject, when the key frame and the useless frame are not distinguished well, the key frame cannot be screened well when the model performs feature extraction on a video sequence, the lip reading speed is slow, and the recognition accuracy of the model is also low due to the influence of the non-key frame (the useless frame).
Therefore, based on the above embodiment, the method further includes, before step S150, the steps of:
Step S1401, adding boundary information to the sample sequence, and encoding the sample sequence to which the boundary information is added according to Libjpeg. Step S1401 can solve the problem that the boundary useless frame cannot be removed by adding the boundary information. The training data is encoded using Libjpeg for the purpose of compressing the data, thereby speeding up the subsequent training process.
In the existing scheme, an identification model consisting of ResNet-18 and a backbone network of GRU is generally used, and unlike the existing scheme, the preset cantonese lip reading identification model comprises a feature extraction network, an LSTM network, a three-layer BiGRU network and a mutual information maximization network, and the training process of the cantonese lip reading identification model comprises the following steps:
Step S1501, extracting features in the sample sequence according to the feature extraction network, and setting mutual information constraint between the features and the labels.
In one embodiment, the feature extraction network includes ResNet-34 networks and a global averaging pooling layer, which is able to extract deeper features using ResNet-34 networks that are deeper than ResNet-18 networks than the generic ResNet-18 networks, with the purpose of structurally regularizing the entire network to prevent overfitting using the global averaging pooling layer instead of the fully connected layer.
In another embodiment, the feature extraction network comprises a 3D CNN network, a spatial maximization pooling layer, a ResNet-34 network, and a global averaging pooling layer, connected in sequence. Step S1501 specifically includes the steps of:
Step S15011, extracting initial features in the sample sequence according to the 3D CNN network.
Step S15012, the initial features are laminated according to the spatial maximization pooling.
Step S15013, the initial features are divided into a plurality of parts uniformly, the features of each part are extracted according to ResNet-34 networks, and mutual information constraint between the features and the labels is added.
Step S15014, the features added with mutual information constraint are averaged and pooled according to the global averaging pooling layer.
In steps S15011 and S15012, a 3D CNN network and a spatial maximum pooling layer are provided at the front end of ResNet-34 networks, the 3D CNN network performs feature extraction on the initial frame first to achieve the purpose of preliminary time alignment, and then uses the spatial maximum pooling to compress the features of the spatial domain, so that a better recognition effect can be achieved by the processing. In steps S15013 to S15014, the sequence features are equally divided into T parts according to the number of frames (e.g., T frames) of the input sequence, and features of each part are extracted by ResNet-34 networks, respectively. In order to improve the capability of capturing the fine-grained actions of the lips and the corresponding labels, and further improve the model recognition precision, mutual information constraint is applied between the outputs of ResNet-34 and the labels, and note that when the mutual information is mostly used for feature extraction, the degree of association between feature items and categories is better corresponding to the features and the corresponding labels one by one when the mutual information is maximized. And then sending the obtained characteristics into a global average pooling layer, and structurally regularizing the whole network to prevent overfitting.
Step S1502, corresponding weights are generated based on different frames of the tag according to the LSTM network.
At the output of ResNet-34, an LSTM network is added, which includes an LSTM layer and a linear layer, and in this embodiment, the purpose of the added LSTM network is to filter out key frames, that is, to assign different weights to different frames according to labels to distinguish key frames from non-key frames, in this embodiment, the weights may be any positive number, but for weights of useless frames, that should be as close to 0 as possible. A Relu function may then be introduced to obtain the weights.
And step S1503, classifying the characteristics according to the three-layer BiGRU network to obtain an output result.
And the output result of the characteristic extraction network is taken as input and is input into a three-layer BiGRU network, and the three-layer BiGRU network is used for carrying out characteristic classification to obtain the output result of the three-layer BiGRU network.
Step S1504, generating global average characteristics according to the output result and the weight.
The global average feature is obtained by weighting the outputs and weights of the three layers BiGRU.
Step S1505, maximizing the mutual information between the global average feature and the tag according to the mutual information maximizing network.
And splicing the global average feature and the one-hot vector of the label, and taking the splicing result as the input of the global mutual information maximization network. In this embodiment, the mutual information maximization network is composed of two linear layers and one sigmoid activation layer, the mutual information between the global average feature and a given tag is maximized by the mutual information maximization network, if the global average feature and the tag are from the same sample, the output of the global mutual information maximization network is close to 1 (positive sample), if the global average feature and the tag are not paired samples, the output of the global mutual information maximization network is close to 0 (negative sample).
The method embodiment has the following beneficial effects:
(1) And after the first Guangdong video fragment is acquired in the sample data set for training the model, removing the fragment of the unmanned human voice image and/or the fragment of the unmatched human voice image in the first Guangdong video fragment to obtain a second Guangdong video fragment, and generating a sample sequence containing a label by utilizing the second Guangdong video fragment. Useless sequences in the video sequence are eliminated, and the recognition accuracy of the trained model can be improved.
(2) The traditional lip reading identification model adopts ResNet-18+GRU structure, and the method adopts ResNet-34 network which is deeper than ResNet-18 network, so that deeper features can be extracted.
(3) The boundary information is ambiguous and the key frames and the useless frames are not well distinguished in the lip reading task, and before the samples in the sample data set are input into the model, the boundary information is added to the sample sequence for processing, so that the problem that the boundary useless frames cannot be removed can be solved. An LSTM network consisting of an LSTM and a linear layer is added at the rear end of the feature extraction network, and key frames are screened by utilizing the LSTM network in a weight generation mode, so that the key frames and non-key frames can be correctly distinguished, and the lip language recognition precision of the model can be effectively improved.
(4) In order to improve the capability of capturing fine-grained actions of lips and corresponding labels and further improve the accuracy of model identification, the method also designs a mutual information maximization network consisting of two linear layers and a sigmoid activation layer, the global average characteristic is obtained by weighting the characteristics output by the weight and three layers BiGRU of networks, the global average characteristic and the labels are spliced to serve as the input of the global mutual information maximization network, the mutual information maximization network is used for carrying out mutual information maximization processing on the global average characteristic and the labels, if the mutual information is a paired sample, the global mutual information maximization network output is close to 1, otherwise, the global mutual information is close to 0, and the lip language identification accuracy of the model can be effectively improved.
Referring to fig. 4 to 6, in order to facilitate understanding of those skilled in the art, according to one embodiment of the present invention, there is provided a cantonese lip reading recognition method, comprising the steps of:
and S210, constructing a sample data set of the Yue language lip reading.
Step S2101, crawling the Yue-language television programs from the Internet through a you-get tool to obtain video segments. The term "television" as used herein includes, but is not limited to, news lineups in cantonese, shows in cantonese, interviews with cantonese characters, and talk shows.
Step S2102, reject useless segments in the video segment. The useless segments herein refer to segments having a human voice but no human figure, or segments where the human voice does not match.
Step S2103, separate the audio sequence and the video sequence of the video segment, use a speech processing tool (e.g. transcription function of the science fiction flying speech) to segment the audio sequence and generate a timestamp, and name the video sequence and the audio sequence according to the same naming mode, and name the video sequence and the corresponding audio sequence with the same name, so that the later pairing is convenient.
And step S2104, expanding all word segmentation time stamps for 0.02S, and generating labels according to the video sequence names, the word segmentation time stamps, the word segmentation pinyin and the word segmentation sequence.
Step S2105, using MEDIAPIPE tool to extract face in video sequence, and training a filter to filter out incomplete picture of all identified face.
The steps S2101 to S2105 can effectively remove useless fragments, and can also ensure that the acquired data has environmental diversity and is closer to life scenes.
And S220, constructing a Guangdong language lip reading identification model based on global mutual information maximization.
The cantonese lip reading recognition model mainly comprises a backbone network combined by ResNet-34 networks and three layers of BiGRU networks and a global mutual information maximization network.
First, a 3D CNN layer and a spatial maximum pooling layer are added before ResNet-34 networks. The 3D CNN layer is used for extracting the features of the initial frames of the input video sequence so as to achieve the purpose of preliminary time alignment, and then the features of the compressed space domain are pooled by using the space maximization, so that the training time can be reduced under the condition of not affecting the recognition effect. The 3D CNN layer and the space maximum pooling layer are added before ResNet-34 networks, so that the cantonese lip reading network model can achieve a better recognition effect.
Assuming that the number of frames of an input video sequence is T, dividing the sequence characteristics into T parts according to the number of frames T of the input sequence, and extracting the characteristics of each part by using ResNet-34 networks. To improve the ability to capture fine-grained actions of the lips with the corresponding label L, a mutual information constraint is imposed between the output of the ResNet-34 network and the label L. After the mutual information constraint is applied, the acquired features are input to a global averaging pooling layer, where global averaging pooling is used instead of a fully connected layer with the purpose of structurally regularizing the entire network to prevent overfitting.
At the output of ResNet-34 networks, an LSTM and a linear layer are added, and an LSTM and a linear layer are formed into an LSTM network, and the added LSTM network assigns different weights a according to different frames of the label L. For filtering key frames, the weight a may be any positive number, but for the weight a of useless frames, that should be as close to 0 as possible. Then Relu is introduced to obtain the weight a:
at=Relu(wlinear×LSTM(G)t+blinear)
In the above equation, G represents the output of the global average pooling layer, wlinear and blinear represent parameters of the linear layer, LSTM (G)t represents the hidden state of the LSTM layer at time step t, and at represents the weight of the frame sequence at time step t.
The final global average feature F consists of the output P and weights a of the three-layer BiGRU network.
Where T represents the length of the entire video sequence, i.e. the total number of frames.
The one-hot vectors of the global average feature F and the label L are connected together as inputs to the global mutual information maximization network.
The global mutual information maximization network consists of two linear layers and a sigmoid activation layer, and mutual information between the global average feature and a given tag can be maximized through the global mutual information maximization network. If the global average feature F and the label L are from the same sample, the output of the global mutual information maximization network should be as close to 1 (positive sample) as possible, and if the global average feature F and the label L are not paired samples, the output of the global mutual information maximization network should be as close to 0 (negative sample) as possible, so the global mutual information maximization network can be expressed as:
LossMI=Ep(F,L)[(log(MI(F,L))]+Ep(F)p(L)[(log(1-MI(F,L))]
Where LossMI represents an optimization function for global mutual information maximization, p (F, L) represents a joint distribution of the pair of samples (F, L), p (F) p (L) represents an edge distribution of the pair of samples (F, L), MI(F,L) represents mutual information between the two variables F and L, and E is used to find mathematical expectations.
After the output Pt of the three-layer BiGRU network in the backbone network passes through the linear layer, the dimension becomes C-dimension, and the final classification score is as follows:
where c represents the total number of classes of vocabulary in the dataset.A score representing the label Li.
The loss function of the entire model is as follows:
Wherein Li represents the ith tag, losstotal represents the total Loss, the first term of the Loss functionIs a cross entropy Loss function and the second term LossMI is an optimization function of the global mutual information maximization network.
And step S230, adding boundary information to the data in the data set, and training a cantonese lip reading recognition model.
Step S2301, processing of adding boundary information is performed. The video sequence is cut to 88 x 88 size and then a boundary information is added to the video sequence according to the time stamp information. The boundary information is added here in such a manner that a rounding of OP (start time stamp) ×25 (video frame rate) is selected as a start frame, 40 frames after the selection are taken as an input video sequence, and data of less than 40 frames in length is screened out. By adding the boundary information, the problem that the useless frames of the boundary cannot be removed can be solved.
And step S2302, coding the video sequence through Libjpeg, setting the training parameters such as batch-size and epoch, inputting the coded data into the cantonese lip reading identification model for training to obtain training weights, and obtaining the cantonese lip reading identification model after training. The purpose of the encoding is to compress the data after the boundary information processing is added, thereby increasing the later training speed.
And step 240, identifying the target video sequence through the training-completed cantonese lip reading identification model.
In step S2401, a pyside tool kit is used for designing a Ui interface of the system, and the Ui interface is divided into three parts, namely a prompter area, a face display area and a result display area.
Step S2402, loading training weight, loading a cantonese lip reading identification model, and matching the functions in the buttons and codes of the Ui interface.
Step S2403, starting to identify, collecting the target video sequence through the face display area, and extracting the face in the target video sequence by using MEDIAPIPE.
And step S2404, loading a network, and processing the target video sequence by using the Yue language lip reading identification model.
Step S2405, displaying the lip reading recognition result obtained by the cantonese lip reading recognition model in the display area.
The research of lip language identification is significant for the vast hearing impaired, and Guangdong language is widely used in China's Guangdong and Guangxi Provinces area, hong Kong and Australian communities and people communities worldwide, so the research of Guangdong language lip reading is indistinct. At present, no company or large-scale research institution pushes out a large-scale Guangdong lip-reading data set, and the application can obtain word-level sample data by downloading network video resources, manually cutting and screening useless fragments. Because the phenomena of undefined boundary information and indistinguishable key frames and useless frames occur in a lip reading task, the method is used for preprocessing data in a sample data set firstly and can solve the problem that the boundary useless frames cannot be removed by adding boundary information. The method is improved on the basis of the existing model, the ResNet-18 network is replaced by the ResNet-34 network to extract deeper features, the LSTM network (consisting of an LSTM and a linear layer) is added at the output end of the ResNet-34 network, key frames are screened by the LSTM network in a mode of generating weight a, and a is generated by Relu, so that the weight of useless frames can be changed to 0. And then weighting the characteristics output by the weight a and the three-layer BiGRU network to obtain a global average characteristic F. In order to improve the capability of capturing fine-grained actions of lips and corresponding labels, the accuracy of model identification is further improved, the global average feature F and the labels L are spliced to be used as the input of a global mutual information maximization network, then mutual information maximization processing is carried out on the global average feature F and the labels L, if the mutual information is a paired sample, the global mutual information maximization network output is close to 1, and otherwise, the mutual information is close to 0.
The method makes up the blank that the large-scale lip reading sample data set does not exist in the field of the Guangdong lip reading, and in the used Guangdong lip reading identification model, the accuracy of lip recognition can be effectively improved by providing a method for adding boundary information and global mutual information to be maximized.
Referring to fig. 7, the present application further provides a computer device 301, including a memory 310, a processor 320, and a computer program 311 stored in the memory 310 and capable of running on the processor, where the processor 320 implements the cantonese lip reading recognition method as described above when executing the computer program 311.
The processor 320 and the memory 310 may be connected by a bus or other means.
Memory 310 acts as a non-transitory computer readable storage medium that may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, memory 310 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, memory 310 may optionally include memory located remotely from the processor to which the remote memory may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software program and instructions required to implement the cantonese lip reading identification method of the above-described embodiments are stored in the memory, and when executed by the processor, the cantonese lip reading identification method of the above-described embodiments is performed, for example, the method steps S110 to S160 in fig. 1 or the method steps S210 to S240 in fig. 4 described above are performed.
Referring to fig. 8, the present application also provides a computer-readable storage medium 401 storing computer-executable instructions 410, the computer-executable instructions 410 being for performing the cantonese lip reading identification method as described above.
The computer-readable storage medium 401 stores computer-executable instructions 410, where the computer-executable instructions 410 are executed by a processor or controller, for example, by a processor in the above-described electronic device embodiment, and may cause the processor to perform the cantonese lip reading identification method in the above-described embodiment, for example, performing the method steps S110 to S160 in fig. 1 or the method steps S210 to S240 in fig. 4 described above.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of data such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired data and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any data delivery media.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.