CN119206744A

Movatterモバイル変換

Info

Publication number: CN119206744A
Application number: CN202411719200.2A
Authority: CN
Inventors: 于永斌; 王向向; 冯箫; 丁佳恒; 王颢梁; 头旦才让; 仁青东主; 尼玛扎西
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2024-11-28
Filing date: 2024-11-28
Publication date: 2024-12-27
Anticipated expiration: 2044-11-28
Also published as: CN119206744B

Abstract

The invention provides a natural scene Tibetan image data enhancement method based on a Markov theory, which comprises the steps of identifying input information semantics, defining a Markov process, gradually adding training data into noise disturbance to form a Markov chain, defining an inverse Markov process, using a neural network model, predicting and removing disturbance according to a current disturbance image and a time step, gradually recovering an original image, training the neural network model to enable the neural network model to maximize the conditional probability of the inverse Markov process, generating an image, taking the result obtained after the processing of the Markov process as an initial state matrix, then traversing the Markov chain reversely, and gradually recovering the result by using the neural network model to finally obtain the generated image. The method can avoid the common problems of countermeasure training, mode collapse, gradient disappearance and the like in the generated model, can realize efficient parallelization and expandability, and has high resolution and diversity of the generated image.

Description

Natural scene Tibetan image data enhancement method based on Markov theory

Technical Field

The invention relates to the technical field of machine learning and natural language processing, in particular to a natural scene Tibetan image data enhancement method based on a Markov theory.

Background

In the fields of machine learning and natural language processing, since Tibetan language data resources are scarce, particularly in natural scenes, corresponding Tibetan language image data are lacking, and natural scene images have various styles such as different scenes, brightness, contrast, colors and the like, the current situations all prevent collection and utilization of Tibetan language data to a certain extent. Meanwhile, the Tibetan language picture database is very rare, and research on the aspect of Tibetan language data enhancement is limited by insufficient data volume, so that challenges are brought to research and application related to Tibetan language.

Currently, the most commonly used image generation method is based on a method of generating a countermeasure network (GAN), which is proposed by Goodfellow et al in 2014, and the GAN is composed of a generator whose task is to generate an image from random noise and a discriminator whose task is to judge whether the image is real or generated. By countertraining, the generator may learn the distribution of the training data, thereby generating images that are similar to the training data.

However, GAN also has some problems such as unstable training, pattern collapse, gradient disappearance, etc., and in order to solve these problems, researchers have proposed many improved GAN models such as DCGAN, WGAN, PGGAN, styleGAN, etc., which improve the quality and diversity of the generated images to some extent, but still have some limitations such as limited resolution of the generated images, unclear details of the generated images, inconsistent semantics of the generated images, etc.

Disclosure of Invention

The invention aims to provide a natural scene Tibetan image data enhancement method based on a Markov theory, which can avoid the problems existing in GAN, can realize efficient parallelization and expandability, and can generate images with high resolution and diversity.

The invention provides a natural scene Tibetan image data enhancement method based on Markov theory, which generates a high-quality natural scene Tibetan image similar to training data from random disturbance, wherein each pixel point in an input image is regarded as a state, all of these states together form a state space, and in the Markov process, the image processing process involves transitioning from one state space to another, and when the state transition process converges, the method outputs the processed image.

In order to achieve the above object, the technical scheme of the present invention is as follows:

The method comprises the steps of obtaining image information, preprocessing an image, defining a Markov process to add disturbance to an original image, defining an inverse Markov process, enabling the inverse Markov process to maximize conditional probability of the inverse Markov process by training a neural network, and finally generating a target image. Specifically, the method comprises the following steps:

The method comprises the steps of S1, image processing, namely acquiring image information through shooting in the field, acquiring a web crawler and collecting a public data set, preprocessing the collected Tibetan language image, including unifying the sizes, correcting the inclination of characters, screening images with clear text content and obvious characteristics, and classifying the screened images according to the image characteristics;

Step S2, word processing is carried out on words in the classified images, the positions of Tibetan words in the images are found through edge detection, then word segmentation is carried out, the line segmentation, word segmentation and word segmentation are included to divide the images into different word units, feature extraction is carried out on the segmented word units, the feature extraction comprises structural features, shape features and stroke features, so as to describe the shape and the attribute of the word units, the extracted features are classified, the machine learning or deep learning model is used for classifying so as to identify the category and the content of the word units, post processing is carried out on the classification results, the Tibetan language grammar rules, tibetan language dictionary and Tibetan language corpus resources are used for correcting and optimizing the identification results, and finally Tibetan language containing Tibetan language semantic information is obtained;

Step S3, defining a Markov process, the initial state of which is the original training data, i.e. the original Tibetan image, and setting a time step sequence for a given training data setWhereinT represents the total number of steps of the Markov process, for each time stepSetting a transfer parameterDisturbance information conforming to a certain distribution rule is gradually added to each training sample, so that a state matrix after the Markov propagation process is finishedClose to normal distribution;

Step S4, defining a reverse Markov process, traversing along the reverse direction of a Markov chain, defining a state transition probability matrix to describe a state transition process, predicting the next state according to the state transition probability for a given current state until the specified step number is carried out, namely starting from the distribution finally obtained by the Markov process, gradually recovering the information of the image according to the conditional probability, and finally changing the image into an image similar to the original training data;

Step S5, training a neural network model, giving a current state space, inputting Tibetan character semantic information, generating a next state according to the learned state transition probability, inputting the semantic information of characters in Tibetan character pictures into the neural network model, and generating a natural scene Tibetan character image with relevant characteristics by maximizing the conditional probability of a reverse Markov process through the neural network;

And S6, generating an image, taking a result obtained after the processing of the Markov process as an initial state matrix, traversing along a Markov chain, and deciding the Markov state transition probability by using a trained neural network model to finally obtain a generated image matrix.

Further, the step S3 specifically includes:

S301 given training data setWhereinRepresenting the i-th training sample of the sample,Representing the total number of training samples;

s302, setting a time step sequenceWherein,Representing the total number of steps of the markov process;

S303 for each time stepSetting a state transition parameterSatisfies the following conditionsAnd (2) andIs large enough to allow for a state after the end of the Markov propagation processClose to obeying normal distribution;

S304 for each training sampleAdding disturbance information conforming to a certain distribution rule step by step according to the following formula to obtain a Markov chain:

Wherein the method comprises the steps ofIs a state transition parameter which is independently and uniformly distributed.

Further, for a single Markov state in the state transition probability matrixAnd its subsequent stateThe state transition probability is defined as:

Wherein,Representing the mean vector asCovariance matrix isIs used for the distribution of the gaussian distribution of (c),Is in the initial stateAn identity matrix of the same dimension number,Is a state transition parameter, T is the number of transition steps,I.e. in the state ofReaching states by state transition probability matricesIs a probability of (2).

Further, the step S4 specifically includes:

S401, setting a neural network modelWhereinParameters representing a neural network model whose inputs are current statesAnd time stepOutputting the state of the last stepConditional probability distribution of (2);

S402 for each time stepThe conditional probability distribution of the inverse Markov process is defined from the states using the following formulaTo the point ofThe state transition probabilities of (2) are distributed according to the probabilities:

Wherein the method comprises the steps ofAndRespectively representing the output mean and variance of the neural network model;

s403 for each training sampleThe joint probability distribution of the inverse Markov process is defined using the following formula, which constitutes the entire inverse Markov process, i.e. describes the inclusion of allMarkov chains of step state transition probabilities:

Wherein the method comprises the steps of,In the form of a gaussian distribution,Is an identity matrix.

Further, the neural network model comprises a sinusoidal position coding layer, an encoder, a decoder and a re-parameterization layer, wherein the sinusoidal position coding layer is used for time stepConversion to a sinusoidal position codeAnd with a state space matrixSplicing as input to an encoder for matrix of a current state spaceAnd time stepCoding as a hidden vectorDecoder for decoding hidden vectorDecoding into a last-step imageMean of (2)Sum of variancesA re-parameterization layer for averaging output values from the neural network modelSum of variancesMid-sampling a state matrixAs an output of the decoder.

Further, the step S6 specifically includes:

S601, taking a result obtained after the Markov process is processed as an initial state matrix;

S602 from a sequence of time stepsEach time step is traversed in turnAnd performs the following operations:

S6021 matrix the current state spaceAnd time stepAs the input of the neural network model, the image of the last step is obtainedConditional probability distribution of (2) ;

S6022 sampling a state matrix from the conditional probability distributionI.e. the state of the next step;

S603, after T steps, generating a final image and returning to the final generated image 。

The invention adopts a Markov model to realize Tibetan image data enhancement, wherein the Markov model is a generation model which can generate a high-quality image similar to training data, and comprises a forward process and a reverse process, wherein the forward process gradually adds disturbance information conforming to a certain distribution rule into a training sample, so that a state matrix after the Markov propagation process is finishedThe reverse process defines a state transition probability matrix describing the transition process of the state, and given the current state, the next state can be predicted according to the transition probability matrix to form an inverse Markov chain, and the neural network model is used for learning the Markov state transition probability to gradually restore the original image. The model can avoid the problems of countermeasure training, mode collapse, gradient disappearance and the like which are common in the generation model, can realize efficient parallelization and expandability, and can capture more detail information when generating an image, thereby generating a more real and fine image. Meanwhile, the method is relatively stable in the training process, the problems of gradient disappearance or explosion and the like are not easy to occur, the model is easy to train and optimize, the method also has the characteristic of strong controllability, the fine control on the generated image is realized by adjusting parameters such as step length, disturbance strength and the like, and the requirement of user diversification is met.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a natural scene Tibetan image data enhancement method based on Markov theory provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a Markov process provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a reverse Markov process provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a neural network model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image generation step provided by an embodiment of the present invention;

Fig. 6 is an exemplary diagram of an image generation experiment result provided by an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention relates to a natural scene Tibetan image data enhancement method based on Markov (Markov) theory, which can identify Tibetan semantic information and generate a high-quality multi-style image similar to training data from random disturbance. The markov process is a probability statistical method suitable for various tasks such as image segmentation and texture migration, in which each pixel point in an input image is regarded as a state, all of which together form a state space, and in which the image processing process involves a system transition from one state space to another. When the state transition process converges, the method can output the processed image. Image generation is an important research direction in the fields of computer vision and machine learning, and its objective is to generate realistic images according to given conditions or unconditionally, and image generation has many application scenarios, such as image restoration, image enhancement, image style conversion, image animation, image editing, and the like.

Fig. 1 is a flowchart of a natural scene Tibetan image data enhancement method based on markov theory, and as shown in the figure, the method comprises the following steps:

The method comprises the steps of S1, image processing, namely acquiring image information through modes of field shooting, web crawler acquisition, public data set collection and the like, preprocessing collected Tibetan language images, including unified size, correcting character gradient and the like, screening images with clear text content and obvious characteristics, and classifying the screened images according to image characteristics, wherein scenes such as social media, urban streets, garden scenery, highland snowmountain, grassland lakes, and scenes of different categories, brightness, contrast, colors and the like are different.

Step S2, word processing is carried out on words in the classified images, positions of Tibetan words in the images are found through edge detection, then word segmentation is carried out, the word segmentation comprises line segmentation, word segmentation and word segmentation, the images are divided into different word units, feature extraction is carried out on the segmented word units, the feature extraction comprises structural features, shape features, stroke features and the like, morphology and attribute of the word units are described, the extracted features are classified, the classification and content of the word units are identified through machine learning or deep learning models such as support vector machines, random forests and convolutional neural networks, post processing is carried out on classification results, resources such as Tibetan grammar rules, tibetan dictionaries and Tibetan language corpuses are used, and finally Tibetan language texts containing Tibetan word semantic information are obtained through correction and optimization of the identification results, and are used for supervising neural network training in subsequent steps.

Step S3, defining a Markov process, wherein the initial state of the process is original training data, namely an image originally containing Tibetan language, each state is an image in a model, and each moment is a time step. Setting a time step sequence for a given training data setWhereinT represents the total number of steps of the Markov process, for each time stepSetting a transfer parameterDisturbance information conforming to a certain distribution rule is gradually added to each training sample, so that a state matrix after the Markov propagation process is finishedClose to normal distribution, a Markov chain is obtainedThe Markov chain is a random process, and is characterized in that the current state is only related to the state at the last moment and is irrelevant to the previous historical state, and the process has the meaning that the information of the image is gradually lost from the original training data, and the convergence result of the Markov process is close to the state which is completely disturbance information.

And step S4, defining a reverse Markov process, traversing reversely along a Markov chain, and preparing for learning probability distribution and finally obtaining a generated image for the neural network model. Defining a state transition probability matrix P which describes a state transition process, wherein the state transition probability matrix is a set of a plurality of state transition probabilities, and predicting the next state according to the state transition probability for a given current state until a specified step number is performed. For a single Markov state in a state transition probability matrixAnd its subsequent stateThe state transition probability is defined as:

Wherein,Representing the mean vector asCovariance matrix isIs used for the distribution of the gaussian distribution of (c),Is in the initial stateAn identity matrix of the same dimension number,Is a state transition parameter, T is the number of transition steps,I.e. in the state ofReaching states by state transition probability matricesForms an inverse markov chain. The meaning of this process is that, starting from the distribution obtained by the markov process, the information of the image is gradually restored according to the conditional probability, and finally becomes an image similar to the original training data. The purpose of this is to allow the neural network model to learn the distribution of the training data so that images similar to the training data can be generated in reverse.

And S5, training a neural network model, giving a current state space, inputting Tibetan character semantic information, generating a next state according to the learned state transition probability, inputting the semantic information of characters in Tibetan character pictures into the neural network model, and generating a natural scene Tibetan character image with relevant characteristics by maximizing the conditional probability of a reverse Markov process through the neural network so as to achieve the aim of data enhancement.

FIG. 2 is a schematic illustration of a Markov process provided by the present invention, as shown, including the steps of:

S303 for each time stepSetting a state transition parameterSatisfies the following conditionsAnd (2) andIs large enough to allow for a state after the end of the Markov propagation processObeying normal distribution;

FIG. 3 is a schematic diagram of the inverse Markov process provided by the present invention, as shown, including the steps of:

FIG. 4 is a block diagram of a neural network model provided by the present invention, including a sinusoidal position coding layer for use in time-step, an encoder, a decoder, and a re-parameterization layer, as shownConversion to a sinusoidal position codeAnd with a state space matrixSplicing as input to an encoder for matrix of a current state spaceAnd time stepCoding as a hidden vectorDecoder for decoding hidden vectorDecoding into a last-step imageMean of (2)Sum of variancesA re-parameterization layer for averaging output values from the neural network modelSum of variancesMid-sampling a state matrixAs an output of the decoder.

FIG. 5 is a schematic illustration of the image generation step provided by the present invention, as shown, comprising the operations of:

Fig. 6 is an exemplary graph of the results of an image generation experiment of the present invention, showing images generated from different datasets, including MNIST, CIFAR-10, celebA, LSUN, etc., using the method of the present invention, as can be seen from the graph, the method of the present invention can generate high quality images similar to training data, with high resolution and variety.

It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims

1. A natural scene Tibetan image data enhancement method based on a Markov theory is characterized in that a high-quality Tibetan image similar to training data is generated from random disturbance, and the method comprises the following steps of:

Step S2, word processing is carried out on the words in the classified images, the positions of Tibetan words in the images are found through edge detection, and then word segmentation is carried out, wherein the word segmentation comprises line segmentation, word segmentation and word segmentation, so that the images are divided into different word units; the method comprises the steps of carrying out feature extraction on segmented word units, including structural features, shape features and stroke features, to describe the morphology and the attribute of the word units, classifying the extracted features, including classifying by using a machine learning or deep learning model to identify the category and the content of the word units, and carrying out post-processing on the classified results, including using Tibetan grammar rules, tibetan dictionary and Tibetan corpus resources to correct and optimize the identified results to obtain final Tibetan texts containing Tibetan semantic information;

Step S3, defining a Markov process, the initial state of which is the original training data, i.e. the original Tibetan image, and setting a time step sequence for a given training data setWhereinT represents the total number of steps of the Markov process, for each time stepSetting a transfer parameterGradually adding disturbance information to each training sample;

2. The method according to claim 1, wherein the step S3 further comprises:

S303 for each time stepSetting a state transition parameterSatisfies the following conditions;

S304 for each training sampleAdding disturbance information step by step according to the following formula to obtain a Markov chain:

,

3. The method of claim 1, wherein for a single markov state in the state transition probability matrixAnd its subsequent stateThe state transition probability is defined as:

,

4. The method according to claim 1, wherein the step S4 further comprises:

,

Wherein the method comprises the steps ofAnd) Respectively representing the output mean and variance of the neural network model;

s403 for each training sampleThe joint probability distribution of the inverse markov process is defined using the following formula, this joint probability distribution constituting the entire inverse markov process:

,

5. The method of claim 1, wherein the neural network model comprises a sinusoidal position coding layer, an encoder, a decoder, and a re-parameterization layer, the sinusoidal position coding layer to use for time-stepConversion to a sinusoidal position codeAnd with a state space matrixSplicing as input to an encoder for matrix of a current state spaceAnd time stepCoding as a hidden vectorDecoder for decoding hidden vectorDecoding into a last-step imageMean of (2)Sum of variances) A re-parameterization layer for averaging output values from the neural network modelSum of variances) Mid-sampling a state matrixAs an output of the decoder.

6. The method according to claim 1, wherein the step S6 further comprises:

S603, after T steps, generating a final image and returning to the final generated image。