CN120215281B

Movatterモバイル変換

Info

Publication number: CN120215281B
Application number: CN202510677737.5A
Authority: CN
Inventors: 刘英莉; 熊正; 杨玲; 沈韬; 袁海滨
Original assignee: Yunnan Tin Industry Co ltd; Kunming University of Science and Technology
Current assignee: Yunnan Tin Industry Co ltd; Kunming University of Science and Technology
Priority date: 2025-05-26
Filing date: 2025-05-26
Publication date: 2025-08-26
Anticipated expiration: 2045-05-26
Also published as: CN120215281A

Abstract

The invention relates to a dynamic optimization control method for a tin smelting process, and belongs to the technical field of intersection of metallurgical engineering and artificial intelligence. Firstly, collecting data, preprocessing, removing abnormal values, generating missing values by cGAN to make up for data blank, then modeling key state variables in the tin smelting process by using an LSTM model, capturing complex time sequence change rules, and embedding metallurgical knowledge as constraint. Next, a data-driven based state update model is added to the reinforcement learning environment. Finally, a deep reinforcement learning algorithm is introduced, the operation parameters of the spray gun are used as an action space, and the state in the tin smelting process is used as a state space. In the bonus function design, a prediction result of key parameters is integrated, and a dynamic optimization control strategy is realized through intelligent agent learning. The method has remarkable advantages in the aspects of dynamically optimizing the tin purity and reducing the CO emission, and provides an innovative solution for the intelligent and efficient production of the metallurgical industry.

Description

Dynamic optimization control method for tin smelting process

Technical Field

The invention relates to a dynamic optimization control method for a tin smelting process, and belongs to the technical field of intersection of metallurgical engineering and artificial intelligence.

Background

Tin smelting is an important process in nonferrous metal smelting, and has complex physicochemical reaction and multivariable coupling characteristics. Conventional tin smelting control methods generally rely on empirical rules and static optimization strategies, and are difficult to cope with dynamic changes and nonlinear complex relationships within the furnace. The method is faced with a plurality of challenges in practical application, such as difficult accurate prediction of dynamic changes of key indexes (such as tin purity, CO emission concentration and the like) in a smelting process, difficult real-time optimization of a control strategy, and high energy consumption, large emission, fluctuation of product quality and the like.

In recent years, with the rapid development of intelligent manufacturing technology, deep learning and reinforcement learning provide a new solution for intelligent optimization of a tin smelting process. The excellent performance of the cyclic neural network and an improved model (such as a long-short-term memory network) thereof in time sequence modeling enables accurate prediction of key parameters in a tin smelting process, and a deep reinforcement learning technology provides a theoretical basis for construction of a dynamic optimization control strategy. However, relying solely on data-driven machine learning models, it is easy to ignore physical constraints such as stoichiometry, energy balance, etc. in metallurgical processes, which may lead to insufficient reliability of the prediction and optimization results.

Therefore, the dynamic optimization control method integrating data driving and metallurgical knowledge is researched, the data potential in the tin smelting process can be fully mined, and meanwhile, field knowledge is embedded to enhance the interpretability and accuracy of the model. By constructing a reinforcement learning environment, taking the operating parameters of the spray gun as an action space, taking the key state of the tin smelting process as a state space, designing a reward function capable of sensing the change of the process conditions in real time, and enabling an intelligent body to autonomously learn and dynamically optimize a control strategy, the improvement of tin purity, the reduction of CO emission and the minimization of energy consumption are realized. The technical background lays a theoretical and practical foundation for intelligent optimization of the tin smelting process.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a dynamic optimization control method for a tin smelting process, which can optimize operation parameters in the smelting process in real time so as to solve the problems.

The technical scheme of the invention is that the dynamic optimization control method for the tin smelting process comprises the steps of firstly, collecting relevant data from a tin smelting factory database, wherein the relevant data comprise state variables and spray gun operation parameters at each moment in the smelting process. And secondly, preprocessing the data, removing abnormal values recorded by the sensor, and generating key variables of low-frequency measurement by using a Conditional GAN (cGAN) to make up for the data blank. Then, a Long Short-Term Memory (LSTM) model is utilized to model key state variables (such as tin purity, CO concentration and the like) in the tin smelting process, a time sequence change rule is captured, metallurgical knowledge (such as oxygen-coal stoichiometric ratio, oxygen flow change and the like) is embedded as constraint, and accuracy and reliability of model prediction are improved. In the reinforcement learning environment construction, a data-driven state update model is adopted, key state variables and operating parameters of the spray gun are used as input, the state at the next moment is predicted through a deep learning model, and the reliability of state transition is improved by combining metallurgical knowledge. Finally, a deep reinforcement learning algorithm is introduced, the operation parameters of the spray gun are used as an action space, the state of tin smelting is used as a state space, and the optimization targets of tin purity improvement and CO emission reduction are reflected by designing a reward function. The agent is led to learn an optimal strategy that can guide the agent to generate optimal lance operating advice under different environmental conditions.

The method comprises the following specific steps:

Step1, collecting relevant parameter data in the tin smelting process, wherein the relevant parameter data comprise state variable values and spray gun operation parameter values at each moment in a smelting furnace, and providing a comprehensive data base for subsequent analysis and modeling;

step2, preprocessing the related parameter data, deleting the abnormal value recorded by the sensor, and generating a variable with the measurement frequency lower than a preset threshold value by cGAN so as to compensate for the data vacancy and improve the data integrity;

step3, modeling the tin purity and the CO concentration in the tin smelting process by adopting an improved LSTM model, capturing a time sequence change rule of a state variable value and a spray gun operation parameter value, and improving the accuracy and reliability of prediction;

Step4, constructing a reinforcement learning environment, adopting a data-driven state updating model, taking a state variable value and a spray gun operation parameter value in a tin smelting process as input, predicting a state value at the next moment by using a deep learning model, and enhancing the reliability of state transition and the actual performance of an environment model by fusing metallurgical knowledge constraint;

Step5, designing a reward function, guiding the reinforcement learning agent to learn in the direction of tin purity improvement and CO emission reduction based on the captured time sequence change rule, and feeding back the decision effect of the agent in real time by the reward function to provide effective guidance for the optimization process;

step6, introducing a deep reinforcement learning (Deep Reinforcement Learning, DRL) algorithm, designing the operation parameters of the spray gun as an action space, designing the state in the tin smelting process as a state space, performing interactive learning on the intelligent agent and the state updating model, and continuously optimizing a control strategy to realize a dynamic optimization target, wherein the strategy can guide the intelligent agent to generate the optimal spray gun operation proposal under different environmental states.

The Step2 specifically comprises the following steps:

step2.1, setting threshold values of various variables according to the measuring range of the sensor and the experience of a process expert, and directly eliminating abnormal values exceeding the threshold values to ensure the accuracy and reliability of the follow-up module data;

Step2.2, carrying out data division, taking a complete high-frequency measured variable as a condition input of a model, taking a low-frequency measured variable as a target generation variable, and carrying out missing value generation by utilizing cGAN models, wherein the high-frequency measured variable is taken as the condition input into a generator and a discriminator to provide additional constraint information, so that the data integrity and consistency are improved.

The Step3 specifically comprises the following steps:

Step3.1, extracting core variables reflecting dynamic changes in the furnace based on the existing industrial data in the tin smelting process, generating a new feature sequence of the smelting process through mathematical transformation, wherein the features not only enrich data dimension, but also provide more meaningful input information for subsequent modeling;

embedding physical constraint in the metallurgical field in a loss function of an original LSTM model, guiding the LSTM model to learn a predicted result conforming to an actual process rule through constraint penalty items, and improving scientificity and interpretability of the predicted result;

step3.3, adopting an improved LSTM model to conduct predictive training on index variables in the tin smelting process, capturing complex time dependency relations through a gating mechanism, and accurately modeling dynamic changes among the variables;

And step3.4, after training, verifying the prediction performance of the model by using an independent test data set, evaluating the prediction accuracy of the tin purity and the CO concentration, and storing LSTM model parameters to ensure the applicability of the model in industrial scenes.

The Step4 specifically comprises the following steps:

Step4.1, acquiring state changes of continuous time steps through time sequence data, wherein the state changes comprise state variables and action variables, and sorting the data into a triplet form:

In the formula,A state variable representing the current time of day,An action variable representing the current time of day,Representing the state variable at the next moment, and laying a foundation for training a subsequent state update model by the data format;

Step4.2, constructing a state update model of the fully connected neural network, and carrying out current stateAnd current actionsAs input, predict the next time stateTraining a state updating model by utilizing the constructed triplet data;

Step4.3, evaluating the performance of the state update model on the verification set, storing the state update model parameters meeting the preset index requirements, and providing a reliable state update mechanism for the subsequent embedded reinforcement learning environment;

And step4.4, in the reinforcement learning environment, the trained state is used for updating model parameters, the reinforcement learning environment state at the next moment is predicted through the current environment and the current executed action, the intelligent agent is guided to perform action selection, the description capability of the environment on complex dynamic changes is enhanced through the embedding process, more real state feedback is provided for the intelligent agent, the optimization action selection is guided, and the high-efficiency control on the tin smelting process is realized.

The Step5 specifically comprises the following steps:

Step5.1, setting optimization targets, namely improving the tin purity and reducing the CO emission. Simultaneously, auxiliary targets are set, including reducing energy consumption and maintaining stability of key process parameters such as furnace pressure, so that multi-target cooperative optimization is realized, and efficient operation of a tin smelting process is ensured;

Step5.2, predicting the influence of the adjustment of the current process parameters on tin purity and CO emission in a certain preset time period in the future by using the stored LSTM model parameters, and using a prediction result as a basis for calculating a reward function so as to accurately reflect the contribution of an optimization strategy to a long-term target;

step5.3, designing a reward function, wherein the formula is as follows:

()

where R is the bonus function,、、For the weight coefficients, for balancing the importance of multiple targets,Is the purity content of the tin at the current moment,Is the carbon monoxide content at the current moment,The purity content of the tin at the next moment,Is the content of carbon monoxide at the next moment,Is the energy consumption value at the next moment;

In order to enhance the instantaneity and the robustness, the reward function combines a key parameter prediction result based on an LSTM model in design, and a long-term optimization target of a tin smelting process is decomposed into a plurality of short-term optimization sub-targets so as to more accurately reflect the influence of current process adjustment on a future state, thereby effectively guiding an intelligent agent to learn and continuously optimizing a control strategy.

The Step6 specifically comprises the following steps:

Regulating and controlling operation parameters of a spray gun based on DDPG algorithm, defining operation executed by the spray gun in the smelting furnace as an action space, and defining ten core variables including concentration total amount, molten pool accumulation, oxygen content percentage, furnace bottom middle temperature, furnace bottom outer temperature, furnace bottom inner temperature, furnace raised temperature, furnace pressure, waste gas CO analysis and total energy consumption in the smelting furnace as a state space;

Step6.2, predicting optimal actions based on the current state by using an Actor network in DDPG algorithm, for adjusting the operation parameters of the spray gun, and evaluating the Q value of the current strategy by using a Critic network, for guiding the Actor to optimize;

step6.3, storing interaction data of the reinforcement learning agent and the furnace environment in an experience pool, randomly sampling a plurality of furnace states and gun operation parameter data from the interaction data to train, breaking time correlation, enabling the reinforcement learning agent to learn the operation experience which is not available in the previous continuous furnace period data, and improving training stability and generalization capability of the model;

Step6.4, enhancing the exploration capability of the intelligent body to the action space by adding noise, avoiding sinking into local optimum, and simulating the fault condition of a factory to complete model training;

Step6.5, utilizing the trained model to realize intelligent dynamic optimization control in the tin smelting process, providing optimization suggestions for a spray gun operator by combining a model optimization result through real-time monitoring of state variables, and dynamically adjusting operation parameters, thereby improving tin purity, reducing CO emission and realizing process energy efficiency maximization.

The step3.2 specifically comprises the following steps:

adding metallurgical knowledge constraint in a loss function of the LSTM model, wherein a constraint formula is as follows:

Wherein, theIs the total loss value of the total loss,Is the predicted loss value of the LSTM model,Is a constant value, and is a function of the constant,The function needs to be combined with metallurgical knowledge, and the specific formula is as follows:

Wherein, theIs a constant value, and is a function of the constant,The difference between the predicted value and the theoretical value, the theoretical value of CO is calculated as follows:

In the formula,The flow rate of the combustion coal is indicated,Is a constant related to the calorific value of fuel coal and the combustion efficiency,Is the flow rate of the oxygen gas,Is the theoretical stoichiometric ratio of the burning coal.

The beneficial effects of the invention are as follows:

(1) Dynamic optimization control capability compared with the existing static optimization method, the method realizes the dynamic optimization control of the tin smelting process by combining a deep reinforcement learning algorithm and a data driving model. The intelligent agent can monitor the state in real time and autonomously optimize the operation strategy of the spray gun to cope with complex process changes;

(2) Different from the traditional data driving model, the invention introduces metallurgical physical constraints such as oxygen coal stoichiometric ratio and the like, combines an improved cyclic neural network (RNN), accurately captures the dynamic change rule of key parameters, and improves the prediction accuracy and reliability;

(3) The invention adopts a data-driven state update model to accurately reflect the dynamic characteristics of tin smelting, provides a reliable state transition mechanism for reinforcement learning, avoids failure of an optimization strategy caused by accumulation of environmental errors, and improves learning efficiency and optimization performance.

Drawings

FIG. 1 is an overall frame diagram of the present invention;

FIG. 2 is a graph showing the comparison of predicted and actual values of the total amount of concentration in accordance with an embodiment of the present invention;

FIG. 3 is a graph of predicted versus actual values accumulated for a puddle in accordance with an embodiment of the present invention;

FIG. 4 is a graph showing the comparison of predicted and actual values of the percentage of oxygen content in accordance with an embodiment of the present invention;

FIG. 5 is a graph showing the comparison of predicted and actual values of the temperature in the middle of the furnace bottom according to an embodiment of the present invention;

FIG. 6 is a graph showing the comparison of predicted values and actual values of the outside temperature of the furnace bottom according to the embodiment of the present invention;

FIG. 7 is a graph showing the comparison of predicted values and actual values of the internal furnace bottom temperature according to the embodiment of the present invention;

FIG. 8 is a graph of predicted versus actual values of elevated furnace temperatures in accordance with an embodiment of the present invention;

FIG. 9 is a graph showing the comparison of predicted and actual values of the furnace pressure according to an embodiment of the present invention;

FIG. 10 is a graph showing the comparison of predicted and actual values of CO content in exhaust gas according to an embodiment of the present invention;

FIG. 11 is a graph showing the predicted value versus the actual value of the total energy consumption according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and detailed description.

Embodiment 1 As shown in FIG. 1, the method for dynamically optimizing and controlling the tin smelting process comprises the following specific steps:

Step1, collecting relevant parameter data in the tin smelting process, wherein the relevant parameter data comprise state variable values and spray gun operation parameter values at each moment in a smelting furnace, and providing a comprehensive data base for subsequent analysis and modeling.

Specifically, in order to verify the real-time dynamic optimization effect of the model in the tin smelting process, the embodiment evaluates the effects of tin purity improvement, CO emission reduction and energy consumption optimization by simulating interaction between the reinforcement learning model and the industrial production environment based on actual operation data of a certain tin smelting factory. Smelting data for 30 consecutive days was extracted from the data system of a tin smelting plant Ausmel furnace with a time resolution of 1 minute. Data size-total 43200 records containing 10 state variables and 8 operating parameters. Specifically, the results are shown in tables 1 and 2.

Table 1 action variables performed by the lance in tin factory data

Action variable name	Unit (B)
		Total material flow	Ton/hr
Fuel coal flow rate	Kg/h
		Coal-carrying air flow	Standard cubic meter/h
Coal-carrying wind pressure	Kilopascals
		Spray gun back pressure	Kilopascals
Gun position	Millimeter (mm)
		Oxygen flow rate	Standard cubic meter/hour
Air flow rate	Standard cubic meter/h

TABLE 2 State variables for Ausmel furnaces in tin factory data

State variable name	Unit (B)
		Concentrating the total amount	Ton (ton)
Pool accumulation	Ton (ton)
		Oxygen concentration	Percentage of
Middle temperature of furnace bottom	Temperature (temperature)
		External temperature of furnace bottom	Temperature (temperature)
Internal temperature of furnace bottom	Temperature (temperature)
		Furnace temperature rise	Temperature (temperature)
Internal pressure of furnace	Handkerchief
		CO content of exhaust gas	At a level of per million
Energy consumption	Kilowatt (kilowatt)

Step2, preprocessing the related parameter data, deleting the abnormal value recorded by the sensor, and generating a variable with the measurement frequency lower than a preset threshold value by cGAN, so that the data vacancy is filled and the data integrity is improved.

Step2.2, carrying out data division, taking a complete high-frequency measurement variable (such as oxygen flow, furnace pressure and the like) as a condition input of a model, taking a low-frequency measurement variable (such as tin purity and the like) as a target generation variable, and carrying out missing value generation by utilizing a cGAN model, wherein the high-frequency measurement variable is taken as a condition input into a generator and a discriminator to provide additional constraint information, so that the data integrity and consistency are improved.

Step3, modeling the tin purity and the CO concentration in the tin smelting process by adopting an improved LSTM model, capturing a time sequence change rule of a state variable value and a spray gun operation parameter value, and improving the accuracy and reliability of prediction.

Step3.1, extracting core variables (such as oxygen flow, fuel coal flow, furnace pressure and the like) reflecting dynamic changes in the furnace based on the existing industrial data in the tin smelting process, generating a new feature sequence of the smelting process through mathematical transformation, wherein the features not only enrich data dimension, but also provide more meaningful input information for subsequent modeling;

The new characteristic list comprises oxygen flow rate variation, combustion coal flow rate variation and stoichiometric ratio of oxygen and coal, wherein the oxygen flow rate variation and the combustion coal flow rate variation help model capture time sequence dynamic and nonlinear relation of combustion reaction in a furnace, the stoichiometric ratio of oxygen and coal reflects supply and demand relation of oxygen and fuel, and a specific calculation formula is as follows:

In the formula,Indicating the amount of change in the flow rate of oxygen,The oxygen flow rate for the current time step is indicated,An oxygen flow rate representing a previous time step;

In the formula,The flow rate variation of the combustion coal is shown,Representing the flow of combustion coal representing the current time step,Representing the flow rate of the combustion coal in the previous time step;

In the formula,Represents the stoichiometric ratio of the oxycoal,Indicating the flow rate of oxygen gas,Indicating the purity of the oxygen gas,Indicating the density of the oxygen gas,The flow rate of the combustion coal is indicated,Indicating the purity of the combustion coal.

Embedding physical constraint in the metallurgical field in a loss function of an original LSTM model, guiding the LSTM model to learn a prediction result conforming to an actual process rule through constraint penalty items (such as the stoichiometric ratio deviation of oxygen coal), and improving the scientificity and the interpretability of the prediction result;

specifically, metallurgical knowledge constraint is added into the loss function of the LSTM model, and the constraint formula is as follows:

Step3.3, adopting an improved LSTM model to conduct predictive training on index variables (such as tin purity, CO concentration and the like) in the tin smelting process, capturing a complex time dependence by a gating mechanism, and accurately modeling dynamic changes among the variables;

Step4, constructing a reinforcement learning environment, adopting a data-driven state updating model, taking a state variable value and a spray gun operation parameter value in the tin smelting process as input, predicting a state value at the next moment by using a deep learning model, and enhancing the reliability of state transition and the actual performance of an environment model by fusing metallurgical knowledge constraint.

Step4.3, evaluating the performance of the state updating model on a verification set, storing state updating model parameters meeting preset index requirements, providing a reliable state updating mechanism for a subsequent embedded reinforcement learning environment, wherein the state updating effect is shown as a graph in fig. 2-11, and the predicted values of the reinforcement learning state updating model for ten different states are compared with the actual values, the ten states are specifically concentrated total, the molten pool is accumulated, the oxygen content percentage, the middle furnace bottom temperature, the outer furnace bottom temperature, the inner furnace bottom temperature, the furnace raised temperature, the furnace pressure, the waste gas CO content and the total energy consumption, wherein the predicted values represent the states of the next moment predicted by the states and actions at the current moment, and the actual values represent the actual states at the next moment;

Step5, designing a reward function, guiding the reinforcement learning agent to learn in the direction of tin purity improvement and CO emission reduction based on the captured time sequence change rule, and feeding back the decision effect of the agent in real time by the reward function, thereby providing effective guidance for the optimization process.

Step5.2, predicting the influence of the adjustment of the current process parameters on the tin purity and CO emission in the future 3 minutes by using the stored LSTM model parameters, and using the prediction result as a basis for calculating a reward function so as to accurately reflect the contribution of the optimization strategy to a long-term target;

step5.3, designing a reward function, wherein the formula is as follows:

()

where R is the bonus function,、、For the weight coefficients, for balancing the importance of multiple targets,Is the purity content of the tin at the current moment,Is the carbon monoxide content at the current moment,The purity content of the tin at the next moment,Is the content of carbon monoxide at the next moment,Is the energy consumption value at the next moment.

Further, when the LSTM model predictions show that adjustment of the current operating parameters can effectively increase tin purity and decrease CO concentration, the reward function will give positive excitation, and conversely, if the operation results in an increase in CO concentration or a decrease in tin purity, the reward function will give negative feedback.

Regulating and controlling operation parameters of a spray gun based on DDPG algorithm, defining operation (such as oxygen flow, coal loading wind pressure and the like) executed by the spray gun in a smelting furnace as an action space, and defining ten core variables including concentration total amount, bath accumulation, oxygen content percentage, furnace bottom middle temperature, furnace bottom outside temperature, furnace bottom inside temperature, furnace raised temperature, furnace pressure, waste gas CO analysis and total energy consumption in the smelting furnace as a state space;

Step7, verifying the performance of the reinforcement learning model on the test data set, and evaluating the effects of improving the tin purity, reducing the CO emission and optimizing the energy consumption. The experimental results were as follows, the current furnace state, total concentration of 5.777605 tons, accumulation of molten pool: 5.4335 tons, oxygen concentration: 1.2%, furnace bottom middle temperature: 594.0 temperature units, furnace bottom outside temperature: 518.0 temperature units, furnace bottom inside temperature: 518.0 temperature units, furnace temperature rise temperature: 740.0 temperature units, furnace internal pressure: 3.490 Pa, exhaust gas CO content: 2.000 ppm (content per million), energy consumption: 1.780 kw. The recommended actions of the algorithm are 86.5 tons/hr of total material flow, 9960 kg/hr of fuel coal flow, 0.106 standard cubic meter/hr of coal-carrying air flow, 32.5 kilopascals of coal-carrying air pressure, 410 kilopascals of spray gun back pressure, 7031 millimeters of gun position, 17832 standard cubic meters/hr of oxygen flow, and 17832 standard cubic meters/hr of air flow. When the intelligent agent executes the operation parameters of the spray gun according to the optimal strategy, the tin purity is averagely improved by 1.5%, and the CO emission concentration is reduced by 5%.

While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.