Disclosure of Invention
The invention aims to provide a flexible job shop scheduling optimization method with robot constraint, which solves the problems that the robot constraint cannot be effectively processed, the solution space is limited and the searching efficiency is low in the conventional shop scheduling, and achieves the purposes of effectively reducing the influence of the loading and unloading time of a robot on the production efficiency and improving the equipment utilization rate by optimizing a scheduling scheme.
The invention provides a flexible job shop scheduling optimization method with robot constraint, which is characterized by comprising the following steps,
Step 1, initializing parameters, and setting an evolution guidance population scale Np1, a knowledge driving population scale Np2 and a total running time t;
step 2, initializing a population, and randomly generating an initial population, wherein the initial population comprises an evolutionary guide population and a knowledge driven population;
Step 3, the population evolves, the evolution guiding population evolves by using individual competition strategy, self-evolution and co-evolution, and the knowledge driving population evolves by using individual pairing strategy and DQN evolution;
step 4, updating the population, namely combining the evolutionarily guided population and the knowledge driven population into a population, and updating the evolutionarily guided population and the knowledge driven population by utilizing the combined population;
5, CP auxiliary optimization, if an optimization condition is met, constructing a CP model, taking an individual with the largest completion time after population updating as an initial solution in the CP model, further optimizing by utilizing the global searching capability of the CP model, and if the optimization condition is not met, returning to the step 3, wherein the optimization condition is that the running time after population updating is equal to half of the running total time t;
and 6, checking a termination condition, and outputting a final solution if the termination condition is met, wherein the termination condition is that the running time after the CP auxiliary optimization is equal to the running total time t.
Further, in step 2, the population initialization is performed by cycling the individuals, including iterating the evolutionary guide population through cycling from 0 to Np1, iterating the knowledge-driven population through cycling from 0 to Np2, creating an individual per iteration, the individuals are composed of two vectors, a process ordering vector and a machine selection vector, initializing the process ordering vector, wherein the length of the process ordering vector is the total number of processes, the processes in the process ordering vector are randomly generated within the range of [0, N-1], N is the total number of workpieces, the number of times each workpiece appears in the process ordering vector is consistent with the number of processes contained in the workpiece, initializing the machine selection vector, the length of the machine selection vector is the total number of processes, randomly generating machines in the machine selection vector within the range of [0, M-1], and M is the total number of machines, if the processing time of the processes on the selected machines is 0, continuing to randomly generate the machines in the range of [0, M-1], until the processing time of the processes on the selected machines is not 0, after the initialization is completed, adding the generated number of Np1 to the individual population, and driving the individual populationp2.
Further, in step 3, the individual competition strategy is to classify the individuals in the evolutionary guide population according to the maximum completion time, divide the evolutionary guide population into a winner population and a loser population, the winner population and the loser population are Np/2 in scale, the individuals in the evolutionary guide population are ordered from small to large according to the maximum completion time, the first Np1/2 number of individuals are distributed to the winner population, the later Np1/2 number of individuals are distributed to the loser population, each individual in the winner population randomly performs self-evolution by using one of an exchange operator, an inversion operator and a reassignment operator, wherein the exchange operator and the inversion operator are applied to process ordering vectors, the reassignment operator is applied to machine selecting vectors, the winner population and the loser population cooperatively evolve by using a crossover operator, the crossover operator comprises a priority process crossover operator and a uniform crossover operator, each individual is randomly selected from the winner population and the loser population, the process ordering vectors of the two selected individuals use the priority crossover operator, the machine selecting vectors uses the uniform crossover operator, each individual crossover operator and each individual ordering vector performs self-evolution by each individual ordering vector, and all the process selecting the individual vectors.
Further, the operations of the swap operator, the invert operator, the reassign operator, the priority process crossover operator and the uniform crossover operator are as follows,
The operation process of the exchange operator is that two different positions rand1 and rand2 are randomly selected from the individual process sequence vectors, and the processes on the selected positions rand1 and rand2 are exchanged;
The operation process of the reversing operator is that two different positions rand1 and rand2 are randomly selected from the individual process sequence vectors, the position rand1 is at least 3 process intervals before the position rand2, the process sequence between the position rand1 and the position rand2 in the process sequence vectors is reversed, the midpoint position (rand1+rand2)/2 of a reversing interval is found, and the processes with the midpoint position of the reversing interval as symmetry are sequentially exchanged;
the operation process of the reassignment operator is that a position is randomly selected in the machine selection vector, and a machine is randomly selected from a selectable machine set of the procedure corresponding to the selected position to replace the original machine;
The operation process of the priority procedure crossing operator is that procedure ordering vectors of an individual pop1 and an individual pop2 are obtained, wherein the procedure ordering vectors are expressed as procedure sequences, a number num between [1, n-1] is randomly generated, n is expressed as the total number of workpieces, a num-sized workpiece set I0 is generated, num different workpieces are randomly selected from all the workpieces to be filled into the workpiece set I0, each position of the procedure sequences of the pop1 and the pop2 is traversed, and if a procedure belonging to the workpiece set I0 is found, corresponding procedures in the pop1 and the pop2 are sequentially exchanged;
The operation of the uniform crossover operator is to obtain machine selection vectors of the individual pops1 and2, wherein the machine selection vectors are represented as machine sequences, generate a number set B of size N, N represents the total number of processes, each position in the number set B is randomly filled with 0 or 1,0 and 1 is a flag for determining whether machines at corresponding positions in the machine sequences in pops1 and2 are swapped, wherein 0 represents swap, 1 represents no swap, N iterations are performed, each position of the machine sequences in pops1 and pop2 is sequentially traversed in order from the first position to the last position, and numbers at corresponding positions in the number set B are synchronously traversed, and if the number at the current position of the number set B is 0, machines at current positions on the machine sequences in pops1 and2 are swapped.
Further, in step 3, the individual pairing strategy is that all paired individual pairs are generated from the knowledge-driven population, each paired individual pair is composed of two different individuals, all individual pairs are combined together to form a paired population, DQN evolution is applied to the paired population, DQN selects a search operator for minimizing the maximum completion time according to the combined information of the process sorting vector and the machine selection vector of the individual pairs in the current paired population, the search operator comprises an exchange operator, an inversion operator, a reassignment operator, a job-based crossover operator, a two-point crossover operator and a multi-point crossover operator, wherein the exchange operator, the inversion operator and the job-based crossover operator are applied to the process sorting vector, and the reassignment operator, the two-point crossover operator and the multi-point crossover operator are applied to the machine selection vector.
Further, the operation steps of the operation-based crossover operator, the two-point crossover operator and the multi-point crossover operator are as follows,
The operation process of the crossover operator based on the operation is that the process sequencing vectors of the individual pops1 and the individual pops2 are obtained, wherein the process sequencing vectors are expressed as process sequences, a list C1 and a list C2 are initialized, a list C1 is used for storing the process sequences of the crossed pops1, a list C2 is used for storing the process sequences of the crossed pops2, all workpieces are randomly divided into two different sets and respectively marked as a workpiece set I1 and a workpiece set I2, each position of the process sequences of the pops2 and the pops2 is traversed, if the current process of the pops2 belongs to the I2, the process is copied to the corresponding position of the list C2, if the current process of the pops2 belongs to the I2, the process sequences of the pops2 are sequentially copied to the corresponding position of the list C2 according to the sequence in the order of the pops2, and for the processes of the pops2 which do not belong to the I2 are sequentially copied to the list2, the process sequences of the non-occupied position of the pops2 are sequentially copied to the list2 according to the sequence of the non-processing sequences of the pops2;
The two-point crossover operator is operated by obtaining machine selection vectors of individual pop1 and pop2, wherein the machine selection vectors are represented as machine sequences, initializing list C1 and list C2,C1 for storing machine sequences of post-crossover pop1, C2 for storing machine sequences of post-crossover pop2, randomly generating position rand1 and position rand2 on the machine sequences, position rand1 being at least 1 machine interval before position rand2, copying machines on machine sequence interval [ rand1,rand2 ] in pop1 to corresponding positions in list C1,rand2, copying machines on machine sequence interval [ rand1,rand2 ] in pop1,rand2 to corresponding positions in list C1,rand2, copying machines on machine sequence intervals [0, rand1,rand2 ] and [ rand1,rand2, N-1] to corresponding positions in list C1,rand2, copying machines on machine sequence intervals [ rand1,rand2 ] and [ rand1,rand2, assigning numbers to the total number of machines in list C1,rand2, and assigning numbers to the post-crossover machines in list C1,rand2;
The operation of the multipoint crossover operator is to obtain machine selection vectors for individual pops1 and2, where the machine selection vectors are expressed as a sequence of machines, the number num between random generation [0, N-1], N is the total number of processes, pops1 and pop2 are exchanged for machines in num times, one machine location is randomly selected at a time, and machines in selected positions of the sequence of machines in pops1 and pop2 are exchanged.
Further, in step 4, the specific process of population updating includes that the evolved leading population and the knowledge driving population are combined into a population, individuals in the combined population are ordered from small to large according to the maximum finishing time, all individuals in the evolved leading population and the knowledge driving population are removed, the first Np1 of individuals in the combined population are sequentially added into the evolved leading population, and the first Np2 of individuals in the combined population are sequentially added into the knowledge driving population.
Further, in step 5, the CP model contains the following constraint set,
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
The variables of the CP model comprise, I representing workpiece indexes; J represents the process index; A process set representing a workpiece i, L representing an index of a loading or unloading task, L representing a set of loading or unloading tasks, K representing a machine index, K representing a set of all machines, R representing a robot index, R representing a set of all robots, N representing a total number of workpieces, N representing a total number of processes, Oi,j representing a jth process of the workpiece i; a set of machines representing a processable process Oi,j; As a continuous decision variable for maximum finishing time,Is the first of the work pieces iInterval variables of task unloading of the individual procedures; As the interval variable of the process Oi,j,An optional process variable on machine k for process step Oi,j; The interval variable for loading or unloading a task in the process Oi,j is represented by a loading task when l=1, an unloading task when l=2,Optional interval variables for the loading or unloading tasks of process Oi,j on machine k; As an interval variable for the off-load task of process Oi,j,An interval variable for the loading task of process Oi,j+1; Interval variables for loading or unloading tasks integrating process Oi,j, and optional interval variables for processing on machine k; Decision variables for machine sequences, including assigned optional interval variables;Decision variables for robot sequences including assigned selectable interval variables;Decision variables for machine sequences, including assigned optional interval variablesAnd;An interval variable for the loading task of process Oi,j; Interval variable for task off-load for process Oi,j, constraint set (1) represents the goal of minimizing maximum completion timeFunction ofReturn interval variableEnd time of (2);
the constraint set (2) indicates that the process Oi,j can be performed on only one machine meeting the conditionsUpper working, functionRepresenting the variable in each intervalCan only select one optional interval variable;
The constraint set (3) indicates that the loading or unloading task/of the process Oi,j can be performed on only one machine which meets the conditionsUpper working, functionRepresenting the variable in each intervalCan only select one optional interval variable;
The constraint set (4) indicates that, for each step of the workpiece i, the loading task of step Oi,j+1 can be started only after the unloading task of step Oi,j preceding step Oi,j+1 is completed, a functionRepresenting variables between loading tasksVariable only between offload tasksStarting after finishing;
the constraint set (5) represents the constraint setAndThe two selectable interval variables form one selectable interval variable in time sequenceFunction ofThe function represents an optional inter-zone variableInvolving sets {,All of the current intervals in the },Is set {,Minimum start time of optional inter-zone variable in }, whileThe end time of (1) is set {,Maximum end time of optional inter-zone variable in };
the constraint set (6) represents that one process can only be processed on one machine, the functionOptional interval variable indicating the presence of process Oi, jIs the number of (3);
the constraint set (7) represents that the machine k can only process one procedure at the same time, and the functionRepresenting all the available selectable inter-zone variablesNon-overlapping;
the constraint set (8) indicates that the robot r can only execute one loading or unloading task at the same time, the functionRepresenting all the available selectable inter-zone variablesNon-overlapping;
The constraint set (9) indicates that the processing steps on machine k must be performed in sequence, a functionAll existing optional inter-zone variablesAndAre not overlapped with each other;
The constraint set (10) indicates that for each process of the workpiece i, the processing start time of the process Oi,j cannot be set to a function before the loading task of the process Oi,j is completedRepresenting interval variablesThe start time of (a) is not earlier than the interval variableEnd time of (2);
The constraint set (11) indicates that for each process of the workpiece i, the off-load task start time of process Oi,j cannot be before the processing end time of process Oi,j, a functionRepresenting interval variablesNot less than the interval variableEnd time of (2).
Compared with the prior art, the flexible job shop scheduling optimization method with robot constraint aims at minimizing the maximum finishing time, and has the following positive effects that (1) the invention provides a novel hybrid algorithm, namely an evolution algorithm (EA-DQN-CP) taking a depth Q network into consideration, assisted by a constraint planning model, and the evolution algorithm (EA-DQN) assisted by the depth Q network is combined with the Constraint Planning (CP) model, so that a high-quality solution is obtained by utilizing the EA-DQN at high efficiency, then the solution space is further optimized by means of the CP model, the quality of the solution is improved, and the solution with smaller maximum finishing time is obtained;
(2) In the invention, an evolution guide population and a knowledge driven population are respectively constructed, the overall optimization of the population is realized by the evolution guide population through individual competition, self-evolution and co-evolution, the risk of sinking into local optimum is reduced by the knowledge driven population through individual pairing and DQN evolution, and meanwhile, 6 search operators are designed, the advantages of the operators are comprehensively utilized, and the maximum finishing time of the obtained solution is minimized;
(3) The invention adopts the CP model which is suitable for the scheduling problem of the flexible job shop with robot constraint, and compared with the MILP model, the CP model can more comprehensively explore the solution space, more comprehensively consider various constraint conditions in the problem and more effectively process the complex relation of the job shop scheduling;
In conclusion, the flexible job shop scheduling problem with robot constraint is effectively solved. Through designing an EA-DQN-CP algorithm, namely an evolution algorithm taking a depth Q network into consideration assisted by a constraint planning model, the working procedure sequence of workpieces, machine selection and loading and unloading task sequence of a robot are reasonably arranged, and the method has the positive effects of improving the utilization rate of the machine and further improving the resource utilization efficiency of the whole workshop production.
Detailed Description
As shown in fig. 1, the optimization method for flexible job shop scheduling with robot constraint provided by the invention is mainly realized through the following processes.
Step 1, initializing parameters, and setting an evolution guidance population scale Np1, a knowledge driving population scale Np2 and a total running time t.
And step 2, initializing a population, and randomly generating an initial population, wherein the initial population comprises an evolutionary guided population and a knowledge driven population. Specifically, the population initialization is performed by initializing individuals in a loop, including iterating the evolutionary lead population through the loop from 0 to Np1, iterating the knowledge-driven population through the loop from 0 to Np2, creating one individual per iteration, each individual being composed of two vectors, a process ordering vector and a machine selection vector, respectively, the process ordering vector and the machine selection vector of the individual being illustrated in fig. 2, Oi,j representing a j-th process of the workpiece i, a numerical sequence on the process ordering vector representing a process sequence of the processes, e.g., a first number 1 representing a first process O1,1 of the workpiece 1, a second number 2 representing a first process O2,1 of the workpiece 2, a third number 3 representing a first process O3,1 of the workpiece 3, a fourth number 1 representing a second process O1,2 of the workpiece 1, a number on the machine selection vector representing a machine selected to process each of the workpieces, e.g., a first process O1,1 of the workpiece 1 is processed on the machine 2, a second number 4 representing a first process O858 of the workpiece 1, a number 4 representing a second process O858 3 of the workpiece 2 is processed on the machine 533, and a fourth number 4 representing a number O of the machine 423 is processed on the fourth number 2. The method comprises the steps of initializing a process sequence vector, wherein the length of the process sequence vector is the total number of processes, the processes in the process sequence vector are randomly generated within the range of [0, N-1], N is the total number of workpieces, the number of times of each workpiece appearing in the process sequence vector is consistent with the number of processes contained in the workpiece, initializing a machine selection vector, the length of the machine selection vector is the total number of the processes, the machines in the machine selection vector are randomly generated within the range of [0, M-1], M is the number of headquarters, if the processing time of the processes on the selected machines is 0, the machines in the machine selection vector are continuously randomly generated within the range of [0, M-1], until the processing time of the processes on the selected machines is not 0, after the initialization of individuals is completed, adding the generated individuals of Np1 into an evolutionary guidance population, and adding the generated individuals of Np2 into a knowledge driving population.
And 3, carrying out population evolution, namely, carrying out evolution on the evolution guide population by using an individual competition strategy, self-evolution and co-evolution, and carrying out evolution on the knowledge driven population by using an individual pairing strategy and DQN evolution. Specifically, the individual competition strategy is to classify individuals in the evolution guide population according to the maximum finishing time, divide the evolution guide population into a winner population and a loser population, the scale of the winner population and the loser population is Np/2, order the individuals in the evolution guide population from small to large according to the maximum finishing time, the first Np1/2 number of individuals are allocated to the winner population, the later Np1/2 number of individuals are allocated to the loser population, each individual in the winner population randomly uses one of an exchange operator, an inversion operator and a reassignment operator to perform self-evolution, wherein the exchange operator and the inversion operator are applied to process order vectors, the reassignment operator is applied to machine selection vectors, the winner population and the loser population perform collaborative evolution by using a crossover operator, the crossover operator is an operation mode of collaborative evolution of the winner population and the loser population, the crossover operator comprises a priority process crossover operator and a uniform crossover operator, each individual is randomly selected from the winner population and the loser population, each individual selected by using the crossover operator, the crossover operator and the crossover operator is uniformly selected by the crossover operator, and the machine selection vector is shown in the cross-order diagram 3, and the whole evolution guide vector is completed. Specifically, the operation steps of the exchange operator, the inversion operator, the reassignment operator, the priority procedure crossing operator and the uniform crossing operator are as follows:
The operation process of the exchange operator is that two different positions rand1 and rand2 are randomly selected from the individual process sequence vectors, and the processes on the selected positions rand1 and rand2 are exchanged;
The operation process of the reversing operator is that two different positions rand1 and rand2 are randomly selected from the individual process sequence vectors, the position rand1 is at least 3 process intervals before the position rand2, the process sequence between the position rand1 and the position rand2 in the process sequence vectors is reversed, the midpoint position (rand1+rand2)/2 of a reversing interval is found, and the processes with the midpoint position of the reversing interval as symmetry are sequentially exchanged;
the operation process of the reassignment operator is that a position is randomly selected in the machine selection vector, and a machine is randomly selected from a selectable machine set of the procedure corresponding to the selected position to replace the original machine;
The operation process of the priority procedure crossing operator is that procedure ordering vectors of an individual pop1 and an individual pop2 are obtained, wherein the procedure ordering vectors are expressed as procedure sequences, a number num between [1, n-1] is randomly generated, n is expressed as the total number of workpieces, a num-sized workpiece set I0 is generated, num different workpieces are randomly selected from all the workpieces to be filled into the workpiece set I0, each position of the procedure sequences of the pop1 and the pop2 is traversed, and if a procedure belonging to the workpiece set I0 is found, corresponding procedures in the pop1 and the pop2 are sequentially exchanged;
The operation process of the uniform crossover operator is that machine selection vectors of an individual pop1 and an individual pop2 are obtained, wherein the machine selection vectors are expressed as machine sequences, a digit set B with the size of N is generated, N represents the total procedure number, each position in the digit set B is randomly filled with 0 or 1,0 and 1 are marks for determining whether machines at corresponding positions of the machine sequences in the pop1 and the pop2 are exchanged, wherein 0 represents exchange, 1 represents no exchange, N times of iteration are carried out, each position of the pop1 and the pop2 machine sequences is sequentially traversed according to the sequence from the first position to the last position, numbers at corresponding positions in the digit set B are synchronously traversed, and if the numbers at the current position of the digit set B are 0, the machines at the current traversing positions on the machine sequences in the pop1 and the pop2 are exchanged;
The method comprises the steps of generating all paired individual pairs from a knowledge-driven population, combining all individual pairs together to form the paired population, applying DQN evolution to the paired population, selecting a search operator for minimizing the maximum finishing time according to the combined information of a process sorting vector and a machine selection vector of the individual pairs in the current paired population, wherein the search operator is an operation mode of the DQN evolution, and comprises an exchange operator, an inversion operator, a reassignment operator, an operation-based crossover operator, a two-point crossover operator and a multi-point crossover operator, wherein the exchange operator, the inversion operator and the operation-based crossover operator are applied to the process sorting vector, the reassignment operator, the two-point crossover operator and the multi-point crossover operator are applied to the machine selection vector, and the evolution process diagram of the knowledge-driven population is shown in figure 4. Specifically, the operation steps of the operation-based crossover operator, the two-point crossover operator and the multi-point crossover operator are as follows:
The operation process of the crossover operator based on the operation is that the process sequencing vectors of the individual pops1 and the individual pops2 are obtained, wherein the process sequencing vectors are expressed as process sequences, a list C1 and a list C2 are initialized, a list C1 is used for storing the process sequences of the crossed pops1, a list C2 is used for storing the process sequences of the crossed pops2, all workpieces are randomly divided into two different sets and respectively marked as a workpiece set I1 and a workpiece set I2, each position of the process sequences of the pops2 and the pops2 is traversed, if the current process of the pops2 belongs to the I2, the process is copied to the corresponding position of the list C2, if the current process of the pops2 belongs to the I2, the process sequences of the pops2 are sequentially copied to the corresponding position of the list C2 according to the sequence in the order of the pops2, and for the processes of the pops2 which do not belong to the I2 are sequentially copied to the list2, the process sequences of the non-occupied position of the pops2 are sequentially copied to the list2 according to the sequence of the non-processing sequences of the pops2;
The two-point crossover operator is operated by obtaining machine selection vectors of individual pop1 and pop2, wherein the machine selection vectors are represented as machine sequences, initializing list C1 and list C2,C1 for storing machine sequences of post-crossover pop1, C2 for storing machine sequences of post-crossover pop2, randomly generating position rand1 and position rand2 on the machine sequences, position rand1 being at least 1 machine interval before position rand2, copying machines on machine sequence interval [ rand1,rand2 ] in pop1 to corresponding positions in list C1,rand2, copying machines on machine sequence interval [ rand1,rand2 ] in pop1,rand2 to corresponding positions in list C1,rand2, copying machines on machine sequence intervals [0, rand1,rand2 ] and [ rand1,rand2, N-1] to corresponding positions in list C1,rand2, copying machines on machine sequence intervals [ rand1,rand2 ] and [ rand1,rand2, assigning numbers to the total number of machines in list C1,rand2, and assigning numbers to the post-crossover machines in list C1,rand2;
The operation of the multipoint crossover operator is to obtain machine selection vectors for individual pops1 and2, where the machine selection vectors are expressed as a sequence of machines, the number num between random generation [0, N-1], N is the total number of processes, pops1 and pop2 are exchanged for machines in num times, one machine location is randomly selected at a time, and machines in selected positions of the sequence of machines in pops1 and pop2 are exchanged.
And 4, updating the population, namely combining the evolutionary guided population and the knowledge driven population into a population, updating the evolutionary guided population and the knowledge driven population by using the combined population, and taking the updated population as a result after EA-DQN. The method comprises the steps of combining an evolutionary guided population and a knowledge driven population into a population, sorting individuals in the combined population from small to large according to maximum finishing time, removing all individuals in the evolutionary guided population and the knowledge driven population, sequentially adding the first Np1 individuals in the combined population into the evolutionary guided population, and sequentially adding the first Np2 individuals in the combined population into the knowledge driven population.
And 5, CP auxiliary optimization, if the optimization condition is met, constructing a CP model, taking an individual with the largest completion time after population updating as an initial solution in the CP model, further optimizing by utilizing the global searching capability of the CP model, taking the CP auxiliary optimization as the result of EA-DQN-CP, and if the optimization condition is not met, returning to the step 3. The connection mechanism diagram of the CP model and EA-DQN is shown in FIG. 5. Wherein, the optimization condition is that the running time after population updating is equal to half of the running total time t, the CP model comprises the following constraint set,
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
The variables of the CP model comprise, I representing workpiece indexes; J represents the process index; A process set representing a workpiece i, L representing an index of a loading or unloading task, L representing a set of loading or unloading tasks, K representing a machine index, K representing a set of all machines, R representing a robot index, R representing a set of all robots, N representing a total number of workpieces, N representing a total number of processes, Oi,j representing a jth process of the workpiece i; a set of machines representing a processable process Oi,j; As a continuous decision variable for maximum finishing time,Is the first of the work pieces iInterval variables of task unloading of the individual procedures; As the interval variable of the process Oi,j,An optional process variable on machine k for process step Oi,j; The interval variable for loading or unloading a task in the process Oi,j is represented by a loading task when l=1, an unloading task when l=2,Optional interval variables for the loading or unloading tasks of process Oi,j on machine k; As an interval variable for the off-load task of process Oi,j,An interval variable for the loading task of process Oi,j+1; Interval variables for loading or unloading tasks integrating process Oi,j, and optional interval variables for processing on machine k; Decision variables for machine sequences, including assigned optional interval variables;Decision variables for robot sequences including assigned selectable interval variables;Decision variables for machine sequences, including assigned optional interval variablesAnd;An interval variable for the loading task of process Oi,j; Interval variable for unloading task of process Oi,j;
the constraint set (1) represents the goal of minimizing the maximum finishing timeFunction ofReturn interval variableEnd time of (2);
the constraint set (2) indicates that the process Oi,j can be performed on only one machine meeting the conditionsUpper working, functionRepresenting the variable in each intervalCan only select one optional interval variable;
The constraint set (3) indicates that the loading or unloading task/of the process Oi,j can be performed on only one machine which meets the conditionsUpper working, functionRepresenting the variable in each intervalCan only select one optional interval variable;
The constraint set (4) indicates that, for each step of the workpiece i, the loading task of step Oi,j+1 can be started only after the unloading task of step Oi,j preceding step Oi,j+1 is completed, a functionRepresenting variables between loading tasksVariable only between offload tasksStarting after finishing;
the constraint set (5) represents the constraint setAndThe two selectable interval variables form one selectable interval variable in time sequenceFunction ofThe function represents an optional inter-zone variableInvolving sets {,All of the current intervals in the },Is set {,Minimum start time of optional inter-zone variable in }, whileThe end time of (1) is set {,Maximum end time of optional inter-zone variable in };
the constraint set (6) represents that one process can only be processed on one machine, the functionOptional interval variable indicating the presence of process Oi, jIs the number of (3);
the constraint set (7) represents that the machine k can only process one procedure at the same time, and the functionRepresenting all the available selectable inter-zone variablesNon-overlapping;
the constraint set (8) indicates that the robot r can only execute one loading or unloading task at the same time, the functionRepresenting all the available selectable inter-zone variablesNon-overlapping:
The constraint set (9) indicates that the processing steps on machine k must be performed in sequence, a functionAll existing optional inter-zone variablesAndAre not overlapped with each other;
The constraint set (10) indicates that for each process of the workpiece i, the processing start time of the process Oi,j cannot be set to a function before the loading task of the process Oi,j is completedRepresenting interval variablesThe start time of (a) is not earlier than the interval variableEnd time of (2);
The constraint set (11) indicates that for each process of the workpiece i, the off-load task start time of process Oi,j cannot be before the processing end time of process Oi,j, a functionRepresenting interval variablesNot less than the interval variableEnd time of (2).
And 6, checking a termination condition, and outputting a final solution if the termination condition is met. The termination condition is that the running time after the CP auxiliary optimization is equal to the running total time t.
The invention will be further described by way of a specific example.
The present invention runs on a computer equipped with an Intel CoreTM i7-12700 processor and an NVIDIA T600 GPU. Encoded in the PyCharm 2024.1.1 environment using Python, CP and CPLEX solvers are provided by IBM CPLEX Studio IDE 12.7.1. The total run time t is set to 2N seconds (N is the total number of steps in the corresponding instance), and all comparison algorithms are performed 10 times on each instance.
The invention is based on the commonly used reference examples MFJS-MFJS and MK01-MK10 in FJSP, and the examples after robot restraint are named RMFJS-RMFJS and RMK01-RMK10. Each example contains two robots, each robot responsible for half of the machines, with a loading and unloading task time set to 5s. To verify the effectiveness of the present invention in a mass production environment, the number of workpieces in examples RMFJS-RMFJS 10 and RMK01-RMK10 was doubled, creating new examples RLMFJS-RLMFJS 10 and RLMK01-RLMK10.
First, the validity of the CP model is verified, and the comparison result of the existing mixed integer linear programming model (MILP model) and constraint programming model (CP model) is shown in table 1:
Table 1 Mixed integer Linear programming model and constraint programming model comparison results
;
In table 1, the numbers of binary decision variables, continuous decision variables, and constraints are denoted by "NB", "NC", and "NCT", respectively. In addition, "NV" represents the number of section decision variables in the CP model. "Gap" represents the optimal Gap, and a value of 0 indicates that the solution with the smallest maximum completion time is found. "Cmax" represents the solution found within the Time limit and "Time" represents the CPU Time. Specifically, if the solution with the smallest maximum completion Time is obtained, the "Time" does not exceed the Time limit, otherwise, is equal to the Time limit.
According to table 1, it is shown that as the scale of the examples increases, the values of NB, NC and NTC in the MILP model, and the values of NV and NCT in the CP model, all increase substantially. The MILP model found the minimum maximum time-to-finish solution for RMFJS-RMFJS 06, the feasible solutions for RMFJS 07-RMFJS 10 and RLMFJS01-RLMFJS 05. Because of the relatively large scale of other examples, the MILP model cannot find a viable solution within a defined time. The CP model also found the solution with the minimum maximum finishing time of RMFJS-06, which was superior in speed to the MILP model. For the rest of the examples, the CP model obtains a solution that is smaller than the maximum completion time of the MILP model. In summary, the CP model is superior to the MILP model.
In order to prove the effectiveness of the CP auxiliary optimization method, the evolutionary algorithm (EA-DQN-CP) taking the depth Q network into consideration assisted by the constraint planning model and the evolutionary algorithm (EA-DQN) assisted by the depth Q network are compared, and the comparison result is shown in the table 2:
TABLE 2 comparison of EA-DQN with EA-DQN-CP
;
For the EA-DQN-CP of the present invention, the optimal configuration of the eight important parameters is determined experimentally. These parameters include evolutionary lead population size Np1, knowledge driven population size Np2, discount factor γ, learning rate lr, exploration rate ε, target Q network update frequency Q, empirical playback buffer size D, and batch size BS. The present invention was performed 10 times on the RMK10 example for each parameter configuration, and it was determined from the experimental results that the optimal parameter configuration was Np1=500,Np2 =10, γ=0.9, lr=0.1, ε=0.95, q=15, d=512, and bs=32.
According to Table 2, it is shown that "Best" represents the minimum value of the maximum finishing time for 10 times of repeated execution of each instance, "AVG" represents the average value of the maximum finishing time for 10 times of repeated execution of each instance, "Mean" represents the average value of each column, and among 40 instances, EA-DQN-CP is better than EA-DQN for Best and AVG indicators, indicating that there is a significant difference between EA-DQN and EA-DQN-CP. In summary, the CP-assisted optimization method enhances the search capability of EA-DQN. By combining the EA-DQN and CP models, the EA-DQN-CP takes advantage of the EA-DQN and CP models to achieve superior performance.
Comparing the CP model, EA-DQN and EA-DQN-CP method provided by the invention with the existing literature algorithm IGA (improved genetic algorithm) and QABC (artificial bee colony algorithm based on Q learning), and the comparison results are shown in Table 3:
table 3 IGA, QABC, CP, EA-DQN, EA-DQN-CP comparison results
;
For QABC, the population size, iteration limit, learning rate, and discount factor were set to 80,20,0.8 and 0.1, respectively. For IGA, the population size was 300, the crossover probability was 0.8, the mutation probability was 0.1, and the diversity check algebra was 300. In table 3, "LB" represents the result of the minimum maximum finishing time in the algorithm, and "ALB" represents the result of the minimum maximum finishing time average in the algorithm. As shown in Table 3, among the 40 examples, EA-DQN gave 1 minimum in Best. CP gets 21 minima in Best and 22 minima in AVG. EA-DQN-CP achieves 36 minima in Best and 35 minima in AVG. Therefore, the CP model, EA-DQN and EA-DQN-CP methods proposed by the invention are superior to the existing literature algorithms QABC and IGA for Best and AVG indexes.
In summary, the CP model, EA-DQN, and EA-DQN-CP are efficient methods for solving FJSP-RC. The CP model explores the complete solution space through the advanced searching technology, the EA-DQN realizes learning and evolution by utilizing the advantages of EA and DQN algorithms, and the EA-DQN-CP comprehensively utilizes the advantages of the CP model, EA and DQN algorithms, thereby generating excellent performance.