Movatterモバイル変換


[0]ホーム

URL:


CN116208510B - A smart activation method for smart reflective surface elements based on deep reinforcement learning - Google Patents

A smart activation method for smart reflective surface elements based on deep reinforcement learning
Download PDF

Info

Publication number
CN116208510B
CN116208510BCN202211598212.5ACN202211598212ACN116208510BCN 116208510 BCN116208510 BCN 116208510BCN 202211598212 ACN202211598212 ACN 202211598212ACN 116208510 BCN116208510 BCN 116208510B
Authority
CN
China
Prior art keywords
network
intelligent
reflecting surface
reinforcement learning
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211598212.5A
Other languages
Chinese (zh)
Other versions
CN116208510A (en
Inventor
庞宇
昝世明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and TelecommunicationsfiledCriticalChongqing University of Post and Telecommunications
Priority to CN202211598212.5ApriorityCriticalpatent/CN116208510B/en
Publication of CN116208510ApublicationCriticalpatent/CN116208510A/en
Application grantedgrantedCritical
Publication of CN116208510BpublicationCriticalpatent/CN116208510B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses an intelligent reflector element intelligent activation method based on deep reinforcement learning, and belongs to the field of deep reinforcement learning and intelligent reflector auxiliary communication. The method comprises the steps of system model establishment and target problem establishment, markov decision model element setting, algorithm framework establishment and network structure design. The method comprises the steps of establishing a system model according to a considered communication scene and providing a solved target problem, wherein the setting of Markov decision elements mainly defines states, actions and rewarding functions related to the intelligent body of reinforcement learning in the process of interacting with the environment, adopts a deep reinforcement learning classical algorithm framework based on a lecturer-reviewer, utilizes an evaluation network to assist gradient update of the strategy network while outputting actions by the strategy network, and adjusts the strategy network and the network structure of the evaluation network to solve the problem of insufficient extraction of channel state information by a full-connection structure caused by introduction of an intelligent reflecting surface. The method reduces the iteration complexity of the traditional communication algorithm.

Description

Intelligent reflector element intelligent activation method based on deep reinforcement learning
Technical Field
The invention belongs to the field of deep reinforcement learning and intelligent reflector auxiliary communication, in particular to an intelligent reflector element intelligent activation method based on deep reinforcement learning,
Background
Smart reflectors are currently considered an important technology that may be applied in future 6G wireless communication systems and is expected to surpass large-scale multi-antenna technology. In general, smart reflective surfaces are typically composed of a large number of passive reflective elements, each capable of producing controllable amplitude and phase variations to an incident signal. By densely arranging a large number of intelligent reflecting surfaces in a wireless network and intelligently coordinating their reflecting angles, the signal transmission between a base station and user equipment can be flexibly reconfigured to achieve the intended objective, which provides a new means for fundamentally solving the problems of channel fading and interference under various factors and possibly realizing a quantum leap of wireless communication capacity and reliability.
The deep reinforcement learning algorithm effectively combines the decision making capability of reinforcement learning with the perception capability of deep learning, and achieves a series of milestone events in the field of artificial intelligence in recent years. Most of the current main researches on the intelligent reflecting surface aim at the phase optimization problem of the intelligent reflecting surface, but the problem of the activation of the intelligent reflecting surface element is also a considerable problem in consideration of the relatively large number of intelligent reflecting surface elements. On the premise of ensuring the communication rate, the energy efficiency of the intelligent reflecting surface can be greatly improved by ensuring the activation of the high-quality intelligent reflecting surface element. In order to solve the problem, the invention provides an intelligent reflector element activation method based on combination of deep reinforcement learning and a traditional communication algorithm, which can be well used for solving a continuous integer optimization problem.
CN112019249A is an intelligent reflecting surface regulation and control method and device based on deep reinforcement learning, wherein the method comprises the steps of generating a first action by a strategy network according to a first state, fixing the amplitude and inputting the first action into an optimization module, updating the first action to obtain a second action, obtaining a first target value, enabling the second action to act on a wireless environment to obtain a second state to obtain a new sample and storing the new sample in an experience pool, training DDPG by the strategy network and a value network according to the sample, updating parameters of an executor by using a strategy gradient method, determining a third target value according to the first target value and a second target value generated by a target Q network, training DNN of the online Q network according to the third target value, updating the parameters of the online Q network, and repeatedly executing the steps until the change amplitude of transmitting power is smaller than a preset threshold value, obtaining network parameters for minimizing the transmitting power of an AP and outputting the network parameters. The invention can realize stable and efficient learning in a shorter time, and can converge to the optimal target more quickly.
The patent mainly aims at optimizing the transmitting power of the base station, the phase position, the amplitude and the like of the intelligent reflecting surface element under the condition of meeting the constraint of signal to noise ratio. The invention mainly relates to the problem of energy loss of an intelligent reflecting surface, the intelligent reflecting surface is generally composed of a plurality of passive reflecting elements, and controllable phase and amplitude changes of an incident signal can be realized by adjusting the elements of the intelligent reflecting surface. In the invention, the number of elements of the intelligent reflecting surface is considered to be more, and in order to ensure the communication rate between the base station and the user and further reduce the energy consumption, it is assumed that only a given number of elements of the intelligent reflecting surface can be activated. In order to reduce the complexity of solving the traditional communication algorithm, the invention provides a method for combining deep reinforcement learning with the traditional communication algorithm, wherein the traditional communication algorithm solves the problem that the phase optimization of elements of the intelligent reflecting surface ensures the communication rate, and a reinforcement learning strategy network is responsible for outputting an activation strategy and selecting the most suitable intelligent reflecting surface element for activation. In the feature extraction of the channel state, a bidirectional convolution and close-contact full-connection structure is adopted, so that the convergence speed in the network training process is accelerated.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. An intelligent reflector element intelligent activation method based on deep reinforcement learning is provided. The technical scheme of the invention is as follows:
an intelligent reflector element intelligent activation method based on deep reinforcement learning comprises the following steps:
s1, establishing a system model according to an actual communication scene, and providing a target problem to be solved according to the system model;
S2, abstracting a target solving problem into a Markov decision problem, and defining basic elements involved in the process of interaction with the environment by the reinforcement learning agent, wherein the basic elements comprise actions, states and setting of rewarding functions;
s3, constructing a deep reinforcement learning algorithm based on a classical presenter-reviewer framework, wherein the deep reinforcement learning algorithm comprises a strategy network and an evaluation network, the strategy network is used for outputting action decisions of an intelligent agent, and the evaluation network is used for evaluating actions taken by the strategy network in a current state and providing a gradient of strategy network update on the basis;
S4, redesigning the structure of the network, and completing the extraction of the channel state information by adopting a full-connection structure of bidirectional convolution and dense connection. The channel state relates to the factors such as the position relation, the mutual interference degree, the signal attenuation and the like of the base station, the intelligent reflecting surface and the user, so that the mapping relation between the channel state and the element activation of the intelligent reflecting surface is fitted by using a neural network, and the intelligent activation of the element of the intelligent reflecting surface is completed.
Further, the intelligent reflecting surface is a passive intelligent reflecting surface, and the intelligent reflecting surface mainly plays a role of assisting communication between the base station and the user equipment as a relay device between the base station and the user equipment.
Further, the step S1 is to deploy an intelligent reflecting surface between the base station and the user according to the scene that no line-of-sight communication exists between the base station and the user due to shielding in practice, establish a system model of intelligent reflecting surface auxiliary communication, and propose a target problem to be solved according to the system model, and specifically comprises the following steps:
the transmission rate of the kth user equipment can be expressed as:
wherein,Phi represents the intelligent reflection surface coefficient matrix,Mean square error between information received by a receiving end and information of a transmitting end is represented, pt is user transmitting power, sigma2 is represented by Gaussian noise power, IM is represented by a standard identity matrix, and hd,k、hr,k and G are respectively represented by channel states from a base station to user equipment, from an intelligent reflecting surface to the user equipment and from the intelligent reflecting surface to the base station.
Further, the step S2 is to abstract the target solving problem into a Markov decision problem, define basic elements involved in the process of interaction with the environment by the reinforcement learning agent, including setting of actions, states and rewarding functions, and specifically includes:
in the establishment of the Markov decision model, the state is set as the channel state at the current moment, the channel state information comprises a real part and an imaginary part, the action of the strategy is a one-dimensional vector v with the number of intelligent reflecting surface elements, the number of the intelligent reflecting surface elements is assumed to be N, and the action at any moment t is expressed as follows:
v(t)=[v1(t),v2(t),…vN(t)]
the reward reflects the performance of the target problem, and according to the transmission rate formula of the user, the sum of the rates of all user equipment is taken as a reward function, which can be expressed as:
further, the step S3 constructs a deep reinforcement learning algorithm based on a classical presenter-reviewer framework, and the deep reinforcement learning algorithm comprises a policy network and an evaluation network, wherein the policy network is used for outputting action decisions of an agent, and the evaluation network is used for evaluating actions taken by the policy network in a current state and providing gradients of updating of the policy network on the basis of the actions, and the method specifically comprises the following steps:
The policy network is entirely made up of three modules. The module 1 is a bidirectional convolution, mainly carries out convolution from horizontal to vertical on an input link channel, then carries out convolution operation on outputs in two directions through a 1X1 convolution kernel, and finally combines the outputs in the two directions into a one-dimensional vector to be input into the module 2. The module 2 is mainly a full-connection structure of two layers of close connection, and the close connection structure can enable the input of the current layer to contain the characteristics of all the previous layers, so that the characteristic extraction and analysis capability of a channel is further improved. The module 3 adjusts the output of the network by using 2-layer full connection, and the output of the strategy network is a one-dimensional probability vector with the number of intelligent reflection surface elements, and the probability is larger to indicate the possibility of being activated.
The evaluation network is likewise composed of three modules. The module 1 adopts a bidirectional convolution mode, and the input of the evaluation network relative to the strategy network also comprises the output action of the strategy network, so that the action of the strategy network is merged together by the evaluation network after the 1X1 convolution and then is input to the module 2. The structure of the module 2 is completely consistent with a policy network. The module 3 uses a 2-layer full connection mainly to adjust the output of the network, and evaluates the output of the network to be a Q value, and thus outputs a value at the end.
Further, the bidirectional convolution module respectively carries out bidirectional convolution on the channel states of all links;
The module 2 is a full-connection structure adopting 2 layers of close connection, the network width of all full-connection structures is limited within 1024, in the use of an activation function, except that the action output of a strategy network adopts Sigmoid to represent the probability of the activation of an intelligent reflecting surface element, the rest of the activation functions are ReLu, the size of an experience playback pool is 218, the learning rate of a neural network is 2-16, the variance of exploration noise is 0.0001, an optimizer adopts Adam, and a discount coefficient is set to be 0.
The invention has the advantages and beneficial effects as follows:
1. Aiming at the problem of troublesome integer optimization of the traditional communication algorithm, the invention provides a combined optimization mode combining the traditional communication algorithm and deep reinforcement learning, the complex iteration of the traditional communication algorithm is replaced by utilizing the fitting capacity of a neural network, and only the adjustment of the element phase of the intelligent reflecting surface is calculated by the traditional communication algorithm.
2. Aiming at the structure of the channel state, a mode of combining the bidirectional convolution and the close-contact full-connection structure is provided for extracting the channel characteristics, so that the capability of extracting the channel state information is greatly improved, and the convergence speed in the network training process is accelerated.
Drawings
FIG. 1 is a diagram of an algorithm framework of the overall preferred embodiment provided by the present invention;
FIG. 2 is a diagram of a system model in which the present invention is actually considered;
FIG. 3 is a block diagram of a policy network and an evaluation network as a whole;
FIG. 4 is a block diagram of a two-way convolution;
FIG. 5 is a diagram of a full-connection close-fitting connection configuration;
FIG. 6 is a graph comparing training curves of a two-way convolution and a full connection structure;
FIG. 7 is a graph comparing training curves for setting total number of different smart reflector elements given the number of smart reflector active elements;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
An intelligent reflection surface element intelligent activation method based on deep reinforcement learning is characterized by comprising the following steps:
S1, establishing a system model and a target solving problem, establishing the system model according to an actual communication scene and providing the target problem to be solved according to the system model;
S2, the reinforcement learning problem is generally abstracted into a Markov decision problem, and basic elements involved in the process of interaction with the environment by the reinforcement learning agent are defined, wherein the basic elements comprise actions, states and setting of rewarding functions;
S3, constructing a deep reinforcement learning algorithm based on a classical presenter-reviewer framework, wherein the algorithm mainly comprises two parts, a strategy network is used for outputting action decisions of an intelligent agent, and an evaluation network is used for evaluating actions taken by the strategy network in a current state and providing a gradient of strategy network update on the basis;
S4, introducing the intelligent reflecting surface brings challenges to the extraction of channel characteristics of the strategy network and the evaluation network, and redesigning the structure of the network to finish the extraction of channel state information.
Preferably, the intelligent reflecting surface is mainly used as relay equipment between the base station and the user equipment to assist communication between the base station and the user equipment, so that a channel with a range of vision range is seriously blocked by the base station and the user equipment in consideration of a multi-user uplink transmission scene, a cascade link of the intelligent reflecting surface is used for assisting to complete a communication task between the base station and the user equipment, and the channel state is integrally subjected to a Rayleigh channel model. Generally, unless specifically stated, the intelligent reflecting surfaces referred to in the present invention are all passive intelligent reflecting surfaces.
Preferably, in S2, according to the interaction model of the agent and the environment in the markov decision, at any time t, the agent takes an action according to the current environment state and obtains an instant prize, and then the environment state is updated to the next time. According to the basic interaction model and the target solving problem, channel state information at each moment is set to be the state of the environment, actions are one-dimensional vectors with the number of intelligent reflector elements and are used for indicating the current intelligent reflector activation strategy, and a reward function is usually used for measuring the target completion condition, so that the sum of the communication rates of user equipment is taken as a reward.
Preferably, the deep learning algorithm based on the frame of a presenter-reviewer is a currently mainstream deep reinforcement learning algorithm, a strategy network focuses on the strategy itself, and an intelligent reflector element activation strategy is output according to the current state, and is updated only by means of updating the strategy gradient, wherein an agent is usually required to complete a complete exploration track, and the requirement cannot be met in practice. In order to improve the updating speed of the strategy network, an evaluation network for evaluating the performance of the strategy network in real time is added, and on one hand, the strategy network can be simultaneously promoted to update towards the direction of the evaluation network guidance while the strategy is evaluated. It is worth mentioning that the phase of the intelligent reflecting surface is calculated by the traditional communication algorithm, and the neural network is used as a decision maker for selecting and activating the intelligent reflecting surface in the environment where the intelligent reflecting surface interacts with the intelligent body, so that the continuous mixed integer optimization problem is completed.
Preferably, the introduction of the intelligent reflecting surface greatly increases the number and complexity of the whole channels, and in order to improve the capability of extracting channel characteristics of the network, as shown in fig. 4 and 5, an extraction structure based on the ideas of bidirectional convolution and dense connection is used, and the basis of convolution kernel selection in the bidirectional convolution mainly selects a convolution kernel related to the shape of the channel state of the link according to the translational invariance of the channel state. The densely connected structure can make up the problem caused by insufficient network capacity on one hand, and can synthesize all the input features to further improve the feature extraction capability of the network on the other hand
The algorithm model used in the invention mainly comprises the following steps:
S1, establishing a system model and a target solving problem, establishing the system model according to an actual communication scene and providing the target problem to be solved according to the system model;
S2, the reinforcement learning problem is generally abstracted into a Markov decision problem, and basic elements involved in the process of interaction with the environment by the reinforcement learning agent are defined, wherein the basic elements comprise actions, states and setting of rewarding functions;
S3, constructing a deep reinforcement learning algorithm based on a classical presenter-reviewer framework, wherein the algorithm mainly comprises two parts, a strategy network is used for outputting action decisions of an intelligent agent, and an evaluation network is used for evaluating actions taken by the strategy network in a current state and providing a gradient of strategy network update on the basis;
S4, introducing the intelligent reflecting surface brings challenges to the extraction of channel characteristics of the strategy network and the evaluation network, and redesigning the structure of the network to finish the extraction of channel state information.
According to the intelligent reflection enhanced multi-user uplink multi-input single-output system model considered in fig. 2, it is assumed that 2 single-antenna user equipments transmit information to a remote data center through a base station configured with 2 antennas, and communication can only be assisted by means of the intelligent reflection between the base station and the user equipments considering that no line-of-sight channel exists between the base station and the user equipments. Since the objective problem of the solution is to accomplish high quality activation of the number of intelligent reflective surfaces while guaranteeing the communication rate. Given only the necessary formulation to solve the problem, the transmission rate of the kth user device can be expressed as:
wherein,Φ represents the intelligent reflector coefficient matrix, IM represents the standard identity matrix, hd,k、hr,k and G represent the base station to user equipment, intelligent reflector to user equipment and intelligent reflector to base station channel conditions, respectively.
In the establishment of the Markov decision model, the state is set as the channel state at the current moment, the channel state information comprises a real part and an imaginary part, the action of the strategy is a one-dimensional vector v with the number of intelligent reflecting surface elements, the number of the intelligent reflecting surface elements is assumed to be N, and the action at any moment t can be expressed as follows:
v(t)=[v1(t),v2(t),…vN(t)]
The performance of rewards, which generally needs to reflect the target problem, can be expressed as a rewards function by taking the sum of the rates of all user devices as a function of the user's transmission rate formula:
According to the overall algorithm framework in fig. 1, the policy network outputs an activation decision of the high-quality intelligent reflector element, which is an activation action made by the policy network for the current environmental state, and the evaluation network is used for evaluating the action of the current policy network, and the evaluation network guides the policy network to update towards the direction of increasing the output of the evaluation network in the updating of the policy network, namely provides a gradient of the updating of the policy network. Meanwhile, in order to ensure that the strategy of the strategy network keeps a certain exploratory property in the algorithm, exploring noise is increased in the process of interactively collecting data between the agent and the environment, so that the exploring noise is used as a trade-off mode of the agent in exploring and utilizing. The inputs of the strategy network and the evaluation network both contain channel state information, and the input of the evaluation network needs the action of the strategy network output besides the channel state information, and is the overall evaluation of the current state action pair.
According to the overall network framework in fig. 3, the modules 1 and 2 are mainly used for completing new channel state extraction, and are the main cores of the network, and the module 3 mainly solves the requirement of the output dimension, and respectively meets the adjustment of different networks to the output dimension, and will not be described in detail. The focus is here on the structure in modules 1 and 2, where the network structure in module 1 is mainly a two-way convolution as in fig. 4, which has a stronger feature extraction capability than a fully connected structure, helping the algorithm to train faster. The way of adopting the bidirectional convolution mainly considers the actual translation invariance of the channel state. In practice, two implementations of the bidirectional convolution are mainly two, one is to splice channel state information of all links into a regular rectangle, so that the convolution can be conveniently implemented, and the other is to respectively carry out the bidirectional convolution on channel states of all links, and the first convolution mode is considered to lead to errors due to filling elements which are inevitably brought in the splicing process, so that the latter mode is adopted. Specifically, the total number of intelligent reflection plane elements is set to be 24, and the whole network flow of the bidirectional convolution is analyzed in detail by taking the channel G from the intelligent reflection plane to the base station as an example. The two-way convolution, as the name implies, mainly carries out convolution operation on the convolution kernels of (24X 1) and (1X 2) adopted by G respectively in the horizontal direction and the vertical direction, then carries out convolution operation on the output of the two directions respectively, wherein the convolution kernel size is (1X 1), the number of channels is 512 and 256 respectively, and then the two outputs are combined and output to be sent to the module 2 according to a certain output dimension.
The module 2 adopts a full-connection structure of 2-layer close-contact connection as shown in fig. 5, and the connection can avoid the problem of insufficient network capacity on one hand, and further improves the extraction capability of the channel state because the current layer comprehensively comprises the characteristics of all the previous layers in the close-contact connection on the other hand. It is worth mentioning that the network width of all the fully connected structures is limited within 1024, in the use of the activation functions, except that the action output of the strategy network adopts Sigmoid to represent the probability of the activation of the intelligent reflecting surface element, the rest of the activation functions are ReLu, the size of the experience playback pool is 218, the learning rate of the neural network is 2-16, the variance of the exploration noise is 0.0001, the optimizer adopts Adam, and the discount coefficient is set to 0.
FIG. 6 is a comparison of training curves for a practical algorithm training process using a two-way convolution and a fully-connected structure, FIG. 7 shows the sum of the rates of user equipment achievable with the total number of smart reflector elements at 18 and 24,32, respectively
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (4)

The strategy network is integrally formed by three modules, wherein the module 1 is a bidirectional convolution, mainly carries out the convolution of an input link channel from the horizontal direction to the vertical direction, then carries out the convolution operation of a convolution kernel of 1X1 on the output of the two directions, and finally combines the output of the two directions into a one-dimensional vector to be input into the module 2, the module 2 is mainly a full-connection structure of two layers of close connection, the structure of the close connection can lead the input of the current layer to contain the characteristics of all the previous layers, the characteristic extraction and analysis capability of the channel is further improved, the module 3 adopts 2 layers of full-connection to adjust the output of the network, the output of the strategy network is a one-dimensional probability vector with the number of intelligent reflection surface elements, and the larger probability indicates the larger possibility of being activated;
CN202211598212.5A2022-12-122022-12-12 A smart activation method for smart reflective surface elements based on deep reinforcement learningActiveCN116208510B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211598212.5ACN116208510B (en)2022-12-122022-12-12 A smart activation method for smart reflective surface elements based on deep reinforcement learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211598212.5ACN116208510B (en)2022-12-122022-12-12 A smart activation method for smart reflective surface elements based on deep reinforcement learning

Publications (2)

Publication NumberPublication Date
CN116208510A CN116208510A (en)2023-06-02
CN116208510Btrue CN116208510B (en)2024-12-10

Family

ID=86518151

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211598212.5AActiveCN116208510B (en)2022-12-122022-12-12 A smart activation method for smart reflective surface elements based on deep reinforcement learning

Country Status (1)

CountryLink
CN (1)CN116208510B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113194488A (en)*2021-03-312021-07-30西安交通大学Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113489521A (en)*2021-05-262021-10-08电子科技大学Intelligent united beam forming method for non-cell large-scale MIMO network assisted by reflecting surface
CN115103372A (en)*2022-06-172022-09-23东南大学 A user scheduling method for multi-user MIMO systems based on deep reinforcement learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
DE102017219307B4 (en)*2017-10-272019-07-11Siemens Healthcare Gmbh Method and system for compensating motion artifacts by machine learning
CN111917509B (en)*2020-08-102023-04-18中国人民解放军陆军工程大学Multi-domain intelligent communication system and communication method based on channel-bandwidth joint decision
US20220292641A1 (en)*2021-03-042022-09-15Rensselaer Polytechnic InstituteDynamic imaging and motion artifact reduction through deep learning
CN113162679B (en)*2021-04-012023-03-10南京邮电大学DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN113543176B (en)*2021-07-082023-06-27中国科学院深圳先进技术研究院Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
CN114979801A (en)*2022-05-102022-08-30上海大学Dynamic video abstraction algorithm and system based on bidirectional convolution long-short term memory network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113194488A (en)*2021-03-312021-07-30西安交通大学Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113489521A (en)*2021-05-262021-10-08电子科技大学Intelligent united beam forming method for non-cell large-scale MIMO network assisted by reflecting surface
CN115103372A (en)*2022-06-172022-09-23东南大学 A user scheduling method for multi-user MIMO systems based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Deep Reinforcement Learning Based Approach for Intelligent Reconfigurable Surface Elements Selection;Shiming Zan et al.;2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech);20220912;第2-6页,图1、3、6、8*
Latency Minimization for Intelligent Reflecting Surface Aided Mobile Edge Computing;TongBai et al.;IEEE Journal on Selected Areas in Communications;20200703;第38卷(第11期);第2666-2672页*

Also Published As

Publication numberPublication date
CN116208510A (en)2023-06-02

Similar Documents

PublicationPublication DateTitle
CN113162679B (en)DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN111181618B (en) An intelligent reflective surface phase optimization method based on deep reinforcement learning
Li et al.Path planning for cellular-connected UAV: A DRL solution with quantum-inspired experience replay
CN114727318B (en) A method for improving the rate of multi-RIS communication networks based on MADDPG
CN114449584B (en)Distributed computing unloading method and device based on deep reinforcement learning
CN113300749A (en)Intelligent transmission beam optimization method based on machine learning enabling
CN118054828B (en)Intelligent super-surface-oriented beam forming method, device, equipment and storage medium
Xiao et al.Multi-scale attention based channel estimation for RIS-aided massive MIMO systems
CN119697663A (en) A waveform generation method and system for an intelligent reflective surface auxiliary safety synaesthesia system based on reinforcement learning
CN115685054A (en) Location estimation method, device and terminal
CN116208510B (en) A smart activation method for smart reflective surface elements based on deep reinforcement learning
Lin et al.Intelligent reflecting surface aided activity detection for massive access: Performance analysis and learning approach
CN119233323A (en)Multi-IRS unmanned aerial vehicle general sense calculation integrated system resource allocation optimization method
CN118487649B (en)STAR-RIS parameter configuration method for ultra-dense network
CN118869023A (en) A space-time joint extrapolation method for 6G near-field non-stationary channels based on deep learning
CN119210529A (en) A joint optimization method of RIS deployment and beamforming based on reinforcement learning
CN118900143A (en) A RIS-assisted MISO system optimization method based on deep reinforcement learning
CN118449818A (en) A frequency domain binary method for UAV data link anti-interference decision making
CN118282458A (en)Beam training and power distribution method and system of near-field NOMA system
CN117478256A (en)Multi-intelligent reflecting surface auxiliary unmanned aerial vehicle network rate maximization method in emergency scene
CN114697974B (en)Network coverage optimization method and device, electronic equipment and storage medium
Wu et al.GAI-Based Resource Management in RIS-Aided Next-Generation Network and Communication
CN119210539B (en) A codebook-free near-field beamforming method, device, equipment, medium and product based on deep learning
Mei et al.Multi‐agent reinforcement learning based transmission scheme for IRS‐assisted multi‐UAV systems
KR102780783B1 (en)Sum data maximization method and system using RIS-based RSMA

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp