Kuznetsov et al., 2021
ViewPDF| Publication | Publication Date | Title |
|---|---|---|
| Kuznetsov et al. | Solving continuous control with episodic memory | |
| Hambly et al. | Recent advances in reinforcement learning in finance | |
| Chen et al. | Application of deep reinforcement learning on automated stock trading | |
| Rose et al. | A reinforcement learning approach to rare trajectory sampling | |
| Liu et al. | Practical deep reinforcement learning approach for stock trading | |
| Chen et al. | Generative inverse deep reinforcement learning for online recommendation | |
| Li et al. | Minimax-optimal multi-agent RL in Markov games with a generative model | |
| Liu et al. | Prioritized experience replay based on multi-armed bandit | |
| US20240169237A1 (en) | A computer implemented method for real time quantum compiling based on artificial intelligence | |
| Bhambri et al. | Reinforcement learning methods for wordle: A pomdp/adaptive control approach | |
| Cini et al. | Deep reinforcement learning with weighted Q-Learning | |
| Taveeapiradeecharoen et al. | Dynamic model averaging for daily forex prediction: A comparative study | |
| Shi et al. | Multi actor hierarchical attention critic with RNN-based feature extraction | |
| Shakya et al. | A deep reinforcement learning approach for inventory control under stochastic lead time and demand | |
| Chua et al. | FedPEAT: Convergence of federated learning, parameter-efficient fine tuning, and emulator assisted tuning for artificial intelligence foundation models with mobile edge computing | |
| Karda et al. | Automation of noise sampling in deep reinforcement learning | |
| Tokmak et al. | PACSBO: Probably approximately correct safe Bayesian optimization | |
| Nguyen et al. | Nonmyopic multifidelity acitve search | |
| Nabati et al. | Representation-driven reinforcement learning | |
| Yang et al. | Continuous control for searching and planning with a learned model | |
| Bossens et al. | Lifetime policy reuse and the importance of task capacity | |
| Izadi et al. | Using rewards for belief state updates in partially observable markov decision processes | |
| Refael et al. | LORENZA: Enhancing generalization in low-rank gradient LLM training via efficient zeroth-order adaptive SAM | |
| Marfaing et al. | Computer-assisted fraud detection, from active learning to reward maximization | |
| Yin et al. | Hashing over predicted future frames for informed exploration of deep reinforcement learning |