Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Stream-Fueled Hive Mind for Reinforcement Learning.

License

NotificationsYou must be signed in to change notification settings

pykong/Borg-DQN

Repository files navigation

PlaceholderBadgePlaceholderBadgePlaceholderBadgePlaceholderBadgePlaceholderBadgePlaceholderBadgeLicensePlaceholderBadgePlaceholderBadge

Title picture

Borg-DQN

A Stream-Fueled Hive Mind for Reinforcement Learning.

This project originated as implementing the portfolio assignment for the data engineering module DLMDSEDE02 at the International University of Applied Sciences. It demonstrates how to build a streaming data-intensive application with a machine-learning focus.

Borg-DQN presents a distributed approach to reinforcement learning centered around ashared replaymemory. Echoing the collective intelligence of theBorgfrom the Star Trek universe, the system enables individual agents to tap into a hive-mind-like pool of communalexperiences to enhance learning efficiency and robustness.

This system adopts a containerized microservices architecture enhanced with real-time streaming capabilities.Agents employ Deep Q-Networks (DQN) within game containers for training on the Atari Pong environmentfrom OpenAI Gym. The replay memory resides in a separate container, consisting of a Redis Queue, whereinagents interface via protocol buffer messages.

The architecture continuously streams agents' learning progress and replay memory metrics to Kafka,enabling instant analysis and visualization of learning trajectories and memory growth on a Kibanadashboard.

Gettings Started

Requirements

The execution of Borg-DQN requires a working installation ofDocker, as well as thenvidia-container-toolkit to pass through CUDA acceleration to the game container instances. Refer to the respective documentation for installation instructions:

The development of the game and monitor containers furthermore requires a working Python 3.11 interpreter andpoetry for dependency management:

Starting Up

To start the application, run from the root directory:

docker compose up

Observe the learning progress and memory growth on thelive dashboard.

To start the application with multiple game containers, run:

docker compose up --scale game=3

TheElasticsearch indices can also be looked into.

Persistence Features

Upon startup, game containers load the most recent model checkpoint from the mode store location, while the replay memory will be prefilled with persisted transitions.

Architecture

The application follows an infrastructure-as-code (IaC) approach, wherein individual services run inside Docker containers, whose configuration and interconnectivity are defined in acompose.yaml at its root directory.

Architecture diagram

In the following, there is a short overview of each component of the application.

Game Container

The game container encapsulates an Atari Pong environment (OpenAI gym) and a double deep Q-network agent (using PyTorch). The code is adapted fromMERLIn, an earlier reinforcement learning project bypykong.

Pong screenshot

Configuration

The game container instances can be configured via environment variables. The easiest way is to place a.env file at the project's root; keys must bear the prefixCONFIG_, for example,CONFIG_alpha=1e-2, would configure the learning rate. For a complete list of configuration parameters, consultconfig.py.

Serializing Game Transitions

The game container will put each game transition into the shared replay memory and sample minibatches from that memory again.Protocol Buffers shortprotobuf is used for serialization, which is fast and byte-safe, allowing for efficient transformation of the NumPy arrays of the game states.

This approach, however, requires the definition and maintenance of a.proto schema file, from which native Python code is derived:

syntax="proto3";packagetransition.proto;messageTransition {bytesstate=1;uint32action=2;floatreward=3;bytesnext_state=4;booldone=5;    ...}

Replay Memory

The shared replay memory employsRedis to hold game transitions. Redis is performant and allows storing the transitions as serializedprotobuf messages due to its byte-safe characteristics.

Redis, however, does not natively support queues, as demanded by the use case. The workaround used is to emulate queue behavior by the client-side execution of theLTRIM command.

Memory Monitor

The memory monitor is a Python microservice that periodically polls the Redis shared memory for transition count and memory usage statistics and publishes those under a dedicated Kafka topic.While ready-made monitoring solutions, like a Kibana integration, exist, the memory monitor demonstrates using Kafka with multiple topics, the other being the training logs.

Kafka

Apache Kafka is a distributed streaming platform that excels in handling high-throughput, fault-tolerant messaging. In Borg-DQN, Kafka serves as the middleware that decouples the data-producing game environments from the consuming analytics pipeline, allowing for robust scalability and the flexibility to introduce additional consumers without architectural changes. Specifically, Kafka channels log to two distinct topics, 'training_log' and 'memory_monitoring', both serialized as JSON, ensuring structured and accessible data for any downstream systems.

ELK Stack

TheELK stack, comprisingElasticsearch,Logstash, andKibana, serves as a battle-tested trio for managing, processing, and visualizing data in real-time, making it ideal for observing training progress and replay memory growth in Borg-DQN.Elasticsearch is a search and analytics engine with robust database characteristics, allowing for quick retrieval and analysis of large datasets.Logstash seamlessly ingests data from Kafka through a declarative pipeline configuration, eliminating the need for custom code.Kibana leverages this integration to provide a user-customizable dashboard, all components being from Elastic, ensuring compatibility and stability.

Kibana screenshot

Development

Plans

  • Create external documentation, preferably usingMkDocs
  • Allow game container instances to be individually configured (e.g., different epsilon values to address the exploitation-exploration tradeoff)
  • Upgrade the replay memory to one featuring prioritization of transitions.

Contributions Welcome

If you like Borg-DQN and want to develop it further, feel free to fork and open any pull request. 🤓

Links

  1. Borg Collective
  2. Docker Engine
  3. NVIDIA Container Toolkit
  4. Poetry Docs
  5. Redis Docs
  6. Apache Kafka
  7. ELK Stack
  8. Protocol Buffers
  9. Massively Parallel Methods for Deep Reinforcement Learning
    • a more intricate architecture than Borg-DQN, also featuring a shared replay memory

[8]ページ先頭

©2009-2025 Movatter.jp