- Notifications
You must be signed in to change notification settings - Fork6.6k
Insights: ray-project/ray
Overview
Could not load contribution data
Please try again later
1 Release published by1 person
- ray-2.47.1 Ray-2.47.1
published
Jun 18, 2025
422 Pull requests merged by84 people
- [DOC][Core] fix typo in Anti-pattern.
#54547 merged
Jul 12, 2025 - use wait_condition for verifying http response
#54522 merged
Jul 12, 2025 - [serve] reorganize how we handle the http receive task
#54543 merged
Jul 12, 2025 - [deps] upgrade python protobuf to 4
#54496 merged
Jul 12, 2025 - cherrypick #54518
#54561 merged
Jul 12, 2025 - cherrypick #54386
#54560 merged
Jul 12, 2025 - cherrypick #54511
#54559 merged
Jul 12, 2025 - cherrypick #54544
#54558 merged
Jul 12, 2025 - [release] veresion change to 2.48.0
#54557 merged
Jul 12, 2025 - [Core] Add file_mounts to azure example-minimal config
#54533 merged
Jul 12, 2025 - [core][telemetry/10] support custom gauge+counter+sum metrics
#53734 merged
Jul 12, 2025 - [Data] Add Expression Support & with_columns API
#54322 merged
Jul 12, 2025 - [core][autoscaler] add the missing readonly/example.yaml to the build
#54535 merged
Jul 12, 2025 - document unexpected queuing behavior in handle
#54542 merged
Jul 12, 2025 - [train] TrainStateActor periodically checks controller status and sets aborted
#53818 merged
Jul 12, 2025 - Revert "[core] Default state API address when in a connected worker"
#54549 merged
Jul 12, 2025 - [Serve.llm] Make llm serve endpoints compatible with vLLM serve frontend (4/N): Refactor LLMServer
#54484 merged
Jul 12, 2025 - [ci] kick forge refresh
#54544 merged
Jul 11, 2025 - only print log line once during shutdown
#54534 merged
Jul 11, 2025 concat
: Handle mixed Tensor types for structs#54386 merged
Jul 11, 2025 - [core][gpu-objects] garbage collection
#53911 merged
Jul 11, 2025 - [wheel] limit build artifacts duplicated in example directory for ray_cpp
#54465 merged
Jul 11, 2025 - [core] Default state API address when in a connected worker
#54468 merged
Jul 11, 2025 - [core] enable the v2 autoscaler by default when the cluster is managed by KubeRay
#54518 merged
Jul 11, 2025 - [Serve] Set the docs path after app is initialized on the replica
#53463 merged
Jul 11, 2025 - [Doc][Cluster] Update Azure cluster docs
#54517 merged
Jul 11, 2025 - [ci] bumping uv binary version
#54514 merged
Jul 11, 2025 - [serve] Fix
test_deploy
on windows#54511 merged
Jul 11, 2025 - [core][telemetry/09] record sum metric e2e
#53512 merged
Jul 11, 2025 - [core][telemetry/08-bis] api documentation + improvements
#54472 merged
Jul 11, 2025 - [release] remove dask from byod 3.9 deps
#54521 merged
Jul 11, 2025 - [serve] update
test_request_timeout
#54519 merged
Jul 11, 2025 - [uv] Fix uv run parser for handling extra arguments
#54488 merged
Jul 10, 2025 - [core][autoscaler] fix: enable cloud_instance_id reusing in autoscaler v2
#54397 merged
Jul 10, 2025 - [core] Don't order retries for in-order actors to prevent deadlock
#54034 merged
Jul 10, 2025 - [serve] deflake test_e2e_preserve_prev_replicas
#54513 merged
Jul 10, 2025 - [Serve] Update timeout to 20 for test_deploy_bad_pip_package_deployment
#54510 merged
Jul 10, 2025 - increase timeout for wait condition
#54503 merged
Jul 10, 2025 - [serve] deflake test_replica_metrics_fields
#54493 merged
Jul 10, 2025 - Feat/add websocket support for di
#54490 merged
Jul 10, 2025 - [serve] fix
test_standalone_2
#54508 merged
Jul 10, 2025 - Optimize get_live_deployments
#54454 merged
Jul 10, 2025 - Feat/fix callback tests
#54507 merged
Jul 10, 2025 - [Docs] Troubleshooting DeepSeek/multi-node GPU deployment on KubeRay
#54229 merged
Jul 10, 2025 - split _wrap_user_method_call into _wrap_request and _start_request
#54485 merged
Jul 10, 2025 - [core] Improve status messages and add comments about stale seq_no handling
#54470 merged
Jul 10, 2025 - [deps] Allow to call individual functions within install-dependencies
#54502 merged
Jul 10, 2025 - Updated stalebot to use unstale label instead of bounced.
#54506 merged
Jul 10, 2025 - [core][telemetry/12] record histogram metric e2e
#53927 merged
Jul 10, 2025 - [deps] core: drop opencensus-proto test dep
#54497 merged
Jul 10, 2025 - [runtime env]: Integrating ROCm Systems Profiler to Ray worker process
#48525 merged
Jul 10, 2025 - [release] update release test dependencies
#54494 merged
Jul 10, 2025 - [ci] stop using get.docker.com
#54487 merged
Jul 10, 2025 - [Serve] Add RouterConfig field to DeploymentConfig to configure RequestRouter
#53870 merged
Jul 10, 2025 - [serve] deflake test_metrics
#54482 merged
Jul 10, 2025 - [serve] deflake test_deployment_scheduler_with_comp_sched
#54479 merged
Jul 10, 2025 - [core] upgrade opentelemetry-sdk
#53745 merged
Jul 10, 2025 - [core] fix get_max_resources_from_cluster_config
#54455 merged
Jul 9, 2025 - [wheel] only call bazel once when building the wheel
#54476 merged
Jul 9, 2025 - Add __init__.py for prefix tree directories
#54480 merged
Jul 9, 2025 - [core][compiled graphs] Supporting allreduce on list of input nodes
#51047 merged
Jul 9, 2025 - [Data] Fixing
map_groups
issues#54462 merged
Jul 9, 2025 - [Data] Re-enable sorting Ray Data tests
#54475 merged
Jul 9, 2025 - [core][tune] fix RayTaskError (de)serialization logic
#54396 merged
Jul 9, 2025 - [Data] Prevent Op fusion for streaming repartition
#54469 merged
Jul 9, 2025 - [serve] tests for log formatter
#54248 merged
Jul 9, 2025 - [serve] update get_application_url and tests
#54449 merged
Jul 9, 2025 - [serve] deflake
test_autoscaling_policy_with_metr_disab
#54458 merged
Jul 9, 2025 - [Serve] Check multiple FastAPI ingress deployments in a single application
#53647 merged
Jul 9, 2025 - [serve.llm] Adaption of the change of vllm.PoolingOutput
#54467 merged
Jul 9, 2025 - migrate check_library_usage_telemetry to _common
#54355 merged
Jul 9, 2025 - [Core] Use std::move in cluster task manager constructor
#54413 merged
Jul 9, 2025 - [core] Delete old skipped tests
#54427 merged
Jul 9, 2025 - Add label selector observability to placement group tables and actor and task detail pages
#54292 merged
Jul 9, 2025 - [llm] bump vllm to 0.9.2
#54407 merged
Jul 9, 2025 - [ci] adding uv binary v0.7.19
#54437 merged
Jul 9, 2025 - [deps] upgrade datasets in release tests
#54425 merged
Jul 9, 2025 - [serve] skip test_proxy_disconnect_metrics on windows
#54441 merged
Jul 8, 2025 - Fix test_json CI failures
#54352 merged
Jul 8, 2025 - [Data] Avoid OOMs with
read_json(..., lines=True)
#54436 merged
Jul 8, 2025 - update google tag container id
#54444 merged
Jul 8, 2025 - [train][doc] ray.train.report api docs should mention optional checkpoint_dir_name
#54391 merged
Jul 8, 2025 - [data] Extract backpressure-related code from ResourceManager as a policy
#54376 merged
Jul 8, 2025 - [serve] add request metadata to can_accept_request
#54429 merged
Jul 8, 2025 - [ci] add workspace status script
#54398 merged
Jul 8, 2025 - [deps] sync ml byod deps with global lock file
#54424 merged
Jul 8, 2025 - [serve] bump timeout for test_proxy
#54426 merged
Jul 8, 2025 - [wheel] remove pyarrow <18 restriction
#54405 merged
Jul 8, 2025 - [Autoscaler][V2] Use running node instances to rate-limit upscaling
#50414 merged
Jul 8, 2025 - [Refactor]Rename NCCL-related items to comm_backend
#51061 merged
Jul 8, 2025 - [core] Skip
test_owner_assign_inner_object
on Windows#54383 merged
Jul 8, 2025 - [Data] Add
pin_memory
toiter_torch_batches
#53792 merged
Jul 8, 2025 - [serve] Remove usage of
internal_api.memory_summary()
#54417 merged
Jul 8, 2025 - [serve] Remove usage of
ray._private.state
#54140 merged
Jul 8, 2025 - [core] Inject reorder_wait_seconds for scheduling queue test
#54404 merged
Jul 8, 2025 - [Core] Use smart pointer in logging.cc
#54351 merged
Jul 8, 2025 - [core] Delete skip_flaky_core_test_premerge
#54382 merged
Jul 8, 2025 - migrate signature from _private to _common
#54357 merged
Jul 8, 2025 - [ci] release tests: remove unused app configs
#54410 merged
Jul 8, 2025 - [data] dask: disable smoke test
#54411 merged
Jul 8, 2025 - [Data] Fixing
sort_benchmark
to avoid offsets overflows with Pyarrow#54390 merged
Jul 8, 2025 - [ci] upgrade rayci version to 0.16.0
#54392 merged
Jul 8, 2025 - updating raydepsets test job
#54387 merged
Jul 8, 2025 - [release] Make KubeRay test run nightly
#54243 merged
Jul 8, 2025 - [core][GPU Objects] Disable tensordict tests in macos ci
#54375 merged
Jul 8, 2025 - [data.llm] Return a batch of rows in the udf instead of row by row
#54329 merged
Jul 7, 2025 - [Data] Revisit async UDF handling in Ray Data
#54190 merged
Jul 7, 2025 - [core] Unskip
test_placement_group_strict_spread
#54381 merged
Jul 7, 2025 - [core] Add deprecation warning and remove tests for
_max_cpu_fraction_per_node
#54380 merged
Jul 7, 2025 - [core] Delete event_label
#54378 merged
Jul 7, 2025 - [core][autoscaler] make the autoscaler v2 work with the cluster launcher
#54230 merged
Jul 7, 2025 - [core] Remove
test_get_locations_timeout
#54367 merged
Jul 7, 2025 - [data] dask: mark all dask-on-ray tests as manual
#54371 merged
Jul 7, 2025 - [train] Force abort on SIGINT spam and do not abort finished runs
#54188 merged
Jul 7, 2025 - [core] Deflake
test_reconstruction_suppression
#54366 merged
Jul 7, 2025 - [core] Deflake
test_actor_scheduling_not_block_with_placement_group
#54368 merged
Jul 7, 2025 - [train] Document ray.train.collective
#54340 merged
Jul 7, 2025 - [data.llm] Decouple max_tasks_in_flight from max_concurrent_batches
#54362 merged
Jul 7, 2025 - [ci] sync versions of pytest and pip-tools
#54315 merged
Jul 7, 2025 - reduce number of loops over request headers from 2 to 1
#54326 merged
Jul 7, 2025 - [data.llm] Log engine stats after each batch task is done.
#54360 merged
Jul 7, 2025 - [serve] deflake
test_autoscaling_policy
#54336 merged
Jul 7, 2025 - [core] Get cloud provider with ray on kubernetes
#51793 merged
Jul 7, 2025 - [deps] remove time series and ludwig dependencies
#54316 merged
Jul 7, 2025 - [Data] - write_parquet enable both partition by & min_rows_per_file, max_rows_per_file
#53930 merged
Jul 7, 2025 - [core][GPU Objects] Add related tests for tensordict
#54286 merged
Jul 7, 2025 - [core] Don't try to monitor zipped files
#53151 merged
Jul 7, 2025 - [cpp] add explicit files for deps
#54311 merged
Jul 6, 2025 - [wheel] mac: upgrade arm64 wheel to macos 12
#54323 merged
Jul 5, 2025 - raydepsets scaffolding (package management tool)
#54265 merged
Jul 5, 2025 - [Train] Remove the subclass relationship between RunConfig and RunConfigV1
#54293 merged
Jul 5, 2025 - [serve] configure http options in controller
#54331 merged
Jul 4, 2025 - [Data] Fix flaky
test_shuffle
#54339 merged
Jul 4, 2025 - [train] Add broadcast_from_rank_zero and barrier collectives
#54066 merged
Jul 4, 2025 - [serve] deflake test_multiplex
#54335 merged
Jul 4, 2025 - Updated stalebot to add bounced label and exempt labels
#54318 merged
Jul 3, 2025 - [core][task-manager/02] consolidate TaskManager interface
#54317 merged
Jul 3, 2025 - Fix
test_runtime_env_container
#54330 merged
Jul 3, 2025 - [core] Fix gcs register actor callback check
#53634 merged
Jul 3, 2025 - [doc][kuberay] state
rayStartParams
is optional starting with KubeRay 1.4.0#53943 merged
Jul 3, 2025 - [core] Skip generator reconstruction test
#54320 merged
Jul 3, 2025 - [core] Use
SignalActor
intest_hybrid_policy_threshold
#54312 merged
Jul 3, 2025 - [data] adapt dask on ray to the new dask task class
#54108 merged
Jul 3, 2025 - disable uvicorn message logger middleware
#54309 merged
Jul 3, 2025 - [serve] speed up CI tests
#54303 merged
Jul 3, 2025 - [serve] speed up
test_deploy_app
#54304 merged
Jul 3, 2025 - [ci] remove min install on aarch64 mac
#54231 merged
Jul 3, 2025 - [setup] remove invoker in
bazel_invoke
#54302 merged
Jul 3, 2025 - [serve] rebalance serve CI tests
#54296 merged
Jul 3, 2025 - [train][tune] add support for dynamically loading callbacks by environement variables
#54233 merged
Jul 3, 2025 - [core] add digest for opentelemetry proto
#54300 merged
Jul 3, 2025 - modify example names to be modality based
#54297 merged
Jul 3, 2025 - [serve] Add arguments
#54295 merged
Jul 3, 2025 - [data.llm][Bugfix] Fix doc to only support int
concurrency
#54196 merged
Jul 2, 2025 - fix toctree for object detection README
#54290 merged
Jul 2, 2025 - add memory buffer logger to serve
#54269 merged
Jul 2, 2025 - [core][refactor/01] remove MarkTaskCanceled as a condition check
#54283 merged
Jul 2, 2025 - [core] Deflake
test_hybrid_policy_threshold
#54271 merged
Jul 2, 2025 - [doc][core] fix reStructuredText formatting on Resources page
#53882 merged
Jul 2, 2025 - [Data] Refactor
test_json_read_partitioned_with_filter
to avoid actors#54279 merged
Jul 2, 2025 - [docs; RLlib] Remove "new API stack" banner from all RLlib docs pages as its now the default.
#54282 merged
Jul 2, 2025 - [Data] Fix IcebergDatasink to properly generate individual file uuids
#52956 merged
Jul 2, 2025 - [docs] move directives to bottom of README.ipynb
#54250 merged
Jul 2, 2025 - add a env var to disable forceful replica shutdown
#54204 merged
Jul 2, 2025 - feat(runtime_env): add Azure Blob Storage support
#53135 merged
Jul 2, 2025 - [Data] Update release test datasets to us-west-2 buckets
#54258 merged
Jul 2, 2025 - [serve] Move
pickle.dumps
to handle request methods#54259 merged
Jul 2, 2025 - [deps] upgrade pytest-virtualenv
#54214 merged
Jul 2, 2025 - [core] Fix "Check failed: it->second.num_retries_left == -1"
#54116 merged
Jul 2, 2025 - Update azure.md - Missing azure dependency
#49104 merged
Jul 2, 2025 - [core] Fix sanitizers for actor manager test
#54224 merged
Jul 1, 2025 - [core] Add static type hints for Actor methods
#54173 merged
Jul 1, 2025 - [core] Remove
test_schedule_many_actors_and_normal_tasks
#54249 merged
Jul 1, 2025 - [core] Add timeouts to
test_scheduling.py::test_hybrid_policy
#54176 merged
Jul 1, 2025 - [RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling.
#53702 merged
Jul 1, 2025 - [serve] optimize code when user code is on same event loop
#54227 merged
Jul 1, 2025 - [core][GPU objects] Attach tensor transport to task args protobuf
#53935 merged
Jul 1, 2025 - Cursor/update ray docs Twitter link to X
#54238 merged
Jul 1, 2025 - Fix broken Ray Workflows documentation link in README.rst
#53136 merged
Jul 1, 2025 - [core][test] deflaky test_demand_report_when_scale_up by reducing workloads
#54183 merged
Jul 1, 2025 - [core] Deflake
test_node_affinity_scheduling_strategy_soft_spill_on_unavailable
#54247 merged
Jul 1, 2025 - remove flaky marker from test
#44033 merged
Jul 1, 2025 - [doc][kuberay] add version skew warning for plugin and RayCluster
#53950 merged
Jul 1, 2025 - [RLlib] Bug fix: Failed EnvRunners are not restored if there is no local EnvRunner.
#54091 merged
Jul 1, 2025 - [core] Delete asyncio actor logic in in-order scheduling code
#54033 merged
Jul 1, 2025 - move Collector class to _common
#54180 merged
Jul 1, 2025 - Revert "Revert "remove extraneous index.rst file for e2e examples (part 2)""
#54234 merged
Jul 1, 2025 - [docker] Update latest Docker dependencies for 2.47.1 release
#54015 merged
Jul 1, 2025 - [Doc][KubeRay] remove head pod trailing hash and adjust volcano output
#53826 merged
Jul 1, 2025 - [serve] support running user code on same event loop
#54219 merged
Jun 30, 2025 - [data] split dask doc tests out into its own jobs
#54207 merged
Jun 30, 2025 - [core] Recover intermediate objects if needed while generator running
#53999 merged
Jun 30, 2025 - [CI][KubeRay] Update KubeRay CI Tests branch for KubeRay v1.4.0 release
#53984 merged
Jun 30, 2025 - [serve] extract proxy args in replica
#54216 merged
Jun 30, 2025 - [data] run dask tests seperately
#54195 merged
Jun 30, 2025 - [serve] take scope, receive, send for call http entrypoint
#54184 merged
Jun 30, 2025 - Feat/update backpressure tests
#54212 merged
Jun 30, 2025 - [Core] Fix test_aggregator_agent Failing Test on MacOS
#54217 merged
Jun 30, 2025 - [ci] unify macos build script across platforms
#54198 merged
Jun 30, 2025 - [core] Remove
ray.wait
fortest_locality_aware_leasing_borrowed_objects
#54211 merged
Jun 30, 2025 - [core] Remove
test_scheduling_performance.py
#54210 merged
Jun 30, 2025 - [data] Add timeout for
test_arrow_block_scaling.py
#54155 merged
Jun 30, 2025 - [core] Fix race condition b/w object eviction & repinning for recovery.
#53934 merged
Jun 30, 2025 - [core] Manually run io service in actor manager tests to simulate different orderings
#54203 merged
Jun 30, 2025 - [core] Atomic num_tasks_submitted
#54200 merged
Jun 30, 2025 - [Doc][KubeRay] Kuberay gcs ft takes yaml file with version 1.4.0
#54192 merged
Jun 30, 2025 - [core] Task receiver cleanup
#54205 merged
Jun 30, 2025 - BLD: Automatically patch
.bazelrc
file for Windows 11 build#53586 merged
Jun 30, 2025 - [Docs][KubeRay] Update all KubeRay version references for KubeRay 1.4.0 release
#53884 merged
Jun 30, 2025 - [serve] Move logic into user callable wrapper
#54177 merged
Jun 29, 2025 - [Doc] Convert configuring-autoscaling.ipynb back to markdown docs
#54111 merged
Jun 28, 2025 - [Docs][KubeRay] Convert rayservice-quick-start.ipynb back to markdown docs
#54138 merged
Jun 28, 2025 - [Doc][KubeRay] Convert raycluster-quick-start.ipynb back to markdown docs
#54125 merged
Jun 28, 2025 - [Doc][KubeRay] Add doc for running KubeRay dashboard
#53830 merged
Jun 28, 2025 - [ci] remove
ci/keep_alive
#54079 merged
Jun 28, 2025 - [data] gather dask tests into single test files
#54163 merged
Jun 28, 2025 - [Data] Add TooManyRequests catch to BQ writer
#54000 merged
Jun 28, 2025 - [Data] Fix
test_binary
setup fixture that doesn't close file handles#54028 merged
Jun 28, 2025 - [serve] Increase default uvicorn keep alive timeout
#54127 merged
Jun 27, 2025 - [doc] fix broken links in the vllm guide
#54161 merged
Jun 27, 2025 - [Docs][KubeRay] Delete KubeRay doctests
#54080 merged
Jun 27, 2025 - [Feat][Core] Implement Event Aggregator Agent
#53182 merged
Jun 27, 2025 - Feat/middleware callback support
#54106 merged
Jun 27, 2025 - [data] Handle HuggingFace parquet dataset resolve URLs
#54146 merged
Jun 27, 2025 - [data] Use
write_dataset
for partitioning & writing to file instead of custom implementation#54052 merged
Jun 27, 2025 - Correct asyncio ref documentation for Python 3.11+
#54157 merged
Jun 27, 2025 - [core][test] fix flaky data races in NodeManagerTest
#54129 merged
Jun 27, 2025 - [Doc][KubeRay] verl example
#54114 merged
Jun 27, 2025 - [RLlib] Fix shapes in
explained_variance
for recurrent policies.#54005 merged
Jun 27, 2025 - [ci] add cibase tags for ci base envs
#53755 merged
Jun 27, 2025 - Remove
botocore
dependency in Ray Serve LLM#54156 merged
Jun 27, 2025 - (serve.llm) Remove test leakage from placement bundle logic
#53723 merged
Jun 27, 2025 - [data] split dask and modin tests
#54122 merged
Jun 26, 2025 - [Data] Fixing PyArrow overflow handling
#53971 merged
Jun 26, 2025 - [serve] split call_user_method
#54104 merged
Jun 26, 2025 - [Data] Handle Huggingface Integration CI test failures
#54128 merged
Jun 26, 2025 - [Data] Fix ActorPool autoscaler to properly scale up
#53983 merged
Jun 26, 2025 - use gtm datalayer directly, fix format
#54144 merged
Jun 26, 2025 - [package] remove
__api__
insetup.py
#54143 merged
Jun 26, 2025 - [Minor][Fix][Core/Test] Fix test_actor_restart_on_node_failure wrong test logic without waiting
#54088 merged
Jun 26, 2025 - [data] fix repartitioning empty datasets
#54107 merged
Jun 26, 2025 - [Doc][KubeRay] revert kuberay-gcs-ft.ipynb to markdown
#54084 merged
Jun 26, 2025 - Fix sort_benchmark release test arg
#54145 merged
Jun 26, 2025 - [Doc][KubeRay] Convert rayjob-quick-start.ipynb back to markdown docs
#54093 merged
Jun 26, 2025 - [core] split dask and modin tests
#54121 merged
Jun 26, 2025 - [Core] Remove Unnecessary Checks in GRPC Server Shutdown Process
#53910 merged
Jun 26, 2025 - [core] Delete unused env vars
#54095 merged
Jun 26, 2025 - [Doc][KubeRay] Remove
rayserve-dev-doc.md
#54057 merged
Jun 26, 2025 - [core] Bump timeout in
test_ray_init
#54136 merged
Jun 26, 2025 - [core] Clean up unused FFs
#54139 merged
Jun 26, 2025 - [core] Fix GCS crash on duplicate MarkJobFinished RPCs due to network failures
#53951 merged
Jun 26, 2025 - [train] Remove usage of
ray._private.state
#54142 merged
Jun 26, 2025 - [core] Deflake
test_scheduling.py
in client mode#54137 merged
Jun 26, 2025 - [core] Fix
test_basic_3.py
in client mode#54135 merged
Jun 26, 2025 - [serve] refactor _run_user_code
#54103 merged
Jun 26, 2025 - [Doc] vale ignores anchors of headers
#53580 merged
Jun 26, 2025 - set config for ua tag
#54112 merged
Jun 26, 2025 - [ci][docs] Add test tag rule for Vale files
#54118 merged
Jun 26, 2025 - [train] update beginner pytorch example
#54124 merged
Jun 26, 2025 - [Data] Bumped latest PA version to 20.0
#54123 merged
Jun 26, 2025 - [ci] fix missing
dask
tag in all tags list#54113 merged
Jun 26, 2025 - [core][test] fix data races in NodeManagerTest
#54097 merged
Jun 25, 2025 - [core] Remove experimental "array" library
#54105 merged
Jun 25, 2025 - [core] Clean up
test_locality_aware_leasing_borrowed_objects
#54086 merged
Jun 25, 2025 - [core][refactor] replace unnecessary shared_ptrs with unique_ptrs and references in raylet
#54062 merged
Jun 25, 2025 - [ci] fix mac ci by pinning cython version
#54061 merged
Jun 25, 2025 - [core] Deflake
test_basic_3.py
#54083 merged
Jun 25, 2025 - remove final references to plasma_event_handler
#54085 merged
Jun 25, 2025 - [core] Deflake
test_ray_init
#54094 merged
Jun 25, 2025 - [core] Deflake
test_actor_restart
#54087 merged
Jun 25, 2025 - Updated stalebot to run every 12 hours.
#54041 merged
Jun 25, 2025 - [serve] Prefer localhost instead of host ip for microbenchmarks
#54092 merged
Jun 25, 2025 - [train] Driver SIGINT calls controller abort
#53600 merged
Jun 25, 2025 - [data] Split out long running scaling test
#54045 merged
Jun 25, 2025 - [core] Deflake
test_actor_unavailable_conn_broken
#54090 merged
Jun 25, 2025 - [V2][Autoscaler] Fix
numOfHosts
> 1 slice termination logic#54063 merged
Jun 25, 2025 - [V2][Autoscaler] Add
cloud_instance_id
to all V2 Austoscaler termination requests#53938 merged
Jun 25, 2025 - Fix autoscaler recovery docker config to use node-specific settings
#53992 merged
Jun 25, 2025 - [data/preprocessors] Improve execution perf for One Hot encoding
#54022 merged
Jun 25, 2025 - [Docs][KubeRay] Update changes from KubeRay 1.3.2 to 1.4.0
#53886 merged
Jun 25, 2025 - [core] Fix comment
#53853 merged
Jun 25, 2025 - [ci] add
-sSL
for curl on node install#54060 merged
Jun 25, 2025 - updating compile comment
#54058 merged
Jun 25, 2025 - Revert "remove extraneous index.rst file for e2e examples (part 2)"
#54051 merged
Jun 25, 2025 - [data] fix lint error in conftest.py
#54053 merged
Jun 25, 2025 - [serve] Use
get_application_url
in test_metrics#54050 merged
Jun 24, 2025 - [ci] update anyscale layer
#54043 merged
Jun 24, 2025 - [serve.llm] Prefix aware router eviction thread improvements
#53957 merged
Jun 24, 2025 - [serve] Remove hardcoded urls from serve microbenchmarks
#54026 merged
Jun 24, 2025 - [core] fix detached actor being unexpectedly killed
#53562 merged
Jun 24, 2025 - [POC] fix test_metrics
#54037 merged
Jun 24, 2025 - [serve] Handle request with Semaphore
#54019 merged
Jun 24, 2025 - remove extraneous index.rst file for e2e examples (part 2)
#54023 merged
Jun 24, 2025 - [☀️] Fix repr for ray.ObjectRef, ray.ObjectRefGenerator types
#54011 merged
Jun 24, 2025 - [core][ci] Disable test db for container tests
#54031 merged
Jun 24, 2025 - [docker] Update latest Docker dependencies for 2.47.1 release
#54016 merged
Jun 23, 2025 - [core] improve assertion check in test_task_metrics
#53958 merged
Jun 23, 2025 - remove extraneous index.rst file for e2e-multimodal-ai-workloads
#54017 merged
Jun 23, 2025 - [Serve.llm] Remove ImageRetriever class and related tests from the LLM deployment module.
#53980 merged
Jun 23, 2025 - fix test_request_timeout timeout mismatch issue
#54010 merged
Jun 23, 2025 - fix gsat global
#54012 merged
Jun 23, 2025 - [train] Fix release test missing data key
#53963 merged
Jun 23, 2025 - [data] remove schema from release tests
#53956 merged
Jun 23, 2025 - [kuberay] log actionable err msg when required TPU node selectors missing
#53914 merged
Jun 23, 2025 - [core] Fix flaky
test_state_api
#53975 merged
Jun 23, 2025 - [data] remove operator_fusion_benchmark
#53962 merged
Jun 23, 2025 - [Data] Add reading from Delta Lake tables and from Unity Catalog
#53701 merged
Jun 23, 2025 - test: refactor
test_observability_helpers
#53875 merged
Jun 23, 2025 - [core] Remove actor task path in normal task submitter
#53996 merged
Jun 23, 2025 - [core] Rename
GcsFunctionManager
and use fake in test#53973 merged
Jun 23, 2025 - [Serve.llm][P/D] Fix health check in prefill disagg
#53937 merged
Jun 22, 2025 - [Test][KubeRay] Update KubeRay version to v1.4.0 for autoscaler tests
#53974 merged
Jun 22, 2025 - [core] Fix ActorClass.remote return typing and expose Actor class methods to static analysis
#53986 merged
Jun 21, 2025 - [core] Use core worker client pool in GCS
#53654 merged
Jun 21, 2025 - [core] Revert container tests to medium size instance
#53966 merged
Jun 21, 2025 - Fix ray import error when both ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES are set
#53757 merged
Jun 20, 2025 - [core] Making NodeManager use ILocalTaskManager instead of TaskManager.
#53961 merged
Jun 20, 2025 - defer loading csat so gtag loads first
#53968 merged
Jun 20, 2025 - fix ga4 events
#53967 merged
Jun 20, 2025 - [train][template] Remove clock emoji which does not always render well
#53965 merged
Jun 20, 2025 - [core][gpu-objects] Support
ray.get
on the driver process for GPU objects#53902 merged
Jun 19, 2025 - [kuberay] Update helm install command in prometheus doc to set serviceMonitor
release=prometheus
#53952 merged
Jun 19, 2025 - [Docs] Fix async code in serving notebook
#53864 merged
Jun 19, 2025 - [core][rocm] Allow CUDA_VISIBLE_DEVICS and HIP_VISIBLE_DEVICES
#53531 merged
Jun 19, 2025 - [train][template] Pip install with python block instead
#53928 merged
Jun 19, 2025 - [Data] Refactor
Planner
to avoid storing plan-specific state#53955 merged
Jun 19, 2025 - [core] Avoid unnecessary deserialization/serialization of CallerWorkerId
#53939 merged
Jun 19, 2025 - [serve] add ability to track child requests
#53941 merged
Jun 19, 2025 - [Doc][KubeRay] Add a doc for scheduler plugins
#53846 merged
Jun 19, 2025 - [core][telemetry/08] record counter metric e2e
#53449 merged
Jun 19, 2025 - [HashShuffle] - Add warnings for when there are insufficient resources for Aggregators
#53705 merged
Jun 19, 2025 - [Data] Join release tests
#53903 merged
Jun 19, 2025 - [docs][Serve] Add clarification for health check and FT of serve deployments
#53944 merged
Jun 19, 2025 - fastapi and streaming tests use get applications api
#53949 merged
Jun 19, 2025 - [RLlib; docs] Fix docstring example for custom MultiRLModule with shared encoder.
#53912 merged
Jun 19, 2025 - [Data] Prevent filename collisions on write
#53890 merged
Jun 19, 2025 - [data] fix flakey schema
#53901 merged
Jun 19, 2025 - [Data] Fixed
BlockMetadata
derivation forRead
operator#53908 merged
Jun 19, 2025 - [core] Fix flaky
test_worker_exit_intended_user_exit
#53909 merged
Jun 19, 2025 - fix the bash code run error in notebook
#53900 merged
Jun 19, 2025 - [Docs] Fix issues with e2e audio tutorial
#53932 merged
Jun 19, 2025 - [train] Cleanups for training ingest benchmark
#53684 merged
Jun 19, 2025 - [train] add proper filtering to metrics
#53788 merged
Jun 18, 2025 - [cgraph] Avoid depending on torch CPU module for CPU-only actor
#53849 merged
Jun 18, 2025 - [train] expose training input/output in callbacks
#53869 merged
Jun 18, 2025 - Skip test_metrics_agent_with_open_telemetry on mac
#53917 merged
Jun 18, 2025 - [Docs] Add ServiceMonitor section and make some step optional in Grafana & Promethus page
#53474 merged
Jun 18, 2025 - [Docs][KubeRay] Update KubeRay operator installation references for all docs
#53885 merged
Jun 18, 2025 - [Core] Support AMD GPU MI3xx product line
#51802 merged
Jun 18, 2025 - [Doc][KubeRay] Update KubeRay operator installation reference
#53842 merged
Jun 18, 2025 - [Docs][KubeRay] Fix RayJob quickstart doc step 9 error
#53887 merged
Jun 18, 2025 - [Core] Use fd instead of handle for windows log redirection
#53852 merged
Jun 18, 2025 - Add dashboard visualizations for TPU metrics
#53898 merged
Jun 18, 2025 - [ObjectStore] Warn if object store is allocated < 50% of total memory for data workloads
#53857 merged
Jun 18, 2025 - [Data] Deprecate use_polars flag
#53867 merged
Jun 17, 2025 - [data] split test_all_to_all.py
#53865 merged
Jun 17, 2025 - add missing configs for object detection template
#53895 merged
Jun 17, 2025 - [core] Remove hardcoded flaky tests
#53888 merged
Jun 17, 2025 - [Serve][LLM] Simplify _prepare_engine_config()
#53704 merged
Jun 17, 2025 - [core][gpu-objects] Fix
test_gpu_objects_nccl.py
#53874 merged
Jun 17, 2025 - [RLlib] MetricsLogger: Fix
get/set_state
to handle tensors inself.values
.#53514 merged
Jun 17, 2025 - [Data] Improve handling of mismatched columns
#53861 merged
Jun 17, 2025 - Fix pickle error with remote code models in vLLM Ray worker process
#53815 merged
Jun 17, 2025 - [train][template] Remove ineffective post build script and pip install instead
#53822 merged
Jun 17, 2025 - [core][gpu objects] Integrate single-controller collective APIs with GPU objects
#53720 merged
Jun 16, 2025 - [Data] Improve handling of
pandas.NA
#53859 merged
Jun 16, 2025 - [devx] Fix 'uv run' command line parsing
#53838 merged
Jun 16, 2025 - [Data] Improve
read_text
trailing newline semantics#53860 merged
Jun 16, 2025 - [Serve.llm][P/D] Support separate deployment config for PDProxy in Prefill disagg
#53821 merged
Jun 16, 2025 - [Doc][KubeRay] Remove
vllm-rayservice.md
and use Ray Serve LLM instead#53844 merged
Jun 16, 2025 - add api to get application url
#53796 merged
Jun 16, 2025 - [Doc][KubeRay] Remove very old ResNet benchmark example
#53839 merged
Jun 16, 2025 - [release] Fix release tests
#53855 merged
Jun 16, 2025 - [Serve.llm] Disable TP=2 VLM batch test
#53825 merged
Jun 16, 2025 - [Doc][Fix] reveal the falsely hidden export command in the KubeRay GCS FT guide
#53832 merged
Jun 16, 2025 - [core][gpu-objects] Support intra-process communication
#53798 merged
Jun 16, 2025 - [Doc][KubeRay] Remove very old XGBoostTrainer example
#53837 merged
Jun 16, 2025 - [core] Release resources only after tasks have stopped executing
#53660 merged
Jun 16, 2025 - [core] Deflake
test_multiprocessing.py
#53802 merged
Jun 16, 2025 - [core] Fix
test_object_spilling.py
on Windows#53851 merged
Jun 16, 2025 - [KubeRay] Remove unused YAMLs
#53840 merged
Jun 16, 2025 - [chore] Change file mode of
rayservice-no-ray-serve-replica.md
from 755 to 644#53843 merged
Jun 16, 2025 - fix
AggregateFnV2
doc to statefinalize
instead of_finalize
#53835 merged
Jun 16, 2025 - [core] Fix GCS subscribers map race condition
#53781 merged
Jun 16, 2025 - [core] deleting unused code from plasma client
#53814 merged
Jun 16, 2025 - [core] Fix race condition in raylet graceful shutdown
#53762 merged
Jun 16, 2025 - [serve] Revert request timeout from serve instance fixtures
#53809 merged
Jun 16, 2025 - [Doc] Remove "Deploying a static Ray cluster without KubeRay"
#53833 merged
Jun 15, 2025 - [Doc] Small mistake in kuberay ingress
#53834 merged
Jun 15, 2025 - [ci] bazelize
get_contributors
script#53743 merged
Jun 14, 2025 - [ci] First release test on GKE
#53390 merged
Jun 14, 2025 - Replace
python setup.py bdist_wheel
withpip wheel
#53458 merged
Jun 14, 2025 - [serve] Set route_prefix and docs_path when re-deploying app
#53753 merged
Jun 14, 2025 - Add tpu usage metrics to reporter_agent
#53678 merged
Jun 14, 2025 - [data] Refactor interface for actor_pool_map_operator
#53752 merged
Jun 13, 2025 - ray-llm container cu124 -> cu128 update
#53730 merged
Jun 13, 2025 - [dashboard] Fix retrieving IP address from the
GPUProfilingManager
on the dashboard agent#53807 merged
Jun 13, 2025 - [ci/release] Trigger Ray release by running a Bazel binary
#52962 merged
Jun 13, 2025 - version change for 2.47.1
#53813 merged
Jun 13, 2025 - cherrypick #53671
#53812 merged
Jun 13, 2025 - [core] Move dependencies of NodeManger to main.cc for better testability
#53782 merged
Jun 13, 2025 - [core] Deflake
test_object_spilling.py
#53803 merged
Jun 13, 2025 - [core] Deflake
test_state_api.py
#53804 merged
Jun 13, 2025 - [tune] update BlockMetadata args in tests
#53791 merged
Jun 13, 2025 - [serve] Fix autoscaling metrics
#53778 merged
Jun 13, 2025 - pass route prefix to replica
#53777 merged
Jun 13, 2025 - [Serve] Call shared long poll client router registration in event loop
#53613 merged
Jun 13, 2025 - [core] Add timeout to
ray.get
call intest_update_object_location_batch_failure
#53805 merged
Jun 13, 2025 - [RLlib] Fix device check in
Learner
.#53706 merged
Jun 13, 2025 - [core] Deflake
test_client_builder.py
#53774 merged
Jun 13, 2025 - [core] Increase instance sizes for wheel / HA tests
#53783 merged
Jun 13, 2025
139 Pull requests opened by80 people
- Bump tqdm from 4.64.1 to 4.66.3 in /python
#53820 opened
Jun 13, 2025 - [core] Ungracefully exit if the agent dies unexpectedly
#53847 opened
Jun 16, 2025 - [RLlib] Mixin Layer Design Sketch Up
#53850 opened
Jun 16, 2025 - [core] adding additional stats to the dump object store usage api.
#53856 opened
Jun 16, 2025 - [core] Cleanup naming in core worker scheduling queues
#53858 opened
Jun 16, 2025 - [core] Sleep to debug container test
#53862 opened
Jun 16, 2025 - Feat/ray serve middleware support
#53868 opened
Jun 17, 2025 - [dashboard] Support to overwrite the _client_max_size of http request entity
#53880 opened
Jun 17, 2025 - [ci] add python 3.13 ray docker image build
#53894 opened
Jun 17, 2025 - Bump gradio from 3.50.2 to 5.31.0 in /python/requirements
#53899 opened
Jun 17, 2025 - python depsets tool
#53904 opened
Jun 18, 2025 - [core] Move inner_publisher logic into gcsPublisher
#53905 opened
Jun 18, 2025 - [RLlib] Add missing colon to CUBLAS_WORKSPACE_CONFIG
#53913 opened
Jun 18, 2025 - [RLlib] Add missing documentation for SACConfig's training()
#53918 opened
Jun 18, 2025 - Update deletion policy for rayjob quick start
#53929 opened
Jun 18, 2025 - [serve] move test from test_grpc to test_proxy
#53933 opened
Jun 18, 2025 - Bump urllib3 from 1.26.19 to 2.5.0 in /python
#53936 opened
Jun 18, 2025 - [Data] Replaced `get_object_locations` with `get_local_object_locations`
#53942 opened
Jun 19, 2025 - finishing commit for issue #52113
#53964 opened
Jun 19, 2025 - tune: make Tune status/progress tables readable in dark mode
#53969 opened
Jun 20, 2025 - docs(data): fix broken Parameters table
#53972 opened
Jun 20, 2025 - [RLlib] Enhance SAC (new API stack) with discrete action support.
#53982 opened
Jun 20, 2025 - [Core] Add AcceleratorManager implementation for Rebellions NPU
#53985 opened
Jun 21, 2025 - [Doc] Update Istio service mesh graph
#53988 opened
Jun 21, 2025 - [Serve] Make replica scheduler backoff configurable #52871
#53991 opened
Jun 21, 2025 - Fixes default_dqn_torch_rl_module assuming the device is 'cpu'
#54004 opened
Jun 23, 2025 - Added openssl support for PPC64LE.
#54006 opened
Jun 23, 2025 - [dashboard] Clean up naming for GPU profiling module
#54009 opened
Jun 23, 2025 - [DONOTMERGE] Proof-of-concept for GPU objects + NIXL
#54024 opened
Jun 24, 2025 - Bump mlflow from 2.19.0 to 3.1.0 in /doc/source/ray-overview/examples/e2e-xgboost
#54027 opened
Jun 24, 2025 - Multimodal ai
#54029 opened
Jun 24, 2025 - [core][autoscaler][v1] add heartbeat timeout logic to determine node activity status
#54030 opened
Jun 24, 2025 - Bump mlflow from 2.22.0 to 3.1.0 in /python
#54032 opened
Jun 24, 2025 - gen test
#54046 opened
Jun 24, 2025 - update all 'Run on Anyscale' buttons to redirect to respective template preview pages
#54049 opened
Jun 24, 2025 - Add Azure Files support to persistent storage documentation
#54055 opened
Jun 24, 2025 - Adapt to vLLM reducing exports from the top level
#54099 opened
Jun 25, 2025 - [data] Remove asserts that test internal `ds._block_num_rows()`
#54109 opened
Jun 25, 2025 - vLLM ZMQ KVEvent Router
#54115 opened
Jun 25, 2025 - [core][cgraph] Export classes related to NCCL communicator
#54117 opened
Jun 26, 2025 - [core] fix checking for uv existence during ray_runtime setup
#54141 opened
Jun 26, 2025 - [RLlib] Fix checkpoints not having correct metrics
#54148 opened
Jun 26, 2025 - [core] Deflake `test_spread_scheduling_overrides_locality_aware_scheduling`
#54154 opened
Jun 26, 2025 - [Data] Fix examples in some Data user guides
#54158 opened
Jun 27, 2025 - [test] fix test not ending cluster; spelling mistake: tearDow -> tearDown
#54171 opened
Jun 27, 2025 - [Feat][Core] Don't count actor restarts due to node preemption towards max_restarts
#54175 opened
Jun 27, 2025 - [Core] Use Factory method to create gcs KV Manager
#54178 opened
Jun 27, 2025 - [core][raycheck/01] Fix "it != submissible_tasks_.end()"
#54179 opened
Jun 27, 2025 - [Feat][Core] Don't count task retries due to node preemption
#54182 opened
Jun 27, 2025 - [RLlib] - Increased default timesteps on two experiments.
#54185 opened
Jun 27, 2025 - Token-split prefix router
#54187 opened
Jun 27, 2025 - [Serve.llm][Prototype][WIP] Simplify LLMServer and inherit OpenAIServingChat behavior
#54189 opened
Jun 28, 2025 - [data] allow custom batcher for dataset iteration
#54193 opened
Jun 28, 2025 - [data.llm] Add release test to capture memory leak
#54194 opened
Jun 28, 2025 - [core] Normal task submitter cleanup
#54206 opened
Jun 30, 2025 - Feat/fix g rpc error code
#54218 opened
Jun 30, 2025 - [core] [wip] lazy sub
#54220 opened
Jun 30, 2025 - another round of mac debug
#54232 opened
Jul 1, 2025 - [core] Introduce `ShutdownCoordinator` and unified core worker shutdown entry points
#54244 opened
Jul 1, 2025 - [serve] refactor call_http_entrypoint
#54253 opened
Jul 1, 2025 - [DOC-127] MVP for OSS Ray labels
#54254 opened
Jul 1, 2025 - [core] Refactoring LocalObjectManager to have a cleaner API for pinning objects.
#54255 opened
Jul 1, 2025 - [core][gpu-objects] Move data transfers to a background thread
#54256 opened
Jul 1, 2025 - [train] Use FailurePolicy to handle resize failure
#54257 opened
Jul 1, 2025 - [core] Always access task finisher from submitters without mutex
#54262 opened
Jul 2, 2025 - [Core] Fixed the bug where the child process turned into a zombie process.
#54266 opened
Jul 2, 2025 - [Core] Fixed the bug where the head was unable to submit tasks after redis is turned on.
#54267 opened
Jul 2, 2025 - fix missing brace and add pytest
#54270 opened
Jul 2, 2025 - [RLlib] Switch Offline Data iteration to `iter_torch_batches`.
#54277 opened
Jul 2, 2025 - Run Docker builds as non-root user with scoped root access via `sudo`
#54285 opened
Jul 2, 2025 - MCP Ray Serve End to End Example
#54289 opened
Jul 2, 2025 - [data] Update Hudi integration to support incremental query
#54301 opened
Jul 3, 2025 - Enable field documentation with Pydantic
#54306 opened
Jul 3, 2025 - [RLlib; docs] Docs do-over (new API stack): `ConnectorV2` documentation (part II).
#54313 opened
Jul 3, 2025 - [RLlib] Fix IndexError on peek on empty Stats object without reduction
#54325 opened
Jul 3, 2025 - Updating Daft links in Ray documentation
#54328 opened
Jul 3, 2025 - [WIP][Core] Deflake test_gcs_fault_tolerance
#54334 opened
Jul 3, 2025 - [train] update `datasets` from `2.19.1` to `3.6.0`
#54338 opened
Jul 3, 2025 - [Tests] Improve error message on some test-failures because of `IndexError` in `test_utils.py`
#54343 opened
Jul 4, 2025 - [core] Correct bytes in flight when objects <5mb
#54349 opened
Jul 5, 2025 - [Core] use RunFnPeriodically for metrics report in GCS server
#54358 opened
Jul 6, 2025 - [data.llm] Make ray the distributed backend for vLLM stage
#54361 opened
Jul 7, 2025 - [java] encapsulation + resource immutability for option classes
#54370 opened
Jul 7, 2025 - [core] prevent sending SIGTERM after calling Worker::MarkDead
#54377 opened
Jul 7, 2025 - [ci] raydepsets: adding compile operation
#54389 opened
Jul 7, 2025 - [ci] raydepsets: adding config dataclass and config loading
#54394 opened
Jul 8, 2025 - [train] fail fast if pg cannot be met
#54402 opened
Jul 8, 2025 - Fix minor bug in sample code in documentation
#54403 opened
Jul 8, 2025 - [release][ci] First test for kuberay release test trigger path
#54415 opened
Jul 8, 2025 - [llm.data] Fix AttributeError for the shallow copy of data batch transfer
#54419 opened
Jul 8, 2025 - [Core] Avoid copy deque in cluster task manager
#54432 opened
Jul 8, 2025 - [WIP][Data] Fix Operators to make sure they doesn't produce empty blocks
#54435 opened
Jul 8, 2025 - Remove CPU profiler from Ray Service Replica, Proxy & Controller
#54438 opened
Jul 8, 2025 - [data] Allocate GPU resources in ResourceManager
#54445 opened
Jul 8, 2025 - Add ray.dataset.write_delta for supporting writes to Delta Lake
#54447 opened
Jul 8, 2025 - Give the option to make `target_max_block_size` nullable
#54450 opened
Jul 8, 2025 - [Data] Limit operator push down
#54457 opened
Jul 9, 2025 - WIP: Support nixl
#54459 opened
Jul 9, 2025 - [Core] Remove ineffectual TODO comment
#54464 opened
Jul 9, 2025 - Use upstream RayPrometheusStatLogger
#54471 opened
Jul 9, 2025 - [Core] Minor fixes in GCS health check manager
#54473 opened
Jul 9, 2025 - Train Benchmark: Add preserver_order
#54474 opened
Jul 9, 2025 - Add optional APIType filter to /api/serve/applications/ endpoint
#54478 opened
Jul 9, 2025 - Remove debug logs for when aggregators are ready
#54483 opened
Jul 9, 2025 - [Data] Fixed chained inplace assignment to prevent FutureWarning from Pandas
#54486 opened
Jul 9, 2025 - [data] Return schema of res in aggregations
#54489 opened
Jul 9, 2025 - test basic functionality
#54491 opened
Jul 10, 2025 - [train] add LightGBMTrainer user guide
#54492 opened
Jul 10, 2025 - [Core] Fix the issue where multiple multithreaded calls to ray.get may cause hanging.
#54495 opened
Jul 10, 2025 - [serve.llm] Pass dimensions of embedding request to vllm engine
#54499 opened
Jul 10, 2025 - [Data] Add option for enabling out-of-order execution to optimize data processing performance
#54504 opened
Jul 10, 2025 - [WIP] `TaskExecutionResult`
#54505 opened
Jul 10, 2025 - [serve.llm] Remove upstreamed workarounds
#54512 opened
Jul 10, 2025 - Fix bug in http_serve_head by using os.path.realpath instead of inval…
#54523 opened
Jul 11, 2025 - Fix get actor timeout multiplier
#54525 opened
Jul 11, 2025 - Handle missing 'chunks' key when Databricks UC query returns zero rows
#54526 opened
Jul 11, 2025 - [core][raycheck/01] Fix "it != submissible_tasks_.end()"
#54527 opened
Jul 11, 2025 - [Doc] Update deprecated `evaluation_strategy` parameter to `eval_strategy` in transformers examples
#54528 opened
Jul 11, 2025 - [core] attempting streaming generator hanging fix
#54529 opened
Jul 11, 2025 - [Serve] Fix windows test deploy apps flakiness
#54530 opened
Jul 11, 2025 - Fix backpressure gRPC error code
#54537 opened
Jul 11, 2025 - [Data] [Draft] introduce per-op config options to disable operator fusion
#54539 opened
Jul 11, 2025 - [dashboard] fix typos
#54550 opened
Jul 12, 2025 - [train][checkpoint] CheckpointManager and Worker both count checkpoints
#54555 opened
Jul 12, 2025 - debug
#54556 opened
Jul 12, 2025 - [Core] Minor fixes in gcs job manager
#54562 opened
Jul 12, 2025 - [ci] disable test db on release auto nightly run
#54563 opened
Jul 12, 2025 - Revert "[serve] reorganize how we handle the http receive task"
#54565 opened
Jul 12, 2025 - Feat/fix request replica context
#54566 opened
Jul 12, 2025 - [Core] Core Worker GetObjStatus GRPC Fault Tolerance
#54567 opened
Jul 12, 2025 - [ci] use compiled list for install
#54568 opened
Jul 12, 2025 - Feat/remove cpu profiler
#54569 opened
Jul 12, 2025 - [llm.serve] Add support of `list[int]` for `CompletionRequest.prompt`
#54570 opened
Jul 12, 2025
1,410 Issues closed by70 people
- [Azure] Ray up for Azure fails
#48976 closed
Jul 12, 2025 - CI test linux://python/ray/tests:test_metrics_agent is consistently_failing
#48956 closed
Jul 12, 2025 - CI test linux://python/ray/air:test_integration_wandb is consistently_failing
#54553 closed
Jul 12, 2025 - [Core][autoscaler] autoscaler v2 tries to load default configs that do not exist on the image.
#54532 closed
Jul 12, 2025 - CI test linux://rllib:examples/connectors/flatten_observations_dict_space_impala is flaky
#49754 closed
Jul 12, 2025 - [Ray Data] Filtering function is very slow
#53493 closed
Jul 11, 2025 - [core][gpu-objects] Garbage collection for in-actor GPU objects
#51262 closed
Jul 11, 2025 - [core][gpu-objects] Actor sends the same ObjectRef twice to another actor
#51273 closed
Jul 11, 2025 - [Core] Ray Data job hanging with flooded Cancelling stale RPC with seqno 125 < 127 error
#50814 closed
Jul 11, 2025 - [core][autoscaler] Enable autoscaler v2 by default when running on KubeRay
#54226 closed
Jul 11, 2025 - [Serve] refactor serve code that sets `docs_path`
#53023 closed
Jul 11, 2025 - ray azure does not work out of the box
#52511 closed
Jul 11, 2025 - [<Ray component: Core|RLlib|etc...>] Anaconda free python in Docker images
#51991 closed
Jul 11, 2025 - CI test windows://python/ray/serve/tests:test_standalone is flaky
#48420 closed
Jul 11, 2025 - CI test windows://python/ray/serve/tests:test_logging is consistently_failing
#46043 closed
Jul 11, 2025 - [Serve] FastAPI ingress does not work with composable routers
#50373 closed
Jul 11, 2025 - [Serve] ingress decorator does not work with fastapi.APIRouter arg
#50372 closed
Jul 11, 2025 - CI test windows://python/ray/tests:test_node_labels is consistently_failing
#52307 closed
Jul 10, 2025 - [autoscaler][v2] Autoscaler stops working after the head node recovers with enabled FT
#54353 closed
Jul 10, 2025 - CI test linux://rllib:examples/metrics/custom_metrics_in_algorithm_training_step is flaky
#51870 closed
Jul 10, 2025 - [Serve] DeepSeek-R1 mode load stuck in H20
#50975 closed
Jul 10, 2025 - [data][bug] Dataset execution can be implicitly triggered when passing a dataset to an Actor.
#52549 closed
Jul 10, 2025 - [Data] Refactor `ParquetDatasink._write_partition_files` to use `pyarrow.parquet.write_to_dataset`
#50502 closed
Jul 10, 2025 - [Core] ray distributed debugger, always connecting to cluster..
#50682 closed
Jul 10, 2025 - [Data] __repr__ shouldn't trigger execution
#50361 closed
Jul 10, 2025 - [data] RefBundle doesn't always eagerly free data
#37910 closed
Jul 10, 2025 - [data -- read_iceberg] pickling error on UDF for dataset.groupby.map_batches
#54280 closed
Jul 10, 2025 - [Data] `test_hudi` flakes in CI 25% of the time
#50463 closed
Jul 10, 2025 - CI test windows://python/ray/serve/tests:test_deploy is consistently_failing
#46033 closed
Jul 10, 2025 - CI test linux://python/ray/serve/tests:test_standalone_2_with_compact_scheduling is flaky
#48338 closed
Jul 10, 2025 - CI test linux://python/ray/serve/tests:test_standalone_2 is flaky
#48403 closed
Jul 10, 2025 - [Serve] Specify different images for each deployment
#52994 closed
Jul 10, 2025 - [Serve] Optimize the _get_live_deployments function
#45793 closed
Jul 10, 2025 - CI test windows://python/ray/serve/tests:test_grpc is flaky
#46028 closed
Jul 10, 2025 - [Ray serve] Unable to serve meta-llama/Llama-3.1-8B-Instruct
#53663 closed
Jul 10, 2025 - [Serve] Unable to load meta-llama/Llama-3.3-70B-Instruct
#53571 closed
Jul 10, 2025 - [Dashboard] Refactor job / node / actor updating code
#16243 closed
Jul 10, 2025 - [Dashboard][event] Event API in Python.
#16250 closed
Jul 10, 2025 - [Dashboard][event] Event API in Java.
#16251 closed
Jul 10, 2025 - Ray Data: AssertionError when using repartition
#54434 closed
Jul 10, 2025 - [Train V2 + Tune] Exception in Tune while processing results from failed runs using TorchTrainer
#54379 closed
Jul 9, 2025 - [data][bug] repartition(target_num_rows_per_block) should not be fused with downstream op
#54448 closed
Jul 9, 2025 - [Serve] Multiple FastAPI ingress deployments in a single application are not disallowed
#53024 closed
Jul 9, 2025 - Release test gcp_cluster_launcher_full failed
#54460 closed
Jul 9, 2025 - Release test aws_cluster_launcher_full failed
#54452 closed
Jul 9, 2025 - [llm.serve] vllm.PoolingOutput no more has attr embedding
#54466 closed
Jul 9, 2025 - [Core] thread creation error, even with environment variables all set to 1
#54225 closed
Jul 9, 2025 - [Core] Too many threads in single pod on large-CPU-cores machine
#54422 closed
Jul 9, 2025 - Release test air_example_vicuna_13b_lightning_deepspeed_finetuning failed
#54308 closed
Jul 9, 2025 - Release test air_example_gptj_deepspeed_fine_tuning failed
#54133 closed
Jul 9, 2025 - Release test air_example_dolly_v2_lightning_fsdp_finetuning failed
#54307 closed
Jul 9, 2025 - CI test windows://python/ray/serve/tests:test_proxy is consistently_failing
#51464 closed
Jul 8, 2025 - Release test sort_fixed_size failed
#52650 closed
Jul 8, 2025 - Release test sort_autoscaling failed
#53546 closed
Jul 8, 2025 - [Autoscaler][V2] Autoscaler V2 does not honor 'Conservative' upscaling mode
#50259 closed
Jul 8, 2025 - CI test windows://python/ray/tests:test_object_assign_owner_client_mode is flaky
#54333 closed
Jul 8, 2025 - [core|serve] Migrate shared utilities from `ray._private` to `ray._common`
#53478 closed
Jul 8, 2025 - Release test dask_on_ray_large_scale_test_spilling failed
#54347 closed
Jul 8, 2025 - CI test linux://rllib:learning_tests_multi_agent_stateless_cartpole_ppo_multi_cpu is flaky
#47313 closed
Jul 7, 2025 - CI test linux://python/ray/data:test_json is flaky
#48150 closed
Jul 7, 2025 - Release test dask_on_ray_1tb_sort failed
#54356 closed
Jul 7, 2025 - Release test chaos_dask_on_ray_large_scale_test_spilling.aws failed
#54346 closed
Jul 7, 2025 - Release test chaos_dask_on_ray_large_scale_test_no_spilling.aws failed
#54345 closed
Jul 7, 2025 - CI test windows://python/ray/tests:test_object_store_metrics is flaky
#49514 closed
Jul 7, 2025 - [core][gpu-objects] Ability to register custom types for GPU data
#52340 closed
Jul 7, 2025 - CI test linux://python/ray/data:test_block_sizing is flaky
#54164 closed
Jul 5, 2025 - CI test linux://rllib:learning_tests_multi_agent_cartpole_dqn_multi_gpu is consistently_failing
#47234 closed
Jul 4, 2025 - CI test darwin://python/ray/tests:test_gcs_fault_tolerance is flaky
#43777 closed
Jul 4, 2025 - CI test linux://python/ray/tests:test_runtime_env_container is consistently_failing
#54319 closed
Jul 4, 2025 - CI test windows://python/ray/tests:test_implicit_resource is flaky
#43849 closed
Jul 3, 2025 - CI test darwin://python/ray/tests:test_streaming_generator_2 is consistently_failing
#54239 closed
Jul 3, 2025 - CI test darwin://python/ray/tests:test_scheduling_client_mode is consistently_failing
#54202 closed
Jul 3, 2025 - Release test sort.chaos failed
#49765 closed
Jul 2, 2025 - [Data] `dataset.write_iceberg` error
#52967 closed
Jul 2, 2025 - [Train] Support for Callbacks in Ray Train Training Loop
#54268 closed
Jul 2, 2025 - [Core] Add Azure Blob Storage Support for Ray runtime_env
#38316 closed
Jul 2, 2025 - CI test linux://src/ray/gcs/gcs_server/test:gcs_actor_manager_test is flaky
#54221 closed
Jul 2, 2025 - [Docker] [CI] Bump the GPU base image to a newer version
#54102 closed
Jul 2, 2025 - CI test linux://python/ray/tests:test_scheduling_client_mode is consistently_failing
#54160 closed
Jul 2, 2025 - Ray component: Core ray.init() fails on windows since #51731
#54151 closed
Jul 1, 2025 - ERROR services.py:1355 -- Failed to start the dashboard , return code 3221226505
#54165 closed
Jul 1, 2025 - [Core] ASSERTION FAILED: queue.num_items() == 0
#53510 closed
Jul 1, 2025 - [air/output] Jupyter notebook trial result table keeps swapping column order
#35838 closed
Jun 30, 2025 - [RLlib] Make Learner more standalone with regards to LearnerHyperparameters
#35788 closed
Jun 30, 2025 - [AIR] `on_trial_complete` callback hook happens before trial resources are freed
#35721 closed
Jun 30, 2025 - [core] Failed to close sockets in CoreWorker when crash.
#35681 closed
Jun 30, 2025 - Ray Data - Glob/wildcard in file path
#35499 closed
Jun 30, 2025 - [serve] Document how to silence access logs from GradioIngress
#35496 closed
Jun 30, 2025 - [RLlib] Windows CLI, cmd.exe, powershell parsing json arguments JSONDecodeError
#35492 closed
Jun 30, 2025 - [RayClient]large object transfer failure
#35448 closed
Jun 30, 2025 - [train] Simplify `test_transformers_trainer_steps::test_e2e_steps`
#35424 closed
Jun 30, 2025 - [Core] Reducing scheduling fragmentation
#35422 closed
Jun 30, 2025 - [Core, RLlib] Multi GPU RLlib experiment is unable to be scheduled.
#35409 closed
Jun 30, 2025 - [Job] Failed to schedule supervisor actor leads to job failure
#35387 closed
Jun 30, 2025 - [Job] Show submitter of a Job on the dashboard
#35367 closed
Jun 30, 2025 - [Serve] Support sync function for multiplexing
#35356 closed
Jun 30, 2025 - [AIR] [Train] train multiple instances simultaneously on machines with specified tags
#35333 closed
Jun 30, 2025 - <RLlib> What is the cause of the low CPU utilization in rllib PPO?
#35313 closed
Jun 30, 2025 - [Data] Infer the data schema in Ray Datasets
#35230 closed
Jun 30, 2025 - [dashboard] how to adjust ray dashboard refresh rate?
#35156 closed
Jun 30, 2025 - [RLlib] Better error handling when return shape from step() mismatch in utils._flatten_multidiscrete
#35113 closed
Jun 30, 2025 - The ray rsync-up cli reports no issue, but actually file is absent on remote side (Ray AWS cluster)
#35051 closed
Jun 30, 2025 - [Core] - GPU Support - Explanation of Results
#35048 closed
Jun 30, 2025 - [Data] Optimize `read_datasource` setup
#35029 closed
Jun 30, 2025 - [EC2 VM Cluster launcher] Document EC2 ssh key limit and workaround
#35020 closed
Jun 30, 2025 - [VM launcher] Ran `Ray status` after I sshed in to the head node and it printed "No cluster status"
#35017 closed
Jun 30, 2025 - [air/tune][multi-tenancy] Parallel runs can use the same experiment directory
#35006 closed
Jun 30, 2025 - Issue on page /cluster/vms/examples/ml-example.html
#34996 closed
Jun 30, 2025 - [AWS VM Cluster Launcher] AWS Cluster launcher installs nightly Ray by default
#34991 closed
Jun 30, 2025 - [CI] Fix minimal-install python 3.11: build wheel with unsupported tags.
#34980 closed
Jun 30, 2025 - [serve][docs] Add DAG building classes to the API reference
#34953 closed
Jun 30, 2025 - [AIR output] Rich table gets truncated when the terminal height is smaller than it
#34925 closed
Jun 30, 2025 - [AIR output] Format of trial table with Rich enabled.
#34923 closed
Jun 30, 2025 - [AIR output] "iteration" is shown in the output for RL users
#34918 closed
Jun 30, 2025 - [core] ray.kill doesn't guarantee resources are cleaned up
#34917 closed
Jun 30, 2025 - [Data] Add `fn_kwargs` to `BatchMapper`
#34852 closed
Jun 30, 2025 - Resource Allocation: Ray Core, Ray Client
#34816 closed
Jun 30, 2025 - [Jobs] Job agent recovers all running jobs on restart, not just those monitored by that agent
#34794 closed
Jun 30, 2025 - [Doc] Autogenerated "suggest an edit" link doesn't work
#34751 closed
Jun 30, 2025 - [Tune] thread limit resulting in the job failure in multi-tenancy usage
#34745 closed
Jun 30, 2025 - Ray Job
#34710 closed
Jun 30, 2025 - [docs][infra] automate checks for common link errors
#34681 closed
Jun 30, 2025 - [Ray Job] Auto-shutdown of the cluster when job finished
#34672 closed
Jun 30, 2025 - [Core] Ray.wait should return if task throw exception
#34653 closed
Jun 30, 2025 - [Core] ray2.3.1 gcs_server memory keeps increasing until OOM
#34619 closed
Jun 30, 2025 - [Runtime Env/Ray Job] Job submission fails when specifing local zip file as working dir
#34605 closed
Jun 30, 2025 - why ray.data.read_images cat not combine_chunks
#34563 closed
Jun 30, 2025 - [Core] Add support for cancelling descendants of a completed task
#34545 closed
Jun 30, 2025 - [Data] retrieve written paths from `Dataset.write_datasource`
#34444 closed
Jun 30, 2025 - [Docs Infra] [RLLib] Remove "<<<" from code blocks
#34439 closed
Jun 30, 2025 - [Serve] Production Guide: Add instruction for non-K8s on-premise clusters
#34437 closed
Jun 30, 2025 - [Serve] Ray Serve hangs and becomes unresponsive when calling ffmpeg in deployment
#34414 closed
Jun 30, 2025 - [Serve] Deployments page tasks history is full of system tasks. Not very useful
#34386 closed
Jun 30, 2025 - ImportError: cannot import name 'torch' from 'ray.rllib.train'
#34354 closed
Jun 30, 2025 - [core][state] Include job info for placement group
#34333 closed
Jun 30, 2025 - [Jobs] Use new API `is_head_node` to find head node
#34317 closed
Jun 30, 2025 - [Core] RFC: simplify CI testing
#34315 closed
Jun 30, 2025 - [air] Error while loading xgboost model in BatchPredictor
#34307 closed
Jun 30, 2025 - [RLlib] Unity 3d env tests are broken
#34290 closed
Jun 30, 2025 - [air/train] the logic to grab free ports for `tf_config` is potentially racy
#34271 closed
Jun 30, 2025 - [Core][Object Store] Push Manager: round for object manager client and FIFO for object
#34270 closed
Jun 30, 2025 - [air] xgboost/lightgbm trainer's validation result differ between online and offline
#34211 closed
Jun 30, 2025 - [tune] support viewing partial experiment result as tuning goes on
#34207 closed
Jun 30, 2025 - Issue on page /rllib/package_ref/algorithm.html
#34157 closed
Jun 30, 2025 - [Prometheus metrics util] Application level custom metrics aren't getting exported consistently
#34145 closed
Jun 30, 2025 - [Core] Actors not cleaning up resources correct because `force_kill=true`.
#34124 closed
Jun 30, 2025 - Ray Tune + ray xgboost running out of disk space
#34118 closed
Jun 30, 2025 - [Core][Tune]Trials hang when using Pytorch
#34028 closed
Jun 30, 2025 - [Data] `map_batches` hard to use and debug
#34007 closed
Jun 30, 2025 - [Core] improve garbage collection after job go out of scope
#34001 closed
Jun 30, 2025 - [Core] Timeout for unschedulable task due to unavailable workers
#33954 closed
Jun 30, 2025 - [Observability] Programmatically fetch prometheus metrics
#33940 closed
Jun 30, 2025 - [Ray AIR] Add more documentation about checkpointing
#33932 closed
Jun 30, 2025 - Ray Workflow
#33844 closed
Jun 30, 2025 - [Train] Intermittent `UnpicklingError` when loading estimator/preprocessor from checkpoint
#33815 closed
Jun 30, 2025 - [AIR output] Warnings for AIR_VERBOSITY is confusing
#33810 closed
Jun 30, 2025 - [air output] Aggregation of feedback for air output v2
#33803 closed
Jun 30, 2025 - [Core][Runtime Env] Document how to write custom runtime env plugin
#33746 closed
Jun 30, 2025 - Core: Can the ray core's scheduling mechanism support customized extensions?
#33735 closed
Jun 30, 2025 - [Ray init] Ray init method does not support pathlib.Path
#33672 closed
Jun 30, 2025 - [docs] improve user experience of the API ref
#33645 closed
Jun 30, 2025 - [RLLib] Collecting external experience
#33636 closed
Jun 30, 2025 - [runtime_env] Actors always depend global `pip` field for `runtime_env`
#33607 closed
Jun 30, 2025 - [Core] Raylet process not respecting `--node-ip-address`
#33554 closed
Jun 30, 2025 - [Tune] Support ExperimentAnalysis.dataframe(mode='mean')
#33540 closed
Jun 30, 2025 - [Train] `RunConfig` doesn't get propagated from the Tuner to the Trainer
#33539 closed
Jun 30, 2025 - [Core] std::bad_alloc error using ray.init()
#33525 closed
Jun 30, 2025 - [Core] `test_memory_deadlock` times out
#33491 closed
Jun 30, 2025 - [Core] Support binding worker processes to NUMA nodes
#33465 closed
Jun 30, 2025 - [Serve] Support for setting `working_dir` to a local directory in `RayService`
#33456 closed
Jun 30, 2025 - RLLIB - RE3 Exploration Algorithm - No GPU support f0r Dynamic TF V2
#33425 closed
Jun 30, 2025 - [client] kubernetes w ray client
#33367 closed
Jun 30, 2025 - [Train] Reporting metrics/checkpoints from multiple workers
#33360 closed
Jun 30, 2025 - [Data] `read_parquet` schema is incorrect (schema is a dict instead of a string)
#33279 closed
Jun 30, 2025 - [Ray status] confusing output about gpus and accelerators
#33272 closed
Jun 30, 2025 - [Serve] Enhance replica upgrade process.
#33192 closed
Jun 30, 2025 - [air output] Isolate/refactor/improve rllib related progress reporting logic
#33150 closed
Jun 30, 2025 - [Tune][wandb] Report tune experiments as a wandb `sweep`
#33142 closed
Jun 30, 2025 - [AIR][wandb] Add option to track artifact references in wandb if using cloud storage
#33130 closed
Jun 30, 2025 - [AIR][Tune] Add an option in `WandbLoggerCallback` to group wandb runs by config
#33084 closed
Jun 30, 2025 - [Serve] Support external storage for state
#33059 closed
Jun 30, 2025 - [Serve] Use the namespace of context instead of "serve" when the Controller gets all running Actors
#33057 closed
Jun 30, 2025 - [Serve] Specify replicas when scaling down
#33056 closed
Jun 30, 2025 - [Serve] Restart a batch of replicas by Actor names or replica tags
#33055 closed
Jun 30, 2025 - [Serve] Specify a batch of replicas to update their user_config
#33054 closed
Jun 30, 2025 - Ray Core Runtime Environments with tea.xyz
#33049 closed
Jun 30, 2025 - [Ray Tune] Support for continuing training when metrics are only reported from some of the workers
#33042 closed
Jun 30, 2025 - [Data] Cannot get the length of a tf dataset created from `ray_ds.to_tf`
#33004 closed
Jun 30, 2025 - [Data] Include image class id in the returned datasets of `ray.data.read_images()`.
#32989 closed
Jun 30, 2025 - [Datasets] Raise descriptive error if `iter_torch_batches` can't convert data
#32953 closed
Jun 30, 2025 - [Serve] Don't start Serve agent if Serve isn't installed
#32920 closed
Jun 30, 2025 - [Data]: `ds.take()` and `ds.iter_batches()` have unexpected different behavior for pd.Series columns
#32913 closed
Jun 30, 2025 - [Ray: Serve] Model Composition primitives should be part of Serve Core API docs.
#32837 closed
Jun 30, 2025 - [core][state] Task backend : already submitted cancelled task showing up as finished
#32826 closed
Jun 30, 2025 - [AIR][Tune] Make trial checkpoint + artifact upload happen atomically
#32823 closed
Jun 30, 2025 - [Tune] During multi-GPU training (using mp.spawn), ray.tune.report does not take effect.
#32810 closed
Jun 30, 2025 - [Tune] failure when using more than one GPU
#32760 closed
Jun 30, 2025 - [Runtime Env] Add docstring for public class methods and attributes
#32704 closed
Jun 30, 2025 - [tune] Add suggestions on when `reuse_actor` should be set to false.
#32698 closed
Jun 30, 2025 - [serve] serve run doesn't restart app successfully in some environments
#32633 closed
Jun 30, 2025 - [train] Big performance hit when TensorFlow trainer is not scheduled on head node
#32509 closed
Jun 30, 2025 - [doc][tune] clarify `Stopper`, what is `training_iteration`
#32497 closed
Jun 30, 2025 - [release] update our xgboost release test to catch issues like (see discription)
#32491 closed
Jun 30, 2025 - [Core] The remote function in the worker no longer runs after the head crashes
#32454 closed
Jun 30, 2025 - [RLlib] Special __common__ key in MultiAgent batches is not documented
#32399 closed
Jun 30, 2025 - [tune] update how trainable reports result/checkpoint to driver
#32380 closed
Jun 30, 2025 - [Datasets] The projection pushdown cannot work with hive style partitioning file path
#32301 closed
Jun 30, 2025 - [Core][utilization] some anti-pattern that not well supported by Ray core.
#32297 closed
Jun 30, 2025 - [tune/train] Provide actionable error messages for common thirdparty errors
#32232 closed
Jun 30, 2025 - [ci] Mirror external dependenies in CI
#32113 closed
Jun 30, 2025 - [Serve] ValueError: Message ray.serve.ReplicaConfig exceeds maximum protobuf size of 2GB
#32049 closed
Jun 30, 2025 - Serve build usage of click CLI library conflicts python argparse
#32001 closed
Jun 30, 2025 - [Serve] Version Support in 2.X API
#31928 closed
Jun 30, 2025 - [Train] User exceptions not propagated from remote cluster
#31913 closed
Jun 30, 2025 - [RLlib] AlgorithmConfig() defaults not used by build_sac_model when implementing custom model
#31783 closed
Jun 30, 2025 - [kubernetes/cluster] More guides on deployment
#31623 closed
Jun 30, 2025 - [core][state] ray log supporting regex searching
#31549 closed
Jun 30, 2025 - [Tune] Support NLopt search algorithms
#31492 closed
Jun 30, 2025 - [Rllib] Possible Redudant Code
#31463 closed
Jun 30, 2025 - [aws] ray submit --stop fails on aws
#31380 closed
Jun 30, 2025 - [Tune] Avoid insufficient resources warning if cluster is autoscaling
#31292 closed
Jun 30, 2025 - No worker logs in the dashboard after recreating the K8S Ray pods
#31288 closed
Jun 30, 2025 - [core] Please improve warning message for ip mismatch
#31264 closed
Jun 30, 2025 - [core][state] Refactor use of bounded LRU/FIFO buffer/map used in task backend
#31158 closed
Jun 30, 2025 - [core] Ray resources should be case-insensitive
#31087 closed
Jun 30, 2025 - [RayCluster]
#31041 closed
Jun 30, 2025 - [Serve] gRPCis should not allow route_prefix set
#30891 closed
Jun 30, 2025 - [General] Setup a "code walkthrough" meetup or tutorial
#30852 closed
Jun 30, 2025 - [RFC][core] Option to avoid scheduling tasks to nodes with disk full
#30843 closed
Jun 30, 2025 - [core] Enable greater control over log verbosity
#30832 closed
Jun 30, 2025 - [Tune] ability to specify search algorithm when using tune.run_experiments()
#30802 closed
Jun 30, 2025 - [RLlib] Deprecate the RLlib spaces that are duplications of gym spaces.
#30800 closed
Jun 30, 2025 - [Tune] Guard against users overriding internal `Trainable` methods
#30795 closed
Jun 30, 2025 - Ray Cluster Resources Issue
#30780 closed
Jun 30, 2025 - [Core] Worker leak
#30731 closed
Jun 30, 2025 - [RLlib] Default policy error in two trainer work flow
#30676 closed
Jun 30, 2025 - [core] Can't set working directory for runtime env in actor definition
#30666 closed
Jun 30, 2025 - [Tune] HeboSearch reproducible deterministic results
#30661 closed
Jun 30, 2025 - [core] Memory changes are not as expected when using ray.get()
#30615 closed
Jun 30, 2025 - [Tune] `fail_fast` marks all runs as terminated, making the experiment impossible to restore
#30584 closed
Jun 30, 2025 - [RLLib] Custom model with LSTM causes the auto wrapping to be partially executed
#30581 closed
Jun 30, 2025 - [Core|RayTrain] RuntimeError: Some workers returned results while others didn't
#30545 closed
Jun 30, 2025 - [Core] Overriding the default logging format for Worker logs
#30544 closed
Jun 30, 2025 - [AIR] Canonical way to determine whether the code is running in a Train/Tune session
#30536 closed
Jun 30, 2025 - [client][runtime_env] Inconsistent runs on ray client
#30518 closed
Jun 30, 2025 - [Core] ray.exceptions.RaySystemError: System error: buffer source array is read-only
#30505 closed
Jun 30, 2025 - [Core] Access violation on windows 11 when running modin workload
#30493 closed
Jun 30, 2025 - Critic Regularized Regression (CRR) model is getting error with Custom Environment (Offline RL)
#30411 closed
Jun 30, 2025 - [Docs] [Jobs] Add pros and cons of different ways of submitting a job
#30305 closed
Jun 30, 2025 - [air/horovod] horovod distributed worker creation may hang
#30276 closed
Jun 30, 2025 - [<Ray component: Workflow>] module 'ray.workflow' has no attribute 'HTTPListener'
#30248 closed
Jun 30, 2025 - [RLLIB][Torch] numerically unstable + mkl issue in torch.sqrt normc_initializer
#30191 closed
Jun 30, 2025 - [RLlib] RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)
#30164 closed
Jun 30, 2025 - [AIR][Tune] Provide user guide on how to build active learning on AIR
#30157 closed
Jun 30, 2025 - [AIR/Docs] Mention/warn that running a Trainer inside a custom Tune trainable is an anti-pattern
#30153 closed
Jun 30, 2025 - [Core] Reference leakage somewhere after ray.shutdown()
#30089 closed
Jun 30, 2025 - [Tune] Can't access all metrics for all trials
#30004 closed
Jun 30, 2025 - [core][dashboard] state api on worker nodes can not connect to dashboard url
#29959 closed
Jun 30, 2025 - [Jobs] Include requested and available resources in JobInfo status message
#29921 closed
Jun 30, 2025 - [RLlib] Add some metric for aync algos (e.g. APPO) that shows the total number of gradient updates
#29830 closed
Jun 30, 2025 - [AIR] Update pytorch training and prediction benchmark with numpy with updated metrics
#29743 closed
Jun 30, 2025 - [RLlib] Undesired memory growing when using convolutional neural network
#29699 closed
Jun 30, 2025 - [AIR] `XGBoostTrainer` gives misleading error if column missing
#29695 closed
Jun 30, 2025 - [RLLib Tests] : Included pytests in package as well as basic commands fail with ValueError
#29691 closed
Jun 30, 2025 - [RLlib] Benchmark bandit methods vs plain Thompson Sampling for a non-contextual MAB
#29528 closed
Jun 30, 2025 - [Autoscaler] Delete AWS resources created when launching Ray cluster upon cluster termination
#29499 closed
Jun 30, 2025 - [Ray Log_monitor]: close_all_files ProcessLookupError
#29452 closed
Jun 30, 2025 - Ray core: incorrect account of GPUs on ec2 ubuntu instance: g4dn.2xlarge
#29420 closed
Jun 30, 2025 - [core] GCS segfaults under OOM
#29336 closed
Jun 30, 2025 - [AIR] Add progress bar for training
#29314 closed
Jun 30, 2025 - [CI] A simple way to reproduce osx/linux/windows CI run failure locally
#29068 closed
Jun 30, 2025 - [Core] Is it possible to do asynchroneous task submission?
#29039 closed
Jun 30, 2025 - [doc][core] multiprocessing.Pool should document resource usage with ray_remote_args
#29004 closed
Jun 30, 2025 - [Train] Allow passing in placement group to trainer
#28924 closed
Jun 30, 2025 - [<Algorithm overview>]
#28915 closed
Jun 30, 2025 - Runtime Environment Dependencies- container per task
#28875 closed
Jun 30, 2025 - [Ray component: Core] Returning to much data from ray remote fails with no error
#28855 closed
Jun 30, 2025 - Issue on page /ray-core/examples/plot_parameter_server.html
#28854 closed
Jun 30, 2025 - [Datasets] Why does pydantic make training slower?
#28836 closed
Jun 30, 2025 - [Infra] Improve Ray client usability
#28790 closed
Jun 30, 2025 - [Core] Download Logs from Ray Dashboard
#28788 closed
Jun 30, 2025 - Ray Core: AttributeError: 'NoneType' object has no attribute 'enum_types_by_name'
#28779 closed
Jun 30, 2025 - [Tune] HyperOptSearch fails with nested config dicts and points_to_evaluate
#28753 closed
Jun 30, 2025 - Ray Deployment crashes in docker [<Ray component: Serve>]
#28732 closed
Jun 30, 2025 - [Ray Serve]: Testing out on local using Docker container
#28692 closed
Jun 30, 2025 - [Core] CloudPickle explain tool
#28585 closed
Jun 30, 2025 - [dashboard] Dashboard randomly not showing the status of worker nodes.
#28569 closed
Jun 30, 2025 - [AIR] Status updates still prints even with breakpoint
#28554 closed
Jun 30, 2025 - [AIR/Tune] Session report does not show the key for those not included in the first metrics report
#28549 closed
Jun 30, 2025 - [Core] dump the info and anaylze the data offline
#28496 closed
Jun 30, 2025 - [Core] Document what are the generic python code that's easily scalable.
#28487 closed
Jun 30, 2025 - [AIR] Refactor checkpoint encoding and decoding out of Backend to framework-specific Checkpoints
#28462 closed
Jun 30, 2025 - [Core] [RLlib] RLlib on Ray 2.0 not easily working on Colab
#28457 closed
Jun 30, 2025 - [Job Submission] Support env file input in ray.runtime_env.RuntimeEnv
#28453 closed
Jun 30, 2025 - [Tune] Adding DEHB
#28427 closed
Jun 30, 2025 - [serve] Gradio integration does surface error messages, runs indefinitely
#28399 closed
Jun 30, 2025 - [Ray: Core] Ray can hang when getting an ObjectRef from an unknown environment
#28341 closed
Jun 30, 2025 - [Core][RuntimeEnv]Make `job_submission_id` to a new index of GCS::JobTableData
#28337 closed
Jun 30, 2025 - [Jobs] Run jobs tests on Windows
#28316 closed
Jun 30, 2025 - [Core] Job stop should terminate runtime_env setup
#28221 closed
Jun 30, 2025 - [Core] log_to_driver=False does not suppress worker errors in ipython
#28216 closed
Jun 30, 2025 - [core][runtime envs] Ray should respect CUDA_VISIBLE_DEVICES if set in runtime env
#28215 closed
Jun 30, 2025 - [Core] ray dashboard <rayhost>:8265/nodes?view=details cpuPercent should contains actor's subprocess
#28100 closed
Jun 30, 2025 - [Core] Ray may hang if workers fail to start due to limited ports
#28071 closed
Jun 30, 2025 - [Core] Support retry_delay option in Ray tasks
#28015 closed
Jun 30, 2025 - [Core] Ray object primary copy transfer
#27985 closed
Jun 30, 2025 - [Core] allow customized error message for WorkerCrashedError
#27947 closed
Jun 30, 2025 - [RLlib] make policy evaluation support Attention nets
#27909 closed
Jun 30, 2025 - [tune] allow using (nested) data classes for search space definition
#27904 closed
Jun 30, 2025 - [Autoscaler][GCP] Autofill GCP node type resources
#27888 closed
Jun 30, 2025 - [Core] The Idle worker killing feature slows down tasks
#27863 closed
Jun 30, 2025 - [AIR] Support TorchRec trainer
#27794 closed
Jun 30, 2025 - [Dashboard] Dashboard agent cannot be started because the port is still occupied
#27736 closed
Jun 30, 2025 - [Tune/RLlib] log_to_file creates files, but doesn't write anything there
#27702 closed
Jun 30, 2025 - [RLLib] global_timestep not monotonic when when running concurrent episodes with ExternalEnv
#27669 closed
Jun 30, 2025 - [Dashboard] Ray Dashboard not showing the SpillWorker's actual memory usage
#27591 closed
Jun 30, 2025 - [Core] The actors got distributed to just a few nodes even with spread scheduling
#27577 closed
Jun 30, 2025 - [runtime_env] Add tests for all driver output (warnings, etc)
#27566 closed
Jun 30, 2025 - [Tune] TuneReportCheckpointCallback causes two checkpoints to made every time it is called.
#27524 closed
Jun 30, 2025 - [AIR] SettingWithCopyWarning for "A value is trying to be set on a copy of a slice from a DataFrame"
#27352 closed
Jun 30, 2025 - [ray dashboard] profile button not working
#27211 closed
Jun 30, 2025 - [Ray Train] Ray Train running slow when multiple workers executed
#27107 closed
Jun 30, 2025 - [workflow] We should give the storage a default value if it's not set in some way.
#27046 closed
Jun 30, 2025 - [State Observability][Log] Allow to ctrl + C when running logs API
#27008 closed
Jun 30, 2025 - [runtime env] local `working_dir` doesn't work with strongly-typed `RuntimeEnv`
#26984 closed
Jun 30, 2025 - [Core][State Observability] More fine-grained exceptions/error codes handling
#26974 closed
Jun 30, 2025 - [Core] Typing for .options for Ray Tasks
#26871 closed
Jun 30, 2025 - [State Observability] Support filter None value
#26820 closed
Jun 30, 2025 - [Core] Batch PinObjectIDs requests from Raylet client
#26796 closed
Jun 30, 2025 - [Train] feature request for catboost_ray
#26687 closed
Jun 30, 2025 - [AIR/Tune] Add a `ScalingConfig`-based API to `ResourceChangingScheduler`
#26538 closed
Jun 30, 2025 - [RLlib] CRR and CQL consume more cpus than reported
#26533 closed
Jun 30, 2025 - [Job] Job submission not following convention for quote
#26514 closed
Jun 30, 2025 - [RLlib] Unable to call ray.remote functions inside env/action dist
#26468 closed
Jun 30, 2025 - [Core] Observing Multiple Exceptions When Using Different Python Patch Versions
#26443 closed
Jun 30, 2025 - [RLlib] Use observations (input_dict) for exploration
#26437 closed
Jun 30, 2025 - [core][c++ worker] RayClusterModeTest.DefaultActorLifetimeTest timed out in macOS
#26435 closed
Jun 30, 2025 - [Ray component: Core] Enable better progress bar
#26426 closed
Jun 30, 2025 - [RLlib] Issue Regarding Future Warnings
#26424 closed
Jun 30, 2025 - [doc][Core | State Observability] Document usage of the rate limiting env variable in public doc
#26370 closed
Jun 30, 2025 - [Tune] NevergradSearch Budget Exception
#26305 closed
Jun 30, 2025 - How to color-code console output
#26226 closed
Jun 30, 2025 - [Core] [Quality] Live handle raises unnecessary exception when script ends
#26198 closed
Jun 30, 2025 - [RLlib]: SimpleQ TF2 is broken
#26192 closed
Jun 30, 2025 - [State Observability] Raise an exception if the state schema contains predicates.
#26125 closed
Jun 30, 2025 - [RLlib] server reports nan episodes and empty policy
#26048 closed
Jun 30, 2025 - [test][autoscaler] ModuleNotFoundError: No module named 'ray.tests'
#26023 closed
Jun 30, 2025 - [Tune] Ray Tune doesn't work inside Spark UDF
#26002 closed
Jun 30, 2025 - [Serve] A Deployment Graph with unfulfilled demands fails to scale Pods in Kubernetes
#25998 closed
Jun 30, 2025 - API server internal error message not useful
#25986 closed
Jun 30, 2025 - [Core][State Observability] Use a separate thread to run spill/restore
#25960 closed
Jun 30, 2025 - [RLlib] KeyError: simple_list_collector.py, line 950, in postprocess_episode
#25938 closed
Jun 30, 2025 - [RLLib] SampleBatch.update() doesn't update `added_keys`
#25937 closed
Jun 30, 2025 - [runtime env] Use namespace for internal KV storage
#25897 closed
Jun 30, 2025 - [Core?] Federation + data perimeters
#25846 closed
Jun 30, 2025 - [Core] RBAC + auditability
#25845 closed
Jun 30, 2025 - [Core] Arrow Flight Server doesn't work with Ray Actors due to two GRPC versions
#25774 closed
Jun 30, 2025 - [Core | State Observability ] Refactor summary/log SDK to use StateApiClient
#25746 closed
Jun 30, 2025 - [Serve] Deployment fails if name contains slashes
#25714 closed
Jun 30, 2025 - [RLlib] ModelCatagolg Selects Wrong Model for Nested Complex Observations
#25619 closed
Jun 30, 2025 - [Core][Observability] Ray memory should show more objects
#25463 closed
Jun 30, 2025 - [Dashboard] Error during render node with gpu and 4 hdds
#25437 closed
Jun 30, 2025 - [Ray Collective Lib] Enable CI
#25396 closed
Jun 30, 2025 - Core: deamonset feature request
#25334 closed
Jun 30, 2025 - [DeviceMesh][Collective] Support multiple tensors API
#25129 closed
Jun 30, 2025 - [Ray Air] nan in the tensorflow_linear_dataset_example.py
#25037 closed
Jun 30, 2025 - ray docker images do not have uvloop installed
#25023 closed
Jun 30, 2025 - Ray Tune: No console output is logged to Wandb.
#25011 closed
Jun 30, 2025 - [Core][RLlib][Tune] CUDA PTX error when training with Tune
#25001 closed
Jun 30, 2025 - Ray component: Core: PoolActor processes hanging
#24784 closed
Jun 30, 2025 - [RLlib] Duplicate custom metrics
#24731 closed
Jun 30, 2025 - [Serve] Asynchronous inference best practices
#24627 closed
Jun 30, 2025 - [tune] `progress_reporter.py` is messy and should be cleaned up
#24604 closed
Jun 30, 2025 - [aws][autoscaler] AWS: When using spot instances, always single availability zone is selected
#24310 closed
Jun 30, 2025 - [RLlib] PPO - ray.rllib.agents.ppo "Put Error"
#24307 closed
Jun 30, 2025 - [Ray Collective] Remove Redis store and LocalFile store from gloo mode.
#24288 closed
Jun 30, 2025 - [Autoscaler] upscaling_speed: 0 gets reset to 1
#24177 closed
Jun 30, 2025 - [RLlib] Categorical action dist incorrectly uses tf.random.categorical
#24055 closed
Jun 30, 2025 - [RLlib] Enable Training from Replay Buffer Larger than Memory
#23816 closed
Jun 30, 2025 - [RLlib] [Bug] IMPALA causes an OOM after a long running.
#23769 closed
Jun 30, 2025 - [BUG] Ray dashboard client failed to build
#23548 closed
Jun 30, 2025 - [RFC][Feature][Autoscaler][Core]Graceful draining of nodes while scale-down
#23522 closed
Jun 30, 2025 - [ml][Improvement] Improve messages to be “rank0, rank1” actors etc.
#23310 closed
Jun 30, 2025 - [Feature] [tune] create a mlflow run name from config params
#23228 closed
Jun 30, 2025 - [Feature][RLlib] Improve pytorch memory usage by disabling caching
#23077 closed
Jun 30, 2025 - [tune][Bug] Worker doesn't sync the logs to HDFS at the given interval
#23055 closed
Jun 30, 2025 - [Bug] AdaBelief optimizer crashes checkpoint restore
#22976 closed
Jun 30, 2025 - [Bug] Custom model with R2D2
#22747 closed
Jun 30, 2025 - [Bug] Resources displayed in Dashboard don't match cluster configuration
#22548 closed
Jun 30, 2025 - [Bug] Deletion of Ray clusters hangs while Ray operator is still up
#22505 closed
Jun 30, 2025 - Doing import ray breaks my logging [Bug]
#22312 closed
Jun 30, 2025 - [Feature][Client] remove ray.disconnect() and ray.connect()
#22125 closed
Jun 30, 2025 - [Bug] Detached actor exceptions are not logged.
#21810 closed
Jun 30, 2025 - [Bug] Sometimes the worker node logs in the ray dashboard are empty
#21785 closed
Jun 30, 2025 - [Core][Feature] Add checksum support for object store.
#21782 closed
Jun 30, 2025 - Setting VF_SHARE_LAYERS to False and NO_FINAL_LINEAR to true leads to a bug
#21756 closed
Jun 30, 2025 - [Feature] [runtime env] support using different python versions in Ray cluster
#21597 closed
Jun 30, 2025 - [Feature] [Serve] Request Redistribution Among Replicas
#21578 closed
Jun 30, 2025 - Put failed error occurred when shutdown and init again at client mode
#21573 closed
Jun 30, 2025 - [Core] [Bug] No timeout or deadlock on scheduling job in remote cluster
#21419 closed
Jun 30, 2025 - [Core] [Bug] Failed to register worker to Raylet for single node, multi-GPU
#21226 closed
Jun 30, 2025 - [Train] Port over `timm` example to Train
#21020 closed
Jun 30, 2025 - [Bug] [RLlib] Custom metrics are not reported to Tune
#20938 closed
Jun 30, 2025 - [Train] Deepspeed support
#20648 closed
Jun 30, 2025 - [Bug] Cannot start cluster if other user is already running one
#20634 closed
Jun 30, 2025 - [Bug] Excess memory usage when scheduling tasks in parallel?
#20618 closed
Jun 30, 2025 - [Bug] Ray auto init interacts badly with allow_multiple=True and kills python shell
#20355 closed
Jun 30, 2025 - [Bug] BasicVariantGenerator not compatible with Repeater
#19879 closed
Jun 30, 2025 - [Feature] Support Sigopt for Tune standard space definitions
#19018 closed
Jun 30, 2025 - [Bug] Re-enable Worker in Container Tests.
#18787 closed
Jun 30, 2025 - [datasets] `random_shuffle` overspills objects on random node
#17612 closed
Jun 30, 2025 - [Core] Ray Actor abnormal exit problem && Reproduction
#17198 closed
Jun 30, 2025 - Support resizing placement groups
#16403 closed
Jun 30, 2025 - [dashboard] Errors are not shown
#15238 closed
Jun 30, 2025 - changing the docker image in consecutive `ray up` calls fails.
#14990 closed
Jun 30, 2025 - [metrics] Add regression tests for Prometheus metrics
#14614 closed
Jun 30, 2025 - [dashboard] Show more nodes at a time instead of paging through
#14537 closed
Jun 30, 2025 - [autoscaler] Support Memory Aware Scheduling on a multi-node-type cluster.
#14104 closed
Jun 30, 2025 - [RFC] k8s-native worker pool
#14077 closed
Jun 30, 2025 - [dashboard] clicking on a column to sort makes the UI blank
#13525 closed
Jun 30, 2025 - Autoscaler does not respect --num-cpus argument to `ray start`
#13270 closed
Jun 30, 2025 - [core] Number of CPUs in ray.available_resources() does not match Dashboard's Machine View
#13100 closed
Jun 30, 2025 - atexit handlers don't run when actor is terminated from going out of scope
#12806 closed
Jun 30, 2025 - Task Cancellation is broken for queued tasks
#12080 closed
Jun 30, 2025 - [logging] Use 'warnings.warn' appropriately
#12060 closed
Jun 30, 2025 - [Dashboard] New dashboard port errors in a large cluster.
#11638 closed
Jun 30, 2025 - ES Trainer does not support evaluation workers
#10999 closed
Jun 30, 2025 - [Plasma] Improve plasma documentation on distributed storage
#10858 closed
Jun 30, 2025 - Unable to connect to ray head running on linux from ray worker node on windows
#10362 closed
Jun 30, 2025 - Ray log tracing
#9786 closed
Jun 30, 2025 - [Core] Logging policy should be clearly defined and needs unit test coverage
#9692 closed
Jun 30, 2025 - [dashboard] Error on Infinity values
#9103 closed
Jun 30, 2025 - "ray timeline" command fails when RAY_ADDRESS is set
#8951 closed
Jun 30, 2025 - [tune] [dashboard] Table formatting issues due to too many hparams
#8667 closed
Jun 30, 2025 - Allowing multiple users to access a single ray cluster
#6800 closed
Jun 30, 2025 - [Ray core & ray cluster] Add diagrams/architectures to explain how to run ray locally vs remotely
#25663 closed
Jun 30, 2025 - [Ray Clusters] Remove nightly and latest images and wheels from all example configs.
#25606 closed
Jun 30, 2025 - [air] Consider having a preprocessor for Feast integration
#25559 closed
Jun 30, 2025 - [Core] Open telemetry Context pass from ray client to actors
#25538 closed
Jun 30, 2025 - [dataset] Reduce tasks in push-based shuffle are not evenly distributed
#25468 closed
Jun 30, 2025 - [Core] [State Observability] List all actor logs when actors are restarted.
#25443 closed
Jun 30, 2025 - [air] Ordinal Encoder complains about None
#25442 closed
Jun 30, 2025 - [Autoscaler] google-cloud-storage seems cannot read GOOGLE_APPLICATION_CREDENTIALS
#25308 closed
Jun 30, 2025 - [Serve] Dynamically move models between CPUs and GPUs
#25295 closed
Jun 30, 2025 - [RLlib][Doc] Add documentation for `ModelCatalog.get_model_v2()`
#25186 closed
Jun 30, 2025 - [AIR] MLflow integration polish
#25156 closed
Jun 30, 2025 - [AIR] TensorFlow warns to use `distribute.MultiWorkerMirroredStrategy` when I'm already using it
#25140 closed
Jun 30, 2025 - [air] Have a default column for not frequent enough categories for OHE
#25096 closed
Jun 30, 2025 - [Core] Make NodeManager unit testable
#25095 closed
Jun 30, 2025 - [AIR] Improve logging for train
#25088 closed
Jun 30, 2025 - [RLlib] Hope RLlib can support DQfD & POfD
#25058 closed
Jun 30, 2025 - [AIR] Support postprocessing in Predictors
#24979 closed
Jun 30, 2025 - [AIR] Add a `TorchVision` preprocessor
#24976 closed
Jun 30, 2025 - [AIR/Train] Torch: Automatically unpack model when checkpointing state dicts
#24975 closed
Jun 30, 2025 - [AIR/Train] Automatically return the framework specific dataset in `train_loop_per_worker`
#24974 closed
Jun 30, 2025 - [<Ray component: RLlib>] ppo error when not using critic
#24907 closed
Jun 30, 2025 - [RLlib]: Add tabular models to ModelV2
#24882 closed
Jun 30, 2025 - [RLlib] Error when converting GYM Robotics env to Multi-agent Env with the make_multi_agent wrapper
#24881 closed
Jun 30, 2025 - [tune] SigOptSearch suggester is not serialisable
#24864 closed
Jun 30, 2025 - [core] Add basic metrics for lineage reconstruction
#24855 closed
Jun 30, 2025 - [Core] Enhance runtime env state when `ray list runtime-env` is used.
#24838 closed
Jun 30, 2025 - [Core] Refactor Ray memory codepath to follow same pattern as `ray list tasks`.
#24836 closed
Jun 30, 2025 - [Core] Reach parity of task status for `ray memory` and `ray list tasks`
#24835 closed
Jun 30, 2025 - [Tune] `MedianStoppingRule` mishandles `nan`s
#24809 closed
Jun 30, 2025 - [Serve] Simplify json_serde of deployment graph
#24620 closed
Jun 30, 2025 - [RLlib] Metrics not reported with Client/Server and env=None
#24601 closed
Jun 30, 2025 - [Serve] `Deployment.url` not updated after options changing name or prefix.
#24548 closed
Jun 30, 2025 - [Rllib] Lack validation for "num_workers" parameter in DDPGTrainer.
#24536 closed
Jun 30, 2025 - [doc] Update instructions for wheel installation
#24533 closed
Jun 30, 2025 - [RLlib] Simplex action space shape
#24529 closed
Jun 30, 2025 - [Tune] Make it easy to configure logger level
#24447 closed
Jun 30, 2025 - [tune] improve documentation around "resource exhausted error"
#24439 closed
Jun 30, 2025 - [Core] Unify RegisterClient and AnnounceWorkerPort
#24432 closed
Jun 30, 2025 - [core] Annotation and docstring for ray.remote wrapped functions
#24411 closed
Jun 30, 2025 - [AIR] `Result` object doesn't work with Ray Client
#24396 closed
Jun 30, 2025 - [RLlib] Current Implementation of Replay Buffer is not a True Circular Buffer
#24393 closed
Jun 30, 2025 - [RLlib] wrong env step counting when train multi-agent with shared default policy
#24340 closed
Jun 30, 2025 - [Core][observability] Enable observability features built in gRPC
#24327 closed
Jun 30, 2025 - [Autoscaler][Docs] Add up-to-date docs on how the autoscaler works.
#24323 closed
Jun 30, 2025 - Received message larger than max (105683136 vs. 104857600)
#24286 closed
Jun 30, 2025 - [Serve] [Doc] HTTP Adapters Cookbooks
#24245 closed
Jun 30, 2025 - [Serve] Default DAGDriver implementation cannot serve.run() or serve.build() twice
#24122 closed
Jun 30, 2025 - [AIR] Support functionality to stitch Preprocessor with Keras model
#24023 closed
Jun 30, 2025 - [Core] Log propagation between actor exit called and process terminated
#24020 closed
Jun 30, 2025 - [<Ray component: Serve] Improve access by index/key on intermediate result in Serve deployment graph
#23987 closed
Jun 30, 2025 - [Serve] [Docs] Improve architectural diagrams
#23956 closed
Jun 30, 2025 - [Runtime Env] Dependency Installation private git repositories via ssh
#23768 closed
Jun 30, 2025 - [ray client] ray.wait timeout is not respected when connection is interrupted
#23694 closed
Jun 30, 2025 - [Feature] [Tune] Trial-wise dependencies
#23654 closed
Jun 30, 2025 - [Bug] `policies_to_train` throws incorrect/confusing error message when passed an empty list.
#23646 closed
Jun 30, 2025 - [Feature] support of complicated action space in QMix algorithm in Rllib.
#23634 closed
Jun 30, 2025 - [runtime env] Deflake `test_runtime_env_working_dir_2`
#23569 closed
Jun 30, 2025 - [runtime env] [Feature] Make Internal KV operations async
#23567 closed
Jun 30, 2025 - [Feature] .bind() on function does not take pre-bind value from upstream DAGNode
#23511 closed
Jun 30, 2025 - [RLlib][Bug] RLLib Dreamer tuned example requesting unreasonable amount of GPU memory
#23479 closed
Jun 30, 2025 - [Core] Add a warning message if options / arguments differ for Actor.options(get_if_exists=True)
#23455 closed
Jun 30, 2025 - [RLlib][Feature] Feature Importance Plots
#23447 closed
Jun 30, 2025 - [air] Logging message is not relevant to user
#23430 closed
Jun 30, 2025 - [RLlib][docs] Adding more flow charts to RLlib components docs
#23393 closed
Jun 30, 2025 - [runtime env] Warn user if pip check fails
#23335 closed
Jun 30, 2025 - [runtime env] Refactor packaging code
#23257 closed
Jun 30, 2025 - [runtime env] Improve tracking of URI size
#23186 closed
Jun 30, 2025 - [updater][Bug] update fails on preempted node and autoscaler stops scheduling
#23182 closed
Jun 30, 2025 - Pipeline ingress requires trailing /
#23048 closed
Jun 30, 2025 - Shouldn't require `PipelineInputNode` to build a pipeline DAG
#23037 closed
Jun 30, 2025 - Pipeline DAG sanity check for model wrappers fields
#23019 closed
Jun 30, 2025 - Pipeline doesn't accept importable class as arguments
#23016 closed
Jun 30, 2025 - [Train] add logging to `finish_training` for existing `Callback`s
#22754 closed
Jun 30, 2025 - [Bug] [serve] Accessing shared objects within a deployment
#22751 closed
Jun 30, 2025 - [Feature] Client version check on commit
#22675 closed
Jun 30, 2025 - [Jobs] run all doc examples in CI
#22487 closed
Jun 30, 2025 - Some tests misusing assertTrue for comparisons
#22395 closed
Jun 30, 2025 - [Enhancement][client] Move synchronous GetObject calls to datapath
#22357 closed
Jun 30, 2025 - Enhance state notification pattern in Ray pubsub
#22340 closed
Jun 30, 2025 - [Core] Avoiding subscribing to all logs by each log subscriber
#22274 closed
Jun 30, 2025 - [Train] TPU support
#22251 closed
Jun 30, 2025 - [train] support per epoch shuffling with `prepare_dataloader`
#22108 closed
Jun 30, 2025 - [runtime_env] Remove `.lock` files after URI garbage collection
#22062 closed
Jun 30, 2025 - [runtime env] Use LRU cache for URIs instead of random eviction
#22060 closed
Jun 30, 2025 - [runtime env] Use single URI for `py_modules` field
#22059 closed
Jun 30, 2025 - [Train] Add callback preprocessor that smoothly tracks values
#21989 closed
Jun 30, 2025 - [Bug] Policy - ActionDistribution Type
#21973 closed
Jun 30, 2025 - [runtiime env] Use coroutine to create runtime envs in `runtime_env_agent`
#21950 closed
Jun 30, 2025 - [Train] Add support for Bagua
#21934 closed
Jun 30, 2025 - [Bug] "The kernel has died..." during Ray tune.run
#21917 closed
Jun 30, 2025 - [Jobs] Backwards compatibility tests for REST API
#21915 closed
Jun 30, 2025 - [Jobs] Make jobs work out-of-the-box with cluster YAML
#21911 closed
Jun 30, 2025 - [Train] Support for averaging results
#21849 closed
Jun 30, 2025 - AttributeError raised when using response_model in FastAPI route decorator
#21744 closed
Jun 30, 2025 - [Feature] [runtime env] [C++] support a strong-typed API in C++
#21733 closed
Jun 30, 2025 - [runtime env] Cross-language runtime env
#21731 closed
Jun 30, 2025 - [Testing] multi fake node set up doesn't work under non ray client mode
#21653 closed
Jun 30, 2025 - [Bug] "Sent message larger than max" error with dask
#21601 closed
Jun 30, 2025 - [runtime env] Can we avoid merging two runtime envs?
#21494 closed
Jun 30, 2025 - [runtime env] raise exception for unsupported runtime_env features on Windows
#21435 closed
Jun 30, 2025 - [train] fix scalability of `JsonLoggerCallback`
#21416 closed
Jun 30, 2025 - [Feature] [runtime env] [java] select jdk version
#21239 closed
Jun 30, 2025 - [Feature][Tune] Trial status based Stopper
#21222 closed
Jun 30, 2025 - [Train][Tune] Unify Train and Tune Callbacks
#21065 closed
Jun 30, 2025 - [Bug] rsync_filter isn't used in hash_runtime_conf
#20878 closed
Jun 30, 2025 - [autoscaler] Persistent problems encountered during autoscaling can lead to driver log spam
#20855 closed
Jun 30, 2025 - [Feature] Autoscaler should understand AWS availability and act accordingly
#20774 closed
Jun 30, 2025 - [Bug] Test placement group chaos testing
#20716 closed
Jun 30, 2025 - [GCP][autoscaler] Scale down is slow and Ray status doesn't show pending nodes
#20695 closed
Jun 30, 2025 - Support snappy compression for spilled objects
#20575 closed
Jun 30, 2025 - Sparse object reads - read part of an object, without downloading the entire object
#20500 closed
Jun 30, 2025 - [core] Scale shuffle to 200+ nodes
#20499 closed
Jun 30, 2025 - Memory-aware task scheduling to avoid OOMs under memory pressure
#20495 closed
Jun 30, 2025 - [Feature] [Placement Group] Add timeout mechanism when scheduling placement group
#20477 closed
Jun 30, 2025 - [job submission] Add RAY_ADDRESS or --address to suggested commands for logs/status
#20441 closed
Jun 30, 2025 - [Bug] [Ray Autoscaler] [Core] Ray Worker Node Relaunching during 'ray up'
#20402 closed
Jun 30, 2025 - [workflow] Fail to construct workflow within a workflow
#20381 closed
Jun 30, 2025 - [Feature] [Serve] Threading for Ray Serve
#20169 closed
Jun 30, 2025 - [Feature] [Serve] Support Sticky Sessions for Stateful Workflows Deployed via Ray Serve
#20107 closed
Jun 30, 2025 - [runtime env] Remove filelock dependency
#20083 closed
Jun 30, 2025 - [Bug] Potential deadlock in task scheduling algorithm for placement group resources.
#20051 closed
Jun 30, 2025 - [Feature] rllib + tune metric logging selection
#19816 closed
Jun 30, 2025 - [RLlib] [documentation] clarify postprocess_fn usage in our doc
#19648 closed
Jun 30, 2025 - [Feature] [runtime env] Clean up the command arguments in raylet args
#19448 closed
Jun 30, 2025 - [client] better error message when failing to connect with client
#19371 closed
Jun 30, 2025 - [SGD] Document best practices for Pipeline epochs
#19323 closed
Jun 30, 2025 - [workflow] scan_prefix with pages/as geneartor
#19234 closed
Jun 30, 2025 - [Core][usability] Improve Ray cluster start up time
#19215 closed
Jun 30, 2025 - [Serve] Don't use `ray.wait()` to drain tracking refs in handle
#19158 closed
Jun 30, 2025 - Unify internal configs & common datastructures
#19152 closed
Jun 30, 2025 - Clean up EndpointState
#19148 closed
Jun 30, 2025 - [Core][Feature] use clang-tidy/format to block usage of std::getenv
#18894 closed
Jun 30, 2025 - [Feature][workflow] Namespace for workflow
#18818 closed
Jun 30, 2025 - [Feature][workflow] Resource limit for workflow job
#18780 closed
Jun 30, 2025 - [Bug] Exception in task leads to truncated error message
#18699 closed
Jun 30, 2025 - [Bug] Logging config is not propagated to driver
#18660 closed
Jun 30, 2025 - Enable copy/paste to get correct command for connecting to Ray client
#18513 closed
Jun 30, 2025 - Ray client suppresses error messages
#18512 closed
Jun 30, 2025 - Add workflow.current_step_uuid() function
#18356 closed
Jun 30, 2025 - [Shuffle] non streaming shuffle 5000 partitions seem to reach the scalability limit
#18333 closed
Jun 30, 2025 - [tune] atari-impala-large.yaml does not finish gracefully
#18325 closed
Jun 30, 2025 - [runtime env] eagerly install for task/actor level
#18160 closed
Jun 30, 2025 - [C++ API] Support cross-lang API with Python/Java
#18149 closed
Jun 30, 2025 - [helm][kubernetes][test] Add formatting tests for Helm chart
#18125 closed
Jun 30, 2025 - [workflows] Better message when not init'ed
#18121 closed
Jun 30, 2025 - resource config is not respected in head_start_ray_commands in cluster.yaml
#18097 closed
Jun 30, 2025 - [Dask-on-Ray] Propagate Dask-on-Ray scheduler config to (rest of) cluster
#17943 closed
Jun 30, 2025 - [core] PlacementGroup should be no op for local_mode=True
#17937 closed
Jun 30, 2025 - Enhance document on Java API
#17820 closed
Jun 30, 2025 - [Object Spilling] Remove the spilled directory upon Sigterm for ray start
#17790 closed
Jun 30, 2025 - [C++ API] Support non-global named actor
#17734 closed
Jun 30, 2025 - Cleanup stats/metrics.h
#17679 closed
Jun 30, 2025 - workflow cli to manage all jobs
#17672 closed
Jun 30, 2025 - [docs] Tutorial on Pytorch Lightning needs rearranging
#17611 closed
Jun 30, 2025 - [Serve] Helper functions that are written below the actor class don't work
#17590 closed
Jun 30, 2025 - Fix circular dependence in workflow's code
#17445 closed
Jun 30, 2025 - [lineage] Support lineage reconstruction for borrowed ObjectRefs
#17380 closed
Jun 30, 2025 - Errors during scaling cluster
#17292 closed
Jun 30, 2025 - Trial is being repeated with the exact same results
#17257 closed
Jun 30, 2025 - [RFC][Placement groups] Allow tasks to acquire resources in addition to placement group bundle
#17229 closed
Jun 30, 2025 - runtime env in workflow
#16992 closed
Jun 30, 2025 - [autoscaler][core] Safe node termination
#16975 closed
Jun 30, 2025 - [Ray Client] [Usability] Help users spot bandwidth bounded workload
#16966 closed
Jun 30, 2025 - [cli] Support redis password for all ray commands
#16921 closed
Jun 30, 2025 - [Core] [runtime env] Use portable hash function for runtime_env_hash
#16821 closed
Jun 30, 2025 - [runtime env] Support rescheduling tasks when runtime env creation failed.
#16800 closed
Jun 30, 2025 - Priority scheduling of jobs
#16782 closed
Jun 30, 2025 - [C++ API] Completed object reference counting support
#16702 closed
Jun 30, 2025 - [Core] Programmatic way to access pending tasks for an actor?
#16641 closed
Jun 30, 2025 - [Core] Erroneous check for size_t underflow
#16626 closed
Jun 30, 2025 - [Core] Standardize Timestamps across codebase
#16510 closed
Jun 30, 2025 - [test][MLDataset] Fix test_from_modin
#16357 closed
Jun 30, 2025 - Example for tuning layer count, dropout probabilities with Transformers
#16340 closed
Jun 30, 2025 - Ray started in local mode doesn't restore environment variables after shutdown
#16132 closed
Jun 30, 2025 - [core] ray.remote hides the docstring of the decorated class
#15877 closed
Jun 30, 2025 - [autoscaler] support rsync option `--include`
#15859 closed
Jun 30, 2025 - [Placement Group] The bundle_reservation_check_func breaks load code from local
#15840 closed
Jun 30, 2025 - Contributor docs don't mention running tests via bazel
#15833 closed
Jun 30, 2025 - [docs] should actor methods always have num_returns value?
#15818 closed
Jun 30, 2025 - [rfc] Support `ray[aws,gcp,azure]` as an install target
#15725 closed
Jun 30, 2025 - [rllib] Error while using "count_steps_by": "agent_steps" and misleading documentation
#15708 closed
Jun 30, 2025 - Ray duplicate data from GPU to CPU when placing an actor on GPU
#15692 closed
Jun 30, 2025 - [kubernetes] ModuleNotFoundError when executing a task on a remote cluster
#15668 closed
Jun 30, 2025 - [cross_language] Support Python dictionaries
#15569 closed
Jun 30, 2025 - [core] detached actor logs are not streamed to successive clients
#15549 closed
Jun 30, 2025 - Serve Deployment with Reload Option
#15505 closed
Jun 30, 2025 - [client][core] Have Unified `register_serializer` interface
#15486 closed
Jun 30, 2025 - [Job submission] Monitor driver
#15480 closed
Jun 30, 2025 - [Job submission] Java support
#15479 closed
Jun 30, 2025 - [Job submission] Basic drop job feature
#15478 closed
Jun 30, 2025 - Ray memory size and object store size not correct on k8s
#15463 closed
Jun 30, 2025 - Ray status not report correctly after node crashed
#15459 closed
Jun 30, 2025 - Async actor method hang
#15437 closed
Jun 30, 2025 - [client] python packages version mismatch fail silently
#15407 closed
Jun 30, 2025 - [cluster] Make node_ip_address work throughout
#15239 closed
Jun 30, 2025 - [autoscaler][docs] Explain how the `ray_bootstrap_config` is generated
#15232 closed
Jun 30, 2025 - [autoscaler] Don't autofill `setup_commands` if head/worker `setup_commands` are used
#15231 closed
Jun 30, 2025 - [Core] Add gRPC streaming support.
#15219 closed
Jun 30, 2025 - Optimise for num_workers stucks in the infinite loop
#15168 closed
Jun 30, 2025 - Ray dies without a proper error message - "Killed", might have to do with pandas
#15165 closed
Jun 30, 2025 - [autoscaler] Simplify Custom ObjectStore Size
#15147 closed
Jun 30, 2025 - [Core] Periodical runner can cause heap-use-after-free
#15141 closed
Jun 30, 2025 - Metric tag keys type inference (Tuple To String)
#15130 closed
Jun 30, 2025 - Actor task hangs after actor crashes with max_task_retries=0
#15045 closed
Jun 30, 2025 - AlphaZero torch model doesn't support cuda, only cpu
#14970 closed
Jun 30, 2025 - [Autoscaler] AWS setup commands hardcodes pip
#14963 closed
Jun 30, 2025 - Support "dry runs" for deploy() operations
#14936 closed
Jun 30, 2025 - [Object Spilling] Failing objects that fail to restore many times.
#14921 closed
Jun 30, 2025 - num_cpus not handled correctly when function has a Queue argument
#14863 closed
Jun 30, 2025 - Make rolling update batch size configurable
#14853 closed
Jun 30, 2025 - Typed handle to deployments
#14810 closed
Jun 30, 2025 - [Core] Docs - run data processing examples in CI
#14769 closed
Jun 30, 2025 - [core] The remote function has been exported 100 times..
#14730 closed
Jun 30, 2025 - Support `ray status CLUSTER.YAML`
#14549 closed
Jun 30, 2025 - Support decoupling task/actor interfaces from implementation
#14529 closed
Jun 30, 2025 - Support specifying container images in runtime_env
#14528 closed
Jun 30, 2025 - [autoscaler] Error message not being cleared when autoscaler recovers
#14494 closed
Jun 30, 2025 - [Docs] [tune] WanDB + Ray Integration a bit unclear from the docs
#14478 closed
Jun 30, 2025 - [tune] TBXLoggerCallback not creating necessary directory
#14437 closed
Jun 30, 2025 - [autoscaler][interface] Per-node-type docker configs
#14418 closed
Jun 30, 2025 - [metrics] Add metrics for debugging Dask-on-Ray
#14372 closed
Jun 30, 2025 - [metrics] Report metrics to be used for debugging load balancing issues
#14369 closed
Jun 30, 2025 - [metrics] Remove unused or unnecessary metrics.
#14366 closed
Jun 30, 2025 - When the node is crashed, logs are not accessible.
#14307 closed
Jun 30, 2025 - [autoscaler] SSH command errors aren't written to monitor.out
#14298 closed
Jun 30, 2025 - [dashboard] Add resource usage/availability to the dashboard
#14292 closed
Jun 30, 2025 - [Core] Fix ray::Status <--> gRPC status interplay.
#14278 closed
Jun 30, 2025 - updating worker nodes show as healthy
#14232 closed
Jun 30, 2025 - [Object Spilling] Use subdirectories to avoid large top level inodes for file spilling
#14166 closed
Jun 30, 2025 - [tune] Stack Traces with Function API are really hard to parse
#14162 closed
Jun 30, 2025 - [Object Spilling] Plasma store probably doesn't respect the max shm size.
#14145 closed
Jun 30, 2025 - Latent bugs in command_runner.py
#14139 closed
Jun 30, 2025 - [rllib] undocumented behavior of timers/* in progress.csv
#14052 closed
Jun 30, 2025 - Graceful Placement Group Removal
#14045 closed
Jun 30, 2025 - Improve Docker manual setup document
#14030 closed
Jun 30, 2025 - [UX] Allow passing CPU and GPU to actor and task resources.
#13996 closed
Jun 30, 2025 - Remove cluster_synced_files and file_mounts_sync_continuously
#13967 closed
Jun 30, 2025 - [Object Spilling] Allow to specify max_disk_usage for file system spilling.
#13960 closed
Jun 30, 2025 - [Dashboard] add actor detail to experimental dashboard
#13875 closed
Jun 30, 2025 - ray.put() slows down over time.
#13612 closed
Jun 30, 2025 - [rllib]Action masking with tuple action space
#13592 closed
Jun 30, 2025 - [dask-on-ray] Remove internal Dask API dependencies from the Dask-on-Ray scheduler.
#13560 closed
Jun 30, 2025 - [core] GCS doesn't always cancel worker leases for killed actors
#13545 closed
Jun 30, 2025 - test_autoscaling_policy.py prints out huge pile of JsonErrors
#13433 closed
Jun 30, 2025 - Remove the RAY_CLIENT_MODE flag now that we don't need it
#13279 closed
Jun 30, 2025 - [Core] Make CoreWorker more unit-testable
#13268 closed
Jun 30, 2025 - Test S3 object spilling on multiple nodes with big data (streaming shuffle)
#13222 closed
Jun 30, 2025 - [core] RAY_HOME path is hardcoded
#13168 closed
Jun 30, 2025 - [Plasma Store]PlasmaClient::Get() return Status::OK() when timeout
#12995 closed
Jun 30, 2025 - Add dashboard to bazel target to avoid running manual build commands
#12956 closed
Jun 30, 2025 - Improve dashboard not found exception
#12955 closed
Jun 30, 2025 - Cannot save training episodes: "TypeError: Object of type ndarray is not JSON serializable"
#12951 closed
Jun 30, 2025 - [Object Spilling] Improve Read throughput
#12950 closed
Jun 30, 2025 - Startup log use autoscaler_log.out / err instead of monitor.log
#12884 closed
Jun 30, 2025 - [New scheduler] Don't assume 1-CPU tasks are feasible
#12870 closed
Jun 30, 2025 - Turn on Test_reference_counting
#12849 closed
Jun 30, 2025 - [Core] Locality-aware leasing: Milestone 3 - Spillback
#12815 closed
Jun 30, 2025 - [Autoscaler] Refactor bin packing routines in autoscaler for code clarity
#12723 closed
Jun 30, 2025 - [Core] Ray.get(timeout=0) doesn't work
#12680 closed
Jun 30, 2025 - [core] Is starvation possible for multi-driver on the same cluster?
#12667 closed
Jun 30, 2025 - GCS server ip error
#12639 closed
Jun 30, 2025 - [core] Support detached/GCS owned objects
#12635 closed
Jun 30, 2025 - [autoscaler] respect max_workers per node type when terminating nodes
#12634 closed
Jun 30, 2025 - [Cluster launcher] Command runner logs are improperly quoted when logged
#12631 closed
Jun 30, 2025 - permissions on rsync'd files are incorrect on worker nodes, results in inability to update workers
#12630 closed
Jun 30, 2025 - [tune] Full experiment checkpointing doesn't work with PBT
#12558 closed
Jun 30, 2025 - New workers are started slowly on a node if running workers >= `num_cpus`
#12525 closed
Jun 30, 2025 - [tune] get_checkpoint_paths fails due to glob command for .tune_metadata file
#12453 closed
Jun 30, 2025 - [New scheduler] Implement dynamic resources
#12433 closed
Jun 30, 2025 - [metrics] Investigate tracing visualization tools
#12314 closed
Jun 30, 2025 - [metrics] Utility to easily configure logging for a Ray job/actor/task
#12306 closed
Jun 30, 2025 - `ray dashboard` throws bad exception
#12246 closed
Jun 30, 2025 - [Object Spilling] Tune S3 performance + Add unit tests with moto3
#12232 closed
Jun 30, 2025 - Duplicated IDs are generated
#12197 closed
Jun 30, 2025 - [tune/logging] Warning for Tune
#12140 closed
Jun 30, 2025 - [tune] Restarted Trials Use Incorrect Command When Multiple Commands Run on Cluster/Runtime
#12048 closed
Jun 30, 2025 - [Object spilling] Move LocalObjectManager into the plasma store
#12042 closed
Jun 30, 2025 - [Object spilling] Improve OutOfMemory handling through better memory bookkeeping in plasma store
#12040 closed
Jun 30, 2025 - [Object Spilling] Use compression to reduce IO cost.
#11992 closed
Jun 30, 2025 - [Tune] Add more custom Error Types
#11871 closed
Jun 30, 2025 - Tune report histograms
#11797 closed
Jun 30, 2025 - Unable to create ActorHandle for already created inherited classes object list [java][ray]
#11715 closed
Jun 30, 2025 - Socket connections from GCS stuck in TIME_WAIT after actor death
#11713 closed
Jun 30, 2025 - [docs] tutorial for autoscaling (really basic version)
#11680 closed
Jun 30, 2025 - [flaky] test_multi_node/2 is flaky
#11663 closed
Jun 30, 2025 - [flaky] test_object_manager is flaky
#11661 closed
Jun 30, 2025 - [Core] Reduce the Redis connection per worker.
#11655 closed
Jun 30, 2025 - [flaky] gcs_server test is flaky
#11640 closed
Jun 30, 2025 - Use Pathlib instead of strings in Autoscaler
#11633 closed
Jun 30, 2025 - [tune] PopulationBasedTraining and Tensorboard HPARAMS
#11612 closed
Jun 30, 2025 - AWS Security group rule issue
#11601 closed
Jun 30, 2025 - [dask] Parquet write fails if directory does not exist in advance
#11566 closed
Jun 30, 2025 - [dask] Object store fills up too quickly in simple processing script
#11565 closed
Jun 30, 2025 - [dask/tune] Provide an example of using Dask on Ray with Tune
#11564 closed
Jun 30, 2025 - [tune] tutorial should indicate specific library version that we've tested against.
#11540 closed
Jun 30, 2025 - [Core] Raylet can schedule tasks from a dead driver.
#11520 closed
Jun 30, 2025 - `ray stop` should not kill all redis-server processes
#11513 closed
Jun 30, 2025 - [core] Track the number of connection and use shared pool whenever possible for grpc clients.
#11445 closed
Jun 30, 2025 - ray commandline tools raise exceptions if you forget the YAML config file
#11396 closed
Jun 30, 2025 - [Autoscaler] Placement group rescheduling over-allocates resources
#11372 closed
Jun 30, 2025 - how to add two-timescales Learning rate schedule in coustom policy?
#11328 closed
Jun 30, 2025 - [Autoscaler] Add additional gpu types to util.accelerators
#11160 closed
Jun 30, 2025 - [autoscaler] Worker node container is not removed after ray down?
#11098 closed
Jun 30, 2025 - `ray stop` should wait for processes to exit
#10955 closed
Jun 30, 2025 - [autoscaler] node type preferences
#10929 closed
Jun 30, 2025 - Private/onprem clusters always need explicit ssh_private_key in docker
#10838 closed
Jun 30, 2025 - [docs] Add examples for using custom resources
#10808 closed
Jun 30, 2025 - Autoscaler should set RAY_ADDRESS environment variable
#10752 closed
Jun 30, 2025 - Stop using `file_mounts` for ray_bootstrap_config & ray_bootstrap_key
#10743 closed
Jun 30, 2025 - [Java] Remove Java 9/10/11 warnings
#10673 closed
Jun 30, 2025 - [Documentation] need for default_resource_requests when using custom train function
#10572 closed
Jun 30, 2025 - [rllib] action from policy with Tuple action space has wrong shape
#10516 closed
Jun 30, 2025 - [tune] String summarization/representations for user objects
#10489 closed
Jun 30, 2025 - [tune] Add regression test for avoiding extraneous output
#10485 closed
Jun 30, 2025 - [GCS]Remove tightly coupled Redis code path from Python
#10359 closed
Jun 30, 2025 - [GCS]Support Sharding GCS server
#10358 closed
Jun 30, 2025 - [GCS]Support Multi-threaded GCS server.
#10357 closed
Jun 30, 2025 - [GCS]Support different backend for GCS instead of Redis
#10356 closed
Jun 30, 2025 - [metrics] Better way of grouping metric definitions
#10341 closed
Jun 30, 2025 - [Core] WorkerThreadContext semantics are incorrect for async Python actors.
#10324 closed
Jun 30, 2025 - [ray] Programatically expose the amount of memory available in the object store
#10278 closed
Jun 30, 2025 - [tune] Improve the serialization diagnoser by providing deeper introspection
#10263 closed
Jun 30, 2025 - [tune] Usability issues
#10248 closed
Jun 30, 2025 - [ray] Support mypy
#10244 closed
Jun 30, 2025 - Removed the following hyperparameter values when logging to tensorboard: ... [tune]
#10166 closed
Jun 30, 2025 - [dask-on-ray] ValueError on read-only memory
#10124 closed
Jun 30, 2025 - [cli/docs] Provide example commands in the CLI docstrings.
#10079 closed
Jun 30, 2025 - [Placement Group] Placement group dashboard
#9775 closed
Jun 30, 2025 - Ray issue with serializing pytorch objects only when running on 40+ cores
#9752 closed
Jun 30, 2025 - Ray typing IDE code completion support
#9623 closed
Jun 30, 2025 - [Cluster][Task Schedule] Remote function is not executing without any errors
#9598 closed
Jun 30, 2025 - [core] RayConfig does not get set properly after multiple `ray.init` calls
#9545 closed
Jun 30, 2025 - [New scheduler] Performance optimization
#9487 closed
Jun 30, 2025 - [New scheduler] Release testing
#9486 closed
Jun 30, 2025 - [Core] Core Worker Actor Handle GC.
#9342 closed
Jun 30, 2025 - Graph related applications
#9324 closed
Jun 30, 2025 - Options Support for Actor Methods
#9296 closed
Jun 30, 2025 - [ray] constant memory usage increase of actor using actor handle.
#9232 closed
Jun 30, 2025 - Invalid memory access in RedisAsioClient/RedisAsyncContext on shutdown
#9074 closed
Jun 30, 2025 - Performance issue with many large tasks on 10 node cluster.
#8950 closed
Jun 30, 2025 - Ray Dashboard Head-node CLI [autoscaler]
#8450 closed
Jun 30, 2025 - Support TPUs across all of Ray
#8260 closed
Jun 30, 2025 - [autoscaler] Api instead of CLI to interact with cluster.
#8036 closed
Jun 30, 2025 - Incorrect unreconstructable error message and raise different exception.
#7804 closed
Jun 30, 2025 - ray.wait hangs with no warning or error when local object store is too small to receive object
#7802 closed
Jun 30, 2025 - Segmentation Fault when using multiprocessing.Queue
#7793 closed
Jun 30, 2025 - ray.wait with local_mode=True blocks for a very long time
#7741 closed
Jun 30, 2025 - Awesome: algorithm selection helper & diagrams
#7722 closed
Jun 30, 2025 - Ray hangs when machine is disconnected from network
#7696 closed
Jun 30, 2025 - [docs] Clarify that in K8s the jobs need to be launched from the workers
#7188 closed
Jun 30, 2025 - [ray] tasks running in docker containers are not stopped on local cluster
#6898 closed
Jun 30, 2025 - [dist] Release notes for Java And other Languages
#6608 closed
Jun 30, 2025 - Ray.wait causes node to hang if there are too many object ids
#6403 closed
Jun 30, 2025 - Performance issues with defining remote functions and actor classes from within tasks.
#6240 closed
Jun 30, 2025 - TypeError: can't pickle CudnnModule objects
#5947 closed
Jun 30, 2025 - Profiling ray tasks includes ray initialization time
#5832 closed
Jun 30, 2025 - Make it possible to see resource deadlocks through web UI.
#5789 closed
Jun 30, 2025 - [autoscaler] Raise better error message if `ssh_user` is not correct
#5772 closed
Jun 30, 2025 - Code coverage tracker
#5473 closed
Jun 30, 2025 - [ray] ray misuse gpu in docker container
#5245 closed
Jun 30, 2025 - On a background thread, `ray.wait` doesn't timeout until another method on the actor is called
#4934 closed
Jun 30, 2025 - Ray is not propagating variable types correctly
#4463 closed
Jun 30, 2025 - [tune] Support nesting grid_search in lambdas
#3466 closed
Jun 30, 2025 - Retry policy when a worker crashes: a hook missing?
#2635 closed
Jun 30, 2025 - Task introspection
#2617 closed
Jun 30, 2025 - ray start does not restart failed processes
#2587 closed
Jun 30, 2025 - [rllib] flattening error in gym.spaces.Sequence
#45563 closed
Jun 30, 2025 - cannot import name 'EPISODE_RETURN_MEAN' from 'ray.rllib.utils.metrics'
#45453 closed
Jun 30, 2025 - error: No such option: --torch
#45452 closed
Jun 30, 2025 - [Core] Unable to run worker with virtual environment without installing dashboard
#45410 closed
Jun 30, 2025 - [RLlib] How to support gymnasium graph obs space?
#45290 closed
Jun 30, 2025 - Ray Cluster does not work across multiple docker containers
#45252 closed
Jun 30, 2025 - [Core] Worker crashes unexpectedly due to frequent triggering of OOM
#45244 closed
Jun 30, 2025 - Ray Cluster: Failed to create a ray cluster using running container
#45148 closed
Jun 30, 2025 - [Rllib] Rllib provides wrong state batch size during "bug check" batches on torch custom model
#45131 closed
Jun 30, 2025 - [RLlib] ValueError in initialization of ImpalaTF2Policy
#45050 closed
Jun 30, 2025 - [core] GcsSubscriber hangs in shutdown if the connection broke on MacOS
#45044 closed
Jun 30, 2025 - Workflow: Reading workflow status can lead to corrupted json reads.
#45027 closed
Jun 30, 2025 - [Core] `ray.wait` not actually wait until ready when the task is longer than 12 days
#44909 closed
Jun 30, 2025 - [Data] Add `delete_dir_contents` parameter to `FileDatasink`
#44794 closed
Jun 30, 2025 - [RLlib] PPO and framework=tf / issue with latest tensorflow 2.16.1
#44675 closed
Jun 30, 2025 - [RLlib] PPO reset_config() AttributeError: 'dict' object has no attribute '_enable_new_api_stack'
#44506 closed
Jun 30, 2025 - [RLlib] ReplayBuffer doesnt work with zero_init_states False when store rnn sequence
#44383 closed
Jun 30, 2025 - [Cluster, YARN with Skein] Ray cluster keeps crashing when running on YARN via Skein
#44112 closed
Jun 30, 2025 - [Ray Core] Ray nightly GPU docker image broken on NVIDIA V100 GPUs on AWS
#43565 closed
Jun 30, 2025 - Using RNN for RL
#43420 closed
Jun 30, 2025 - Core: ray.remote raises ValueError when used on torch IterableDataset
#42914 closed
Jun 30, 2025 - Core: Join zombie subprocesses after task completion
#42913 closed
Jun 30, 2025 - [Core] SIGSEGV when running Ray
#42868 closed
Jun 30, 2025 - [Core] Serialisation does not work with classes with `__init_subclass__`
#42823 closed
Jun 30, 2025 - Problem with YOLOv8 Hyperparameters tuning
#42770 closed
Jun 30, 2025 - [RLLIB] Passing configuration to Custom Environment in rllib is giving an error
#42753 closed
Jun 30, 2025 - [RLlib] Algorithms ES, A3C are deprecated and replacement does not exist in python package
#42579 closed
Jun 30, 2025 - [<Ray component: Core|RLlib|etc...>] Inite state of attention_net.py is empty
#42569 closed
Jun 30, 2025 - [<Ray component: Core|RLlib|etc...>] KeyError with RNN
#42501 closed
Jun 30, 2025 - [RLlib] gpu cannot enable
#42388 closed
Jun 30, 2025 - [<Ray component: Core|RLlib|etc...>] reslink in model
#42333 closed
Jun 30, 2025 - [RLlib] shape [] in Box action space not supported.
#42199 closed
Jun 30, 2025 - Building an executable using Ray and Cx_freeze
#42101 closed
Jun 30, 2025 - RichProgressBar in PyTorch Lightning only show progress at the very end
#42091 closed
Jun 30, 2025 - [<Ray component: Core|RLlib|etc...>] Channel errore
#42089 closed
Jun 30, 2025 - [Workflow] get_metadata() returns RUNNING instead of RESUMABLE status
#41980 closed
Jun 30, 2025 - Ray IDs vs endianness?
#41961 closed
Jun 30, 2025 - "RaySystemError: System error: Unknown error"
#41786 closed
Jun 30, 2025 - [RLlib] Value error while running DQN
#41559 closed
Jun 30, 2025 - Saving XGBoost model with json extension
#41374 closed
Jun 30, 2025 - [data] date32 and datetime64 handling should be the same
#41358 closed
Jun 30, 2025 - [RLlib] User guides are not ordered
#41340 closed
Jun 30, 2025 - error installing library
#41223 closed
Jun 30, 2025 - [core][state][dashboard] Better tasks info GC control at GCS
#41142 closed
Jun 30, 2025 - [RLLib] External simulator: mean episode reward is NaN due to done not set
#40954 closed
Jun 30, 2025 - [Core] - Cannot install in tiny core linux
#40832 closed
Jun 30, 2025 - [Tune|RLlib] Add error-tolerant version of PB2
#40787 closed
Jun 30, 2025 - [Ray Train] - Add Options to Save Last checkpoint in Ray Train Checkpointing Config
#40503 closed
Jun 30, 2025 - ray.init() can sometimes hang with a limited range specified for --worker-port-list
#40497 closed
Jun 30, 2025 - [Core] Dead session not closed
#40482 closed
Jun 30, 2025 - [RLlib][MBMPO] The algorithm does not learn as intended.
#40400 closed
Jun 30, 2025 - [Tune] Support for new algorithm: Cost-Aware Pareto Region Bayesian Search (CARBS).
#40356 closed
Jun 30, 2025 - [Workflow] Incorrectly set max_calls in options
#40252 closed
Jun 30, 2025 - [PPOConfig] Utilising new API/models without matching documentation
#40201 closed
Jun 30, 2025 - [Rllib] Tune locks up when attempting to create an rllib algorithm in a trainable
#40015 closed
Jun 30, 2025 - [Tune/Air] Memory Leak when using WandbLoggerCallback with Population Based Tuning
#40014 closed
Jun 30, 2025 - [RLlib] TD3/DDPG doesn't seem to respect action space bounds (at least initially)?
#40002 closed
Jun 30, 2025 - [RLLIB] Issue with AlphaZero algorithm Stateless CartPole
#39937 closed
Jun 30, 2025 - [RLLIB] Error in executing StatelessCartPole environment with AlphaZero
#39862 closed
Jun 30, 2025 - Allow train_loop_config to be a dataclass / pydantic model
#39824 closed
Jun 30, 2025 - [Core] ResolutionImpossible - Test requirements appear to not fit versions
#39782 closed
Jun 30, 2025 - Job history is lost when Ray cluster is restarted (via kuberay)
#39764 closed
Jun 30, 2025 - Ray::Tune::Logger::Tensorboardx
#39741 closed
Jun 30, 2025 - [Core] Upgrading grpc to 1.57.0 causes perf regressions
#39679 closed
Jun 30, 2025 - ray failed to register worker when I used vllm
#39618 closed
Jun 30, 2025 - [rllib] Action space MultiDiscrete([11 5 1 2]) is not supported for DQN
#39571 closed
Jun 30, 2025 - [RLlib] Support JAX-(numpy)-based envs.
#39528 closed
Jun 30, 2025 - [RLlib] Ray RLLib Dependencies Version Information
#39405 closed
Jun 30, 2025 - [RLlib] dreamerv3 causes debug code to be executed when running tune
#39302 closed
Jun 30, 2025 - [Core] CPP Interface crashes on Ray.Init()
#39252 closed
Jun 30, 2025 - ValueError: Must set agent_id on policy config
#39246 closed
Jun 30, 2025 - [Core] Actor retry count is consumed because the task is retried when actor is still alive.
#39110 closed
Jun 30, 2025 - [Core] Memory Leak
#38877 closed
Jun 30, 2025 - [Tune] Leaky core concepts in Ray Tune documentation
#38781 closed
Jun 30, 2025 - latest ray microbenchmark fails
#38758 closed
Jun 30, 2025 - Ray Memory Usage Keeps Increasing even after Manual Garbage Collection
#38730 closed
Jun 30, 2025 - [docs] Document Tune/Train placement group
#38706 closed
Jun 30, 2025 - ray/RLlib/offline/estimators
#38357 closed
Jun 30, 2025 - [Core] gcs_server Failed accept4: Too many open files
#38248 closed
Jun 30, 2025 - [RLlib] RL module and PPO implementation
#38012 closed
Jun 30, 2025 - [RLlib] ray 2.6 relies on tf.bool which does not exist in tensorflow 2.13
#37895 closed
Jun 30, 2025 - [RLlib] Sampler takes first step before next batch is requested
#37893 closed
Jun 30, 2025 - [Data] Ray 2.6 created a breaking change in the index of a Modin DataFrame
#37771 closed
Jun 30, 2025 - [Ray-Java client] Call actor report 'No module named' with py script
#37600 closed
Jun 30, 2025 - [Core] Ray cpp example, if not call ray::Shutdown when exit, will cause segment fault.
#37596 closed
Jun 30, 2025 - RLLib: Training Rllib-DDPG with custom environment leads error in Inference.
#37242 closed
Jun 30, 2025 - [<Ray component: autoscaler>] _load_kubernetes_defaults_config function is not yet made
#37033 closed
Jun 30, 2025 - [Core] No dependency on setuptools results in broken build
#36742 closed
Jun 30, 2025 - [data] Report actual task time and object sizes in Dataset.stats()
#36671 closed
Jun 30, 2025 - [CI][Docs] Example in Train FAQ is flakey
#36399 closed
Jun 30, 2025 - Be consistent on whether or not you include a dot at the end of a bullet list element.
#36308 closed
Jun 30, 2025 - [Core] ray.put and ray.get extremely slow with polars frames
#36068 closed
Jun 30, 2025 - [Core] DecodeError when `ray.put` a large (2GB) object
#35976 closed
Jun 30, 2025 - [Ray Core] There is a Exception error message bug which convert byte array to String.
#35880 closed
Jun 30, 2025 - System error: Ray has not been started yet. You can start Ray with 'ray.init()'
#35592 closed
Jun 30, 2025 - [Client] Dataset write_csv AttributeError: ‘Worker’ object has no attribute 'core_worker'
#35537 closed
Jun 30, 2025 - Ray: Data - Cannot read json its written to s3
#35501 closed
Jun 30, 2025 - [Core] `OwnerDiedError` if dataset owner actor handle get out of scope
#35262 closed
Jun 30, 2025 - [VM launcher] Automtically shut down the ec2 machine when I stop ray up in the middle
#35013 closed
Jun 30, 2025 - [Core] Incorrect detection of cpus
#34846 closed
Jun 30, 2025 - [Clusters] - Cannot switch off rsync during Cluster Launch with `ray up`
#34390 closed
Jun 30, 2025 - Azure autoscaler cannot create additional nodes
#34198 closed
Jun 30, 2025 - [Core] Error in external storage writing for object spilling
#33913 closed
Jun 30, 2025 - get_node_to_storage_syncer has an empty docstring
#33841 closed
Jun 30, 2025 - [ Core ] Correct usage of min/max-worker-port arguments
#33749 closed
Jun 30, 2025 - Core: nightly builds for macos only include an x86 _raylet.so even though they claim to be universal
#33720 closed
Jun 30, 2025 - [Core] The resources have minus values in ray status output
#33569 closed
Jun 30, 2025 - [tune] tqdm/Hyperopt-style TuneReporter for Databricks notebooks
#33519 closed
Jun 30, 2025 - [Core] Ray client doesn't support `should_capture_child_tasks_in_placement_group` API
#33513 closed
Jun 30, 2025 - [RLlib] DictFlatteningPreprocessor order is inconsistent leads to invalid mapping of OBS
#33327 closed
Jun 30, 2025 - [runtime env] Raise warning when using `runtime_env` with `local_mode=True`
#33260 closed
Jun 30, 2025 - [Train] Benchmark testing on Mosaic Composer with Ray
#32946 closed
Jun 30, 2025 - [Ray Core] Actor Handles not properly passed to Actors created by other Actors
#32848 closed
Jun 30, 2025 - [RLlib] A3C has problems with the horizon option removed
#32812 closed
Jun 30, 2025 - [Core][Object Store] Object Store to manage files in the cluster
#32694 closed
Jun 30, 2025 - [Clusters] [KubeRay] problem with pending actors' pods in Kubernetes
#32651 closed
Jun 30, 2025 - [core] Lock contention when submitting actor task on the client queue
#32595 closed
Jun 30, 2025 - [Core] Install via `pip` fails, install with `conda` crashes worker and exits
#32423 closed
Jun 30, 2025 - [Core] "ImportError: No module named ray" when using `ray submit`
#31924 closed
Jun 30, 2025 - [workflow] memory leakage
#31819 closed
Jun 30, 2025 - [Clusters] [RLlib] Trainer Object running on Worker node & RolloutWorker running on Head node
#31808 closed
Jun 30, 2025 - [runtime envs] Ray Client Server failed when starting
#31622 closed
Jun 30, 2025 - Huge numbers of "deleted" files with open processes left after Ray Tune run
#31556 closed
Jun 30, 2025 - [RLlib] Pytorch multiple optimizers
#31428 closed
Jun 30, 2025 - In the docker bridge mode, pulling the actor on a non head node fails.
#31308 closed
Jun 30, 2025 - [Dashboard] Head node exited unexceptly because of dashboard process exited
#31261 closed
Jun 30, 2025 - [CORE] Unable to run celery task containing ray tasks
#31157 closed
Jun 30, 2025 - [core] Segfaults when restarting Ray multiple times in unit tests with background threads running
#31145 closed
Jun 30, 2025 - [core] Error with Slurm: No available node types can fulfill resource request {'node:<ip>': 0.01}.
#31135 closed
Jun 30, 2025 - [RLlib] Not able to save evaluation recording videos
#30949 closed
Jun 30, 2025 - [Ray Job] SchedulingCancelled for JobSupervisor Actor
#30898 closed
Jun 30, 2025 - [Ray client] Ray Zombie Process Issue
#30894 closed
Jun 30, 2025 - [Devprod] Bazel reports an error when compiling as a non-root user
#30885 closed
Jun 30, 2025 - [core] Disk full error logging is verbose
#30833 closed
Jun 30, 2025 - [RLlib] Error when running RLlib
#30412 closed
Jun 30, 2025 - [Cluster Launcher] `ray dashboard` CLI command does not stop port-forwarding after Ctrl+C
#30385 closed
Jun 30, 2025 - [autoscaler] AWS Single Sign-On support
#30064 closed
Jun 30, 2025 - [Autoscaler][GCP] Autoscaler crashing on GCP with error 404.
#30050 closed
Jun 30, 2025 - Setting some system configs causes Ray to fail to start
#29841 closed
Jun 30, 2025 - [Ray Cluster] Assigning all host GPUs into head node without nvidia.com/gpu present
#29753 closed
Jun 30, 2025 - [gcp] "No such container" error after ray up
#29671 closed
Jun 30, 2025 - [Tune] Passing a handle to grid search cause trials to get stuck in running and pending mode
#29545 closed
Jun 30, 2025 - [Serve] `ServeHandles` fail if GCS crashes before first request
#29539 closed
Jun 30, 2025 - [Core] inspect_serializability bug - parent object serializable but bound method not
#29423 closed
Jun 30, 2025 - [Core] Ray doesn't shutdown properly on KeyboardInterrupt
#29384 closed
Jun 30, 2025 - [Serve] Unable to upload current working directory
#29354 closed
Jun 30, 2025 - [core][observability] Improving reliability of memory_summary API call
#29329 closed
Jun 30, 2025 - InvalidLocationConstraint Message: The specified location-constraint is not valid for storage option
#29309 closed
Jun 30, 2025 - [Core] Worker pool didn't prestart num_cpus workers
#29162 closed
Jun 30, 2025 - [core] use proto for oom error / node died error in the frontend
#28907 closed
Jun 30, 2025 - [ray client] surface ray client logs better
#28890 closed
Jun 30, 2025 - [Backlog][Collective] Facilitate NCCL test in ray cluster
#28860 closed
Jun 30, 2025 - [core/k8s/GKE] Ray schedules actors on pods/nodes that are shutting down
#28852 closed
Jun 30, 2025 - [AIR] [Tune] Don't add random hash to trial id for single trial
#28830 closed
Jun 30, 2025 - [P0] test_submit_cpp_job failed in osx
#28592 closed
Jun 30, 2025 - [Ray: Core] - Unable to enable TLS on the ray head node
#28534 closed
Jun 30, 2025 - Dashboard / Jobs RegexMatcher ignores "includes".
#28502 closed
Jun 30, 2025 - [Core, RLlib] RLlib uses Metal GPU even when told not to
#28385 closed
Jun 30, 2025 - [Core] Actor methods will be modified for tracing even if tracing is not enabled.
#28293 closed
Jun 30, 2025 - [Runtime] Improve runtime environment error message when virtualenv version is too old
#28232 closed
Jun 30, 2025 - [Core] Multi-Threaded Actors are Un-Killable
#28086 closed
Jun 30, 2025 - [Autoscaler] Assigning None to optional keys leads to failure
#28012 closed
Jun 30, 2025 - [Core] Can't pickle objects defined in top-level environment
#28000 closed
Jun 30, 2025 - [Doc] [Serve] Serve Loki monitoring tutorial screenshot has outdated API
#27453 closed
Jun 30, 2025 - [core] Very slow task scheduling during Dataset.sort on 100TB
#27410 closed
Jun 30, 2025 - Is Ray going to support Weighted Quantile Sketches or Quantile Sketches?
#27363 closed
Jun 30, 2025 - [Core] Raylet continually exiting on worker in docker
#26576 closed
Jun 30, 2025 - Tensorboard with Docker from Ray dashboard, tune tab cannot be accessed
#26325 closed
Jun 30, 2025 - [RLlib] Eval episode runs forever if Env doesn't terminate properly
#26241 closed
Jun 30, 2025 - [Ray Client] Using many concurrent client connections results in deadlock/hanging
#26144 closed
Jun 30, 2025 - [Core][HA] Actor entries are not deleted from the storage permanently if GCS is crashed.
#26114 closed
Jun 30, 2025 - Unclear error when using generator tasks
#25836 closed
Jun 30, 2025 - [Core] SIGSEGV when I run experimental shuffle command.
#25650 closed
Jun 30, 2025 - [Core][Metrics] Prometheus-client not working with the latest version.
#25523 closed
Jun 30, 2025 - [core] Scheduler stalls during shuffle reduce stage with 100k concurrent tasks or more
#25412 closed
Jun 30, 2025 - [AIR] Utilities to go from Predictor to `BatchPredictor` and `ModelWrapperDeployment`
#24977 closed
Jun 30, 2025 - [Train/AIR] Ray Train actors still use up resources after Notebook cell is stopped
#24947 closed
Jun 30, 2025 - [Core] Failed to delete named actor in client mode
#24906 closed
Jun 30, 2025 - [AIR] Add a `reconfigure` option to `ModelWrapperDeployment`
#24869 closed
Jun 30, 2025 - [core] Uninformative error for unserialisable objects
#24863 closed
Jun 30, 2025 - [Serve] Prototype C++ Worker in Serve
#24738 closed
Jun 30, 2025 - [Core] Spilling performance regression in large-scale shuffle
#24667 closed
Jun 30, 2025 - [Core] Restore objects directly from S3
#24581 closed
Jun 30, 2025 - [runtime env] `serialized_env` used as ID, but identical envs can produce different `serialized_env`
#24515 closed
Jun 30, 2025 - [Core] No overloads for "remote" match the provided arguments
#24371 closed
Jun 30, 2025 - Workflows: Type stubs are incorrect: argument missing for parameter status_filter
#24367 closed
Jun 30, 2025 - [Core] /api/cluster_status treats placement groups differently than ray status
#24309 closed
Jun 30, 2025 - [Core] Restore worker silently fails and the program is stuck
#24248 closed
Jun 30, 2025 - [RLlib][Bug] duplicate action unsquashing in DDPG / TD3 policy
#24213 closed
Jun 30, 2025 - [Tune] support for FIRE PBT
#24137 closed
Jun 30, 2025 - [Tune] Tune Job hangs out and can't finish the tune job
#23858 closed
Jun 30, 2025 - [Workflows] Cant use custom storage backends
#23831 closed
Jun 30, 2025 - [RLlib] Add Option for Custom Sample Preprocessing when Sampling from Replay Buffer
#23815 closed
Jun 30, 2025 - [Core][Bug] global-scoped actor handles/Ray objects prevents Ray workers from being destructed.
#23677 closed
Jun 30, 2025 - [runtime env] `zip_directory` `excludes` parameter doesn't work with absolute paths
#23473 closed
Jun 30, 2025 - [Train] [Feature] Print useful traceback on SIGINT
#23148 closed
Jun 30, 2025 - [Train] [Docs] Document how to change logging verbosity
#23147 closed
Jun 30, 2025 - [docs][Bug] Workflow docs have few typos and type issue
#23113 closed
Jun 30, 2025 - [tune][Feature] add tune.choices to select multiple values from a search space
#23001 closed
Jun 30, 2025 - Ray Train / Tune - W&B logger documentation
#22881 closed
Jun 30, 2025 - [Train] update `logdir` relative path
#22753 closed
Jun 30, 2025 - [Bug][placement groups] Actor scheduling does not respect placement_group=None
#22742 closed
Jun 30, 2025 - [Train] Add flags to disable creating log directories
#22261 closed
Jun 30, 2025 - [RLLib] Workers died at the initialization stage when the observation space is a 3D shape
#22033 closed
Jun 30, 2025 - [Train] Automatically choose number of workers
#21987 closed
Jun 30, 2025 - [Serve] The adjustment about Ray Serve Java Proxy and Java Replica
#21694 closed
Jun 30, 2025 - [C++] Cluster Mode Tests Should have 1 test per feature tested
#21454 closed
Jun 30, 2025 - [Tune] [Bug] lazily expand directories for client compatibility
#21408 closed
Jun 30, 2025 - [Tune] Issue on page /tune/tutorials/tune-pytorch-lightning.html
#21354 closed
Jun 30, 2025 - [Bug] Got stucked when running python script from a shell script
#21298 closed
Jun 30, 2025 - [Bug] [Tune] pbt run_experiments not stable, some trial will error.
#21259 closed
Jun 30, 2025 - [Train] Document Callbacks
#21066 closed
Jun 30, 2025 - [Feature] Single source of truth for Ray version in Java `pom.xml` and `pom_template.xml` files
#21059 closed
Jun 30, 2025 - [Test Bug] Matching `psutil.Process.name()` doesn't work on macOS
#20982 closed
Jun 30, 2025 - [Bug] Incorrect promise usage that causes infinite blocking calls
#20899 closed
Jun 30, 2025 - We encountered the cast exception after we got result from ray actor task
#20369 closed
Jun 30, 2025 - [Train] Refactor `TrainingIterator` result processing logic
#20330 closed
Jun 30, 2025 - [tsan] Add TSAN CI build that runs basic Python tests
#20080 closed
Jun 30, 2025 - [tsan] Race in census SetGlobalTags
#20079 closed
Jun 30, 2025 - [tsan] Race accessing global stats objects
#20078 closed
Jun 30, 2025 - [tsan] Several global config variables accessed unsafely
#20077 closed
Jun 30, 2025 - Support working_dir=None for skipping packaging upload/download
#19962 closed
Jun 30, 2025 - [Bug] Placement group removal refinement
#19937 closed
Jun 30, 2025 - [Feature] Able to access objects put in cross language
#19873 closed
Jun 30, 2025 - [Bug] Improve RuntimeEnvSetupError message
#19824 closed
Jun 30, 2025 - [Serve] Test KVStore early in constructor init.
#19714 closed
Jun 30, 2025 - [Bug] [Workflow] ray.wait on workflow result doesn't work as expected
#19295 closed
Jun 30, 2025 - [tune] MLFlowLogger doesn't save artifacts for remote mlflow tracking_uri
#19263 closed
Jun 30, 2025 - [Bug] [XLang] Segfault when Java returns void
#18837 closed
Jun 30, 2025 - [Bug] tensorboardX vs tensorboard?
#18727 closed
Jun 30, 2025 - Dashboard exposes redis PW on the command line
#18491 closed
Jun 30, 2025 - Race condition of grpc backpressure
#18439 closed
Jun 30, 2025 - [Core] Task spec including inlined objects can crash lease request RPCs.
#18194 closed
Jun 30, 2025 - [Runtime Env] Setup process doesn't have CPU limit
#18137 closed
Jun 30, 2025 - ray.init with address crashes process outside of cluster
#17769 closed
Jun 30, 2025 - new dashboard agent port conflict issues
#17498 closed
Jun 30, 2025 - [Core] Unable to get actor handle of global named actor created in java from python in Ray 1.4.0
#16436 closed
Jun 30, 2025 - [serve] java api
#16393 closed
Jun 30, 2025 - [serve] java serve handle
#16392 closed
Jun 30, 2025 - [serve] java http proxy
#16391 closed
Jun 30, 2025 - [Shuffle] non-streaming consumed bytes are too low compared to spilled / restored bytes.
#16149 closed
Jun 30, 2025 - [ray] Multiple concurrent requests to create a named actor crash GCS
#15941 closed
Jun 30, 2025 - Remove unused util functions for conda environments
#15912 closed
Jun 30, 2025 - [core] Zero-gpu node shouldn't be marked with accelerator_type resource.
#15878 closed
Jun 30, 2025 - Cannot using external model with cuda when using ray
#15869 closed
Jun 30, 2025 - [wheel][doc] Make it easier to access Ray wheels for specific commits
#15765 closed
Jun 30, 2025 - [rllib]Update the docs about Variable-length / Parametric Action Space
#15710 closed
Jun 30, 2025 - Odd task scheduling behavior on same node
#15602 closed
Jun 30, 2025 - Averaging learning curves over repetitions + plotting confidence intervals [Tune]
#15400 closed
Jun 30, 2025 - AssertionError when using pyinstaller with ray
#15396 closed
Jun 30, 2025 - [core] Memory leak when using local simulated cluster (long_running_tests/workloads/apex.py)
#15305 closed
Jun 30, 2025 - [Core] Bad traceback on failure to reconnect to GCS server.
#15235 closed
Jun 30, 2025 - [metrics] Custom sum metrics have type comment "gauge"
#15150 closed
Jun 30, 2025 - [core] Actor restart does not work when owner dies and constructor task has dependencies
#15076 closed
Jun 30, 2025 - [k8s] ray down command does not remove pods which are in evicted state
#14958 closed
Jun 30, 2025 - [Tune] [Ray Client] tune_cifar10_gluon example fails with Ray Client
#14946 closed
Jun 30, 2025 - [ray white paper] broken links
#14897 closed
Jun 30, 2025 - Fix Asyncio Event Metrics on Java
#14715 closed
Jun 30, 2025 - Add ray.__wheel__ with a link to the wheel to install the same version
#14623 closed
Jun 30, 2025 - Pre-push hooks allow code to be pushed that fails LINT
#14367 closed
Jun 30, 2025 - __del__ magic method can't access class properties
#14285 closed
Jun 30, 2025 - Failed to load actor due to dependencies not being pickled
#14284 closed
Jun 30, 2025 - optimization: Client blocks on releasing references due to detached actor race condition
#14137 closed
Jun 30, 2025 - [autoscaler] request resources doesn't work with multiple jobs
#13534 closed
Jun 30, 2025 - [Metrics] Custom metrics don't work after calling `ray.shutdown()` followed by `ray.init()`
#13532 closed
Jun 30, 2025 - Unify linting of clang-format and *.proto files
#13465 closed
Jun 30, 2025 - Hang or Deadlock when calling ray.get() inside pytorch Dataset when DataLoader with num_workers >0
#13407 closed
Jun 30, 2025 - [core] Unwanted pickling behaviour when starting remote actor with @propery
#13365 closed
Jun 30, 2025 - Explore Protos as the Ray Client pickle transport (instead of namedtuples)
#13280 closed
Jun 30, 2025 - SIGKILL generates core dumps on some systems
#13221 closed
Jun 30, 2025 - Object store thrashing if it runs ray.get in a non-main thread.
#12906 closed
Jun 30, 2025 - Canonicalize the python lint options
#12801 closed
Jun 30, 2025 - [autoscaler] refactor duplicate code for handling request_resources().
#12699 closed
Jun 30, 2025 - [Dashboard]Profile Actor Button Not Working
#12668 closed
Jun 30, 2025 - [core] bytearray is parsed as bytes in remote function
#12648 closed
Jun 30, 2025 - Ray grinds to a halt if both PyTorch and TensorFlow are installed
#12467 closed
Jun 30, 2025 - Ray does not handle MIG devices
#12413 closed
Jun 30, 2025 - [tune] progress reporter should limit table to 80char
#12374 closed
Jun 30, 2025 - [serve] Distributed Tracing Support in Serve
#12320 closed
Jun 30, 2025 - [metrics] Replace ray timeline with distributed tracing
#12315 closed
Jun 30, 2025 - [metrics] Support filtering logs streamed to driver by actor/task
#12305 closed
Jun 30, 2025 - [serve] Support more expressive policies for choosing replicas
#12296 closed
Jun 30, 2025 - [Tune] [PBT] Automatic experiment restart for synch=True
#12122 closed
Jun 30, 2025 - [tune] [wandb] Experiment checkpointing fails with `WandbTrainableMixin`
#11917 closed
Jun 30, 2025 - [tune] quniform distribution
#11879 closed
Jun 30, 2025 - [docs] improve tune distributed tuning guide
#11681 closed
Jun 30, 2025 - [tune] doc should indicate print output
#11679 closed
Jun 30, 2025 - [cli] attach `--tmux` should show parallel command output
#11678 closed
Jun 30, 2025 - [tune] Client API improvements
#11676 closed
Jun 30, 2025 - [cloudpickle] Too much override for cloudpickle, breaks scikit-learn usage
#11547 closed
Jun 30, 2025 - [docs] search results don't link to correct tab
#11288 closed
Jun 30, 2025 - [Autoscaler] Prioritize infeasible bundles and placement group rescheduling
#11259 closed
Jun 30, 2025 - Remove the `remove_after_get` flag
#10977 closed
Jun 30, 2025 - [placement groups] Feasibility Check
#10913 closed
Jun 30, 2025 - [autoscaler] Add unit tests for sdk.py
#10903 closed
Jun 30, 2025 - Add testing to `commands.py`/`NodeUpdaterThread` level
#10846 closed
Jun 30, 2025 - Treat CPUs as abstract resources
#10818 closed
Jun 30, 2025 - Installing ray on powerpc
#10774 closed
Jun 30, 2025 - [core] [docs] use-cases for Ray's async support
#10688 closed
Jun 30, 2025 - Exceptions and ResourceWarnings on ray.init (Jupyter+offline)
#10279 closed
Jun 30, 2025 - Can CPU resource scheduling be scheduled through Cgroup?
#10037 closed
Jun 30, 2025 - Windows debugging on gdb does not work
#9827 closed
Jun 30, 2025 - [util.multiprocessing] Support generators
#9712 closed
Jun 30, 2025 - [Core] A ray.remote flag for nested object ID gathering in task arguments.
#9489 closed
Jun 30, 2025 - [docs] ray up <config.xml> --help does not show help
#9455 closed
Jun 30, 2025 - [docs] Document how to use conda environments with the autoscaler
#9199 closed
Jun 30, 2025 - [ray] Visualize Ray dashboard locally/offline
#9095 closed
Jun 30, 2025 - Confusing RedisError when many threads are used
#9083 closed
Jun 30, 2025 - [autoscaler] Check failed: _s.ok() Heartbeat failed: NotImplemented
#8883 closed
Jun 30, 2025 - Can't parallelize non-pickable function with initializer in Pool
#8876 closed
Jun 30, 2025 - DQN Minibatch Option
#8870 closed
Jun 30, 2025 - tune: module 'tensorflow' has no attribute __version__ in Ray Trainable since v0.7.7
#8729 closed
Jun 30, 2025 - Blank redis-password gives wrong message to add node
#8629 closed
Jun 30, 2025 - absl.logging inside remote tasks does not get printed
#8625 closed
Jun 30, 2025 - Invalid iterator dereference in TestReconstructionChain (fails in debug mode)
#8587 closed
Jun 30, 2025 - Can't pickle CudnnModule objects
#8569 closed
Jun 30, 2025 - Reducing unnecessary process overhead in practice
#8522 closed
Jun 30, 2025 - incompatible with 'msgpack_numpy.patch()' function
#8409 closed
Jun 30, 2025 - Error connecting to Redis server at 127.0.0.1:35709
#8389 closed
Jun 30, 2025 - Error while shutting down Ray
#8385 closed
Jun 30, 2025 - [ray] Pyarmor compatibility
#8365 closed
Jun 30, 2025 - [ray] Can RAY pause and continue tasks distributed to the cluster's nodes?
#8263 closed
Jun 30, 2025 - [tune] unify run() and run_experiments()
#8127 closed
Jun 30, 2025 - [ui] More metadata for the task timeline
#8050 closed
Jun 30, 2025 - [tune] Support for config to (optionally) be an argparse.Namespace?
#8006 closed
Jun 30, 2025 - [tune] Resource Allocation UX
#7968 closed
Jun 30, 2025 - `pandas has no attribute 'compat'` Deserialization bug when running tasks very rarely
#7879 closed
Jun 30, 2025 - "Lost reference to actor" when returning actor handle from actor
#7815 closed
Jun 30, 2025 - Ray has both ray.util and ray.utils, which is confusing.
#7787 closed
Jun 30, 2025 - Provide more scheduling algorithms for actors/tasks
#7723 closed
Jun 30, 2025 - [ray] Object store shared memory numpy leak in worker loop
#7653 closed
Jun 30, 2025 - Ray processes on slave node become defunct when the head node is restarted/stopped
#7651 closed
Jun 30, 2025 - Relax python version match requirement when joining a cluster
#7648 closed
Jun 30, 2025 - Does ray workers could share the same tf.sess?
#7646 closed
Jun 30, 2025 - About model configuration.
#7644 closed
Jun 30, 2025 - Probable race condition
#7617 closed
Jun 30, 2025 - Recursion with pickling in ray.init with py3.5
#7605 closed
Jun 30, 2025 - Is it possible to create process inside ray Actor?
#7578 closed
Jun 30, 2025 - Why seems getting from local object store not faster than getting from remote object store?
#7575 closed
Jun 30, 2025 - [util.multiprocessing] Unable to pass Queue to pool.apply_async
#7561 closed
Jun 30, 2025 - Keyword arguments should be keyword only arguments in the Ray API
#7548 closed
Jun 30, 2025 - [Pool] About using ray.util.multiprocessing import Pool
#7542 closed
Jun 30, 2025 - Reporting Reward Breakdowns
#7518 closed
Jun 30, 2025 - [config] Introduce a configuration library for unified configuration code
#7485 closed
Jun 30, 2025 - Proper way of calling a class method in another method
#7450 closed
Jun 30, 2025 - [autoscaler] Provide ability to provide elastic ip when launching cluster
#7446 closed
Jun 30, 2025 - What are system requirements for building on Mac OSX
#7430 closed
Jun 30, 2025 - Ray dashboard integration
#7383 closed
Jun 30, 2025 - Do not suggest calling __ray_terminate__ directly
#7382 closed
Jun 30, 2025 - ray.services.get_node_ip_address doesn't work well if there is a local proxy
#7316 closed
Jun 30, 2025 - Provide abstraction/interface to implement resource isolation for custom resources
#7204 closed
Jun 30, 2025 - [cross-language]Problem about cross language data layout
#7191 closed
Jun 30, 2025 - Documentation for connecting to ray cluster could be improved
#7186 closed
Jun 30, 2025 - ray.experimental.queue is very slow
#7172 closed
Jun 30, 2025 - Using asserts for argument checks is probably a bad idea
#7171 closed
Jun 30, 2025 - [core] Gets timeout on randomly generated ObjectIDs
#7074 closed
Jun 30, 2025 - Allow remote functions to require running on a fresh worker
#7059 closed
Jun 30, 2025 - How to use Ray with closures?
#7055 closed
Jun 30, 2025 - The project `setup.py` script doesn't install tools needed by `ci/travis/format.sh`
#6999 closed
Jun 30, 2025 - Don't run Java or sanitizer tests when only Python changes.
#6992 closed
Jun 30, 2025 - ray plasma object store connection refused after 24hrs
#6988 closed
Jun 30, 2025 - Sharing in memory
#6976 closed
Jun 30, 2025 - [ray] ray on slurm not respecting memory limits
#6968 closed
Jun 30, 2025 - Unable to override ray's default logging format
#6965 closed
Jun 30, 2025 - MADDPG used onto a MultiEnv does not show learning.
#6949 closed
Jun 30, 2025 - How to throttle process to avoid "UnreconstructableError"
#6892 closed
Jun 30, 2025 - pip install from source requires --editable/-e flag
#6845 closed
Jun 30, 2025 - [scheduling] Default actor lifetime resources (0 CPUs) cause cluster not to be saturated
#6814 closed
Jun 30, 2025 - How to Reduce Memory Usage for Creating Actor?
#6778 closed
Jun 30, 2025 - Reconstruction semantics around failing actor constructor.
#6768 closed
Jun 30, 2025 - [Deploy]Ray on Yarn Deployment
#6753 closed
Jun 30, 2025 - failed on virtualnevironment
#6735 closed
Jun 30, 2025 - Managing memory during long loops
#6717 closed
Jun 30, 2025 - Not able to reproduce speed performance improvements using ray on my machine
#6716 closed
Jun 30, 2025 - [tune] Logs don't sync up to workers on restore
#6702 closed
Jun 30, 2025 - The remote_function.options is not documented.
#6699 closed
Jun 30, 2025 - [tune] More robust checkpoint garbage collection
#6697 closed
Jun 30, 2025 - Fault tolerance to dead actors
#6670 closed
Jun 30, 2025 - ray.wait's num_returns should not fail if num_returns > len(results)
#6667 closed
Jun 30, 2025 - Parallel execution of multiple dataframes by dividing them into sub-frames
#6640 closed
Jun 30, 2025 - Batch Norm example failing under APEX
#6638 closed
Jun 30, 2025 - limiting tensorflow memory failed in actor or function
#6633 closed
Jun 30, 2025 - Remote function is executed in python `exec` with empty local/global will fails
#6620 closed
Jun 30, 2025 - [tune] Estimate timing
#6618 closed
Jun 30, 2025 - [streaming] Add micro batching feature
#6607 closed
Jun 30, 2025 - Package reference should include task & actor APIs
#6566 closed
Jun 30, 2025 - Serialization is 20% slower from 0.7.6 -> 0.7.7
#6551 closed
Jun 30, 2025 - [ray] How to write into numpy arrays in shared memory with Ray?
#6507 closed
Jun 30, 2025 - Support for mxnet.ndarray?
#6494 closed
Jun 30, 2025 - [ray] Handle memory pressure more gracefully
#6458 closed
Jun 30, 2025 - Reloading module changes in workers
#6449 closed
Jun 30, 2025 - [tune] [serve] Don't use daemon threads
#6421 closed
Jun 30, 2025 - Terminal freezes after setting @ray.remote(num_gpu=2)
#6418 closed
Jun 30, 2025 - Ray does not preserve requires_grad attribute
#6405 closed
Jun 30, 2025 - Ray over mpi for supercomputers
#6344 closed
Jun 30, 2025 - Support of Ray Decorator for Built in Functions
#6308 closed
Jun 30, 2025 - [docs] Issue on `tune-schedulers.rst`
#6063 closed
Jun 30, 2025 - Can I set priority for my tasks
#6057 closed
Jun 30, 2025 - Avoid putting the redis password in plain text in processlist
#5872 closed
Jun 30, 2025 - Handling `use_pickle=True` with pickle5 serializer and performance regression
#5856 closed
Jun 30, 2025 - Install ray with conda but not pip
#5511 closed
Jun 30, 2025 - [tune] saving mechanism and PBT
#5312 closed
Jun 30, 2025 - Feature request: An API to wait until there are are X resources available
#5243 closed
Jun 30, 2025 - [Feature request] Also expose python function after decorating with ray.remote
#4981 closed
Jun 30, 2025 - Creative action space support: contains method, action interpoalation.
#4837 closed
Jun 30, 2025 - __module__ can be None
#4758 closed
Jun 30, 2025 - [autoscaler] Autoscaler UX Issues
#4656 closed
Jun 30, 2025 - [autoscaler] Add tests that mock endpoints for AWS, GCE
#4303 closed
Jun 30, 2025 - Python Worker class should have proper constructor and destructor.
#3961 closed
Jun 30, 2025 - Should not ignore "AttributeError"
#3820 closed
Jun 30, 2025 - Backend timing statements should be made type safe.
#3341 closed
Jun 30, 2025 - Make it possible to limit memory usage of processes
#3055 closed
Jun 30, 2025 - Task submission from local scheduler client is blocking
#2940 closed
Jun 30, 2025 - Add test for numpy array alignment.
#2937 closed
Jun 30, 2025 - Allow ray.get and ray.wait to take in additional argument types
#2126 closed
Jun 30, 2025 - Remove the import thread from the workers and driver.
#951 closed
Jun 30, 2025 - Remote decorator fails on jitted function.
#593 closed
Jun 30, 2025 - Actors do not work properly with subclasses that call super.
#449 closed
Jun 30, 2025 - Methods on actors inherited from built-in classes are not visible
#278 closed
Jun 30, 2025 - Release test random_shuffle_fixed_size failed
#53806 closed
Jun 30, 2025 - Assessment of the difficulty in porting CPU architecture for Ray
#54162 closed
Jun 30, 2025 - CI test linux://python/ray/data:test_arrow_block_scaling is flaky
#54110 closed
Jun 30, 2025 - [Epic][Docs/KubeRay] Convert doctests back to normal markdown docs
#54072 closed
Jun 28, 2025 - [Docs][KubeRay] Convert configuring-autoscaling.ipynb back to markdown docs
#54077 closed
Jun 28, 2025 - [Docs][KubeRay] Convert rayservice-quick-start.ipynb back to markdown docs
#54076 closed
Jun 28, 2025 - [Docs][KubeRay] Convert raycluster-quick-start.ipynb back to markdown docs
#54074 closed
Jun 28, 2025 - CI test linux://rllib:examples/connectors/multi_agent_with_different_observation_spaces is flaky
#53473 closed
Jun 28, 2025 - [Data] When writing on BigQuery, Google's "TooManyRequests" exceptions is not retried
#53997 closed
Jun 28, 2025 - [RayLLM] RayLLM / vLLM production stack integration
#53331 closed
Jun 27, 2025 - [Core] Ray fails to fulfill request due to node being annotated by IP address
#54152 closed
Jun 27, 2025 - [Docs][KubeRay] Delete KubeRay doctests
#54073 closed
Jun 27, 2025 - CI test linux://:local_object_manager_test is flaky
#54131 closed
Jun 27, 2025 - [Data] `ArrowInvalid` during `ray.data.from_huggingface`: Parquet magic bytes not found in footer
#54101 closed
Jun 27, 2025 - CI test linux://rllib:examples/algorithms/vpg_custom_algorithm is flaky
#53925 closed
Jun 27, 2025 - CI test linux://rllib:examples/algorithms/appo_custom_algorithm_w_shared_data_actor is flaky
#53176 closed
Jun 27, 2025 - CI test linux://:node_manager_test is flaky
#54059 closed
Jun 27, 2025 - [Serve] UnboundLocalError: local variable 'stopped' in deployment state
#54169 closed
Jun 27, 2025 - [Core] Exiting because this node manager has mistakenly been marked as dead by the GCS
#54035 closed
Jun 27, 2025 - [bug][serve.llm] AssertionError: failed to get the hash of the compiled graph (VLM, batch, TP=2)
#53824 closed
Jun 27, 2025 - [Serve, LLM] missing botocore dependency!
#53052 closed
Jun 27, 2025 - Error Handling Large Pyarrow Chunk
#53536 closed
Jun 26, 2025 - CI test linux://python/ray/train/v2:test_controller is consistently_failing
#54147 closed
Jun 26, 2025 - [Serve][LLM] Qwen3 models “enable_thinking: False” still returns thinking process
#52979 closed
Jun 26, 2025 - [Core] Ray fails to fulfill request due to node being annotated by IP address
#54150 closed
Jun 26, 2025 - [Docs][KubeRay] Convert kuberay-gcs-ft.ipynb back to markdown docs
#54078 closed
Jun 26, 2025 - [Docs][KubeRay] Convert rayjob-quick-start.ipynb back to markdown docs
#54075 closed
Jun 26, 2025 - CI test darwin://python/ray/tests:test_basic_3_client_mode is consistently_failing
#54126 closed
Jun 26, 2025 - CI test windows://python/ray/tests:test_basic_3_client_mode is consistently_failing
#54132 closed
Jun 26, 2025 - [Core] Transient network failure on RPC `MarkJobFinished` causes node crash
#53645 closed
Jun 26, 2025 - CI test linux://python/ray/tests:test_basic_3_client_mode is consistently_failing
#54119 closed
Jun 26, 2025 - [Doc] The anchors of headers doesn't follow Vale rules.
#53516 closed
Jun 26, 2025 - CI test linux://:local_object_manager_test is flaky
#54130 closed
Jun 26, 2025 - [Core] Could not connect to socket
#54067 closed
Jun 26, 2025 - [core] TSAN failing on `node_manager_test`
#54096 closed
Jun 25, 2025 - CI test linux://python/ray/data:test_arrow_block is flaky
#48859 closed
Jun 25, 2025 - CI test linux://python/ray/data:test_huggingface is consistently_failing
#44516 closed
Jun 25, 2025 - CI test linux://python/ray/train:accelerate_torch_trainer_no_raydata is consistently_failing
#48939 closed
Jun 25, 2025 - CI test linux://python/ray/train:deepspeed_torch_trainer is consistently_failing
#44517 closed
Jun 25, 2025 - Release test training_ingest_benchmark-task=image_classification.full_training.jpeg failed
#53953 closed
Jun 25, 2025 - [Core] Autoscaler Node Recovery Ignores Node-Specific Docker Config
#53987 closed
Jun 25, 2025 - [Doc][KubeRay] Run doctest `user-guides/configuring-autoscaling.ipynb` in CI
#53989 closed
Jun 25, 2025 - CI test windows://python/ray/tests:test_basic is consistently_failing
#51497 closed
Jun 25, 2025 - [CI] Migrate from flake8 to ruff
#34889 closed
Jun 25, 2025 - [Docker] Upgrade the base image from ubuntu:focal to ubuntu:22.04LTS
#35514 closed
Jun 25, 2025 - CI test linux://python/ray/data:test_backpressure_e2e is flaky
#49963 closed
Jun 25, 2025 - CI test linux://python/ray/tests:test_runtime_env_complicated is consistently_failing
#49674 closed
Jun 25, 2025 - CI test linux://python/ray/data:test_execution_optimizer is consistently_failing
#44410 closed
Jun 25, 2025 - [Dashboard] Decorator that exposes attribute to dashboard for display in grid
#33188 closed
Jun 24, 2025 - [serve] AttributeError when attempting to use serve with cluster and FastAPI
#54008 closed
Jun 24, 2025 - [gcp] Node mistakenly marked dead: increase heartbeat timeout?
#16945 closed
Jun 24, 2025 - Docs on Cython extensions and install requirements
#7094 closed
Jun 24, 2025 - [core] Detached actor being killed when its parent actor crashes
#40864 closed
Jun 24, 2025 - CI test linux://doc:doctest[data] is consistently_failing
#54036 closed
Jun 24, 2025 - CI test linux://python/ray/data:doctest is consistently_failing
#44570 closed
Jun 24, 2025 - [data/proprocessors] Support flattening vector features in concatenator
#51757 closed
Jun 24, 2025 - [Docs][KubeRay] Don't sleep for a long time in `kuberay-gcs-ft.ipynb`
#54040 closed
Jun 24, 2025 - Release test many_nodes_actor_test_on_v2.aws failed
#53990 closed
Jun 24, 2025 - CI test linux://doc/source/train/examples/lightning:lightning_cola_advanced is consistently_failing
#44545 closed
Jun 24, 2025 - CI test linux://python/ray/train:accelerate_torch_trainer is consistently_failing
#44513 closed
Jun 24, 2025 - CI test linux://python/ray/train:deepspeed_torch_trainer_no_raydata is consistently_failing
#44932 closed
Jun 24, 2025 - CI test windows://python/ray/serve/tests:test_request_timeout is flaky
#48417 closed
Jun 24, 2025 - [old]
#54020 closed
Jun 23, 2025 - CI test windows://python/ray/serve/tests:test_batching is consistently_failing
#46016 closed
Jun 23, 2025 - CI test linux://python/ray/tests:test_runtime_env_container is consistently_failing
#45223 closed
Jun 23, 2025 - [CI] `linux://python/ray/tests:test_state_api` is failing/flaky on master.
#54001 closed
Jun 23, 2025 - Ability to select a disk for ray workers
#8607 closed
Jun 23, 2025 - CI test linux://python/ray/serve/tests:test_multiplex is flaky
#48378 closed
Jun 21, 2025 - [RLlib] MAML does not work with TF2 in Ray 2.3.1
#34620 closed
Jun 20, 2025 - [RayData|RayServe] Does RayData/RayServe support multi-node vllm inference
#53192 closed
Jun 20, 2025 - [Core] Core Worker crashing
#49088 closed
Jun 20, 2025 - [core][gpu-objects] Driver tries to get the data from in-actor store
#51272 closed
Jun 19, 2025 - [Core][ROCm] Setting CUDA_VISIBLE_DEVICES leads to an assertion
#52701 closed
Jun 19, 2025 - [Autoscaler][V2] Autoscaler fails to delete idle KubeRay Pod
#52264 closed
Jun 19, 2025 - CI test linux://python/ray/data:test_consumption is flaky
#48163 closed
Jun 19, 2025 - CI test windows://python/ray/tests:test_actor_state_metrics is consistently_failing
#46303 closed
Jun 19, 2025 - [data] ray.data.read_images is slower than reading images manually
#37499 closed
Jun 19, 2025 - [RFC] Q2 Ray Data Roadmap
#51808 closed
Jun 19, 2025 - [RFC] LLM APIs for Ray Data and Ray Serve
#50639 closed
Jun 19, 2025 - CI test windows://python/ray/serve/tests:test_standalone_3 is flaky
#44003 closed
Jun 19, 2025 - Release test compiled_graphs failed
#53716 closed
Jun 18, 2025 - CI test darwin://python/ray/tests:test_metrics_agent_open_telemetry is consistently_failing
#53828 closed
Jun 18, 2025 - [RLlib] ActionMaskingTorchRLModule can't set up `conv_filters`
#53325 closed
Jun 18, 2025 - [Core] `ray.init()` and `ray start` fails on Windows 11 in ray 2.45+
#52739 closed
Jun 18, 2025 - CI test windows://python/ray/tests:test_object_spilling_debug_mode is flaky
#43796 closed
Jun 18, 2025 - [core] support S3 path style access in runtime_env download_and_unpack_package()
#53893 closed
Jun 17, 2025 - How to transfer tensors stored in GPU in actor with NCCL?
#53816 closed
Jun 17, 2025 - [Data] PyArrow 20.0.0 Backward Incompatability (`unexpected keyword argument 'maps_as_pydicts'`)
#52685 closed
Jun 17, 2025 - CI test linux://python/ray/tests:test_gpu_objects_nccl is consistently_failing
#53871 closed
Jun 17, 2025 - [RLlib] Headnode without GPU triggers torch/CUDA de-serialization error
#53467 closed
Jun 17, 2025 - [Core] Ray Autoscaler does not restart a worker node on setup failure
#29127 closed
Jun 17, 2025 - Release test llm_batch_vllm failed
#53827 closed
Jun 17, 2025 - [Serve] Add timeout parameter for `deploy`
#25433 closed
Jun 17, 2025 - [Core] Read-only buffer error in some scikit-learn models
#52571 closed
Jun 17, 2025 - [core] ray stop --force doesn't kill processes on worker node
#28038 closed
Jun 17, 2025 - [core][gpu-objects] Support TensorDict
#51550 closed
Jun 17, 2025 - [core][gpu-objects] Allocate placeholder tensor on corresponding devices
#53622 closed
Jun 17, 2025 - [core][gpu-objects] Driver should order all collective calls to avoid deadlock
#51264 closed
Jun 17, 2025 - CI test windows://python/ray/tests:test_object_spilling_asan is consistently_failing
#45962 closed
Jun 17, 2025 - CI test windows://python/ray/tests:test_object_spilling is consistently_failing
#45961 closed
Jun 16, 2025 - [RLlib] Add syntax checking to configuration string literals or migrate to enums.
#39384 closed
Jun 16, 2025 - [Ray Core] Ray error causes the Python interpreter to terminate without failing
#28211 closed
Jun 16, 2025 - [CI] Test GPU training tutorial with Ray Release tests
#28902 closed
Jun 16, 2025 - [core][gpu-objects] intra-process communication
#51685 closed
Jun 16, 2025 - CI test windows://python/ray/tests:test_basic_client_mode is flaky
#52117 closed
Jun 13, 2025 - [Serve] check_health with custom exception does not enter failed state, infinite retries
#53742 closed
Jun 13, 2025 - [core][gpu-objects] Object contains multiple tensors and/or mix of CPU data and GPU tensors
#51274 closed
Jun 13, 2025 - CI test windows://python/ray/serve/tests:test_standalone_with_comp_sche is flaky
#48425 closed
Jun 13, 2025 - CI test linux://python/ray/tune:test_tuner is consistently_failing
#53786 closed
Jun 13, 2025 - Release test serve_autoscaling_load_test.aws failed
#53760 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot1_8B_quantized_tp1_2p6d failed
#53769 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot1_8B_quantized_tp1_1p1d failed
#53768 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot1_8B_quantized_tp_1 failed
#53764 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot1_8B_lora failed
#53766 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot2_1B_s3 failed
#53767 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot2_1B_no_accelerator failed
#53765 closed
Jun 13, 2025 - Release test llm_serve_llama_3dot1_8B_tp_2 failed
#53763 closed
Jun 13, 2025 - Release test serve_scale_replicas.aws failed
#53761 closed
Jun 13, 2025
134 Issues opened by95 people
- [Tune] New Trial status
#54564 opened
Jul 12, 2025 - CI test linux://python/ray/tests:test_gpu_objects_gloo is consistently_failing
#54552 opened
Jul 12, 2025 - [Ray-llm on Google Cloud] Ray cannot detect GPU device in ray-llm latest version
#54551 opened
Jul 12, 2025 - [data] Possible bug in 3.0.0dev with autoscaling
#54548 opened
Jul 12, 2025 - [Ray Cluster: Azure provider] Enable automatic keypair creation
#54545 opened
Jul 11, 2025 - CI test linux://python/ray/tests:test_state_api is flaky
#54541 opened
Jul 11, 2025 - [data] Ray Autoscaling - Suboptimal Performance with Actors
#54540 opened
Jul 11, 2025 - [Ray Metric Infra] improvement backlogs
#54538 opened
Jul 11, 2025 - [RLlib] Add RLlibCallback on Checkpoint Creation
#54524 opened
Jul 11, 2025 - [data] introduce per-op config options
#54520 opened
Jul 10, 2025 - [Core][Draft] Followup Work on Task Events Buffer
#54515 opened
Jul 10, 2025 - [Serve] non-blocking reconfigure design
#54509 opened
Jul 10, 2025 - [Ray Server: Deployment] Failed to update the deployments ['LLMRouter'].
#54500 opened
Jul 10, 2025 - [serve.llm] Dimensions api of embedding req does not work for serve.llm
#54498 opened
Jul 10, 2025 - [Core] Tensor Transport GPU Path Not Triggered Due to Missing Cython Constants
#54463 opened
Jul 9, 2025 - [RLlib] Cannot export onnx from DefaultDQNTorchRLModule
#54461 opened
Jul 9, 2025 - Can't run python unit tests via compiled Ray wheel?
#54451 opened
Jul 8, 2025 - [data] Autoscaling ignores disk pressure
#54442 opened
Jul 8, 2025 - [Data] Allow disabling Task Fusion / Documenting how to avoid it
#54433 opened
Jul 8, 2025 - [Data] Downstream Stages Run Sequentially After Fanout in Ray Data
#54430 opened
Jul 8, 2025 - [Tune] TorchTrainer fails with Repeater using set_index=True
#54421 opened
Jul 8, 2025 - [data.llm] Fix AttributeError for the shallow copy of data batch transfer
#54420 opened
Jul 8, 2025 - [RLlib] Multi-Agent Environments Documentation is BAD
#54416 opened
Jul 8, 2025 - [Core] `DeleteObjects` fails silently on transient network failure
#54412 opened
Jul 8, 2025 - [RLlib] - `MultiDiscrete` action spaces with different category numbers do not work with `LSTM`.
#54409 opened
Jul 8, 2025 - [RLlib] - Restoring run with stateful connectors leads to discontinuities.
#54408 opened
Jul 8, 2025 - [Train] [Good First Issue] Bug in sample code in documentation
#54401 opened
Jul 8, 2025 - [Proxy] X-Request-ID not output to proxy log file
#54400 opened
Jul 8, 2025 - ray.exceptions.RayTaskError(CompilationError)
#54399 opened
Jul 8, 2025 - [Core] Ray log files in /tmp/ray/session_latest/logs don't use JSON encoding
#54388 opened
Jul 7, 2025 - [core] `test_actor_holding_serialized_reference` is flaky
#54384 opened
Jul 7, 2025 - [core] Implement graceful process termination on Windows
#54374 opened
Jul 7, 2025 - [data] Empty DatabricksUCDatasource provides unhelpful error
#54369 opened
Jul 7, 2025 - [data] OOM killer kicks in but vLLM gpu processes are not cleaned up
#54364 opened
Jul 7, 2025 - [core] max_retry configuration in the task does not take effect.
#54342 opened
Jul 4, 2025 - RuntimeError: Failed to unpickle serialized exception | how can i solve it?
#54341 opened
Jul 4, 2025 - SIGABRT when using the V2 interface
#54337 opened
Jul 3, 2025 - [core] Make Ray tolerant to transient network failures
#54332 opened
Jul 3, 2025 - [core] Fix GcsStatus vs. gRPC status semantics
#54327 opened
Jul 3, 2025 - [Core] Raylet heartbeat misses
#54321 opened
Jul 3, 2025 - [Rllib] module-to-env connector pipeline is initialized with environment action spaces
#54314 opened
Jul 3, 2025 - [Ray Tune + Train V2] Resources not released when Train trial stopped by Tune
#54305 opened
Jul 3, 2025 - [RLLIB] EnvRunnerGroup tag parameter passthru
#54294 opened
Jul 2, 2025 - [<Ray component: Core|RLlib|etc...>] Ray BigQuery Reader Query Response Limit
#54288 opened
Jul 2, 2025 - [RLlib] Unexpected KeyError while training SAC
#54284 opened
Jul 2, 2025 - [core][gpu-objects] tensor_transport doesn’t transfer correctly when the argument is not inlined
#54281 opened
Jul 2, 2025 - [Core] - providing `py_executable=uv run` causes failures with unloadable logs
#54275 opened
Jul 2, 2025 - [Tune] RuntimeError: unexpected pos error when saving model state_dict in Ray Tune training
#54274 opened
Jul 2, 2025 - [<Ray component: Core|RLlib|etc...>] uv + ray in example is not working
#54263 opened
Jul 2, 2025 - [core] race condition between task cancelling and retries
#54260 opened
Jul 1, 2025 - [RLlib,Tune,AIR] Checkpointing scoring per custom metric does not work
#54251 opened
Jul 1, 2025 - [core] Passing `_spill_on_unavailable=True` with `soft=False` crashes the raylet
#54246 opened
Jul 1, 2025 - Support for Hybrid Autoscaling with Multiple Node Providers (e.g., On-Prem + Cloud)
#54245 opened
Jul 1, 2025 - [Core] nested pydantic basemodel missing values
#54242 opened
Jul 1, 2025 - [Core] Tasks cannot be submitted after unexpected restart of the head node with Redis enabled
#54241 opened
Jul 1, 2025 - [core][gpu-objects] Support hierarchy GPU objects management
#54240 opened
Jul 1, 2025 - [Core] Zombie Processes Issue Caused by Unreclaimed Exit Status after Ray Job Execution
#54237 opened
Jul 1, 2025 - [core] replace incompatible container use between Plasma APIs (sets vs vectors)
#54215 opened
Jun 30, 2025 - [core] clean up node_manager unit tests and add additional coverage
#54213 opened
Jun 30, 2025 - [Core] `GcsPublish` retries incorrectly when encountering transient error
#54208 opened
Jun 30, 2025 - [Serve] Deadlock when awaiting DeploymentResponse
#54201 opened
Jun 29, 2025 - [data] Custom local shuffling batcher/sampler for Dataset.iter_*
#54197 opened
Jun 28, 2025 - [RLlib] Restored *custom* metrics after check-pointing is broken since 2.47
#54174 opened
Jun 27, 2025 - Ray&slurm&NPU
#54170 opened
Jun 27, 2025 - [core][gpu-objects] Hide the details of constructing process groups
#54168 opened
Jun 27, 2025 - [core][gpu-objects] Support streaming generator
#54167 opened
Jun 27, 2025 - [core][gpu-objects] Support DTensor
#54166 opened
Jun 27, 2025 - [core] ray.util.state.api.get_actor with timeout = 1s does not work
#54153 opened
Jun 26, 2025 - [core] Improving Ray Typing annotation
#54149 opened
Jun 26, 2025 - [Core] ray job submit may hang in some scenarios
#54120 opened
Jun 26, 2025 - [Core] ray._raylet.CoreWorker.put_file_like_object, parameter owner_address unused
#54100 opened
Jun 25, 2025 - [Data] Allow parameterized queries in `read_sql`
#54098 opened
Jun 25, 2025 - [RLlib] num_env_steps_sampled_lifetime is wrong after checkpoint loaded - bug changed in 2.47
#54089 opened
Jun 25, 2025 - [Core] When pinning object, transient error on RPC `PubsubLongPolling` causes job stuck
#54081 opened
Jun 25, 2025 - [serve.llm] vLLM engine became unhealthy under high incoming traffic
#54070 opened
Jun 25, 2025 - [data] support streaming writes for `write_lance`
#54069 opened
Jun 25, 2025 - [train] Can not start training on more than one node
#54065 opened
Jun 25, 2025 - [train] Add Azure Files support to persistent storage documentation
#54054 opened
Jun 24, 2025 - [Core] ray cannot start under macos + anaconda + python 3.13 + bash
#54047 opened
Jun 24, 2025 - [Core] Ray postmortem debugging does not work with python 3.12
#54044 opened
Jun 24, 2025 - [RFC] Improving Ray for Post-Training / RL for LLM Projects
#54021 opened
Jun 23, 2025 - [Core] Multi-threaded ray.get can hang in certain situations.
#54007 opened
Jun 23, 2025 - [CI] `linux://python/ray/tests:test_scheduling_debug_mode` is failing/flaky on master.
#54002 opened
Jun 23, 2025 - Ray worker resolves module to __init__.py instead of actual file for nested package class
#53998 opened
Jun 22, 2025 - [data] Slow fetching of metadata for large number of parquet files
#53995 opened
Jun 22, 2025 - [Rllib] Bug in TorchMultiDistribution logp prevents policy mapping from being used
#53994 opened
Jun 22, 2025 - [core][gpu-objects] Allow sending ObjectRefs to other processes
#53978 opened
Jun 20, 2025 - [core][gpu-objects] Support ray.put
#53977 opened
Jun 20, 2025 - [core][gpu-objects] RDMA support for data transfer
#53976 opened
Jun 20, 2025 - [Dashboard] Support for List Tasks Filter Pushdown
#53970 opened
Jun 20, 2025 - [Data] Add support to turn off strict block-size enforcement
#53954 opened
Jun 19, 2025 - [Core] `InternalKVPut` retries incorrectly when encountering transient error
#53946 opened
Jun 19, 2025 - PolicyServer and PolicyClient Demo Issue
#53926 opened
Jun 18, 2025 - Windows VS WSL2
#53924 opened
Jun 18, 2025 - [Docker][CI] Add Python 3.13 Ray Image to CI
#53923 opened
Jun 18, 2025 - [serve.llm] Ray LLM serving not respecting max_completion_tokens parameter
#53922 opened
Jun 18, 2025 - [Ray V2 Tune + Train] Tuner is not aware of resources and oversubscribes leading to deadlocks
#53921 opened
Jun 18, 2025 - [Data/Preprocessors]: Preprocessors do not work with nested records
#53920 opened
Jun 18, 2025 - [Core] Ray Does Not Detect GPU
#53919 opened
Jun 18, 2025 - Multiple CVEs in Ray's compiled dependencies
#53915 opened
Jun 18, 2025 - Using ray for LLM inference got errors
#53907 opened
Jun 18, 2025 - [CI] `linux://python/ray/data:test_consumption` is failing/flaky on master.
#53897 opened
Jun 17, 2025 - [Data]Pylint detection found some Python code defects in ray data
#53881 opened
Jun 17, 2025 - [dashboard] Support to overwrite the _client_max_size of http request entity
#53879 opened
Jun 17, 2025 - [RLlib] Significant drop in DQN training reward when resuming from checkpoint
#53878 opened
Jun 17, 2025 - [RLlib] Checkpoint metrics loading with Tune is broken in 2.47.0
#53877 opened
Jun 17, 2025 - Issue: Ray Dashboard Links to Grafana Return "Dashboard Not Found" (Windows)
#53876 opened
Jun 17, 2025 - [serve.llm] LLM serving seems not working with mistral tokenizer.
#53873 opened
Jun 17, 2025 - [Core] ray.ActorID.nil().job_id
#53872 opened
Jun 17, 2025 - [Core] Ray 2.47 regression: All tasks hang when using `uv`
#53848 opened
Jun 16, 2025 - [RLlib] Typo in error message on line 37 of ray/rllib/env/utils/__init__.py
#53841 opened
Jun 16, 2025 - [rllib] [bug] Official PPO Atari example fails with IndexError
#53836 opened
Jun 15, 2025 - [Tune|RLlib] PBT reward drop - not checkpointing or restoring properly
#53831 opened
Jun 14, 2025 - [Dashboard] Discrepancy between Worker Process Memory Display on Dashboard and RSS Statistics
#53829 opened
Jun 14, 2025 - [flaky] test_scheduling_2.py::test_demand_report_when_scale_up
#53811 opened
Jun 13, 2025 - [Data] Custom Partitioner in Ray Data and Related Implementation Considerations
#53800 opened
Jun 13, 2025 - [Core] Transient network failure on RPC `WaitForActorRefDeleted` causes actor registration fail
#53797 opened
Jun 13, 2025 - How to enable tool calling in serve llm?
#53795 opened
Jun 13, 2025 - [RLlib] Checkpointing fails with CUDA GPU learner using the new API stack
#53793 opened
Jun 13, 2025 - [<Ray component: Core|RLlib|etc...>] Issue of port allocation
#53790 opened
Jun 13, 2025
327 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
- [Core] Add Logic to Emit Task Events to Event Aggregator
#53402 commented on
Jul 10, 2025 • 49 new comments - Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API
#53578 commented on
Jul 9, 2025 • 36 new comments - [RLlib; docs] Docs do-over (new API stack): `ConnectorV2` documentation (part I).
#53732 commented on
Jul 4, 2025 • 25 new comments - [serve.llm] Refactor/Consolidate LoRA downloading
#53714 commented on
Jul 11, 2025 • 19 new comments - Add progress bars to hash operators
#53175 commented on
Jul 12, 2025 • 17 new comments - [core][telemetry/11] record histogram metric e2e
#53740 commented on
Jul 12, 2025 • 14 new comments - [core] enable -Wshadow for all c++ targets
#53194 commented on
Jul 10, 2025 • 14 new comments - [Core] Add default Ray Node labels at Node init
#53360 commented on
Jun 30, 2025 • 13 new comments - [core] Add switch for the cache of runtime env
#53775 commented on
Jul 12, 2025 • 12 new comments - Relax check_version_info to check for bytecode compatibility
#41373 commented on
Jul 12, 2025 • 11 new comments - [core] Add as_completed and map_unordered APIs
#53461 commented on
Jul 9, 2025 • 7 new comments - [core] Support pip_install_options for pip
#53551 commented on
Jul 11, 2025 • 5 new comments - Kuberay as one implementation of the operator model
#53318 commented on
Jul 2, 2025 • 5 new comments - [core]: Correct podman output parsing for image uri in runtime env
#53653 commented on
Jul 11, 2025 • 4 new comments - [Dashboard] Add GPU component usage
#52102 commented on
Jul 10, 2025 • 4 new comments - [Core] Deserialization of PyArrow Extension Arrays by registration of deserializers
#51972 commented on
Jul 8, 2025 • 4 new comments - [data] Add GroupedData.random_sample() for group-wise sampling
#53313 commented on
Jul 8, 2025 • 4 new comments - (serve.llm) Make _LLMServerBase.__init__ synchronous
#53719 commented on
Jul 1, 2025 • 3 new comments - Add Apple silicon GPU(mps) support to ray
#38464 commented on
Jul 12, 2025 • 2 new comments - [Data,Train] Add helpful errors when running forbidden methods on sharded datasets
#52079 commented on
Jul 10, 2025 • 2 new comments - [core] Support broadcast and reduce collective for compiled graphs
#53625 commented on
Jul 12, 2025 • 2 new comments - [RLlib] Enable spliting and zero padding of Dict observation
#50589 commented on
Jul 8, 2025 • 2 new comments - [serve.llm] Add useful logging in prefill_decode_disagg.py
#53604 commented on
Jul 9, 2025 • 1 new comment - [core] Returning a useful message when trying to get logs for a job that has not started yet
#53174 commented on
Jul 1, 2025 • 1 new comment - [data] fix lance dataset schema
#53134 commented on
Jul 8, 2025 • 1 new comment - [doc] add jax example
#51040 commented on
Jul 2, 2025 • 1 new comment - [Core] Split stats_metric into smaller targets to improve build performance
#50595 commented on
Jul 7, 2025 • 0 new comments - [Build][Deps] Add new `ray[azure]` extra package
#48847 commented on
Jul 10, 2025 • 0 new comments - [core] Thread-safe gcs node manager
#50024 commented on
Jul 9, 2025 • 0 new comments - [DATA]Add custom resources in data autoscaling
#49756 commented on
Jul 11, 2025 • 0 new comments - [core] Don't get dashboard address after each dashboard connection failure
#49584 commented on
Jul 7, 2025 • 0 new comments - [core][cgraph] Use cv instead of busy wait for next version
#49542 commented on
Jul 7, 2025 • 0 new comments - [Fix][Core] Periodically check log message queue cleared before shutdown
#49337 commented on
Jul 9, 2025 • 0 new comments - [RLlib] Add NPU and HPU support to RLlib
#49535 commented on
Jul 1, 2025 • 0 new comments - [core][cgraph] Use threadpool and one io_context for mutable object provider
#49500 commented on
Jul 7, 2025 • 0 new comments - [serve.llm] Update ray-llm docker
#53532 commented on
Jul 1, 2025 • 0 new comments - [Fix][Core] Fail fast if the dashboard agent fails to launch the HTTP server
#51960 commented on
Jul 9, 2025 • 0 new comments - [core] add ray.util.concurrent.futures.RayExecutor
#51933 commented on
Jul 12, 2025 • 0 new comments - Add new autoscaling parameter `aggregation function`
#51905 commented on
Jul 10, 2025 • 0 new comments - [Data] Fix bug where pandas blocks don't use tensor extension
#51868 commented on
Jul 12, 2025 • 0 new comments - [core][wip] Trying bzlmod
#51834 commented on
Jul 7, 2025 • 0 new comments - [core] Remove client call tag
#51817 commented on
Jun 29, 2025 • 0 new comments - [core] Remove object store runner
#51766 commented on
Jul 7, 2025 • 0 new comments - [Core] Native CPU affinity support for accelerators
#51719 commented on
Jul 6, 2025 • 0 new comments - [core] Lazily subscribe to node changes from workers
#51718 commented on
Jun 29, 2025 • 0 new comments - windows dev setup
#51678 commented on
Jul 10, 2025 • 0 new comments - update to protbuf-28.2, absl-20240722, grpc-1.67 and patch for windows
#51673 commented on
Jul 3, 2025 • 0 new comments - [Docs][wip] Feature: adopt llms.txt convention
#51605 commented on
Jul 11, 2025 • 0 new comments - [Core] Cover cpplint for ray/src/ray/common
#51551 commented on
Jul 8, 2025 • 0 new comments - [Dashboard] Support reporting AMD GPU usage
#51345 commented on
Jul 6, 2025 • 0 new comments - [CI] Replace `black` with `ruff format`
#51332 commented on
Jul 10, 2025 • 0 new comments - Suppress type error
#50994 commented on
Jul 7, 2025 • 0 new comments - fix restore BUG "RuntimeError: Expected scalars to be on CPU, got cud…
#50983 commented on
Jul 5, 2025 • 0 new comments - [CI] Enable pretty-format-java pre-commit hook
#50957 commented on
Jul 9, 2025 • 0 new comments - [core] Cover cpplint for ray/src/ray/stats
#50678 commented on
Jul 7, 2025 • 0 new comments - CI test windows://python/ray/serve/tests:test_controller_recovery is consistently_failing
#46022 commented on
Jul 12, 2025 • 0 new comments - CI test linux://doc/source/llm/examples/batch:vllm-with-lora is consistently_failing
#50881 commented on
Jul 12, 2025 • 0 new comments - CI test linux://python/ray/llm/tests:batch/gpu/stages/test_vllm_engine_stage is consistently_failing
#52075 commented on
Jul 12, 2025 • 0 new comments - CI test linux://python/ray/llm/tests:batch/gpu/processor/test_vllm_engine_proc is consistently_failing
#52074 commented on
Jul 12, 2025 • 0 new comments - CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_cpu is consistently_failing
#47465 commented on
Jul 12, 2025 • 0 new comments - CI test linux://rllib:learning_tests_multi_agent_pendulum_sac_multi_cpu is consistently_failing
#47264 commented on
Jul 12, 2025 • 0 new comments - [ Core] cannot serialize polars.LazyFrame
#46343 commented on
Jul 12, 2025 • 0 new comments - [Data] Stratification in train_test_split
#53297 commented on
Jul 11, 2025 • 0 new comments - [Core|Dataset] Ray job stuck with idle actors with no tasks
#45822 commented on
Jul 11, 2025 • 0 new comments - resource leak in ray/pthon/ray/node.py
#9546 commented on
Jul 11, 2025 • 0 new comments - [Serve] Different Downscale Delay for Scale to Zero
#52867 commented on
Jul 11, 2025 • 0 new comments - [Core] pip runtime env cache by filename instead of the actual file content
#41827 commented on
Jul 11, 2025 • 0 new comments - [Data] Support for SQL/DataFrame capability
#53693 commented on
Jul 11, 2025 • 0 new comments - [Runtime Environment] Remove cached python libs, working dir etc
#47488 commented on
Jul 11, 2025 • 0 new comments - [Core] runtime_env: can't update an application installed from gitlab
#44423 commented on
Jul 11, 2025 • 0 new comments - [Core] ray.init() hangs/fails after "Started a local Ray instance."
#31897 commented on
Jul 11, 2025 • 0 new comments - [Core] [Dashboard] Support a way to stream data from the dashboard service to persist externally
#53073 commented on
Jul 11, 2025 • 0 new comments - Ray Serve Replica Initialization Timeout: STDOUT "Failed to load", RequestCancelledError, Likely Due to Slow/Crashing RLModule.from_checkpoint()
#53079 commented on
Jul 11, 2025 • 0 new comments - [AIR] read_file_from_uri() print Segmentation Fault message while loading from S3 bucket
#32931 commented on
Jul 2, 2025 • 0 new comments - [Fix][GCS] Implement reconnection for RedisContext
#48781 commented on
Jul 9, 2025 • 0 new comments - Fix invalid type for progress_reporter parameter of RunConfig
#48439 commented on
Jul 3, 2025 • 0 new comments - [doc] fix: Typo and missing import in doc
#48311 commented on
Jul 3, 2025 • 0 new comments - [WIP][core] C++20 upgrade
#48044 commented on
Jul 4, 2025 • 0 new comments - [Data] Fix parallelism deriving heuristic to ensure parallelism stays w/in min/max bounds
#47695 commented on
Jul 12, 2025 • 0 new comments - [bazel] move python rules up
#47260 commented on
Jul 11, 2025 • 0 new comments - Fix malformed `temp_dir` path when connecting Windows workers to cluster with Linux head
#45930 commented on
Jul 8, 2025 • 0 new comments - Enable setting OS disk size in Azure
#45867 commented on
Jul 2, 2025 • 0 new comments - [RLlib] DreamerV3 on PyTorch.
#45463 commented on
Jul 2, 2025 • 0 new comments - blind try on ubuntu upgrade ..
#45427 commented on
Jun 30, 2025 • 0 new comments - [data] add better support for list-typed fields when using `write_bigquery`
#44564 commented on
Jul 7, 2025 • 0 new comments - Ray IPv6 support
#44252 commented on
Jul 12, 2025 • 0 new comments - verify windows wheels.
#43442 commented on
Jul 9, 2025 • 0 new comments - [dashboard] ignore reinit error when getting dashboard url
#40545 commented on
Jul 5, 2025 • 0 new comments - Update pettingzoo_env.py
#39431 commented on
Jul 1, 2025 • 0 new comments - [ci] remove is_automated_build in setup.py
#36547 commented on
Jun 30, 2025 • 0 new comments - [RLLib][Air] MLFlow parsing of RLLib evaluation and custom metrics
#26711 commented on
Jun 17, 2025 • 0 new comments - [Ray Data] Categorizer throws internal errors during doctest
#50285 commented on
Jul 12, 2025 • 0 new comments - CI test windows://python/ray/serve/tests:test_deploy_app is consistently_failing
#46448 commented on
Jul 12, 2025 • 0 new comments - [core] Turn executed task inserted into a RAY_CHECK
#53522 commented on
Jul 7, 2025 • 0 new comments - [RLlib] Wrapper which allows EnvRunners to operate on environments with Repeated observation spaces
#53519 commented on
Jul 8, 2025 • 0 new comments - [Data] Add a data compaction function
#53489 commented on
Jul 13, 2025 • 0 new comments - [WIP][Data] Batch query for block_ref_iter
#53485 commented on
Jul 13, 2025 • 0 new comments - [Data] Add dropna function
#53464 commented on
Jul 13, 2025 • 0 new comments - [core] Check if a task can be spilled before checking if args can be pinned
#53462 commented on
Jul 3, 2025 • 0 new comments - [Data] Added distinct function
#53460 commented on
Jul 9, 2025 • 0 new comments - [Data] Add fillna function
#53459 commented on
Jul 12, 2025 • 0 new comments - [WIP][Data] Add support for Arrow native fixed-shape tensor type
#53450 commented on
Jul 13, 2025 • 0 new comments - Bump torch from 2.0.1 to 2.7.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#53447 commented on
Jul 3, 2025 • 0 new comments - feat: Add QPS-based autoscaling policy for Ray Serve
#53445 commented on
Jul 8, 2025 • 0 new comments - [Data] add switch for optimizer rules
#53427 commented on
Jul 13, 2025 • 0 new comments - [Data] Add support for ray.dataset.map_sql
#53417 commented on
Jul 13, 2025 • 0 new comments - kuberay edits
#53411 commented on
Jul 2, 2025 • 0 new comments - [WIP] [core] Attempting a basic solution to streaming generator not adding errors to plasma
#53393 commented on
Jul 3, 2025 • 0 new comments - [serve.llm] DO NOT REVIEW, IN DRAFT
#53391 commented on
Jul 13, 2025 • 0 new comments - Filter out ANSI escape codes from logs when retrieving logs from the dashboard
#53370 commented on
Jul 4, 2025 • 0 new comments - [Core][Bug fix]Fix issue: the streaming generator will mark the inplasma object that is already ready as failed after the task fails.
#53773 commented on
Jul 2, 2025 • 0 new comments - [core] Control whether to construct a default concurrency group executor when max-concurrency=1 and there are other concurrency groups for an actor
#53770 commented on
Jul 1, 2025 • 0 new comments - Minor Documentation Fixes in Protobuf Files
#53731 commented on
Jul 3, 2025 • 0 new comments - [RLlib] Examples folder do-over (vol 53): Learning 2-agent cartpole with global observation, 1 policy outputting all agents' actions, and individual rewards.
#53697 commented on
Jul 1, 2025 • 0 new comments - Bump requests from 2.32.3 to 2.32.4 in /python
#53691 commented on
Jul 12, 2025 • 0 new comments - [data] allow max_calls to be a static but not dynamic option
#53687 commented on
Jul 13, 2025 • 0 new comments - [Air] Add Video FPS Support for `WandbLoggerCallback`
#53638 commented on
Jul 12, 2025 • 0 new comments - [core] Gcs actor manager cleanup
#53633 commented on
Jul 6, 2025 • 0 new comments - [rllib] IMPALA fix no attribute '_minibatch_size'
#53620 commented on
Jul 4, 2025 • 0 new comments - [core] Remove experimental `max_cpu_frac_per_node`
#53610 commented on
Jul 10, 2025 • 0 new comments - [core] Cleanup retryable grpc client
#53599 commented on
Jul 6, 2025 • 0 new comments - [CI] Re-enable isort for all remaining files
#53583 commented on
Jul 6, 2025 • 0 new comments - [Not for Merge] Event Aggregator Perf
#53576 commented on
Jul 10, 2025 • 0 new comments - [core] Cleanup gcs event listeners and gcs_storage env variable
#53566 commented on
Jul 3, 2025 • 0 new comments - Bump torch from 2.3.0 to 2.7.1 in /python
#53558 commented on
Jul 7, 2025 • 0 new comments - Script to generate test coverage for doc files
#53556 commented on
Jul 7, 2025 • 0 new comments - [RLlib] Upgrade RLlink protocol for external env/simulator training.
#53550 commented on
Jun 30, 2025 • 0 new comments - [data] add Lance-based ordered data conversion that keeps row_id content unchanged
#53542 commented on
Jul 13, 2025 • 0 new comments - [core] Synchronize locations with pinned_at_raylet_id
#52920 commented on
Jul 10, 2025 • 0 new comments - [Data] remove empty lance read tasks
#52831 commented on
Jul 3, 2025 • 0 new comments - [ci] try running cicd unit tests in forge env
#52792 commented on
Jul 11, 2025 • 0 new comments - [core] Remove small task output copy on task execution path
#52778 commented on
Jul 9, 2025 • 0 new comments - [core] Remove copy when receiving small object returns
#52777 commented on
Jul 9, 2025 • 0 new comments - [core] Minor pull manager cleanup
#52724 commented on
Jul 9, 2025 • 0 new comments - check if ray is installed when using conda env
#52677 commented on
Jul 10, 2025 • 0 new comments - [Dashboard] Allow getting dashboard URL via RuntimeContext
#52676 commented on
Jul 10, 2025 • 0 new comments - [core] [easy] readability improvements for IO Workers
#52590 commented on
Jul 10, 2025 • 0 new comments - [Dashboard] Add Worker ID column to Worker table in Node detail page
#52581 commented on
Jul 11, 2025 • 0 new comments - [core] Static Priority scheduling
#52489 commented on
Jun 29, 2025 • 0 new comments - [core] Minor task manager related improvements
#52294 commented on
Jul 4, 2025 • 0 new comments - [train] upgrade tensorflow-datasets
#52195 commented on
Jul 9, 2025 • 0 new comments - [WIP] Ray Data doc updates
#52062 commented on
Jul 12, 2025 • 0 new comments - [Data] Make `from_items` lineage serializable
#52026 commented on
Jul 12, 2025 • 0 new comments - [Chore][Dashboard] Move `TrainHead` to `python/ray/train` folder
#52014 commented on
Jul 9, 2025 • 0 new comments - [Chore][Dashboard] Move DataHead to python/ray/data/ folder
#52013 commented on
Jul 12, 2025 • 0 new comments - test for raycirun
#52012 commented on
Jul 9, 2025 • 0 new comments - fix: Type of AlgorithmConfig.training(learner_connector
#53369 commented on
Jul 3, 2025 • 0 new comments - [core] Cleanup plasma client and object manager
#53357 commented on
Jul 3, 2025 • 0 new comments - [Docs] Clarify Train-side docs on Ray Data
#53349 commented on
Jul 8, 2025 • 0 new comments - [core] Core worker get cv - notify after unlock
#53311 commented on
Jul 1, 2025 • 0 new comments - [core][autoscaler][v1] drop object_store_memory from ResourceDemandScheduler._update_node_resources_from_runtime
#53283 commented on
Jul 1, 2025 • 0 new comments - Bump tornado from 6.1 to 6.5.1 in /python
#53274 commented on
Jul 8, 2025 • 0 new comments - [data] add explain interface for dataset
#53235 commented on
Jul 13, 2025 • 0 new comments - [data] New landing page with better examples that show key workloads
#53228 commented on
Jul 13, 2025 • 0 new comments - [data] fix lance count_rows not support filter
#53162 commented on
Jul 9, 2025 • 0 new comments - [docs] updating broken links on rllib torch doc
#53161 commented on
Jul 10, 2025 • 0 new comments - macos wheel build debug
#53119 commented on
Jul 3, 2025 • 0 new comments - Bump flask-cors from 4.0.0 to 6.0.0 in /python
#53116 commented on
Jul 8, 2025 • 0 new comments - [core] Node manager related cpp cleanup
#52990 commented on
Jul 2, 2025 • 0 new comments - [Data] Fixing null-safety when converting to `TensorArray`
#52977 commented on
Jul 12, 2025 • 0 new comments - [core] Use GetResourceLoadRequest as a substitute liveness check
#52971 commented on
Jul 10, 2025 • 0 new comments - [RLlib; Offline RL] - Use `iter_torch_batches` in learner
#52968 commented on
Jul 5, 2025 • 0 new comments - [deps] upgrade pandas to always use 2+
#52961 commented on
Jul 10, 2025 • 0 new comments - [core] Add sync get node info to NodeInfoAccessor
#52928 commented on
Jul 10, 2025 • 0 new comments - [Core]Can’t connect to ray cluster when passing `runtime_env` to `ray.init`
#44757 commented on
Jun 28, 2025 • 0 new comments - [core] Implement runtime plugins for additional package managers (mamba, micromamba, pixi, etc.)
#45572 commented on
Jun 27, 2025 • 0 new comments - [core] Get IP Address of Actor
#7431 commented on
Jun 27, 2025 • 0 new comments - [Dashboard] Decoupling dashboard and dashboard lifetime from Ray Cluster
#46444 commented on
Jun 27, 2025 • 0 new comments - [core][compiled graphs] Slow NCCL init on H200 server
#53619 commented on
Jun 26, 2025 • 0 new comments - [core][ray client] fetch_local flag to ray.wait is not respected for ray client
#52401 commented on
Jun 26, 2025 • 0 new comments - Check failed: WarmupStore() when starting process
#53094 commented on
Jun 26, 2025 • 0 new comments - CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_gpu is flaky
#46226 commented on
Jun 26, 2025 • 0 new comments - [core] Ray fails to reuse GPU to create new actor when CUDA_VISIBLE_DEVICES is set
#44821 commented on
Jun 25, 2025 • 0 new comments - [Core] [Observability] Add PID to structured logs
#52840 commented on
Jun 25, 2025 • 0 new comments - [Data] Aggregation is doing internal conversions that breaks on list-like AggType
#52257 commented on
Jun 24, 2025 • 0 new comments - StreamSplitDataIterator(epoch=-1, split=0) blocked waiting on other clients for more than 30s.
#42008 commented on
Jun 24, 2025 • 0 new comments - Clusters (AWS) - SSH Access to head node via AWS Session Manager
#38885 commented on
Jun 24, 2025 • 0 new comments - [Autoscaler][v1] Autoscaler launches extra nodes despite fulfilled resource demand
#52864 commented on
Jun 24, 2025 • 0 new comments - [RFC] GPU object store support in Ray Core
#51173 commented on
Jun 23, 2025 • 0 new comments - [Ray Core/Dashboard] - Installing Ray via UV breaks dashboard.
#53608 commented on
Jun 23, 2025 • 0 new comments - [Dashboard] A button to shut down the ray cluster from the dashboard UI
#29208 commented on
Jun 23, 2025 • 0 new comments - [Core] Support setting options to the pip install command
#52679 commented on
Jun 23, 2025 • 0 new comments - Ray kill actor API is a GET request
#18411 commented on
Jun 23, 2025 • 0 new comments - [Core] ux issues of ray state cli for tasks
#30805 commented on
Jun 23, 2025 • 0 new comments - [train] Importing `ray.train.torch` creates numerous spammy temp-files
#33207 commented on
Jul 2, 2025 • 0 new comments - Distributed XGBoostTrainer Improvement
#35273 commented on
Jul 2, 2025 • 0 new comments - [AIR] `trainer.pkl` and `tuner.pkl` files needed for restoration get replaced by new runs
#35812 commented on
Jul 2, 2025 • 0 new comments - [Core] MLFlowCallbacks with Ray Train with FailureConfig and restarts creates multi mlflow runs
#48664 commented on
Jul 2, 2025 • 0 new comments - [Train] TUNE_DISABLE_AUTO_CALLBACK_LOGGERS together with TorchTrainer leads to FileNotFoundError error
#48683 commented on
Jul 2, 2025 • 0 new comments - [air] MLFlowLoggerCallback Does not include Artifacts
#46569 commented on
Jul 2, 2025 • 0 new comments - [Data, Train] ray::SplitCoordinator is very slow at every epoch + takes up too much memory
#49190 commented on
Jul 2, 2025 • 0 new comments - Ray: Train Llama-2 with deepspeed by using ray's envionment
#38778 commented on
Jul 2, 2025 • 0 new comments - [RLlib][Unity] unity3d_env_local.py 'NoneType' for action spaces
#53780 commented on
Jul 2, 2025 • 0 new comments - Support gymnasium > 1.0.0
#53776 commented on
Jul 2, 2025 • 0 new comments - [Dashboard/Core] Resource list in Cluster Dashboard tab should show only logical GPUs
#53641 commented on
Jul 1, 2025 • 0 new comments - [docs][data] Documentation code coverage
#52536 commented on
Jul 1, 2025 • 0 new comments - [Serve] `serve.run` can bind the incorrect Application if Deployments have the same name
#53295 commented on
Jul 1, 2025 • 0 new comments - [Conda] Ray should raise exception when ray is not installed in conda environment
#52672 commented on
Jul 1, 2025 • 0 new comments - [Ray serve] StopAsyncIteration error thrown by ray when the client cancels the request
#51598 commented on
Jul 1, 2025 • 0 new comments - [Serve] LongPollHost crash when DeploymentResponse.cancel is called too quickly
#52476 commented on
Jul 1, 2025 • 0 new comments - [Core] Ray hangs with vllm0.8.5 v1 api for tp8+pp4
#53758 commented on
Jul 1, 2025 • 0 new comments - [Announcement] Ray Summit 2025 Call for Proposals Extended until July 14th
#53729 commented on
Jun 30, 2025 • 0 new comments - [core][compiled-graphs] A MPMD Graph controller focus on N-M data transfer in complex task
#48556 commented on
Jun 30, 2025 • 0 new comments - [Core] ray._raylet.ObjectRef and ray.types.ObjectRef type compabtibility
#53591 commented on
Jun 29, 2025 • 0 new comments - [VM launcher] Document how to set up the cluster when there is UFW firewall
#35254 commented on
Jun 23, 2025 • 0 new comments - [Core] The streaming generator will mark the inplasma object that is already ready as failed after the task fails.
#53772 commented on
Jun 17, 2025 • 0 new comments - [core] control whether to construct a default concurrency group executor when max-concurrency=1 and there are other concurrency group for an actor
#53771 commented on
Jun 17, 2025 • 0 new comments - [core] Race condition between raylet graceful shutdown and GCS health checks
#53739 commented on
Jun 17, 2025 • 0 new comments - [Core] Custom docker image not scaling out
#53696 commented on
Jun 17, 2025 • 0 new comments - [Serve] `fastapi_app` is still mutable in the deployment constructor after being passed to `@serve.ingress`
#52775 commented on
Jun 17, 2025 • 0 new comments - [tune] `URI has empty scheme` error when `storage_path` in `RunConfig` is relative
#42969 commented on
Jun 17, 2025 • 0 new comments - [Serve] Autoscaling not working correctly when `max_replica_per_node` is set in Ray Serve
#53582 commented on
Jun 17, 2025 • 0 new comments - [Serve] Allow --metrics-export-port argument in "serve run" CLI command
#44426 commented on
Jun 17, 2025 • 0 new comments - [data] verbose_progress=True doesn't work in client mode
#43200 commented on
Jun 17, 2025 • 0 new comments - [data] importing ray.data closes logging handlers, breaking custom logging
#48846 commented on
Jun 17, 2025 • 0 new comments - [RLlib] TorchDistributionWrapper Typing Information Should Be Changed
#33997 commented on
Jun 17, 2025 • 0 new comments - [core][gpu-objects] Allow tensor metadata to be specified ahead of time for improved performance
#51279 commented on
Jun 17, 2025 • 0 new comments - [<Ray component: Core|RLlib|etc...>] SAC config error about framework
#53694 commented on
Jun 17, 2025 • 0 new comments - [Ray Collective] Ray Collective AllGather is Completely Broken
#31259 commented on
Jun 17, 2025 • 0 new comments - CI test linux://rllib:learning_tests_cartpole_dqn_gpu is flaky
#46683 commented on
Jun 16, 2025 • 0 new comments - [Serve] RayServe Pods Stuck in Unready State Causing API Outages
#53323 commented on
Jun 16, 2025 • 0 new comments - [Serve] Support generics for DeploymentHandle type hints
#52654 commented on
Jun 16, 2025 • 0 new comments - [Ray Complied Graph] NCCL Internal Error
#49827 commented on
Jun 16, 2025 • 0 new comments - [Data] Get Dataset size from DataIterator
#37634 commented on
Jun 15, 2025 • 0 new comments - Global Per-Epoch Shuffling with TorchTrainer
#47460 commented on
Jun 13, 2025 • 0 new comments - [serve][dashboard] Show last line instead of first line in Serve app status message
#35600 commented on
Jun 23, 2025 • 0 new comments - [core][gpu-objects] Support streaming to overlap computation / communication
#51643 commented on
Jun 23, 2025 • 0 new comments - [Core] BUG: Cluster crashes when using temp_dir "could not connect to socket" raylet.x [since 2.7+]
#44431 commented on
Jun 20, 2025 • 0 new comments - [data] Bad error message when function outputs cannot be pickled
#46642 commented on
Jun 19, 2025 • 0 new comments - [data] ObjectRefs passed to map UDF are not automatically deref'ed
#49207 commented on
Jun 19, 2025 • 0 new comments - [data] Optimize Dataset.unique()
#38764 commented on
Jun 19, 2025 • 0 new comments - [RayData] The write operator supports the use of an actor pool
#53552 commented on
Jun 19, 2025 • 0 new comments - [Autoscaler] Improve NodeProvider interface, make it easier to extend it to cluster managers (e.g. Fargate)
#25134 commented on
Jun 19, 2025 • 0 new comments - [RLlib] Observation space with 2 dimensions not working with the new API stack
#46631 commented on
Jun 19, 2025 • 0 new comments - [Ray Client] - Client server failed with runtime_env container
#29852 commented on
Jun 19, 2025 • 0 new comments - [Ray Train] XGBoostTrainer crashes with ActorDiedError when using num_workers > 1 and use_gpu=False
#53123 commented on
Jun 19, 2025 • 0 new comments - [Serve] Allow HTTPs Options in Ray Serve
#26814 commented on
Jun 19, 2025 • 0 new comments - [Core] Make Ray Core tasks/actors metrics counters (accumulators)
#47522 commented on
Jun 18, 2025 • 0 new comments - [RLlib]
#52683 commented on
Jun 18, 2025 • 0 new comments - [Dashboard] Support ncu
#53759 commented on
Jun 18, 2025 • 0 new comments - Incorrect default value of CUBLAS_WORKSPACE_CONFIG
#47690 commented on
Jun 18, 2025 • 0 new comments - [Serve] make various default values of `AutoscalingConfig.max_replicas` consistent and >1
#50222 commented on
Jun 18, 2025 • 0 new comments - CI test linux://rllib:learning_tests_stateless_cartpole_appo_gpu is flaky
#47295 commented on
Jun 18, 2025 • 0 new comments - [core][compiled graph] Support all-to-one collective ops (e.g. reduce)
#49324 commented on
Jun 18, 2025 • 0 new comments - [autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml
#14551 commented on
Jun 18, 2025 • 0 new comments - [Data] RayData driver process crashes when some worker(pod) been preempted
#52815 commented on
Jul 10, 2025 • 0 new comments - [Serve] change the metric tag for the proxy metrics to `route_prefix` for clarity
#52212 commented on
Jul 10, 2025 • 0 new comments - [ray.serve.llm] serve.llm with streaming has overhead compared to vllm-v0 for a single replica when concurrency > 32
#52746 commented on
Jul 10, 2025 • 0 new comments - refactor serve constants to have a utils
#51036 commented on
Jul 10, 2025 • 0 new comments - Ray serve + core steaming is slow at high concurrency
#52745 commented on
Jul 10, 2025 • 0 new comments - Running Multiple Applications in Different Containers stuck in status=DEPLOYING
#49540 commented on
Jul 10, 2025 • 0 new comments - [Dashboard] Display worker utilization & task list instead of process name.
#14175 commented on
Jul 10, 2025 • 0 new comments - Improve error messages for serializing/deserializing remote functions and actor classes.
#5618 commented on
Jul 10, 2025 • 0 new comments - Give a better error message when starting ray on a machine with little memory
#6172 commented on
Jul 10, 2025 • 0 new comments - [Core][State Observability] Grouping state APIs when `ray --help` is called.
#26376 commented on
Jul 10, 2025 • 0 new comments - [Dashboard] Explain Disk usage for KubeRay
#36362 commented on
Jul 10, 2025 • 0 new comments - Raise helpful error message when `ImportError: cannot import name '_psutil_osx`
#28903 commented on
Jul 10, 2025 • 0 new comments - [Umbrella][core] Add context information for all C++ logs
#52314 commented on
Jul 10, 2025 • 0 new comments - [core][dashboard] Use state API directly for actor name
#34479 commented on
Jul 10, 2025 • 0 new comments - [Dashboard] Make it easier to figure out the PID of running tasks.
#49988 commented on
Jul 10, 2025 • 0 new comments - [Dashboard] Hide GPU and GRAM columns from clusters and actors table if there are 0 rows with GPUs.
#49989 commented on
Jul 10, 2025 • 0 new comments - [Ray dashboard] Random character in ray log viewer
#52346 commented on
Jul 10, 2025 • 0 new comments - [Serve] Calls to a Serve Deployment's .remote(), hang after some amount of time / requests.
#47870 commented on
Jul 10, 2025 • 0 new comments - [Ray debugger] Unable to use debugger on slurm cluster
#51157 commented on
Jul 10, 2025 • 0 new comments - [Core] Ray Label Selector API Implementation Tracker
#51564 commented on
Jul 9, 2025 • 0 new comments - tune.uniform/quniform give the error TypeError: UniformFloatHyperparameter.__init__() got an unexpected keyword argument 'q'
#46995 commented on
Jul 9, 2025 • 0 new comments - [Serve] reason_content is null returned by llm serve
#53324 commented on
Jul 10, 2025 • 0 new comments - [ray|llm] ray lora DiskMultiplexConfig loss load from local path to disk_cache
#53315 commented on
Jul 10, 2025 • 0 new comments - [Serve][llm] Make Serve LLM endpoint 100% compatible with the engine's native server.
#53533 commented on
Jul 10, 2025 • 0 new comments - [Serve.llm] Clean up output logs and give option to opt out of different verbosity levels
#53492 commented on
Jul 10, 2025 • 0 new comments - [LLM] We need to create a more robust way of handling actor shutdown
#53179 commented on
Jul 10, 2025 • 0 new comments - Ray build_openai_app Vs Vllm Serve
#52934 commented on
Jul 10, 2025 • 0 new comments - [LLM/Data] lazy import for transformers
#52632 commented on
Jul 10, 2025 • 0 new comments - [LLM] In-place update for deployments when you have new models without having re-deploy the cluster
#51891 commented on
Jul 10, 2025 • 0 new comments - [serve.llm][Feature request] Adding new models to a multi-gpu multi-model service would require the duplication of all the resources
#51720 commented on
Jul 10, 2025 • 0 new comments - [llm] Roadmap for Data and Serve LLM APIs
#51313 commented on
Jul 10, 2025 • 0 new comments - CI test linux://rllib:learning_tests_cartpole_dqn_multi_cpu is flaky
#47214 commented on
Jul 10, 2025 • 0 new comments - [Serve.llm] vLLMDeployment throughput doesn't scale well with `n_replicas`.
#53356 commented on
Jul 10, 2025 • 0 new comments - [Data]Fuse operator
#49587 commented on
Jul 10, 2025 • 0 new comments - [Data]Extend Ray Data with read/write hive
#51094 commented on
Jul 10, 2025 • 0 new comments - [Data] Filter operation changes schema of dataset
#51217 commented on
Jul 10, 2025 • 0 new comments - [Core] Support general Arrow ExtensionTypes
#51959 commented on
Jul 10, 2025 • 0 new comments - [RFC] [Serve] Custom Scaling
#41135 commented on
Jul 10, 2025 • 0 new comments - [RayLLM] error helper for TypeError: _extractNVMLErrorsAsClasses..gen_new..new() takes 1 positional argument but 2 were given
#53407 commented on
Jul 10, 2025 • 0 new comments - [Serve] Ray Serve Autoscaling supports the configuration of custom-metrics and policy
#51632 commented on
Jul 10, 2025 • 0 new comments - [serve] Architecture docs mention round-robin and not pow-of-two scheduler
#49292 commented on
Jul 10, 2025 • 0 new comments - [Clusters][Azure] Custom ARM template for Azure Clusters
#50684 commented on
Jul 9, 2025 • 0 new comments - [Serve] Call serve.run with a config file path
#41359 commented on
Jul 7, 2025 • 0 new comments - [RLlib] `TorchMultiCategorical.to_deterministic()` cannot handle Multi-agent + LSTM case
#52177 commented on
Jul 7, 2025 • 0 new comments - TypeError: Descriptors cannot not be created directly.
#36417 commented on
Jul 7, 2025 • 0 new comments - [core] ray.init() not possible even while on same network as Ray Cluster.
#53520 commented on
Jul 7, 2025 • 0 new comments - [Core] Warning message output as error cannot be filtered/hidden; unexposed environmental variable
#43264 commented on
Jul 7, 2025 • 0 new comments - [RLlib] PPO algorithm can't be trained from checkpoint
#50136 commented on
Jul 4, 2025 • 0 new comments - [Core] ray.init() hangs using Python 3.10.15 on Linux
#48625 commented on
Jul 3, 2025 • 0 new comments - [Feature] Support GCS fault tolerance without external dependencies like Redis
#45824 commented on
Jul 3, 2025 • 0 new comments - Ray Clusters - updating cluster with new docker image should actually use new image
#51448 commented on
Jul 2, 2025 • 0 new comments - Upgrade `pytorch_lightning` to `lightning` in all CI/release tests/examples
#38200 commented on
Jul 2, 2025 • 0 new comments - [Dashboard] Support PyTorch memory usage visualizations
#39878 commented on
Jul 2, 2025 • 0 new comments - [Train] Support custom checkpoint file names
#20807 commented on
Jul 2, 2025 • 0 new comments - [Train] Disable autofilled metrics
#21988 commented on
Jul 2, 2025 • 0 new comments - Python: Pure Keras Callback for ray.air.integrations.keras
#47603 commented on
Jul 2, 2025 • 0 new comments - [<Ray component: Ray Train] Non blocking reporting of the checkpoint to maximize the GPU utilization
#48801 commented on
Jul 2, 2025 • 0 new comments - [Train] Allow customization of FPS for wandb logger; instead of slow 4 FPS
#50186 commented on
Jul 2, 2025 • 0 new comments - [Train] Add support for NeMo Megatron strategy with lightning
#51387 commented on
Jul 2, 2025 • 0 new comments - [Checkpoint: AIR] Saved checkpoints folders does not include correct training iteration number.
#29458 commented on
Jul 2, 2025 • 0 new comments - [train][python3.11] Ray train distributed example error
#31359 commented on
Jul 2, 2025 • 0 new comments - [Train] TorchTrainer does not free all GPUs on shutdown
#32725 commented on
Jul 2, 2025 • 0 new comments - [data] Zero-sized blocks crashes write_bigquery
#51892 commented on
Jul 9, 2025 • 0 new comments - [<Ray component: Data>] lack of check for empty table produce lots of error messages
#53605 commented on
Jul 9, 2025 • 0 new comments - [Core][Streaming generator] Support num_returns.
#46934 commented on
Jul 8, 2025 • 0 new comments - Core: Ray 2.45 causes Google's LIBTPU to be very spammy
#53756 commented on
Jul 8, 2025 • 0 new comments - [Serve] Make replica scheduler backoff configurable
#52871 commented on
Jul 8, 2025 • 0 new comments - [Core] cannot pass namespace package at runtime via py_modules
#50161 commented on
Jul 8, 2025 • 0 new comments - [core][experimental] Accelerated DAG should execute work on actor's main thread
#46336 commented on
Jul 8, 2025 • 0 new comments - [Core]Ray head crashed silently - improve observability for redis timeouts causing said crash
#47419 commented on
Jul 8, 2025 • 0 new comments - [Core][StreamingGenerator] `ray.get` will hang when the node on which the streaming task is running fails.
#47582 commented on
Jul 8, 2025 • 0 new comments - [Core] `ray job submit` doesn't always catch the last lines of the job logs
#48701 commented on
Jul 8, 2025 • 0 new comments - [core][release] Need to update azure-cli-core version to update paramiko
#48733 commented on
Jul 8, 2025 • 0 new comments - [Core] Make sure Actor's `__del__` method invoked on Actor's destruction
#53169 commented on
Jul 8, 2025 • 0 new comments - [Serve] Serve-native CPU profiling in Replicas is broken
#53677 commented on
Jul 8, 2025 • 0 new comments - [Core] Submitted containerized job is stuck in pending mode
#37293 commented on
Jul 8, 2025 • 0 new comments - [Ray Train] FileNotFoundError '/tmp/ray/sessio_xxxx/xxxx/.tmp_generator'
#51020 commented on
Jul 7, 2025 • 0 new comments - [Autoscaler, data] Ray starts `AutoscalingRequester` even when using `enableInTreeAutoscaling`
#51559 commented on
Jul 7, 2025 • 0 new comments - Release test ray-data-resnet50-ingest-out-of-memory-benchmark.aws failed
#52562 commented on
Jul 7, 2025 • 0 new comments - Test issue (please ignore) - more text then even more text
#49867 commented on
Jul 7, 2025 • 0 new comments - [core] ReferenceCountingAssertionError may be thrown if ObjectRef is passed through intermediate worker that dies
#18456 commented on
Jul 7, 2025 • 0 new comments - [Serve] DeploymentResponse._to_object_ref() blocks untill final results from actor
#46893 commented on
Jul 7, 2025 • 0 new comments