Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Add Cloud Logging example for Ray on GKE#50060

Open
weizhaowz wants to merge15 commits intoray-project:master
base:master
Choose a base branch
Loading
fromweizhaowz:cloud-logging

Conversation

weizhaowz
Copy link

@weizhaowzweizhaowz commentedJan 24, 2025
edited
Loading

Why are these changes needed?

Add Cloud Logging tutorial for Ray on GKE, including the instruction to create Ray cluster and debug information.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e.,git commit -s) in this PR.
  • I've runscripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed forhttps://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it indoc/source/tune/api/ under the
      corresponding.rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures athttps://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@weizhaowzweizhaowz marked this pull request as ready for reviewJanuary 24, 2025 22:35
@jcotant1jcotant1 added the docsAn issue or change related to documentation labelJan 27, 2025
@@ -289,6 +289,71 @@ Finally, use a LogQL query to view logs for a specific RayCluster or RayJob, and
[ConfigLink]: https://raw.githubusercontent.com/ray-project/ray/releases/2.4.0/doc/source/cluster/kubernetes/configs/ray-cluster.log.yaml
[KubernetesDownwardAPI]: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

### Configure logging sidecar with Fluentbit on GKE
If you want to deploy your ray cluster on GKE and use Cloud Logging, you can read the following steps:\
When you create a cluster on GKE using these [instructions](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/README.md),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would remove this link and only refer tohttps://cloud.google.com/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics that has steps for cluster creation already

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A bit concern is that the command of creating the cluster in this doc is outdated, for example, the --location is required but not provided in the command, and the sample value(1.30.2-gke.1060005) for --cluster-version cause an internal error without detail message, maybe enable autopilot can provide a better cluster configuration to run Ray cluster. So I use it as a reference for logs query only. But please let me know if we need to update the google doc in parallel so we can use it as a reference to create a cluster as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's update the Google docs in parallel, in general we should only referencegcloud for cluster creation examples here and not ai-on-gke (using terraform)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, the Google docs update is ongoing.

If you don't see the logs in GCP Logs Explorer, below is some debugging information.

#### Verify Fluenbit sidecar and Daemonset
When the Ray cluster is created on GEK using the above instructions, a Fluentbit sidecar container should be ready in the Pod and collecting logs from the Ray container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

GEK -> GKE

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fixed

Also, Daemonset Fluentbit pods are ready to forward the logs to Cloud Logging as well, you can use these commands to verify it.
* Get the name of the pod. You may need to modify the namespace if you've modified the terraform file in the instruction.
```shell
kubectl get pods -n ai-on-gke
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Instead of runningkubectl get pods, can you show the section of the Pod manifest that containers the sidecar (when running kubectl get pods -o yaml`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

changed

@@ -289,6 +289,71 @@ Finally, use a LogQL query to view logs for a specific RayCluster or RayJob, and
[ConfigLink]: https://raw.githubusercontent.com/ray-project/ray/releases/2.4.0/doc/source/cluster/kubernetes/configs/ray-cluster.log.yaml
[KubernetesDownwardAPI]: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

### Configure logging sidecar with Fluentbit on GKE
If you want to deploy your Ray cluster on GKE and use Cloud Logging, you can read the following steps:
When you create a cluster on GKE using these [instructions](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/README.md),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggest removing reference to ai-on-gke here for now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

removed

For example, if you submit a Ray job as described in the instructions, you can follow this [document](https://cloud.google.com/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics#view_ray_logs) to read the job's logs.
If you don't see the logs in GCP Logs Explorer, below is some debugging information.

#### Verify the Fluenbit sidecar and Daemonset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think just "Verify the Fluenbit sidecar" is fine

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

updated

DaemonSet Fluentbit pods should also be ready to forward the logs to Cloud Logging. You can use these commands to verify this.
* Get the name of the pod. You may need to modify the namespace if you've modified the terraform file in the instruction.
```shell
kubectl get pods -n ai-on-gke -o yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

can you remove-n ai-on-gke. Assuming namespacedefault in these guides is usually fine

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

updated

```
* Verify that a Fluentbit sidecar is present in the Pod.
```shell
kubectl get pod <pod-name> -n ai-on-gke -o go-template='{{range .spec.containers}}{{.name}}{{"\n"}}{{end}}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

same here, remove-n ai-on-gke

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

changed


* Verify that the Fluentbit sidecar has collected logs from the Ray container.
```shell
kubectl logs pod <pod-name> -n ai-on-gke -c fluentbit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

likewise, remove-n ai-on-gke

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

changed

kubectl logs pod <pod-name> -n ai-on-gke -c fluentbit
```

* Verify that a Fluentbit DaemonSet is ready to forward logs to Cloud Logging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would suggest removing details about the fluentbit daemonset, it's not important here and it's already covered by verifying GKE logging confiugration below

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

removed

@kevin85421kevin85421 self-assigned thisJan 29, 2025
Signed-off-by: Wei Zhao <weizhaowz@google.com>
Signed-off-by: Wei Zhao <weizhaowz@google.com>
Signed-off-by: Wei Zhao <weizhaowz@google.com>
Signed-off-by: Wei Zhao <weizhaowz@google.com>
Signed-off-by: Wei Zhao <weizhaowz@google.com>
@hainesmichaelchainesmichaelc added the community-contributionContributed by the community labelApr 4, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@maxpumperlamaxpumperlaAwaiting requested review from maxpumperla

@pcmoritzpcmoritzAwaiting requested review from pcmoritzpcmoritz is a code owner

@kevin85421kevin85421Awaiting requested review from kevin85421kevin85421 is a code owner

@andrewsykimandrewsykimAwaiting requested review from andrewsykim

At least 1 approving review is required to merge this pull request.

Assignees

@kevin85421kevin85421

Labels
community-contributionContributed by the communitydocsAn issue or change related to documentation
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@weizhaowz@andrewsykim@kevin85421@hainesmichaelc@jcotant1

[8]ページ先頭

©2009-2025 Movatter.jp