project-codeflare/multi-cluster-app-dispatcherPublic

NotificationsYou must be signed in to change notification settings
Fork63
Star115

Add`go vet` to the Makefile and fix its errors#691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

ronensc wants to merge2 commits intoproject-codeflare:main

base:main

Choose a base branch

fromronensc:go-vet

Open

Add`go vet` to the Makefile and fix its errors#691

ronensc wants to merge2 commits intoproject-codeflare:mainfromronensc:go-vet

Conversation

Copy link

Issue link

What changes have been made

Addedgo vet to the Makefile
Fixed its errors

Verification steps

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- Testing is not required for this change

ronensc added2 commits

November 15, 2023 12:35

Run "go vet" on build

baf2378

Fix go vet errors

7683cd4

openshift-cibot requested review frommetalcycling andtardieu

November 15, 2023 10:43

Copy link

openshift-cibot commentedNov 15, 2023

[APPROVALNOTIFIER] This PR isNOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assignasm582 for approval. For more information seethe Kubernetes Code Review Process.

The full list of commands accepted by this bot can be foundhere.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing/approve in a comment
Approvers can cancel approval by writing/approve cancel in a comment

Copy link

Author

ronensc commentedNov 28, 2023•
edited
Loading

@anishasthana I thinkMCAD-CI is flaky as I don't think its failure is related to the changes in this PR. Could you please try rerunning it?

Copy link

Author

ronensc commentedNov 28, 2023

Failed again. Same failing test "MCAD CPU Preemption Test"... 😢
https://github.com/project-codeflare/multi-cluster-app-dispatcher/actions/runs/6876210215/job/19098434211#step:8:722

It seems it is failing in other PRs as well.
https://github.com/project-codeflare/multi-cluster-app-dispatcher/actions/runs/7018959159/job/19095405569#step:8:726

https://github.com/project-codeflare/multi-cluster-app-dispatcher/actions/runs/7003208146/job/19048461449#step:8:725

Copy link

Contributor

anishasthana commentedNov 29, 2023

Hmm@asm582 are you aware of any recent issues with CI?

Copy link

dgrove-oss commentedNov 29, 2023

I observed when porting the test suite to MCAD v2 that many of the tests only work as expected if the capacity of the cluster is calculated to be about 2 CPUs. If the cluster has more capacity, then the tests that expect certain AppWrappers to not be runnable or to be preempted because higher-priority AppWrappers will have taken all the available capacity fail. The failing test here is one of those. Looking at the logs, I see that the controller thinks the available capacity is about 6 CPUs

 The available capacity to dispatch appwrapper is cpu 5950.00, memory 32085303296.00, GPU 16 and time took to calculate is 8.462585ms

In the v2 test suite, I am redoing all of these tests to be based on a % of total CPU capacity instead of using hardwired resource values to avoid this fragility.

Copy link

Member

asm582 commentedNov 29, 2023

Thank you for the PR. I am curious, if we are doing %-based CPU calculation then are we diverging from the way Scheduler does accounting? and would that be ok?

Copy link

Member

asm582 commentedNov 29, 2023

Hmm@asm582 are you aware of any recent issues with CI?

I usually set 2 CPUs and 8 GB RAM as resource setting in podman or docker desktop and they would run to completion locally. I am not sure if we changed resource requirement in the CI system

Copy link

dgrove-oss commentedNov 29, 2023

To be clear, tests like this are what are fragile in my opinion:https://github.com/project-codeflare/multi-cluster-app-dispatcher/blob/main/test/e2e/queue.go#L97-L125 because they are using hardwired CPU requests (not CPU requests that are calculated dynamically by the test as a fraction of the available capacity on the actual cluster when it is run).

Here it creates an AppWrapper with a 1100 CPU request (2 pods at 550 each) and then assumes that if it creates a 854 CPU request (2 pods at 427 each..the function name is misleading) it won't fit. If docker is limited to 2 CPUs, then the worker nodes have 4000 CPU capacity, Kubernetes + the mcad operator take around 2100 CPU and the math works as expected. The slightest change in available resource (or in the resources used by the system pods) and the test doesn't work as expected.

Copy link

Author

ronensc commentedDec 4, 2023

@dgrove-oss thanks for the detailed explanation of the root cause of the CI failure!
I noticed that these fragile tests were commented out temporarily in mcad v2. Should we adopt a similar approach here? Can this PR be approved even though the CI is currently failing?

ronensc mentioned this pull request

Dec 7, 2023

Skipping MCAD CPU Preemption Test#696

Open

Labels

None yet

Movatterモバイル変換

Addgo vet to the Makefile and fix its errors#691

Are you sure you want to change the base?

Addgo vet to the Makefile and fix its errors#691

Uh oh!

Conversation

ronensc commentedNov 15, 2023

Issue link

What changes have been made

Verification steps

Checks

Uh oh!

openshift-cibot commentedNov 15, 2023

Uh oh!

ronensc commentedNov 28, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ronensc commentedNov 28, 2023

Uh oh!

anishasthana commentedNov 29, 2023

Uh oh!

dgrove-oss commentedNov 29, 2023

Uh oh!

asm582 commentedNov 29, 2023

Uh oh!

asm582 commentedNov 29, 2023

Uh oh!

dgrove-oss commentedNov 29, 2023

Uh oh!

ronensc commentedDec 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add`go vet` to the Makefile and fix its errors#691

Add`go vet` to the Makefile and fix its errors#691

ronensc commentedNov 28, 2023•
edited
Loading