- Notifications
You must be signed in to change notification settings - Fork16
[issue-858] GPU implementation of 911 vertices#880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Draft
NicolasJPosey wants to merge47 commits intoPoseyDevelopmentChoose a base branch fromissue-858-911vertices-gpu-implementation
base:PoseyDevelopment
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Draft
[issue-858] GPU implementation of 911 vertices#880
NicolasJPosey wants to merge47 commits intoPoseyDevelopmentfromissue-858-911vertices-gpu-implementation
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class.
…-911vertices-gpu-implementation
Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated.
AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing.
…od instead of a connections methodThis method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU.
…-911vertices-gpu-implementation
We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array.
…ent callThe push back call was not easy to mirror on the GPU. We already have examples of using the EventBuffer and insertEvent call on the GPU so we change to use this implementation. This also allows us to make it clear what size the buffer should be, again helping the mirrored GPU implementation.
This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>.
…-911vertices-gpu-implementation
The correct pattern is to first copy the device pointers to the CPU and then the values to the CPU data members. It happens to be the same that the number of bytes for a uint64_t and a uint64_t pointer are the same. However, if this pattern is repeated for a type like float, an illegal memory error is thrown.
The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors.
Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels.
Clean up of commented out code, unnecessary extra variables, and unused methods.
The update in vertex creation to make each vertex have the same sized data member for the GPU made it so that we would never get a dropped call due to large queue sizes. The logic was changed were we interact with vertex queues in PSAPs and RESPs to act like the size is equal to the number of trunks which was the original implementation size.
RecordableVectors are cleared after each epoch if they are the dynamic type. The size is not reset. We need subscript operator access for droppedCalls so the type much be constant which does not clear the vector after each epoch.
The current implementation for generating noise on a device has some assumptions that break for the 911 GPU model. To get noise support for 911, we implemented a way to have vertices specific how many noise elements they will need. A method was then implemented in GPUModel that rounds the input up to the nearest multiple of 100.
Because only caller regions simulate attempted redials, we add a vector to map the caller region vertex IDs to the noise array on the device. This allows us to use the existing noise algorithm with larger graphs since we can only generate noise for up to 10000 vertices.
If the number of trunks and servers is equal and the queue is full, capacity minus busy servers is negative. Since dstQueueSize is of type uint64_t, it can't be negative. The comparison then gives a false positive that the queue is not full. Fix is to cast the size to an int so that the right comparison is done.
The call metrics account for the vast majority of the physical memory used by the GPU. By resizing each to a smaller value, we can fit larger graphs on the GPU by using more epochs with smaller steps per epoch.
Firing rate should actually be equal to 1 since we can have at most 1 call per second.
The buffer size used for a CircularBuffer is 1 more than the capacity passed into the constructor. When we construct the buffer, we pass in the number of trunks but were effectively using 1 less during the simulation.
Metrics that used totalNumberOfEvents and totalTimeSteps were using more memory than needed. These were changed to maxEventsPerEpoch and stepsPerEpoch respectively. Also changed copyTo and copyFrom in All911Edges to use heap memory to prevent stack overflows with large graphs.
The buffer inside the CircularBuffer implementation is 1 larger than the capacity set at construction. VertexQueues are CircularBuffers so we add 1 where we use the buffer size.
Fixed allocation, copyTo, and copyFrom for VertexQueues. They are CircularBuffers which internally have a buffer that is 1 more than the capacity. The sizes used were updated to be 1 more than the stepsPerEpoch to match the construction capacity.
Memory is mostly dependent on epoch duration so we decrease that parameter and increase the number of epochs parameter by the same factor. This keeps the total time steps constant but reduces memory usage. We can only have 1 call per step so the max firing rate should be 1.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #
Description
Checklist (Mandatory for new features)
Testing (Mandatory for all changes)
test-medium-connected.xmlPassedtest-large-long.xmlPassed