- Notifications
You must be signed in to change notification settings - Fork924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
opt_expr: limit effort to reduce runtime on anomalous designs#4782
base:main
Are you sure you want to change the base?
Conversation
passes/opt/opt_expr.cc Outdated
sort_fails++; | ||
if (sort_fails >= effort) | ||
log("Effort of %d exceeded, no longer attempting toposort on module %s.\n", | ||
effort, log_id(module)); | ||
} | ||
for (auto cell : cells.sorted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Isn't this empty if you don't runcells.sort()
? And if that's the case, then the true cause of the speed-up isn't not bothering to sort anymore but giving up entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hm, there's alog_assert(GetSize(sorted) == GetSize(nodes));
right before the return fromsort()
so I guess it isn't but I'll verify it later
EDIT: yeah if you don't even runsort()
then it's going to be empty, yikes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fixed by iterating over a list of references to a snapshot ofmodule->cells()
. This increases the runtime to 83 seconds, which I still prefer to 326 seconds. I should say, all runtimes I'm mentioning in this PR are measured withENABLE_DEBUG=1
for no particular reason other having that around inMakefile.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could cache the sorting result provided we remove cells we removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I can't really foresee the consequences of doing that, though - for how many cycles would we cache etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For the lifetime of oneopt_expr
invocation. It should be ok as long as we zero out the position in the sorted array corresponding to any removed cells.
I am not sure if there's a case where opt_expr replaces a cell with two or more cells, that's a slight complication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm struggling to follow the control flow here; my comments are based on the understanding thatiterator
is a lambda function, which gets called once with another (anonymous) lambda function. But if that is the case then I'm not sure whyiterator
is a lambda at all if it only gets called once. In fact, naming the inner anonymous function (the contents of the previous for loop overcells.sorted
), moving lines 494-537 after it, dropping the outer lambda and calling the new lambda directly instead of the locally definedreplace_cell
seems to give the same results, takes very slightly less time, and is more readable. Which probably also suggests that the inner lambda function could even be elevated to a non-lambda?
passes/opt/opt_expr.cc Outdated
std::vector<Cell*> module_cells = module->cells(); | ||
auto iterator = [&](auto&& replace_cell) { | ||
if (sort_fails >= effort) { | ||
// log("Running on unsorted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Remove commented out code
// log("Running on unsorted") |
passes/opt/opt_expr.cc Outdated
cells.edge(cells.node(outbit_to_cell.at(bit)), r_index); | ||
} | ||
if (sort_fails < effort && !cells.sort()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this line need to checksort_fails
? It's already in the else section which checkssort_fails
, so the only way thesort_fails
value could have increased would be ifcells.node
orcells.edge
could increase it and I can't find anyway that would happen. And removing it doesn't appear to affecttests/arch/ice40/memories.ys
.
if (sort_fails < effort &&!cells.sort()) { | |
if (!cells.sort()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah, good catch, that's redundant, this must be left over from experimenting with the pass
passes/opt/opt_expr.cc Outdated
for (auto cell : cells.sorted) | ||
std::vector<Cell*> module_cells = module->cells(); | ||
auto iterator = [&](auto&& replace_cell) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Consider renamingreplace_cell
here to avoid confusion with thereplace_cell
function defined on line 120?
@KrystalDelusion I suppose I tried to be too clever by retaining the pre-existing code ordering to make it easier to review (for myself as well). It uncouples the design traversal without reordering the responsible code blocks, forcing a refactoring into separate functions or creating a worker class that holds all the silly state. When you're saying you named the anonymous function and moved the replacement code after it, do you mean that you nested a function within |
By named I mean assigned to a variable ( |
Uh, anyway, I spent a couple hours refactoring |
Ah I see, although I think just the minimal refactor to avoid the |
Also the CI failing is fixed in latest main, just needs to rebase |
The first case of the ROM section of
tests/arch/ice40/memories.ys
takes 326 seconds to run with majority of this runtime spent sorting cells inopt_expr
. This PR allows sorting to only fail a limited number of times on a given module before reaching fixpoint without trying to sort the cells. This cuts down the test case runtime to 14 seconds. The limit can be changed with the new-effort
argument.