Handle reductions in get_insn_access_map#1009
Open
majosm wants to merge 2 commits into
Open
Conversation
Contributor
Author
|
I can't request review on this repo, so @kaushikcfd @inducer this is probably ready for a glance. |
Contributor
Author
|
I should mention: this is more or less my first time doing anything nontrivial with One thing I'm not clear on is whether I should be using |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently,
get_insn_access_maponly passes the inames fromwithin_inamesin the call toget_access_map. As a result, if reductions are presentget_access_mapwillmay (edit: it only happens if the reduction domain is separate from the element domain, otherwiseknl.get_inames_domain()picks up the reduction) fail, e.g.:This causes
_compute_isinfusible_via_access_mapto returnTruewhenever reductions are present (code), which subsequently preventsget_kennedy_unweighted_fusion_candidatesfrom fusing loops that it potentially could.A real world example of this can be seen in the generated code for a DG operator here. I've annotated with
# DESCRIPTION:what DG operations are being done in each loop in the device code. The first 3 loop nests are performing face-local work on the interior faces (theielloop with 5641 iterations). Since the operations are face-local, theielloops should be able to be fused, but the reductions over theidofaxes are preventing that from happening due to this issue.This PR changes
get_insn_access_maptoget_insn_access_maps. It now computes separate access maps for accesses outside of reductions + each different reduction level present, by traversing the instruction and updating the domain for reductions accordingly (AFAIK, they cannot all be unioned together due to the different spaces involved). It additionally modifiescompute_isinfusible_via_access_mapto do the unioning after projection once it's safe to do so.With this change, I'm now seeing fusion of the element loops as expected: code.