But the code works on AArch64, how can it fail on X86_64?!
I think the fix for the bug just happened to work on x86_64, and isn't guaranteed to work on every architecture. In particular, notice that if you had made the second call to getEnd be sequentially consistent instead of setBase, then the assembly would have been unchanged and the bug would have remained, despite the explanation making it sound like that would have fixed it too. The problem is that making an operation sequentially consistent has no effect on operations on different variables that aren't sequentially consistent. I think the best way to fix the problem is just to make both of those functions be sequentially consistent. Then the C++ standard guarantees that they'll remain in order even on other compilers and architectures you haven't thought of, and it shouldn't make your code any slower, since the generated assembly will still be what it is now.