Reminds me of race conditions in programming.
1 in 15 times the bug happens and you can’t figure it out, but if 2 asynchronous events happen to happen within 10ms of each other it breaks.
Could be some super specific timing on one of the steps where a discrepancy of a short time doesn’t seem meaningful but is
Well the good thing is they were able to reproduce it themselves so they won’t drive themselves fully crazy trying to narrow that down, assuming not a scam.
I wonder how much you could automate that all to remove as much human factor as possible and be down to exact times if needed.