Worth noting that we have trended down on our traitstone percentage since the first three weeks (sitting under a 30% average finally), and the post-patch and pre patch samples are within each other’s margin of error when using a 95% confidence interval (about 17.5~21.5% at 95% CI prepatch and between about 20~39.5% at 95% CI… we don’t have a great sample size post patch, but if they are “the same” then it is far more likely that we collectively have “bad luck” now compared to us collectively having “good luck” when taking the earlier samples).
However, I’m a bit concerned at this comment:
We are dealing with supposedly random and supposedly independent events, right? Speaking, at least, to any given day of adventure board rolls, since repeats of the same task within a day have never been shown. In these cases, the accuracy of the sample should in no way be affected by the size of the data not sampled but estimated by the size of the sample itself. Sure, you’d have more accurate data if you took more samples, but if you sampled say, 100 adventure board pulls and only 100 pulls had ever been done versus sampling 598 when 800,000 had been done, you’d be more likely to have an accurate estimate of the rates of tasks supposed to be appearing on the adventure board with the latter. Again, assuming independent and random.
So, at the moment, our traitstone rates are still indistinguishable from “bad luck” with a degree of certainty… just barely. But also worth noting that while sampling will never be “definitive” it can still be used to say things about a much larger data set. And right now it is saying “maaaaybe something changed here”.
If you are just saying “trust us, its correct, and it hasn’t changed”… I’ll be blunt, but that has been said before when it wasn’t true, and each time it happens becomes a little harder to just accept it when there is any evidence to the contrary. And I know you guys are busy, and I know a lot of threads about something being “wrong with the RNG” are a combination of cognitive biases and misunderstandings with no hard data attached, but being presented with actual evidence something might be off and then getting it summarily marked “not a bug” without even checking with the dev(s) that configured the numbers really makes me wonder why bother reporting anything that needs this much effort just to get blown off without so much as “we will keep an eye on it”.
So, work with us here, maybe you don’t want to bother the team with this yet (even though its been almost a full patch cycle and it will probably be two more before we get a low enough margin of error on the post-patch samples that shows if the samples converge or diverge, and thats if the ABs aren’t just changed in that time) what level does this have to rise to for it to be checked out by someone in the know?