I mean, yes, but its not that “<10 of a given number out of 137 total rolls did not show up on any die in 200 trials” that is being misrepresented as the chances for “8 rolls of a given number in 137 total rolls” to appear on - your expectation for “8 of 137” is really low, and you would not expect to have 9 or fewer rolls of a given number with a sample set of 137 rolls even if you performed the trial hundreds of times. Feeling that 8 in 137 would be too much of an outlier for a “fair die” is a perfectly reasonable assumption that can be backed (more on this below), but it was using 8 in 137 in the first place that was wrong.
The problem is that the test was flawed from the beginning because counting individual deeds as samples which does not give a distribution anywhere close to that of a dice roll. We are looking at a distribution of colors, which is the distribution of tasks and can’t simply add the number of deeds from those tasks and count each one as an individual roll - it doesn’t work like that. Each deed was part of a larger task that it cannot be separated from. While in the very long term “number of deeds earned of a given color” should also follow a distribution where they are similar to 1 in 6, to compare our experience of color distribution it to a dice roll, we have to look at the color distribution of tasks.
What we actually have going on here:
- Deeds tasks have appeared 27 (correct me if I’m wrong here, I’m not keeping track specifically, I just went back and did a manual count of the tracking in this thread), therefore, our color distribution sample size is 27
- Of those 27 samples, some give an output of 4, some 6, and some 9. This is irrelevant to our color distribution calculation.
- If we look at color distribution alone, we get the table below (again, basing it on the task count in this thread)
If we were looking at just the distribution of deeds using the (fallacious, in this case) assumption that we can just do a number of deeds calculation (lets say the circumstances were different and we always only ever got one deed at a time, thus each deed was able to be taken as a sample) and we got 8 purple out of 137 when the expectation was supposed to be 1 of 6(~23.8 purple), the cumulative probability of such an event would be ~.01%. In this instance, we would have ground to stand on as “evidence” (we would also be outside 99.9% confidence level estimates that this would sample set could be used to describe the stated rate) with this many samples. But that isn’t what is going on here, this is not what the data is saying. We aren’t likely to see anything remotely approaching a distribution of the amount of colored deeds approaching that of a “fair dice roll” until we have seen mythic tasks rolled a couple times for every deed color at least a once and legendaries multiple times, which could take years.
The table:
Blue | Green | Red | Yellow | Purple | Brown | |
---|---|---|---|---|---|---|
Epic | 3 | 2 | 4 | 4 | 2 | 2 |
Legendary | 1 | 1 | 1 | 2 | 2 | |
Mythic | 2 | 1 | ||||
Total Color Task | 4 | 5 | 6 | 6 | 2 | 4 |
Total Deeds | 18 | 32 | 31 | 28 | 8 | 20 |
Assuming that it is completely random and follows a normal distribution following the assertion that it is such (every deed color task of a given rarity has the exact same chance to appear as any other one therefore the chance of any given color to appear is exactly equal and also has a normal distribution), the cumulative probability to land 2 or fewer out of 27 with an individual rate of 1 out of 6 is approximately 14.9% (a ~10% chance to hit exactly 2) - for any one given color. If you go back up to the dice roll simulator and input the number of trials as 27 and run through it several times, you’ll note that a great majority of samples have something land with 2 or fewer, and some of those times it is the “5” that gets the 2 rolls, as expected.
Note that this does not definitively prove “deeds are completely random”, nor really anything. The results are simply inconclusive. If we had access to more sample sets, it would only take a few sets of the size that we have to see if we are skewed away from purple or not. A die that only rolled a “5” twice out of 27 times isn’t all that uncommon, but 8 or less out of 108 is far more suspicious (and multiple sets would see very quickly if there is a specific trend on colors). I’m not sure “how confident” we need to be for it to measure as proof for a simple check to be done, but at the rate we are going now, getting, say, 4x as many samples (108) samples would at least be outside of 95% confidence level estimates for being ‘correct’ but it would take over a year to get there at this point, by which time it would be too late to matter (and I’m not sure we’d even get any action from that).
That doesn’t mean there isn’t a problem here. The problem, as I see it, is that it is literally impossible to get any evidence that might qualify as actionable from our end before it is too late to matter, especially with writs coming in. This is combined by knowing that they don’t sample the output of any RNG or RNG plus calculation process until after ample player complaints with enough “evidence” that it is impossible to ignore. This was the underlying cause of the recent trust issue (and many in the past), and why when we get response clearly stating “The deeds that show up are completely random. They are not skewed towards any color” it gets colored to mean “As far as I personally know, and I may or many not have checked with the people that actually implemented it, we intend for the deeds that show up to be completely random but we never actually checked if they were”. The problem is because we know they don’t sample output for any process that involves RNG, any assertion that involves them being confident that things are working correctly is only because there is no overwhelming evidence to suggest otherwise - it is an assumption that everything works correctly, which is kind of an untenable position when things often don’t and why the faintest whiff of things being out of whack is now easier to latch on to than people trying to explain why it isn’t significant evidence.
I don’t know the solution for the trust issue other than to earn it back over time, but sampling anything that uses RNG with or without a calculation should be part of the QA process, for everything. Whether or not players are saying it is wrong, just, like, automatically as part of the QA, sample the output to verify that the input plus the process gives you the desired result. Thats just what I’ve been trying to say. This probably won’t help with trust in the short term, but it might in the long term when it leads to a reduction in errors. Plus a bonus of, you know, potentially catching the errors before they become problems.
As it stands, even if things are broken with this, nobody will care anymore by the time we could potentially see it because “things worked out in the end” with writs. Which means it can continue being “not worth the time” to properly sample outputs and let some people be mad every once in a while “because they don’t understand probability” because that has lower cost and will blow over, even, apparently, if it turns out things were totally wrong and needed to be fixed on more than one occasion. Even if there is basically no evidence to support a problem here, this larger issue bothers me because the processes need to change or we will run into a another “x troop was not in chests” or “portal rates wrong” somewhere down the road. They should have already done the sampling by this point to make sure the system works as intended, not “because we only got 8 purple deeds so far”.
I’m going to adhere to “this was a bad way for deeds to be distributed in the first place”, though, because it involves a lot of watching and waiting for RNG to do its thing and not a lot of proactive gameplay. And that is what is leading to a lot of frustrations with the system now. Hitting a 2 of 27 somewhere and having “clumpy” break point upgrades was likely to happen just because of how the system was implemented.
tl;dr: We can’t tell if deeds are “completely random” or not from this, there is basically no evidence here to suggest that it isn’t random, it would be basically impossible for us to collect enough evidence to “prove” it isn’t random, even if we do it would take so long that it wouldn’t matter, devs say it is random but wouldn’t have explicit knowledge if it were bugged somehow because they don’t sample so , and I wish they would sample all outputs regardless so they would know if it was bugged, and therefore be able to fix it before it reaches us, and then we would could eventually have higher trust in confident assertions of how things work and that they work correctly.
Edit: As for only Purple deeds showing up on flash offers so far, well, theres at least two highly plausible explanations (and at least one less plausible one) for this. Instead of me stating what those are, someone should ask this question during the next QA Stream and see what the official word is (Why have all deed flash offers so far been for purple deeds?) and let the individual decide whether or not they want to believe it from there.