Gems of Wars RNG is currently broken

ButtStallion · November 15, 2020, 8:23pm

When I have some time I’ll try to do some battles against myself with queen bee to see if this is true. I use beetrix in my pvp t3 and it loops just fine. The other thing to note about bee is that it doesn’t have to hit the 40% to get an extra turn, it can get the extra turn on a 4/5 gem match that it creates. So sometimes what you are seeing loop is the 4/5 match, not just the 40% extra turn.

I get the point being made but testing and proof would be required. I’ve played a very long time and i’ve seen streaks go in both directions but I rarely feel like the AI is getting luckier than myself, but when I do lose, it just stands out more in my mind than all the times I’ve won with the same luck. It goes back to the point about losses sticking out in a players mind more than wins. I could win 20 matches in a row, but if I lose 2 in a row it can feel like I’m being cheated, when in actuality nothing has been affected in terms of gameplay, it’s just my perception.

Eika · November 15, 2020, 8:27pm

You dont really need to do any battles, as for human the 40 % works correctly. Its when CPU uses the team that CPU so very often loops you to death if CPU manage to set her up once.

awryan · November 16, 2020, 1:18am

200 (11)

Mithran · November 16, 2020, 3:28am

We can’t take that as evidence of anything. We don’t even know the chances independently and a streak of two (posted specifically because it was an outlier and notable) is actually purely anecdotal.

Having more than one person report this in a random data set would be suspicious (even without knowing the specific mythic treasure rate, we know they are rare), as would a bigger cluster.

True randomness is inherently “streaky” to a degree that most people are just bad at intuiting. But “streakiness” is a property that can be measured, as noted, we have mathematical tests for that. One such test is a runs test, but we need to know the order of samples to use this test.

FWIW, I think it is very likely that any sort of “streakiness” in game RNG that exists to such a degree that it makes it actually fail runs tests for randomness at a high confidence level is less likely to be a problem with the pRNG itself and way more likely to be the way that the pRNG is used in whatever calculation that happens behind the scenes to get the result being seen. This has been the case for every single (proven) snafu in the past having to do with “randomness”, and there have been more than a handful. The thing with all this is we are recording results through a sort of fuzzy filter, not seeing raw numbers generated but having to intuit what the effect of those raw numbers were to get the result that we are seeing. Intuitively, for example, if you “randomly target” between two targets you’d expect “50% chance of either” which would be a simple rand function (returning a number between 0 and 1, not including 1) and the check would simply be if the output value is less than 0.5 for one target and greater than or equal to not for the other. In reality, we have no idea how many calculations go between the rand function and what comparison is being used at the end to get the in-game result, or even what the rand function actually returned.

If you want an example of where this is very likely the case that still currently exists, try documenting how “random” TINA-9000 targets are. The anomaly I noted a while back still exists to this day, and might extend to other troops that target like her, but it is really obvious with TINA. I won’t say what it is so people can do their own tests and draw their own conclusions (or you could go find my old post, but I recommend just checking for yourselves).

Saltypatra · November 16, 2020, 3:53am

I’ve dug up some old tests we did on the AI luck when it comes to RNG, so please feel free to review these if you’re interested. As I said, these tests are a little older, but they do show the lengths we go too, and how we look into our AI and RNG.

Dear devs, please fix your cheating AI on console

So we actually ran some tests on the AI last week, because we’re hearing a few complaints here… we simulated input from the human player, and ran 816 matches overnight on one of our console test kits. The simulation uses the AI to determine the human player’s move, but that only provides input for which Gems to switch & spells to cast… in ALL other respects the actual Gem Board can’t tell whether that input comes from a Human Player or an AI player.

For the record:
a) This was done with all cheating in the human’s favor disabled
b) We define a “lucky” drop as one of the following events (relative weighting is show in brackets):

A Skull drops in to form a 4+ Skull match, when that skull was not previously visible on the board (5 pts)

A Gem drops in to form a 4+ match, when that gem was not previously visible on the board (3 pts)

A Skull drops in to a space where the opponent then immediately gets a 4/5-of-a-kind Skulls (4 pts)

A Gem drops in to a space where the opponent then immediately gets a 4/5-of-a-kind (2 pts)

A Skull Gem drops in to set up a Skull match for the opponent (3 pts)

Here are the results from 816 Games

LUCK SCORES (based on weightings above)

Average Player Luck Score: 24.3

Average AI Luck Score: 25.0

Difference in Luck Scores - the AI was on average 2.9% luckier than the human player

LUCKY GAMES (number of games where player had a higher luck score)

Games with Equal Luck: 4

Games where the Human was Luckier: 414

Games where the AI was Luckier: 398

Difference in Lucky Games: The Human player was luckier in 4% more games

Now that’s not a conclusive result, sample size could certainly be bigger… and obviously we could change the weightings… but I think we can see a clear trend that both sides are receiving fairly similar “luck scores” in the games we ran.

Dear devs, please fix your cheating AI on console

So we actually ran some tests on the AI last week, because we’re hearing a few complaints here… we simulated input from the human player, and ran 816 matches overnight on one of our console test kits. The simulation uses the AI to determine the human player’s move, but that only provides input for which Gems to switch & spells to cast… in ALL other respects the actual Gem Board can’t tell whether that input comes from a Human Player or an AI player.

For the record:
a) This was done with all cheating in the human’s favor disabled
b) We define a “lucky” drop as one of the following events (relative weighting is show in brackets):

A Skull drops in to form a 4+ Skull match, when that skull was not previously visible on the board (5 pts)

A Gem drops in to form a 4+ match, when that gem was not previously visible on the board (3 pts)

A Skull drops in to a space where the opponent then immediately gets a 4/5-of-a-kind Skulls (4 pts)

A Gem drops in to a space where the opponent then immediately gets a 4/5-of-a-kind (2 pts)

A Skull Gem drops in to set up a Skull match for the opponent (3 pts)

Here are the results from 816 Games

LUCK SCORES (based on weightings above)

Average Player Luck Score: 24.3

Average AI Luck Score: 25.0

Difference in Luck Scores - the AI was on average 2.9% luckier than the human player

LUCKY GAMES (number of games where player had a higher luck score)

Games with Equal Luck: 4

Games where the Human was Luckier: 414

Games where the AI was Luckier: 398

Difference in Lucky Games: The Human player was luckier in 4% more games

Now that’s not a conclusive result, sample size could certainly be bigger… and obviously we could change the weightings… but I think we can see a clear trend that both sides are receiving fairly similar “luck scores” in the games we ran.

Dear devs, please fix your cheating AI on console

Just to confirm that our code is working correctly, we have been performing further tests using the same test conditions as the previous test, but on Normal Difficulty in a mode which is not Ranked PvP. This here would be the difference between playing the rest of the game on normal difficulty, and then playing Ranked PvP, where the Combo Breaker is no longer enabled.

Over the past 90 hours, we have produced and analysed 2870 unique battles using the same scoring system according to @Sirrian’s initial post to produce the following results:

LUCK SCORES (based on weightings in the initial post)

Average Player Luck Score: 40.107 (vs 23.153)
Average AI Luck Score: 19.989 (vs 22.982)
Difference in Luck Scores - the human player was on average over 100% luckier than the AI player

LUCKY GAMES (counting number of games and the associated luck score)

Games with Equal Luck: 1
Games where the Human was Luckier: 2870
Games where the AI was Luckier: 0
Difference in Lucky Games: The Human player was equal or luckier in every game

CASCADES (number of cascades which occurred in a match)

Average Player Cascades: 39.532 (vs 36.534)
Average AI Cascades: 30.655 (vs 36.894)
Difference in Cascades - the Human Player had on average 38.40% more games with more cascades than the AI player

4 OF A KIND MATCHES (number of 4 of a kind matches performed by each player)

Average Player 4 of a Kind Matches: 5.784 (vs 5.744)
Average AI 4 of a Kind Matches: 2.893 (vs 5.514)
Difference in 4 of a Kind Matches - the human player had on average 38.40% more games with more 4 of a Kind Matches than the AI player

5+ OF A KIND MATCHES (number of 5+ of a kind matches performed by each player)

Average Player 5+ of a Kind Matches: 2.252 (vs 2.217)
Average AI 5+ of a Kind Matches: 1.000 (vs 2.136)
Difference in 5+ of a Kind Matches - the human player had on average 35.61% more games with more 5+ of a Kind Matches than the AI player

BOARD MATCHES (number of board moves which each player has taken)

Average Player Board Moves: 24.580 (vs 26.49)
Average AI Board Moves: 24.332 (vs 26.734)
Difference in Board Moves - the Human player had on average 8.48% more games with more board matches than the human player

SPELLS CAST (number of spells cast by each player)

Average Player Spells Cast: 7.262 (vs 6.577)
Average AI Spells Cast: 3.737 (vs 5.627)
Difference in Spells Cast - the human player had on average 44.11% more games with more spells cast than the AI player

akots · November 16, 2020, 4:37am

I think it is unreasonable to say you are right or wrong with regard to this statement. I would however prefer to operate and rely on something more accurate. Now, is it possible? Certainly yes, and without any “cheating” or any other mitigating factor. The question is how specifically? Let me explain what I think actually might be possible.

When human makes a move, it does not involve any conditional random, all random is direct. All conditions that are considered are inside the brain of a player. However, when AI makes a move, it does some fair bit of calculation that involves random “other things”, which may be conditionally tied to the actual output on the board. So, basically, human player output is unconditional or minimally conditional and is fully expected to produce random and independent results. The AI opponent moves are heavily conditioned and may therefore involve considerable extra processing, which may results in streakiness, or even inability to pass runs test, or some complete nonrandom and heavily biased output. Essentially, the more conditions are put there, the more complex and maybe better the AI is, the heavier the streaks. All-in-all, there is a fair chance that a human observer detects this and may perceive it “objectively” as nonrandom. While some people may say that this perception is “subjective” and is a simple recall bias, I would argue that this blunt simplification is unsubstantiated. It may very well happen and can be registered, and these types of events are possible not because of AI “cheating” but due to specific reasons.

Basically, if the game has more or less advanced AI but no “adaptive” (adoptive) sampling for final output to remove conditional interference, it will be streaky to the point of being absurd.

So, no AI “cheating” or bias is required to explain what is going on, and no player recall bias should be automatically assumed as well. It is simply a natural phenomenon, which is being studied and explored by mathematicians. The more “things” are added to the game, even in simple random output, like new components for guild tasks, it is increasing the streakiness. So, when there is only gold and glory in guild tasks rewards, it may appear or actually be more random to a human observer compared to a task reward that has gold, glory, souls, gems, stones of this or that type, cards of this or that rarity, etc. The more things are piled as conditions for the output, the streakier the output is going to be.

Does it happen in GoW? It very well might. Is it possible to prove conclusively? Sure, with some creative approaches. Can it be fixed? More or less to an extent with some modern methods.

Eika · November 16, 2020, 12:26pm

@akots we know RNG or streakiness is differently for CPU, than for Human, as CPU operate with bigger chances (%) in general. This was actually one of the few methods that they had to implement to make CPU seem a little stronger, as otherwise it would suck big time. Human have the adventage of starting out every game, so therefore, naturally they found a way to giving the CPU the opportunity to slam back at us. Just to make sure that the 90 % winning rate for most of us does not become 100 %. Another example is the 20 % skull Agile trait that was intentionally made a lot stronger for the CPU. Which long time players havent noticed this?

Everything works as intended, because it was intentionally coded this way. Me personally don’t have a problem with this, as without CPU having a chance to win it wouldnt be much of a game at all.

Fourdottwoone · November 16, 2020, 12:54pm

We do? Every time this claim comes up there’s zero evidence supporting it. I mean, quite a few number crunchers around here would be happy to prove that the AI is cheating in some way, so far all we ever got was “I lost one out of ten battles, this game is rigged!”.

Eika · November 16, 2020, 12:58pm

You forgot this part. At least you must be ones of those that noticed this?

Fourdottwoone · November 16, 2020, 1:11pm

I’m one of the ones who tried to verify it and found nothing out of order. Granted, my test data set is small, speaking strictly scientific research it’s rather a first impression than a high confidence. It feels very much the same as celestial drop rates in the old explore mode though, I could have sworn I was pulling the short straw way too often until I actually tracked more than a thousand battles.

Eika · November 16, 2020, 1:20pm

I have chosen to not be sceptical about this anymore, and rather chosen to trust my 6 years old experience with this game combined with my guildmates and what other longtime players have said/reported about it in this game. It’s a strong enough indicator for me. What others choose to believe in, thats up to them.

Fourdottwoone · November 16, 2020, 1:35pm

Fair enough. However, if you ever feel like challenging your belief, I can describe setup that should work pretty well. It involves several hours of painfully boring data recording, so most prefer to not travel down that road.

Eika · November 16, 2020, 1:36pm

haha, I was hoping you would spend more time with it.

Fourdottwoone · November 16, 2020, 1:46pm

I’m somewhat tempted. Especially since I would be hell of annoyed if I somehow got a result that fell out of a 99% confidence rating for the AI dodge chance being 20%. Still, I think I’ll pass. Probably.

xolid99 · November 16, 2020, 2:03pm

There’s more to this than just RNG probably, but I doubt it will ever be proven as the information is not visible and can be passed off.

Do higher level enemies have a higher agile rate? We don’t know. It feels like it.

Is there a value in delves that determines the success rate of spells cast by the AI which have a percentage chance to succeed?

As delves can’t be checked, because the maximum numbers anyone can do is 3 per day, or waste an event, what’s to stop these values being altered by a parameter set? Firstly, we have no idea if that is occurring, and secondly, because of that, and it is tied to monetisation, it could be done. This one clearly is subjective. I’ll leave it there.

As I’ve done every single faction delve without potions, I’ve a great lot of experience on a lot of percentage chance. I can’t give anything other than anecdotal experience, (no-one can) but if I was pushed here’s what I would say.

Gorbil 10% chance to Devour plus a miniscule gem conversion bonus. Not a chance this is accurate.

I have had it devour 4 troops, 1 by 1. I have had it Devour with 0 gems to convert on successive casts - flat 10%. It’s working more like 50%. As the end result is fatal, you remember it clearly, so it’s not about a biased view on some occasional, irrelevant game.

Mimic 1:6 chance to Devour a troop. I have had all my troops devoured in 4 casts, 100% rate from the top 2 Mimics, not once, but twice. I cannot count how many times I’ve had 2 and even 3 devoured, how about every run for a week 2 were devoured? I would say from my fatal experiences, the first cast devours most of the time.

Bladewing - 20% chance to lethal - more than 50% of all AI casts has killed first time. I went through a phase of a week plus where on every run it killed a troop with it’s first cast.

Deathmark from Drowned Sailor - it can get a 30% chance with 10 gems converted. I have had it work against me 90% of the time, even 13 casts in a row in various Sunken Fleet delves. I have also died first turn to the ensuing death mark well over 50% of the times it landed. To the point that it’s so not performing at the right numbers, I was casting my Drowned Saillor first go, regardless of gems on the board, which would restrict it to 20% maximum chance, and it was still death marking 9/10 times.

I could go on. We can’t really check it in Delve, where it hurts the most.

The best thing you can do is accept the anomalies, and if you don’t like it don’t spend so many gems or cash when events come round.

For the first time yesterday I completed a faction event, solely because there was less of the BS from devours/death mark and stuff in the runs and with the teams. If I was going to lose, it would be from my own errors I suppose.

Anything I don’t agree/like, I don’t partake. Arena, odd go, like 2 runs at the weekend for 1 pet in a reward. World Event when we could run Wrath and Obsidius, fab, great fun.

Yet look at this week’s World Event, Knight, was close to the worst class in the game… We got Sir Quentin, what a great troop, excellent troop, fab with yellow… half start for Knights, bumped up Knight class a little, definitely not a bottom feeder anymore. It’s not available in the Knight World Event…

Eika · November 16, 2020, 2:18pm

Pretty much this. Great advice for GoW.

akots · November 17, 2020, 11:40am

Well, apparently, not everyone agrees.

Tabu · November 17, 2020, 12:51pm

“Dynamic Difficulty Adjustment.“

This is exactly what IP2 is doing. They already admitted doing it for new players.

I believe they do it for veteran players as well.

They said it wouldn’t help them, but lets be real, the system is designed to entice buying in game items.

Soon enough all of these companies will pay the price for their unethical practices.

Fourdottwoone · November 17, 2020, 1:55pm

It’s not what IP2 is doing. IP2 is making fights easier while players are still in the tutorial stage. The EA lawsuit is about them secretly cranking up the difficulty every time players buy booster packs, voiding the benefit of the purchase.

Tabu · November 17, 2020, 2:18pm

Do you really believe that if they make it easier for new players they aren’t making it harder for others to sell in game items?