Testing gnome appearance randomness during vault weekend

The only and sole purpose of this test was to check whether appearance of gnomes during vault weekend is random. There has been a lot of talk recently about pseudo-RNG of the game being not very accurate, or broken if you prefer the term. For example, check out this thread Gems of Wars RNG is currently broken . Since nobody volunteered to collect the data, the thread went nowhere. Well, IMHO, it is nice to check at least something, which can actually be checked. So, here is semi-scientific take on the matters.

Three major approaches have been used to check the hypothesis. The tests are not completely trivial but not too complex either. I have used Excel 2010 with free Real Statistics Addon. The addon can be downloaded from https://www.real-statistics.com/. In general, there are approximately 25-30 reliable randomness tests. However, most of these test require raw output, which means sequence of numbers directly from the pseudo-random generator. Obviously, in our cases, this sequence cannot be obtained, so I had to rely on binary data (0 for no gnome and 1 for any gnome). The theoretical frequency of 1 is approximately 0.1, which is 10%. The actual data have been collected by @MagiBelgr during last vault weekend farming essentially non-stop, interrupting only for regular activities, such as checking tribute, doing daily tasks, and 3 arena runs. This is the original post Vault Weekend data collection - #139 by MagiBelgr , and I got Excel file with actual numbers thanks to MagiBelgr who was very kind to share the data. All battles have been in Ghulvania level 6 explore. Keep in mind that the tests have nothing to do with gnomes not appearing in some modes or with key drops. The only thing that was considered is whether the gnome shows up and whether this event is random. It is a fair event, more or less, since it should not be conditional. Moreover, it is most likely run on the server; however, this is not 100% certain.

Anyhow, the data had a total of 941 battles with 842 of 0 (no gnome) and 99 of 1 (yes, gnome) in exact sequence the battles have been played. Thus, the data as good as it gets in terms of collection accuracy and general reliability of the data series. From a practical standpoint, this is almost perfect. Size of data series is sufficient to make conclusions with relatively high power of the tests. Larger data series will be biased due to the law of large numbers Law of large numbers - Wikipedia and smaller data series will not have sufficient power. Although, something around 300 data points would have been sufficient for reasonable power.

For the actual test, I generated a simple simulation of the data with rand function in Excel, which uses Mersenne Twister pseudo-RNG algorithm that is similar to pRNG used by GoW and probably the server as well, although not 100% sure. The algorithm is not ideal but it passes most of the randomness tests and is considered reasonable Mersenne Twister - Wikipedia.

First test was simple runs test Wald–Wolfowitz runs test - Wikipedia . It is not very strict but if something fails the test, it is sure nonrandom. The main principle of the test is to analyze the distribution of runs (sequences of zeros) and compare to known values based on the assumption of randomness. For details, see the wiki article and National Institute of Standards and Technology website ( Runs Test for Detecting Non-randomness ). Both sources have additional references for somebody who might be interested. These are the results for actual and simulated data:

Actual: z-stat = 0.66 and p = 0.51

Simulated: z-stat = 0.04 and p = 0.96

What does that mean? In both cases, z-stat value is less than 1.96, which means both data series are random. If z-stat is larger than 1.96 and p is less than 0.05, there is a 95% chance that the data are nonrandom. Obviously, this is not the case with actual or simulated data. It may seem that simulated data are a bit more random (lower z-stat and higher p). So, while it does not actually mean anything, it may suggest that simulated data are somewhat more random. Although both data sets are sufficiently random according to runs test.

Second test was correlation analysis based on so-called correlograms created using augmented Dickey-Fuller test (ADF). Here are some references: Autocorrelation - Wikipedia Augmented Dickey–Fuller test - Wikipedia . The test is used for time series analysis to determine if there is some periodic process within the data. The test is used in statistics and economics to determine if certain periods of data share trends. These trends are then visualized as correlations depending on step size, which is known as lag. The results are summarized in the figure below for actual and simulated data to clearly show that there are no significant correlations associated with lag. The coefficients jump up and down and do not have a consistent trend. There are some minor fluctuations within expected range. This means that there is no periodic bias in the data sets as it should be if the data are random. Nonrandom data will have distinct correlations with lag that is not observed in our case. Thus, second test confirms that the actual data are random.

ADF results are as follows:

Actual: tau-stat = -31.3, tau-crit = -3.4, and p = <0.01

Simulated: tau-stat = -12.0, tau-crit = -3.4, and p = <0.01

Other additional test quality parameters are not listed but they were very close and within expected range indicating that the test has enough power. More negative tau-stat corresponds to better randomness, suggesting that in this case, actual data are more random than simulated data. Chance of the data being nonrandom is very low in both cases, below 1%.

Third and final test was comparison of the distribution of the data (runs) in actual and simulated data sets by Kolmogorov-Smirnov test, which is very powerful statistical tool (Kolmogorov–Smirnov test - Wikipedia) . Comparison gave D-stat = 0.008, D-crit = 0,062, and p = 1, indicating that distribution is practically identical. Since the simulated data are known to be random, this identity strongly indicates that the actual data set is also random.

Overall TLDR conclusion: Gnome appearance is random. Does not mean that everything else is also random though. :wink:


Thank you, wonderful analysis! Data rules.