Never-ending Shuffled Sequences - When Random is too Random

usage example
100% Random Ruins It

The Problem

In games or educational programs you often don't want real randomness. Unsurprisingly random it's often a bit too random. You may get some items several times in a row and others too rarely. It's either too easy or too hard. Balancing the weights is hard as well, because each run is too different. And if that wouldn't be already bad enough; it's also difficult to verify the results.

Note: The continuous shuffled sequence algorithm which I'll explain in the second half of the article can be easily ported to other languages. All you need are arrays (and a copy function or resizable arrays) and a random number generator.

Before I'll dive into shuffled sequences I'll outline the other issues of the obvious approaches. Because that's apparently more fun to do.

Bad Solution #1 - the if/else if/else massacre

public class BadBalance1{
    public static void main(String[]args){
        double rand=Math.random(); // 0 <= rand < 1
        String item;
        if(rand<0.5)      //50%
            item="A";
        else if(rand<0.75)//25%
            item="B";
        else if(rand<0.95)//20%
            item="C";
        else              // 5%
            item="D";
        System.out.println(item);
    }
}

It doesn't look that bad at first. Once you pulled it off you might even think that you did something smart there. Well, back then I thought it was a good idea. Of course I was wrong, but it's a pretty common mistake to make.

There is one really nasty issue (apart from those mentioned in the first paragraph): changing the weight of one item is really painful. You have to change all of 'em. All the time. And you have to do the math. Over and over again.

If that doesn't sound like a big deal, picture it with 10, 20, or even more items. It quickly gets out of hand.

Bad Solution #2 - a weight array

import java.util.Random;
public class BadBalance2{
    public static void main(String[]args){
        Random gen=new Random();
        int[]weight={50,25,20,5};
        String[]items={"A","B","C","D"};
        int sum=0;
        for(int i=0;i<weight.length;sum+=weight[i++]);
        int rand=gen.nextInt(sum); // 0 <= rand < sum
        int pick,k;
        for(k=0,pick=0;k<=rand;k+=weight[pick++]);
        System.out.println(items[pick-1]);
    }
}

Empty body for-loops are Satan's offspring, but let's ignore this for now. Feel free to de-ugly-fy the code if you want to.

This one is like the previous one - just without that massive flaw. Weights can now be changed independently, you don't have to do any math, and the code is also more compact if there are many items. It's certainly a big improvement.

If you're fine with the quirkiness of full randomness you can pick this option. Just keep in mind to recalculate the sum if you change weights on the fly.

However, full randomness can ruin a game completely. Tetris is a good example. Naïve implementations often use straightforward randomness, which makes the game plain unfair (or not - well, it's random). If the source of randomness is any good, the game will deal an unsolvable sequence of pieces sooner or later. Game over. You've been brick-walled by the gods of randomness.

Shuffled Sequences

Good Tetris implementations (such as the Tetris brand games from 2001 and later) generate a shuffled sequence of all 7 pieces, use 'em up, and start anew (or a variation thereof). This allows the player to continue playing for all eternity. In theory, that is. At some point he/she won't be able to keep up with the increasing speed.

Many standard libraries cover this use case just fine. Unsurprisingly Java isn't an exception.

import java.util.*;
public class SimpleShuffle{
    public static void main(String[]args){
        ArrayList<Integer> al=new ArrayList<Integer>();
        for(int k=0;k<9;k++){//outer loop
            if(al.size()==0){//refill and shuffle if it's empty
                for(int i=0;i<3;i++)
                    al.add(i);
                Collections.shuffle(al);
            }
            System.out.println(al.remove(al.size()-1));
        }
    }
}

The outer loop here acts as a main loop placeholder. Whenever an item is drawn you first have to check if the list is empty. If it is fill it with new items. After that draw one by removing one item from the list.

If you are wondering why I removed the items from the end of the list, it's to avoid shifting. With ArrayLists all data which comes after the removed element is moved around. As you can imagine this is pretty slow with bigger lists. Especially if you remove elements at index 0 - don't do that. ;)

Why didn't I use a LinkedList then? Unlike ArrayList LinkedList doesn't implement the RandomAccess interface therefore Collections.shuffle dumps it over into an array, shuffles, and then dumps it back. This does of course introduce some unwanted friction. Using an ArrayList and removing elements from the end is clearly the better solution.

Use more of the same for weighting. E.g. for the 50/25/20/5 weighting from the examples above you could either add each one 50/25/20/5 times or 10/5/4/1 times. The former is more random and the latter is less random. With the first set of weights the first item could occur up to 100 times in a row and with the second set only up to 20 times.

That's the most extreme case tho. It only happens if all items of the first kind end up at the very end of the list in this run and at the very beginning of the list in the next run. Highly unlikely but it can happen.

However, we can be certain that the item with 5% likeliness occurs exactly 5 times every 100 drawn items (with the first weight set) or once every 20 drawn items (with the second weight set). This behavior is a lot more deterministic than those two pure random implementations. With pure randomness 5% can mean anything from almost never to 20 times in a row.

Drawbacks

While it solves the issues of the 2 previous examples we get some new ones:

  1. Refilling & shuffling happens at a silly place.
  2. It adds clutter.
  3. Drawing an item takes longer every once in a while (refill and shuffle) - especially with bigger lists.
  4. It does more than necessary.

The first two issues can be solved by shrink wrapping it. The last two, however, cannot. (You can get rid of refilling tho by using a cursor to keep track of the position.)

The ShuffleBag

By using one of the ShuffleBag classes the previous SimpleShuffle.java example is reduced down to this:

public class ShuffleBagUsage{
    public static void main(String[]args){
        IntShuffleBag sb=new IntShuffleBag();
        for(int i=0;i<3;i++) //fill it once
            sb.add(i);
        for(int k=0;k<9;k++){//outer loop
            System.out.println(sb.next());
        }
    }
}

As you can see it addresses the first two drawbacks just fine. Refilling can be done once during the initialization phase and there is zero clutter. The only thing that happens in our main loop placeholder is drawing of items.

Neat. But there is more. Drawing of items is always fast. Additionally, there is only one array which holds all items. Basically all work was reduced to a minimum and the work which left was equally spread between draws.

A port of the earlier weight examples also looks a lot nicer:

public class ShuffleBagUsage2{
    public static void main(String[]args){
        GShuffleBag<String> sb=new GShuffleBag<String>();
        sb.add("A",50);
        sb.add("B",25);
        sb.add("C",20);
        sb.add("D",5);
        System.out.println(sb.next());
    }
}

Cute, isn't it? :)

How Collections.shuffle works

In order to understand the algorithm I'll explain in a bit it might be helpful to know how Collections.shuffle works, since there is some overlap and the shuffling is more isolated there.

An excerpt from the documentation:
This implementation traverses the list backwards, from the last element up to the second, repeatedly swapping a randomly selected element into the "current position". Elements are randomly selected from the portion of the list that runs from the first element to the current position, inclusive.

Basically it's like taking everything out an array, throwing it into a bag, shaking it, taking 'em out one by one, and putting it back into that array in that order. If you picture it like this it should be easier to understand.

However, only one array/list is used and the cursor (=current position marker) divides it in "inside the bag" and "already shuffled". That's the reason why the picking happens from the first element to the cursor inclusive. It transverses the list backwards, because that makes the code simpler and the effect is the same.

How ShuffleBag Works Under the Hood

figure 1
Figure 1: Data Structure

Like an ArrayList ShuffleBag stores its elements inside an array. The length of the array equals its capacity and a variable keeps track of the actually used space.

IntShuffleBag.add():

[...]
public void add(int i){
    add(i,1);
}

public void add(int i, int quantity){
    if(quantity>0){
        int pos=size;
        size+=quantity;
        if(size>capacity){
            capacity=(capacity*3)/2 + 1;
            if(size>capacity)
                capacity=size;
            int[]oldData=data;
            data=new int[capacity];
            System.arraycopy(oldData, 0, data, 0, size-quantity);
        }
        for(int k=0;k<quantity;k++)
            data[pos+k]=i;
        cursor=size-1;
    }else{
        throw new IllegalArgumentException("quantity < 1");
    }
}
[...]

The underlaying array is enlarged if necessary in an ArrayList fashion. If that still isn't enough it's enlarged to the required size. This means there are fewer copy operations if smaller quantities are added and it won't grow too much if bigger quantities are used.

Also note that I used size-quantity (=the old size) as length parameter for System.arraycopy. This ensures that only actually used array fields are copied over. Using the length of the old array would copy a lot more than necessary in some cases. E.g. if the initial capacity is set to 1000, then 1 element is added and then 1000... then 1000 fields would be copied over instead of 1. Well, usually it will only save us a few copy operations, but it's the right thing to do.

IntShuffleBag.next():

[...]
public int next(){
    if(cursor<1){
        cursor=size-1;
        return data[0];
    }
    int grab=gen.nextInt(cursor+1);
    int temp=data[grab];
    data[grab]=data[cursor];
    data[cursor]=temp;
    cursor--;
    return temp;
}
[...]

For size-1 of the draws two items are swapped, and the very last draw of a sequence doesn't swap anything at all. This makes it fast and predictable. As you can see only one array is used and no refilling happens.

But let me illustrate this with a few diagrams.

figure 2
Figure 2: Draw #1

After adding an element the cursor (shown in blue) is at the last position. It is swapped with a randomly chosen element from index 0 to the position of the cursor inclusive. In this case 'D' (cursor) was swapped with 'B' (index 1). The cursor advances one step (new position shown in light blue) and 'B' is returned.

figure 3
Figure 3: Draw #2

In this case 'C' is swapped with itself. This can happen. The probability for that was 1 in 3 after all.

figure 4
Figure 4: Draw #3

'D' (index 1) is swapped with 'A' (index 0).

figure 5
Figure 5: Draw #4

The end of the sequence is reached. The cursor gets reset to the last index and 'D' (the element at index 0) is returned.

figure 6
Figure 6: Adding Items

Adding elements is always possible. After adding elements the cursor is always set to the very last index. This makes freshly added items as likely as any other item for the next draw. The price is a slightly skewed distribution if you were in the middle of a sequence. But that's usually a non-issue.

Closing Words

In case you're wondering: a remove method doesn't exist on purpose because it leads to messy code. With that kind of data structure it won't be all that fast anyways. Besides if you want to use different weight sets just use different weight sets (i.e. ShuffleBag instances). E.g. for a fun racer like Mario Kart you can use several ShuffleBags - one for each position or position range (bad items for the pole position, so-so items for the middle, good items for the last positions).

Maybe "draw" would have been a better name for next, but "next" felt pretty natural to me. Well, change it if you like.

I hope you enjoyed your ride and learned a thing or two. As usual the example implementations are available under a Zero-clause BSD-style license.

Source Files

ShuffleBag.java (<=1.4 Object version - doc)
GShuffleBag.java (1.5+ Generics version - doc)
IntShuffleBag.java (versatile int version - doc)

Update: Check out ShuffleBag Ports for Flash (AS3), haXe, JavaScript, PHP, Perl, and Python versions.

Comments

great read, here's the PLT-Scheme version

(define (make-sb l)
  (cons 0 (list->vector l)))

(define (sb-next! sb)
  (let* ((r (+ (random (- (vector-length (cdr sb)) (car sb))) (car sb)))
         (v (vector-ref (cdr sb) r)))
    (vector-set! (cdr sb) r (vector-ref (cdr sb) (car sb)))
    (vector-set! (cdr sb) (car sb) v)
    (set-car! sb (modulo (+ (car sb) 1) (vector-length (cdr sb))))
    v))

; and now ...
(define e (make-sb '(1 2 3 4 5)))
(sb-next! e)

Thanks :)

Unfortunately SHJS (the used code highlighter) can't highlight Scheme. I also don't know anything about Scheme myself... so, I've to take your word for it that it works as intended.

Thanks just what I needed

It took 4 days but I can finally implement your code in my game now, thank you. This line was invaluable "Basically it's like taking everything out an array, throwing it into a bag, shaking it, taking 'em out one by one, and putting it back into that array in that order. If you picture it like this it should be easier to understand." It took 4 days because I'm not a java programmer. - Oldschool

Re: Thanks just what I needed

With that line I tried to make it clear that the shuffle algorithm actually works the same way one would do it in real life. If you take a look at the source or the technical explanation this isn't immediately obvious, because only one array and a cursor is used.

It took me a while to figure out how/why Collections.shuffle works and why it does it the way it's outlined in the documentation. I actually had to draw a few sketches to figure it out. Sometimes pen and paper really is the best tool. ;)

Interesting

I've also tried to solve this problem. Your approach seems nice but it has some disadvantages. Most importantly the randomness of the numbers is uneven. Every 9th number is *completely* determined by the previous 8. Obviously you could never really use such a system in a competitive game.

The best I could come up with was dynamically changing the probabilities if the numbers based on how often they have come up in the immediate past. Something like (for a dice, in matlab):

probabilities = [1 1 1 1 1 1];
unrandomness = 1;
while true
    cumprob = cumsum(probabilities) ./ sum(probabilities);
    roll = find(cumprob >= rand, 1)
    probabilities = probabilities + unrandomness;
    probabilities(roll) = probabilities(roll) - 6*unrandomness;
    if min(probabilities) < 0
        probabilities = probabilities - min(probabilities);
    end
end

Sorry for lack of indentation [fixed that for you – jh]. The unrandomness parameter can be adjusted as desired. True random output (unrandomness=0):

2 3 1 1 4 1 3 3 4 5 1 5 5 2 6 1 1 1 6 1 1 3 5 6 6 1 4 2 4 6 3 6 5 1 1 6 2 5 6 4 3 5 2 4 6 5 5 5 4 4 3 4 1 2

unrandomness=1:

3 2 4 5 6 2 6 2 4 1 5 5 1 6 4 3 1 4 2 1 3 2 6 5 3 6 5 3 1 4 1 6 5 3 4 2 1 6 5 4 1 4 3 1 6 6 5 4 3 1 5 2 3 2

Looks better I think. Especially if you plot the gaps between numbers.

re: Interesting

>Obviously you could never really use such a system in a competitive game.

Works fine for Tetris though. It depends on the game if such a predictable scheme can (or should be) used.

Re: Interesting

What are talking about? Just "fill-up" the bag after calling next like this:

for(int i=0;i<3;i++) //fill it once
sb.add(i);
for(int k=0;k<9;k++){//outer loop
System.out.println(sb.next());
sb.add(k%3); // <----------------- fill-up
}

Re: Interesting

I've not thought it through but... fill a bag twice the size, so with 18 instead of 9 in it, then take out half, discard the rest and repeat. The ratio between the number you take out and the number you put in will control the amount of randomness - taking out just one ball each time would get you back to truly random.

Code translation for those w/o Matlab

This code was interesting, but I couldn't fully understand it without a working knowledge of Matlab. So I asked a question on StackOverflow to get an explanation, and rewrote the algorithm in Ruby. For those who are interested, but don't comprehend Matlab's functions, you can read the code in Ruby as a GitHub Gist.

(I'd post the code here, but the html sanitization is wrecking havoc with my code's formatting, and I don't know how to fix it.)

Randomness doesn't mean that there will ever be a bad deal

"If the source of randomness is any good, the game will deal an unsolvable sequence of pieces sooner or later. Game over. You've been brick-walled by the gods of randomness."

This isn't necessarily the case. By the very nature of randomness, it's entirely possible for it to never give an impossible set, however improbable it is.

Now that I'm done nitpicking, this is a very good article, and an idea, I never would have thought up..

re: Randomness doesn't mean that there will ever be a bad deal

>By the very nature of randomness, it's entirely possible for it to never give an impossible set, however improbable it is.

That was a somewhat philosophical statement. Given infinite time, everything which is possible will happen infinite times.

Say, you're flipping a coin for an eternity or two. Then you'll get heads 10 times in a row, 100 times in a row, and even 47 million times in a row. Sooner or later (or, well, much later) it will happen, because it can.

Admittedly, it's a sorta awkward way to put it. So, lets stick with Murphy's law: Everything which can go wrong, will go wrong.

I'm looking for a JS version

I'm looking for a JS version of this suitable for Unity. As a noob-coder, my many, many attempts at translating this to JS has failed so many times... Has anyone been able to do this at all? Cheers for help.

>I'm looking for a JS version

Check ShuffleBag Ports. There is also a JavaScript version over there.

Well, now that I actually do know JavaScript I would write it a "bit" differently though. But the version over there does work fine.

Edit: Just added a nicer JavaScript version. :)

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options