Concatenation of Strings is very easy in Java - all you need is a '+'. It can't get any easier than that, right? Unfortunately there are a few pitfalls. One thing you should remember from your first Java lessons is a small albeit important detail: String objects are immutable. Once constructed they cannot be changed anymore.
Whenever you "change" the value of a String you create a new object and make that variable reference this new object. Appending a String to another existing one is the same kind of deal: a new String containing the stuff from both is created and the old one is dropped.
You might wonder why Strings are immutable in first place. There are two very compelling reasons for it:
Each time you append something via '+' (String.concat()) a new String is created, the old stuff is copied, the new stuff is appended, and the old String is thrown away. The bigger the String gets the longer it takes - there is more to copy and more garbage is produced.
Creating a String with a length of 65536 (character by character) already takes about 22 seconds on an AMD64 X2 4200+. The following diagram illustrates the exponentially growing amount of required time:
StringBuilder and StringBuffer are also shown, but at this scale they are right onto the x-axis. As you can see String.concat() is slow. Amazingly slow in fact. It's so bad that the guys over at FindBugs added a detector for String.concat inside loops to their static code analysis tool.
Using the '+' operator for concatenation isn't bad per se though. It's very readable and it doesn't necessarily affect performance. Let's take a look at the kind of situations where you should use '+'.
a) Multi-line Strings:
String text= "line 1\n"+ "line 2\n"+ "line 3";
Since Java doesn't feature a proper multi-line String construct like other languages, this kind of pattern is often used. If you really have to you can embed massive blocks of text this way and there are no downsides at all. The compiler creates a single String out of this mess and no concatenation happens at runtime.
b) Short messages and the like:
The compiler transforms this to:
System.out.println((new StringBuilder()).append("x:").append(x).append(" y:").append(y).toString());
Looks pretty silly, doesn't it? Well, it's great that you don't have to write that kind of code yourself. ;)
If you're interested in byte code generation: Accordingly to Arno Unkrig (the amazing dude behind Janino) the optimal strategy is to use String.concat() for 2 or 3 operands, and StringBuilder for 4 or more operands (if available - otherwise StringBuffer). Sun's compiler always uses StringBuilder/StringBuffer though. Well, the difference is pretty negligible.
This one is easy to remember: use 'em whenever you assembe a String in a loop. If it's a short piece of example code, a test program, or something completely unimportant you won't necessarily need that though. Just keep in mind that '+' isn't always a good idea.
As you can see the graphs are sort of straight with a few bumps here and there caused by re-allocation. Also StringBuilder is indeed quite a bit faster. Use that one if you can.
Both - StringBuilder and StringBuffer - allow you to specify the initial capacity in the constructor. Of course this was also a thing I had to experiment with. Creating a 0.5mb String 50 times with different initial capacities:
The step size was 8 and the default capacity is 16. So, the default is the third dot. 16 chars is pretty small and as you can see it's a very sensible default value.
If you take a closer look you can also see that there is some kind of rhythm: the best initial capacities (local optimum) are always a power of two. And the worst results are always just before the next power of two. The perfect results are of course achieved if the required size is used from the very beginning (shown as dashed lines in the diagram) and no resizing happens at all.
That "PoT beat" is of course specific to Sun's implementations of StringBuilder and StringBuffer. Other implementations may show a slightly different behavior. However, if these particular implementations are taken as target one can derive two golden rules from these results:
In order to get meaningful results I took care of a few things:
The messy code is also available.