Concatenation of Strings is very easy in Java - all you need is a '+'. It can't get any easier than that, right? Unfortunately there are a few pitfalls. One thing you should remember from your first Java lessons is a small albeit important detail: String objects are immutable. Once constructed they cannot be changed anymore.
Whenever you "change" the value of a String you create a new object and make that variable reference this new object. Appending a String to another existing one is the same kind of deal: a new String containing the stuff from both is created and the old one is dropped.
You might wonder why Strings are immutable in first place. There are two very compelling reasons for it:
Each time you append something via '+' (String.concat()) a new String is created, the old stuff is copied, the new stuff is appended, and the old String is thrown away. The bigger the String gets the longer it takes - there is more to copy and more garbage is produced.
Creating a String with a length of 65536 (character by character) already takes about 22 seconds on an AMD64 X2 4200+. The following diagram illustrates the exponentially growing amount of required time:
StringBuilder and StringBuffer are also shown, but at this scale they are right onto the x-axis. As you can see String.concat() is slow. Amazingly slow in fact. It's so bad that the guys over at FindBugs added a detector for String.concat inside loops to their static code analysis tool.
Using the '+' operator for concatenation isn't bad per se though. It's very readable and it doesn't necessarily affect performance. Let's take a look at the kind of situations where you should use '+'.
a) Multi-line Strings:
String text=
"line 1\n"+
"line 2\n"+
"line 3";Since Java doesn't feature a proper multi-line String construct like other languages, this kind of pattern is often used. If you really have to you can embed massive blocks of text this way and there are no downsides at all. The compiler creates a single String out of this mess and no concatenation happens at runtime.
b) Short messages and the like:
System.out.println("x:"+x+" y:"+y);The compiler transforms this to:
System.out.println((new StringBuilder()).append("x:").append(x).append(" y:").append(y).toString());Looks pretty silly, doesn't it? Well, it's great that you don't have to write that kind of code yourself. ;)
If you're interested in byte code generation: Accordingly to Arno Unkrig (the amazing dude behind Janino) the optimal strategy is to use String.concat() for 2 or 3 operands, and StringBuilder for 4 or more operands (if available - otherwise StringBuffer). Sun's compiler always uses StringBuilder/StringBuffer though. Well, the difference is pretty negligible.
This one is easy to remember: use 'em whenever you assembe a String in a loop. If it's a short piece of example code, a test program, or something completely unimportant you won't necessarily need that though. Just keep in mind that '+' isn't always a good idea.
StringBuilder is rather new - it was introduced with 1.5. Unlike StringBuffer it isn't synchronized, which makes it a tad faster:
As you can see the graphs are sort of straight with a few bumps here and there caused by re-allocation. Also StringBuilder is indeed quite a bit faster. Use that one if you can.
Both - StringBuilder and StringBuffer - allow you to specify the initial capacity in the constructor. Of course this was also a thing I had to experiment with. Creating a 0.5mb String 50 times with different initial capacities:
The step size was 8 and the default capacity is 16. So, the default is the third dot. 16 chars is pretty small and as you can see it's a very sensible default value.
If you take a closer look you can also see that there is some kind of rhythm: the best initial capacities (local optimum) are always a power of two. And the worst results are always just before the next power of two. The perfect results are of course achieved if the required size is used from the very beginning (shown as dashed lines in the diagram) and no resizing happens at all.
That "PoT beat" is of course specific to Sun's implementations of StringBuilder and StringBuffer. Other implementations may show a slightly different behavior. However, if these particular implementations are taken as target one can derive two golden rules from these results:
In order to get meaningful results I took care of a few things:
The messy code is also available.
Comments
Kieron Wilkinson
Is this run under Java 5? I did some testing of String Buffer/Builder a while ago, and I found under Java 6 with its synchronisation escaping, the difference was pretty much zero in non-multi-threaded code.
Some details
java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing)
AMD64 X2 4200+, 2gb RAM (DDR2 800), WinXP
Update 5 is out though. Eventually that one would change things. Like getters (instead of direct array access) became virtually free in the past (with a minor update, that is - edit: it was with 1.5.0_07). However, if no synchronization is required and if the target is 1.5+ one should just use StringBuilder.
Kabutz article
http://java.sun.com/developer/technicalArticles/Interviews/community/kab...
This shows some contradictory findings.
Great article
But... it isn't contradicting at all. The fastest solution matches example B in the "When to use '+'" section.
edit: The even faster but really ugly one matches the generated code with required size calculation on top... and that's a bit I also covered (last diagram).
Escape Analysis is not released yet
Java 6 doesn't do lock elision yet!
Why not decompile to see what's really going on?
I wrote a blog post on this subject a few years ago:
http://willcode4beer.blogspot.com/2005/05/evil-string-arithmetic-revisit...
It's one thing to say what the compiler does, something else to run javap and demonstrate it
;-)
Heh
Well, I did decompile it. That's where the last two sentences in the "When to use '+'" section came from. ;)
Be careful of memory usage
Use StringBuffer is definitely faster but one thing to be careful of is the memory usage when you use it in a loop. I have encountered out of memory error when the loop count is big.
twit88.com
Worst case
In the worst case scenario a StringBuffer or StringBuilder use about required memory *3. That's at that point where the buffer is increased. Right after the copying the usage drops to 2x. (Different implementations may use other growing schemes though.)
With String.concat() you don't have that kind of spikes, but you get lots of garbage and far more copying. Well, it's the usual memory/speed trade.
Other solutions are throwing more memory at the problem or trying something entirely different. Usually you don't really need gigantic Strings, because the String on its own isn't used directly. If that's the case you can just use some buffered stream and spit it out in manageable blocks. That's faster anyways, because less copying occurs and less garbage is created.
What about Ropes?
If you are really concerned with "concatenation expenses", may be use Ropes?
http://ahmadsoft.org/ropes/
http://www.ibm.com/developerworks/library/j-ropes/?ca=dgr-jw64javaropest...
Interesting indeed
The paper:
Ropes: an Alternative to Strings (PDF) by Hans-J. Boehm, Russ Atkinson and Michael Plass
The concept is pretty intriguing. Too bad that the implementation you linked to is GPLed. I really don't know why some people chose GPL for small enabling or low-level libraries. It only scares people off and there aren't any practical advantages. I actually even think that GPLing this kind of libraries does more harm than good to the open source movement, because stumbling over this libs over and over again is plain annoying if you cannot use GPLed code.
Less forceful licenses such as BSD or MIT work a lot better for this kind of things.
Ah well... that's just one of my pet peeves. There is nothing wrong with GPL per se though. All those restrictions do make a lot of sense for big projects. I contribute to Inkscape for example, but I wouldn't ever touch a GPLed library with a 10 foot pole. Simply because it's a waste of time since everything I learn there is useless outside of open source work.
Not bad
I'll give it a try but now the performance of my string concatenation has improved drastically usihg the Text object of the Javalution library.
http://javolution.org/ worth at least looking.
Javalution's Text object
Looks interesting indeed. Maybe a follow up article which compares StringBuilder, Rope, and Text would be a good idea.
Post new comment