StringBuilder vs StringBuffer vs String.concat - done right

illustration
How long is a piece of String?

Introduction

Concatenation of Strings is very easy in Java - all you need is a '+'. It can't get any easier than that, right? Unfortunately there are a few pitfalls. One thing you should remember from your first Java lessons is a small albeit important detail: String objects are immutable. Once constructed they cannot be changed anymore.

Whenever you "change" the value of a String you create a new object and make that variable reference this new object. Appending a String to another existing one is the same kind of deal: a new String containing the stuff from both is created and the old one is dropped.

You might wonder why Strings are immutable in first place. There are two very compelling reasons for it:

  1. Immutable basic types makes things easier. If you pass a String to a function you can be sure that its value won't change.
  2. Security. With mutable Strings one could bypass security checks by changing the value right after the check. (Same thing as the first point, really.)

The performance impact of String.concat()

Each time you append something via '+' (String.concat()) a new String is created, the old stuff is copied, the new stuff is appended, and the old String is thrown away. The bigger the String gets the longer it takes - there is more to copy and more garbage is produced.

Creating a String with a length of 65536 (character by character) already takes about 22 seconds on an AMD64 X2 4200+. The following diagram illustrates the exponentially growing amount of required time:

String.concat() - exponential growth
Figure 1: StringBuilder vs StringBuffer vs String.concat

StringBuilder and StringBuffer are also shown, but at this scale they are right onto the x-axis. As you can see String.concat() is slow. Amazingly slow in fact. It's so bad that the guys over at FindBugs added a detector for String.concat inside loops to their static code analysis tool.

When to use '+'

Using the '+' operator for concatenation isn't bad per se though. It's very readable and it doesn't necessarily affect performance. Let's take a look at the kind of situations where you should use '+'.

a) Multi-line Strings:

String text=
    "line 1\n"+
    "line 2\n"+
    "line 3";

Since Java doesn't feature a proper multi-line String construct like other languages, this kind of pattern is often used. If you really have to you can embed massive blocks of text this way and there are no downsides at all. The compiler creates a single String out of this mess and no concatenation happens at runtime.

b) Short messages and the like:

System.out.println("x:"+x+" y:"+y);

The compiler transforms this to:

System.out.println((new StringBuilder()).append("x:").append(x).append(" y:").append(y).toString());

Looks pretty silly, doesn't it? Well, it's great that you don't have to write that kind of code yourself. ;)

If you're interested in byte code generation: Accordingly to Arno Unkrig (the amazing dude behind Janino) the optimal strategy is to use String.concat() for 2 or 3 operands, and StringBuilder for 4 or more operands (if available - otherwise StringBuffer). Sun's compiler always uses StringBuilder/StringBuffer though. Well, the difference is pretty negligible.

When to use StringBuilder and StringBuffer

This one is easy to remember: use 'em whenever you assembe a String in a loop. If it's a short piece of example code, a test program, or something completely unimportant you won't necessarily need that though. Just keep in mind that '+' isn't always a good idea.

StringBuilder and StringBuffer compared

StringBuilder is rather new - it was introduced with 1.5. Unlike StringBuffer it isn't synchronized, which makes it a tad faster:

StringBuilder compared with StringBuffer
Figure 2: StringBuilder vs StringBuffer

As you can see the graphs are sort of straight with a few bumps here and there caused by re-allocation. Also StringBuilder is indeed quite a bit faster. Use that one if you can.

Initial capacity

Both - StringBuilder and StringBuffer - allow you to specify the initial capacity in the constructor. Of course this was also a thing I had to experiment with. Creating a 0.5mb String 50 times with different initial capacities:

different initial capacities compared
Figure 3: StringBuilder and StringBuffer with different initial capacities

The step size was 8 and the default capacity is 16. So, the default is the third dot. 16 chars is pretty small and as you can see it's a very sensible default value.

If you take a closer look you can also see that there is some kind of rhythm: the best initial capacities (local optimum) are always a power of two. And the worst results are always just before the next power of two. The perfect results are of course achieved if the required size is used from the very beginning (shown as dashed lines in the diagram) and no resizing happens at all.

Some insight

That "PoT beat" is of course specific to Sun's implementations of StringBuilder and StringBuffer. Other implementations may show a slightly different behavior. However, if these particular implementations are taken as target one can derive two golden rules from these results:

  1. If you set the capacity use a power of two value.
  2. Do not use the String/CharSequence constructors ever. They set the capacity to the length of the given String/CharSequence + 16, which can be virtually anything.

Benchmarking method

In order to get meaningful results I took care of a few things:

  • VM warmup
  • separate runs for each test
  • each sample is the median of 5 runs
  • inner loops were inside of each bench unit

The messy code is also available.

Comments

Kieron Wilkinson

Is this run under Java 5? I did some testing of String Buffer/Builder a while ago, and I found under Java 6 with its synchronisation escaping, the difference was pretty much zero in non-multi-threaded code.

Some details

java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing)

AMD64 X2 4200+, 2gb RAM (DDR2 800), WinXP

Update 5 is out though. Eventually that one would change things. Like getters (instead of direct array access) became virtually free in the past (with a minor update, that is - edit: it was with 1.5.0_07). However, if no synchronization is required and if the target is 1.5+ one should just use StringBuilder.

Kabutz article

Great article

But... it isn't contradicting at all. The fastest solution matches example B in the "When to use '+'" section.

edit: The even faster but really ugly one matches the generated code with required size calculation on top... and that's a bit I also covered (last diagram).

Escape Analysis is not released yet

Java 6 doesn't do lock elision yet!

Why not decompile to see what's really going on?

I wrote a blog post on this subject a few years ago:
http://willcode4beer.blogspot.com/2005/05/evil-string-arithmetic-revisit...

It's one thing to say what the compiler does, something else to run javap and demonstrate it
;-)

Heh

Well, I did decompile it. That's where the last two sentences in the "When to use '+'" section came from. ;)

Be careful of memory usage

Use StringBuffer is definitely faster but one thing to be careful of is the memory usage when you use it in a loop. I have encountered out of memory error when the loop count is big.

twit88.com

Worst case

In the worst case scenario a StringBuffer or StringBuilder use about required memory *3. That's at that point where the buffer is increased. Right after the copying the usage drops to 2x. (Different implementations may use other growing schemes though.)

With String.concat() you don't have that kind of spikes, but you get lots of garbage and far more copying. Well, it's the usual memory/speed trade.

Other solutions are throwing more memory at the problem or trying something entirely different. Usually you don't really need gigantic Strings, because the String on its own isn't used directly. If that's the case you can just use some buffered stream and spit it out in manageable blocks. That's faster anyways, because less copying occurs and less garbage is created.

What about Ropes?

If you are really concerned with "concatenation expenses", may be use Ropes?
http://ahmadsoft.org/ropes/
http://www.ibm.com/developerworks/library/j-ropes/?ca=dgr-jw64javaropest...

Interesting indeed

The paper:
Ropes: an Alternative to Strings (PDF) by Hans-J. Boehm, Russ Atkinson and Michael Plass

The concept is pretty intriguing. Too bad that the implementation you linked to is GPLed. I really don't know why some people chose GPL for small enabling or low-level libraries. It only scares people off and there aren't any practical advantages. I actually even think that GPLing this kind of libraries does more harm than good to the open source movement, because stumbling over this libs over and over again is plain annoying if you cannot use GPLed code.

Less forceful licenses such as BSD or MIT work a lot better for this kind of things.

Ah well... that's just one of my pet peeves. There is nothing wrong with GPL per se though. All those restrictions do make a lot of sense for big projects. I contribute to Inkscape for example, but I wouldn't ever touch a GPLed library with a 10 foot pole. Simply because it's a waste of time since everything I learn there is useless outside of open source work.

Not bad

I'll give it a try but now the performance of my string concatenation has improved drastically usihg the Text object of the Javalution library.
http://javolution.org/ worth at least looking.

Javalution's Text object

Looks interesting indeed. Maybe a follow up article which compares StringBuilder, Rope, and Text would be a good idea.

Difference concat +

Which is the specific difference between the method concat and operator +?
The performance...
Thank you in advantage.

re: Difference concat +

The '+' operator is translated to different things depending on its context (read the "When to use '+'" section again). Whereas concat is always concat.

Generally speaking there is no reason to use concat ever. The potential speed gain is microscopic and the 2-3 operands case occurs very rarely. Cases like that are better left to the compiler (writers).

A simple concat v. StringBuilder v. StringBuffer comparison

I just wrote a simple timer class to test these approaches. It is not as comprehensive as JH's blog post, but should be easier to understand and modify: http://softwareandresearch.com/blog/?p=62

The timer class can be used to test any code by adding this to your class's main method:
Timer timer=new Timer();
timer.start();
myClass.doSomething(); // the operation you want to time
timer.milestone("subtest 1 complete");
// ... more milestones
timer.end();

What about concat with the += operator?

What does the compiler do here?

String s = "This "
s += "is anoher "
s += "concatenated "
s += "string"

re: What about concat with the += operator?

Check the first diagram... see that green line? That's what happens if you use += in a loop.

Theoretically a compiler could optimize the nonsense away you've posted, but why would anyone write something like that in first place? (See my multi-line String example.)

In practice you'll get either this:

String s = "This ";
s = (new StringBuilder()).append(s).append("is another ").toString();
s = (new StringBuilder()).append(s).append("concatenated ").toString();
s = (new StringBuilder()).append(s).append("string").toString();

or that:

String s = "This ";
s = s.concat("is another ");
s = s.concat("concatenated ");
s = s.concat("string ");

Yuck.

Can u post the code which u

Can u post the code which u used for preparing this graph ?
The following post shows the code with the execution time for string builder and string concatenation.

http://www.robinthomas.in/dotnet/stringbuilder-vs-string-concatenations/

re: Can u post the code

Here is some pasted email which should answer most of your questions:

I create most images/graphs/eye-catchers with Inkscape(.org). In this case I made the benchmarks spit out SVG path data directly, which I copy&pasted into the "d" attribute of a path. I grouped a few of those paths together (e.g. one for StringBuilder, one for StringBuffer, and one for concat) and then I transformed that group (flipped vertically and scaled to a usable size). After that I added the labels, axis, and whatever. I also added a viewBox attribute afterwards with a text editor in order to make it scale to fit and also to make everything show up (it's easier to handle if the document size matches the graph space... so, I had to move the offsets of the view box slightly around and adjust width/height accordingly).

The PNGs are exports from Inkscape with a drop shadow done in Photoshop (made a macro aka "action" for that) and at the end they were recompressed with pngout.

This approach is probably a bit odd and slightly long-winded, but I wanted SVG versions as well.

The bar graphs over at http://kaioa.com/node/78 (How to GZip Drupal 6.x's aggregated CSS and JS files) were done in a similar fashion. There I just used a 1px grid and 1kb=1px. With that in mind drawing the boxes was very easy.

Or this kind of bar graphs: http://kaioa.com/svg/compression_ratio.svgz

The first bar represents 1,146,483 bytes therefore I made it 1146.483px wide. The same goes for all other bars. At the end I grouped em and scaled em to a nice size. (The grouping ensures that everything is scaled by the same amount.)

A more straightforward solution is to output the data as CSV (comma separated values), import it then in Excel or Calc, and then let it generate some graphs out of that data.

If you're using Linux you can also take a look at gnuplot(.info).

re: re: What about concat with the += operator?

Why would anyone write something like that in the first place? I have a perfectly good use for it...I needed to pad a String representation of a decimal value with the specified number of leading zeros. Looks clean for such an application:

String prePaddedValue = fromASCII(asciiArray, startPos, length);
String padding = new String();
for (int i = prePaddedValue.length(); i < digits; i++) {
    padding += "0";
}
return padding + prePaddedValue;

re: re: re: What about concat with the += operator?

"Why would anyone write something like that in the first place?" referred to that piece of code which did use the += operator, but outside of a loop.

re: re: re: What about concat with the += operator?

String longestPadding = "0000000000000000"; //could be a static member.. but no need coz it's a string! go figure ;)
String prePaddedValue = fromASCII(asciiArray, startPos, length);
return longestPadding.substring(0, Math.min(longestPadding.length(), Math.max(0, digits - prePaddedValue.length()))) + prePaddedValue;

a single string concatination is clearer, faster and prettier than LOOPING to CONCATINATE (or append) zeros! ;)
you can also strip the Math.min & Math.max checkings if you are sure of your params lengths, which leaves you with an eligant and efficient single line of code:

return "0000000000000000".substring(0, digits - prePaddedValue.length()) + prePaddedValue;

more than linear isn't always exponential

"The following diagram illustrates the exponentially growing amount of required time:"

String concatenation scales quadratic, not exponential. Otherwise you wouldn't even get to a length of 50.

dead code

The compiler can effectively ignore any code in public void run(int iterations,int delta,int len) since it changes no global state and returns no results, make it return string and print the length (afterwards) of the said string to have effect. Also you the benchmark doesn't warm up enough 10k is the default -server. The benchmark favors (it's bloody biased towards) very heavily buffered approach as well, try adding s+=s; and check for the resulting length in the concat() test.

re: dead code

Theoretically the compiler (and the runtime environment) can remove the code since it doesn't do anything useful (like most benchmark code). However, it doesn't do that. Otherwise the graphs would be flat.

The graphs in figure 1 aren't flat. Check the SVG path data. You get the raw values there.

The benchmark is of course "biased towards" buffer/builder since those are a lot faster.

The warmup was sufficiently large. If I remember correctly it was at least 10 times larger than the number of iterations which were necessary to get the fastest cycle. Also note that median values are used for every sampling point.

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options