Improving SWF Compression

excessive compression illustration
Hrrrrrngh!

Intro

With web games you want hits — lot's of hits. A million per month or more if possible. Of course this means lots of traffic. While broadband became faster and cheaper during the last couple of years, traffic is still somewhat expensive. In order to get a nicer cost/income ratio there are apparently two options: reduce the costs and/or increase the income.

As you might have guessed I'm about to tackle the former with brute processing power. If you reduce the file size by 1% every 100th served file is free. With 5% it's every 20th file and with 10% every 10th file. If the price for that are only a few minutes of processing power it looks like a really good deal, doesn't it?

By the way this is the first part of a small series of articles. There will be two or three at the end, depending on how things work out. I can't really tell at this point since the required research isn't finished yet. Well, this one is the cornerstone. The follow up article(s) will extend the basic strategy outlined in this article.

The SWF file format

The SWF file format uses ZLib streams, which are basically the same kind of Deflate streams we can see everywhere. PNG, ZIP, GZip, etc. all use Deflate for compression. If you're interested in more details on ZLib, Deflate, and GZip refer to RFC 1950, 1951, and 1952. Since the introduction of the SWF6 format almost the whole SWF file can be compressed as illustrated by the following diagram:

SWF file layout
Figure 1: SWF file layout

The first 8 bytes ("SWF H") are the essential parts of the SWF header. If compression is enabled a ZLib stream follows, which consists out of a 2 byte ZLib stream header ("Z"), the deflated data ("Deflate Stream"), and a trailing Adler32 checksum ("A32").

Excessive Deflate compression

Unfortunately there isn't a big selection of tools to improve the compression of Deflate streams directly. However, there are a lot of nice tools for ZIP files. Ken Silverman's excellent KZIP comes to mind. It's based on the code of his PNGOUT utility, which tries really hard to squish as many bytes as possible out of PNG files.

There is also Ben Jos Walbeehm's DeflOpt utility. It optimizes Deflate streams inside of ZIP, PNG, or GZip files. However, it cannot optimize Deflate streams inside of SWF files (yet?).

That's why I went with ZIP files. It's a well-known format and there are plenty of tools available.

The ZIP file format

The ZIP file format is fortunately rather simple and reasonably well documented. The following diagram roughly illustrates the ZIP file layout:

ZIP file layout
Figure 2: ZIP file layout

At the very beginning is the ZIP header ("ZIP H"). Then the header of the first ZIP entry ("E1H") followed by its deflated data ("Deflate Stream"). If there are more than one file inside this ZIP another ZIP entry header ("E2H") follows and so on, but that isn't really of interest in this case since we'll only deal with a single file.

Re-compression of an SWF's main Deflate stream

An SWF may contain other bits of deflated data inside its primary/main Deflate stream, but we'll ignore this for now. So, as far as we're concerned there is only one Deflate stream. Re-compressing it as efficient as possible requires a few steps:

  1. Extract the SWF's data and uncompress it.
  2. Put it into a ZIP file with a ZIP utility with really good compression.
  3. Optimize the Deflate stream.
  4. Extract the ZIP's Deflate stream and inject it into a new SWF file.

Of course you can try harder by using different ZIP utilities and/or different switches, then optimizing all these Deflate streams, and using the best compressed one for the final step. (That's basically what excessive.bat does.)

Results

File Old Size New Size Saved (bytes) Saved (%)
YouTube Player 78,202 75,942 2,260 2.89
FlowPlayer (LP) 123,333 119,944 3,389 2.75
Some random game 3,544,814 3,519,395 25,419 0.72

Try it yourself (at your own risk)

Download

swf_recompress.zip (5kb — source included)

Usage

There are two batch files: compress and excessive. The first batch will often already yield the optimal result. The second batch tries harder, but will be often unable to deliver a better result since KZIP with default block size usually yields the best result. Well, trying doesn't hurt.

compress.bat file.swf

Outputs: file_r.swf
Requires: KZIP, DeflOpt, and Java

excessive.bat file.swf

Outputs: file_r.swf
Requires: KZIP, DeflOpt, Java, ZIPMIX, and 7-Zip

Required tools

Closing words

The savings aren't that big yet, but the next step which involves inflation of everything will increase the percentage to about 4-7% by the looks of it. But I'm actually already pretty pleased with the results so far. For example those 2.89% peeled of the YouTube player could have saved Google millions of dollars. As unbelievable as this sounds this isn't an exaggeration.

Comments

Caching

Surely google's swf file for its youtube is cached by the browser? I know they get millions of hits but each visitor only needs to download the swf file once. The videos themselves are streamed directly and they would surely take much more bandwidth than the player...

-spuz

re: Caching

Client-sided caching actually helps a lot less than many people think. Nowadays the cache is usually quickly overwritten (often within a day or two). See It's Time to Rethink the Default Cache Size of Web Browsers for some details.

YouTube also serves that file from a zillion different locations. With that insane amount of hits you can't point everyone to the same location.

good topic

everyone posts on performance or graphics optimization in flash. nice angle.

if you have a chance, I'd really like to hear about filesize optimization of AS code. no one EVER seems to talk about that, aside from avoiding component frameworks.

for example - are there any tools to give you insight into the size of particular classes in a compiled swf? Are there some ways of writing AS syntax that compile smaller than others?

MT

byte code

Yes, it's possible to write code that results in smaller byte code... but... that's something you shouldn't ever consider doing. Never ever. Trust me.

However, tools which do this for you are something entirely different. There are a lot of obfuscators for Java, which shrink the byte code as a side effect. Basically they call everything they can "a" (classes, method, variables... and they reuse method names and variables in the next class etc). Then "b", "c"... "aa", "ab" and so on. Some of them also rearrange things for a slightly tighter fit.

These tools were very helpful for the Java 4k competitions where you'd to write a game in <= 4096 bytes.

Unfortunately there don't seem to be any decent obfuscators/shrinkers for Flash. There are some commercial obfuscators which obfuscate the byte code quite a lot, but they make the code slower and also bigger.

Well, typically the byte code only accounts for a very small percentage of the SWF's file size. Images take lots of room and sounds are even bigger. A somewhat bigger Flash game will only have 50-100kb of byte code and megabytes of media. So, the code isn't really the low hanging fruit (unless there are good free tools of course). Even if you magically manage to shrink it to 25% of its initial size, it won't help that much at the end.

The next iteration of the re-compression utility will save a few more percents. And there are a few other things I want to try. Apart from the non-compressable parts (JPG and MP3 — they are already compressed and compressing compressed stuff doesn't work very well) there are a lot other things which can be compressed a bit more.

Once I managed to inflate everything it would be also possible (in a sensible fashion, that is) to utilize a different compression scheme than Deflate. LZMA for example is very interesting. It's far superior to Deflate and the decompression routine only takes about 7kb. Images compressed with LZMA are about 30% smaller, text only about 1% (Deflate is already pretty good at that), and byte code stuff seems to compress pretty well.

http://kaioa.com/svg/compression_ratio.svgz

"jar (max compressed" on the second line is Deflate compression. "jar.lzma" on the 4th line is only ~48% of that size. I guess the compression ratio for AS3 byte code will be pretty similar since AS3 byte code looks virtually identical (binary data with some plain text thrown in).

So, embedded text files won't shrink much, byte code will shrink by about 50%, lossless images by 25-30%, JPGs & MP3s won't change, some headers here and there will shrink, and some overhead will be removed. The price is some brand new overhead (those 7kb) and a decompression step, which should only take a second or so. This will most likely be more than offset by the shorter downloading time.

But I'm not sure if I'll really go that far. LZMA-ing everything is a bit excessive. JPG and MP3 will often take a big chunk of the size and compressing/decompressing those won't really help much. So, this pre-loader stuff feels like a rather brute hammer method. Well, I'll see how fast it is.

You're recompression Java code has a bug.

int newSize=1+1+1+1+4+1+1+uncompressedSize+4; // bug
out.writeInt(Integer.reverseBytes(newSize));

Your size header needs to be the exact uncompressed size without any padding (lose the 1+1+1+1+4+1+1+4) or the Flash Player will reject it as a malformed SWF.

re: You're recompression Java code has a bug.

swf_file_format_spec_v10.pdf page 25-26 (emphasis added):

UI32 Length of entire file in bytes
[...]
The FileLength field is the total length of the SWF file, including the header. If this is an
uncompressed SWF file (FWS signature), the FileLength field should exactly match the file
size. If this is a compressed SWF file (CWS signature), the FileLength field indicates the total
length of the file after decompression, and thus generally does not match the file size.

The length of the entire file matches the length of the header plus the length of the following Deflate stream.

Also, the produced files work just fine with Flash 9 and Flash 10.

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options