Experimental SVG Compression

Introduction

silly diagramSVGs can be optimized for three different things. File size, rendering speed, and editing. For the most part you want to optimize for editing - especially with complex drawings. Towards the end you may want to lean toward the 2 other goals tho.

Optimizing for rendering speed is very interesting for devices with limited capabilities such as cellphones or PDAs. Things to keep an eye on are node counts, overdraw, clones, clipping, masking, gradients, and even simple stuff such as strokes. Nokia published some pretty decent guide (PDF) on this topic.

Optimizing for size shares a few aspects of the speed optimization. Keep the node count as low as possible, combine paths where possible... that kind of thing. However, you completely ignore overdraw, and you use clones/clipping/masking/stroke/gradients all over the place. It's certainly lot more entertaining than speed optimizations and it also won't affect editing speed that much. That's why I'll experiment with that.

The SVG

Let's start off with a pretty nice low node/complexity SVG. Paths are combined where possible, clones are used where possible, and the definitions were vacuumed (Inkscape: File->Vacuum Defs). All of this can be done in Inkscape directly.

viewBox

Since it's intended for browser viewing the first step is adding a viewBox attribute and removing the width and height elements (as seen in A New Can of Worms - SVG as Website Graphics).

For example:

<svg [...] width="640" height="480">

Becomes:

<svg [...] viewBox="0 0 640 480">


Step 1: Compacting Paths

The first step is using PathC.java, which creates a more compact path data notation than Inkscape. Additionally it can reduce the accuracy. For this image and typical desktop resolutions a single decimal place was already good enough. With 0 decimal places the changes were quite drastic, with 2 it was perfect, and with 1 the changes were very subtle. It also removes paths with an empty d attribute. While an empty d attribute is totally valid, it also disables the rendering of the element.

Be warned: The path data parsing is pretty fragile - it can't even handle the kind of path data it generates itself. (But it's fine with Inkscape's path data.)

import java.io.*;
import java.text.*;
import java.util.*;
import javax.xml.parsers.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class PathC{
    static Document document;
    static int osum=0;
    static int nsum=0;
    static int esum=0;
    static DecimalFormat nf;
    static int decimalPlaces=1;

    public static void main(String[]args){
        if(args.length!=2&&args.length!=3){
            System.err.println("Usage: java PathC infile outfile <decimal places (default=1)>");
            System.exit(1);
        }
        if(args.length==3){
            decimalPlaces=Integer.parseInt(args[2]);
        }
        String formatPattern="#";
        if(decimalPlaces>0){
            formatPattern+=".";
            for(int i=0;i<decimalPlaces;i++)
                formatPattern+="#";
        }
        nf=new DecimalFormat(formatPattern);

        DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        try{
            DocumentBuilder builder=factory.newDocumentBuilder();
            document=builder.parse(new File(args[0]));

            NodeList paths=document.getElementsByTagName("path");
            System.out.println(paths.getLength()+" path elements found");
            for(int i=paths.getLength()-1;i>=0;--i){
                Node path=paths.item(i);
                if(path.hasAttributes()){
                    NamedNodeMap map=path.getAttributes();
                    for(int k=map.getLength()-1;k>=0;--k){
                        Node att=map.item(k);
                        String name=att.getNodeName();
                        if(name.equals("d")){
                            String val=att.getNodeValue();
                            if(val.trim().length()==0){
                                Node parent=path.getParentNode();
                                parent.removeChild(path);
                                esum++;
                                break;
                            }
                            att.setNodeValue(compact(val));
                        }
                    }
                }
            }
            System.out.println("path data in   : "+osum+" bytes");
            System.out.println("path data out  : "+nsum+" bytes");
            System.out.println("path data saved: "+(osum-nsum)+" bytes");
            System.out.println("path data ratio: "+String.format("%.2f",(100.0/(double)osum*(double)nsum))+"%");
            if(esum>0)
                System.out.println("removed "+esum+" paths with empty path data");

            TransformerFactory tFactory=TransformerFactory.newInstance();
            Transformer transformer=tFactory.newTransformer();

            DOMSource source=new DOMSource(document);
            StreamResult result=new StreamResult(new FileOutputStream(args[1]));
            transformer.transform(source, result);

        }catch(Exception e){
            e.printStackTrace();
        }
    }
    private static String compact(String s){
        osum+=s.length();
        char mode=' ';
        char lastMode=mode;
        //"M 100 200 L 200 100 L -100 -200"
        //"M 100 200 L 200 100 -100 -200"
        //"M 100 100 L 200 200"
        //"M100 100L200 200"
        //mzlhvcsqta
        StringBuilder sb=new StringBuilder(512);
        String[] result = s.split("\\s|,|\\p{Cntrl}");
        boolean lastAddNum=false;
        for(int x=0;x<result.length;x++){
            String s2=result[x];
            boolean isMode=false;
            if(s2.length()==1){
                char c=s2.charAt(0);
                if(c>='a'&&c<='z'||c>='A'&&c<='Z'){
                    isMode=true;
                    mode=c;
                    if(lastMode!=c){
                        lastMode=mode;
                        sb.append(mode);
                        lastAddNum=false;
                    }
                }
            }
            if(!isMode){
                if(s2.length()!=0){
                    if(lastAddNum)
                        sb.append(' ');
                    double d=Double.parseDouble(s2);
                    sb.append(nf.format(d));
                    lastAddNum=true;
                }
            }
        }
        String ret=sb.toString();
        nsum+=ret.length();
        return ret;
    }
}


Step 2: Compacting Styles

The next target are styles. StyleC.java removes all default styles, all "fill-" styles if fill is none, all "stroke-" styles if stroke is none, and if the style attribute ends up completely empty it's removed altogether.

Be warned: The program is mostly untested and not verified at all. Additionally it can only deal with inline styles.

import java.io.*;
import java.util.*;
public class StyleC{
    static String []kill={
        "opacity:1",
        "color:#000000",
        "fill-opacity:1",
        "fill-rule:evenodd",
        "stroke-width:1px",
        "stroke-width:1",
        "stroke-linecap:butt",
        "stroke-linejoin:miter",
        "marker:none",
        "marker-start:none",
        "marker-mid:none",
        "marker-end:none",
        "stroke-miterlimit:4",
        "stroke-dasharray:none",
        "stroke-dashoffset:0",
        "stroke-opacity:1",
        "visibility:visible",
        "display:inline",
        "overflow:visible",
        "enable-background:accumulate"
    };
    public static void main(String[]args){
        if(args.length!=2){
            System.err.println("Usage: java StyleC infile outfile");
            System.exit(1);
        }
        try{
            BufferedReader in=new BufferedReader(new FileReader(args[0]));
            BufferedWriter out=new BufferedWriter(new FileWriter(args[1]));
            String line="";
            while((line=in.readLine())!=null){
                int start=line.indexOf("style=\"");
                if(start>=0){
                    start+=7;
                    int end=line.indexOf('\"',start);
                    StringBuilder sb=new StringBuilder(64);
                    String[]styles=line.substring(start,end).split(";");
                    boolean killStroke=false;
                    boolean killFill=false;
                    for(int i=0;i<styles.length;i++){
                        if(styles[i].equals("stroke:none")){
                            killStroke=true;
                        }else if(styles[i].equals("fill:none")){
                            killFill=true;
                        }
                    }
                    for(int i=0;i<styles.length;i++){
                        //System.out.println("["+i+"]"+styles[i]);
                        boolean found=false;
                        for(int k=0;k<kill.length;k++){
                            if(styles[i].equals(kill[k])){
                                found=true;
                                break;
                            }
                            else if(killStroke&&styles[i].startsWith("stroke-")){
                                found=true;
                                break;
                            }
                            else if(killFill&&styles[i].startsWith("fill-")){
                                found=true;
                                break;
                            }
                        }
                        if(!found){
                            if(sb.length()>0)
                                sb.append(';');
                            sb.append(styles[i]);
                        }
                    }
                    if(sb.length()>0){
                        out.write(line.substring(0,start));
                        out.write(sb.toString());
                        out.write(line.substring(end));
                    }else{//kill style
                        out.write(line.substring(0,start-8));
                        out.write(line.substring(end+1));
                    }
                    out.newLine();
                }else{
                    out.write(line);
                    out.newLine();
                }
            }
            out.flush();
            out.close();
        }catch(Exception e){
            e.printStackTrace();
        }
    }
}


Step 3: Removing Non-Functional Groups (and some other cruft)

This step isn't automated yet. I just went through the SVG with a text editor and removed all groups which weren't good for anything. I kept one group which had clipping and transform set. I also removed some Inkscape namespace stuff, which stayed there thanks to some regression.

Step 4: Removing Indentation and New Lines

Again this one isn't automated. Unindent everything, regex search'n'replace new line chars and that's it.

Step 5: Compressing with 7-Zip

Since 7-Zip doesn't support the headless gzip mode, I renamed the file to a single character, because the file name is stored uncompressed. And finally I gzipped it with the most extreme settings.

Setting Value
Compression level: Ultra
Compression method: Deflate (fixed value)
Dictionary size: 32 KB (fixed value)
Word size: 258
Application Size
Inkscape 7,684 Bytes
7-Zip 6,903 Bytes
Saved 781 Bytes (10.16%)

As you can see even simple re-compression with 7-Zip already yields about 10%, which is pretty impressive, given that it's so little work.

The Results:

Step In Out Saved
1: Compacting Paths 23,334 Bytes 17,868 Bytes 23.43%
2: Compacting Styles 17,868 Bytes 12,661 Bytes 29.14%
3: Removing Non-Functional Groups 12,661 Bytes 12,190 Bytes 3.72%
4: Removing Indentation and New Lines 12,190 Bytes 11,848 Bytes 2.81%
5: Compressing with 7-Zip 11,848 Bytes 4,000 Bytes 66.24%
Original Optimized
The mime type image/svg+xml isn't supported by your browser. Get Firefox or Opera. The mime type image/svg+xml isn't supported by your browser. Get Firefox or Opera.
amg6p7.svgz amg6p_p_s_c_i.svgz
7,684 Bytes 4,000 Bytes
(7zip+viewBox=6,903 Bytes) (52.06% of the initial file size)

Open both files in a new tab, flip back'n'forth, and try to spot the difference. It's barely visible. With one extra digit it would be completely invisible, but imo that wasn't worth it.

Down to about half the size and there is still a lot more which could be done. There are lots of unused ids and those ids which are actually used have rather long names. Generating the shortest possible names and using classes for the styles (again with shortest possible names) could easily peel another 500 bytes off.

Comments

Compressing svg with 7 zip

How exactly do you compress a svg file with 7-zip so that it is .svgz?

Well...

It's GZip. So, just chose GZip as archive format and change the "svg.gz" file extension to "svgz".

SVGZ is pretty similar to TGZ which is just a gzipped TAR (tar.gz).

Did not know you could do

Did not know you could do that with 7-zip

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options