Separation of Content and Presentation - One Step Ahead

Illustration of interchangeable presentation
Interchangeable presentation

You know the drill

If you didn't ignore the web for the past few years, you've probably heard it a million times by now: separate your content and presentation, use CSS for presentation, and say yes to semantic markup. There is no doubt that it's the right thing to do, given the sheer amount of benefits.

In a nutshell: it makes your life easier by lumping those pieces together which belong next to each other. It's somewhat akin to the proximity usability rule. It also keeps the noise down; if you want to change parts of the presentation there are no content bits in the way and vice versa. There is also a lot less overhead since those style sheets can be cached on the client side. In extreme cases it can save as much as 200kb of utterly pointless bloat. Additionally, proper CSS usage also paves the way for the ultimate killer feature: interchangeable presentation.

Reality check

In theory you can rebrand or even redesign a whole website just by replacing the style sheets. Unfortunately it's not always that easy in practice. With static sites you usually have to hack'n'slay through the markup with regex search & replace until you get usable markup. If you ever go down that route use Tidy first.

If a CMS was used things will usually look a lot better. The outer theme markup (rough layout, navigation, etc.) is externalized and can be easily replaced. And the inner article markup (i.e. the actual content) resides elsewhere. So, if the new layout requires some (outer) markup changes this won't be much of an issue. You write a new theme and that's it.

With overly verbose markup as seen on CSS Zen Garden you can also get a high degree of flexibility. Having lots of unused ids and classes in your markup isn't really feasible though. I don't know of any real website which went down that route. Zen Garden is just a content free demonstration page after all.

Legacy by default

With a CMS everything should be fine, shouldn't it? Well, almost. Typically individual articles or blog posts will be HTML or XHTML fragments. And that's already the first big issue: the content is hard coded to fit a specific digital distribution flavor from the very beginning.

For example if the site started with HTML 4.01, switching to XHTML 1.1 won't be easy. Of course you might be inclined to ask for a reason for doing so. However, the more important question is why you can't.

State-of-the-art markup is always a moving target. HTML 5 is just around the corner and so is the next incarnation of XHTML. Things will look different in 5 years. Even more so in 10 or even 20 years. The internet isn't a gimmicky piece of tech anymore. It's fairly save to say that the internet will virtually stay forever. The human span of life isn't all that long after all.

Disposable by default

Of course there is lots of disposable content. This blog isn't an exception with its technical focus. As long as none of the programming languages I talk about is the next COBOL, the article won't be of any interest for anyone a couple of years down the road.

But there is also truly timeless content. And sometimes it might be desirable to distribute it in completely different flavors. For example a "book" created with Drupal (it's a core module, which allows you to create a structured set of pages) might be also distributed as PDF, as DocBook, or even as a real physical book. As you can imagine (X)HTML fragments aren't really suited for that task.

Somewhat semantic (X)HTML

Pure semantics would address all those issues. But even semantic (X)HTML is a tad less semantic than it should be. That shouldn't be much of a surprise though. It's meant to be used to represent the structure of a typical web document in a generalized fashion and - to be fair - it does this job pretty well. But it can't be used to create the most accurate structure for any kind of document as illustrated by the following diagram:

intersection diagram
Figure 1: Depressing intersection diagram

To illustrate this aspect a bit more take a look at the required markup for the eye-catcher in the upper right:

<dl style="float:right">
  <dt>
    <a href="http://kaioa.com/b/0808/content_presentation.svgz">
      <img src="http://kaioa.com/b/0808/content_presentation.png" width="192" height="192" alt="Illustration of interchangeable presentation" title="click for SVG"/>
    </a>
  </dt>
  <dd>Interchangeable presentation</dd>
</dl>

The inline style attribute at the very beginning isn't great, but it should always float to the right, because floating to the left would look really ugly. Why would I want to do that? I saw no benefit in creating some class just for that. Well, lets ignore that for now. The point is that we have some definition list there, which contains one term (which contains an anchor element, which contains an image) and one definition.

Basically I just used some random elements which happen to provide the required structure. It isn't a definition or a list. It's some image with some tag line.

There is even more nonsense. Why are the paths absolute? Well, to work around some issues of some aggregators. You also have to use absolute paths if you use anchor links. The width and height attributes are also pointless. They don't belong to the content. They are merely there to aid the rendering of browsers. It's a lot more pleasant to the eyes if no reflowing happens and it's also a tad quicker to render (since there is no need to recalculate the layout once the image headers are loaded).

While that stuff is somewhat mandatory it isn't related to the content itself. It shouldn't be part of handwritten markup. Even more so if it can be generated automatically.

Even headings are a pain

Even something as simple as headings are somewhat tiresome with (X)HTML. Ideally they should start with H1 and go all the way down to H6 if necessary. Steps shouldn't be omitted and there should be only one H1 heading (one root - everything else is silly). What could go possibly wrong there?

With a CMS you're only writing an (X)HTML fragment and the first headline is from a separate title field. If you look at that page this generated title will be either a first level heading or a second level heading. And over at those overview pages, which only display excerpts it's usually a second level heading, but it might be even a third level heading if those articles are grouped by author for example.

The XHTML 2.0 working draft addresses this issue with the introduction of H elements whose semantic weight is proportional to the section nesting level. The following example was directly taken from the working draft:

<body>
<h>This is a top level heading</h>
<p>....</p>
<section>
    <p>....</p>
    <h>This is a second-level heading</h>
    <p>....</p>
    <h>This is another second-level heading</h>
    <p>....</p>
</section>
<section>
    <p>....</p>
    <h>This is another second-level heading</h>
    <p>....</p>
    <section>
        <h>This is a third-level heading</h>
        <p>....</p>
    </section>
</section>
</body>

It would be great if that issue would be solved now, but you can't actually use XHTML 2.0 yet.

General (X)HTML markup issues

If a rich text editor is used you may end up with some extraneous markup. If you're unlucky it might be even invalid. Needless to say that humans also do mistakes. And validators sometimes let really horrible mistakes with devastating consequences slip through.

As you can imagine this will lead to massive problems as soon as you try to transform it into a different format. Even if only 1% of the pages require manual interaction, it will make you cry if there are thousands. With that point of view in mind you can probably understand why I'm a big fan of absolute strictness. In a perfect world all browsers would only display an error message if the markup or styling is invalid.

XSLT and alternative markup languages to the rescue

By using a different markup language than your current publishing target markup language you can avoid overly rigid coupling. It also ensures that your content will always be in a transformable state. If the need arises you can output any kind of new markup. E.g. you will be able to switch to HTML6.2 or XHTML3.2 in 2020 without having to touch the markup of any of your 5000 articles.

Just imagine the uproar. Not even 24 hours after IE12 became self-aware and deleted all copies of itself your retro geek page will be the first bigger website to make the switch to the very latest standards (which were already supported by all other browsers). That would be so cool.

All joking (and wishful thinking) aside, there are a lot of markup languages to choose from. The most popular ones are probably Markdown, Textile, Texy (I refuse to put the exclamation point there), and Wikitext. However, those options might be a tad too limited for your taste or they simply may not meet your requirements.

On the XML side there is an infinite amount of options since you can create your own schema there. You can use as many elements and attributes as you need. And if you ever need some new kind of structure you can just create it. You can also use standardized schemata such as DocBook.

In most content management systems the XSL transformation and XML validation introduces an additional step in the pre-templating phase. In Drupal this can be done during the filtering step. Unsurprisingly the XML Content module does just that. If server-sided caching is enabled this additional transformation step will be virtually free. With that extra step in place the pipeline looks like this:

XSLT diagram
Figure 2: XSLT can transform any kind of XML into any other kind of XML

With my own schema in place and some automation the markup for the eye-catcher (compare with the markup above) could look like this:

<eyecatcher src="content_presentation.svgz" alt="Illustration of interchangeable presentation" desc="Interchangeable presentation"/>

The generated markup, however, could look the same. But it could also look completely different if it's desired. Since the original graphic is an SVG, a clickable thumbnail can be created in any size. Right now I use a width of 180 pixels and a maximum height of 180 pixels and the drop shadows ramp these values up to 192 pixels.

Apparently it would be really handy if the rendering, additional effects, and post processing were fully automated. This would allow me for example to change the dimensions later on or to use different effects (e.g. a magnifying glass icon overlay). It would also allow me to get rid of that bitmap altogether as soon as it isn't required anymore.

Closing words

In retrospect it's somewhat funny that it took me that long to realize that virtually everyone (me included) creates non-portable content and that it's actually rather easy to circumvent this issue. A few years ago when I first read about XSLT in the context of web applications I didn't see the point. One could just use XTHML all the way, right?

Well, now I can see that the path from the most accurate representation to some representation (e.g. XHTML 1.1) is a one way route. It won't be possible to utilize more meaningful structures once they are introduced if you were restricted to a specific set (which didn't cover your requirements completely) at the point of creation.

Comments

interesting....

The process you've outlined using the XML Content Module is interesting, but aren't you just essentially splitting the "theming" process into two steps.

It seems that the main point is separating the content layer from the presentation layer. Using drupal already does that by having a separate theming layer.

The bottom line is that, at some point, you have to translate your "idealized" data, whether it's sitting in a database, or in XML, or whatever into a format that can "actually" be viewed in a browser.

I guess I don't see much of an advantage to using XSLT as opposed to handling everything in drupal's theming layer. It sounds to me like you would be creating more work by having to first generate XML, then transform that into (x)HTML. Maybe I'm wrong...I'm just trying to think of a situation where this approach would give you a real advantage.

-- Jeff

re: interesting....

>[...]but aren't you just essentially splitting the "theming" process into two steps[?]

The theming step doesn't touch the content itself in any way. Well, it could, but it's a really bad place to do that.

>It sounds to me like you would be creating more work by having to first generate XML,
>then transform that into (x)HTML.

More work for the machine, yes - but that's cached. And well, the content has to be created first either way and the amount of work for the author is about the same. It might be even easier to write (see my eye-catcher example).

>The bottom line is that, at some point, you have to translate your "idealized" data,
>whether it's sitting in a database, or in XML, or whatever into a format that can
>"actually" be viewed in a browser.

In this case it's XML sitting in a database. And yes, at the end of the day (actually... when hitting preview or submit) you have to transform it into something a browser (or whatever) can understand. The big idea is that you can transform a perfect representation into anything else (whereas the opposite obviously isn't true).

A nice example is probably a "book" organized with the Drupal book module. If the DocBook schema is used there you can basically print it the way it is. You can also make it available on the net by applying an XSL transform or by generating a PDF out of it. And you can also transform it into a new markup language (e.g. HTML5, HTML6, HTML7, etc - you get the idea) in the future... without having to change any of those hundreds of pages.

Agreements and Addendums

Having been playing with xml+xslt for QUITE some time now, I must voice an absolute agreement with the need to focus on a markup that can easily be transported from one application to another. (x)html (of any solution) is simply NOT the answer. We will be plagued by upgrade and issues for a long time to come no matter what the version of markup we're working with. XML+XSLT solves MANY of the issues involved here.

Another great point that wasn't made by the article is that many of the styling issues we face today (your example of a place where it's simply easier to use an inline style that create a new class for the purpose of simply floating something right) can be solved via nth-child support in css, and since no browsers have deemed to implement this particular subset of the css 3 standard, we're still limited to doing things the way we've always done them. Nth-child support is one of the most important/powerful features of the css 3 standard, and outside of jquery, we can't make use of it. Nth-child is still more useful in a css zen garden approach, which will completely eliminate the need for exceptionally verbose class/id attributes.

Anyway, great article overall, completely agree.

Eclipse

re: Agreements and Addendums

>We will be plagued by upgrade and issues for a long time to come no matter
>what the version of markup we're working with.

Yea, it will probably be really painful. Anyone who moved some old static website to a CMS should be able to see that coming. After all that horrid table/font markup actually used to be state-of-the-art HTML at some point (yes, really).

And even today there are a lot of new websites with invalid markup.

Very Helpful

Very Helpful Article! I'm going to try and apply it on my Visual Basic Source site. Thank you!

re: Very Helpful

You got quite a lot of (good) content there. I'd say it's one of those sites where it would make a lot of sense to use future-proof markup. In 10-15 years there will be probably still a bunch of VBS applications around. A lot of smaller businesses seem to be very happy with their little custom applications. They will most likely stick around and the demand for this kind of information won't drop to zero anytime soon. With virtualization at hand you can basically keep them up and running forever.

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options