If you didn't ignore the web for the past few years, you've probably heard it a million times by now: separate your content and presentation, use CSS for presentation, and say yes to semantic markup. There is no doubt that it's the right thing to do, given the sheer amount of benefits.
In a nutshell: it makes your life easier by lumping those pieces together which belong next to each other. It's somewhat akin to the proximity usability rule. It also keeps the noise down; if you want to change parts of the presentation there are no content bits in the way and vice versa. There is also a lot less overhead since those style sheets can be cached on the client side. In extreme cases it can save as much as 200kb of utterly pointless bloat. Additionally, proper CSS usage also paves the way for the ultimate killer feature: interchangeable presentation.
In theory you can rebrand or even redesign a whole website just by replacing the style sheets. Unfortunately it's not always that easy in practice. With static sites you usually have to hack'n'slay through the markup with regex search & replace until you get usable markup. If you ever go down that route use Tidy first.
If a CMS was used things will usually look a lot better. The outer theme markup (rough layout, navigation, etc.) is externalized and can be easily replaced. And the inner article markup (i.e. the actual content) resides elsewhere. So, if the new layout requires some (outer) markup changes this won't be much of an issue. You write a new theme and that's it.
With overly verbose markup as seen on CSS Zen Garden you can also get a high degree of flexibility. Having lots of unused ids and classes in your markup isn't really feasible though. I don't know of any real website which went down that route. Zen Garden is just a content free demonstration page after all.
With a CMS everything should be fine, shouldn't it? Well, almost. Typically individual articles or blog posts will be HTML or XHTML fragments. And that's already the first big issue: the content is hard coded to fit a specific digital distribution flavor from the very beginning.
For example if the site started with HTML 4.01, switching to XHTML 1.1 won't be easy. Of course you might be inclined to ask for a reason for doing so. However, the more important question is why you can't.
State-of-the-art markup is always a moving target. HTML 5 is just around the corner and so is the next incarnation of XHTML. Things will look different in 5 years. Even more so in 10 or even 20 years. The internet isn't a gimmicky piece of tech anymore. It's fairly save to say that the internet will virtually stay forever. The human span of life isn't all that long after all.
Of course there is lots of disposable content. This blog isn't an exception with its technical focus. As long as none of the programming languages I talk about is the next COBOL, the article won't be of any interest for anyone a couple of years down the road.
But there is also truly timeless content. And sometimes it might be desirable to distribute it in completely different flavors. For example a "book" created with Drupal (it's a core module, which allows you to create a structured set of pages) might be also distributed as PDF, as DocBook, or even as a real physical book. As you can imagine (X)HTML fragments aren't really suited for that task.
Pure semantics would address all those issues. But even semantic (X)HTML is a tad less semantic than it should be. That shouldn't be much of a surprise though. It's meant to be used to represent the structure of a typical web document in a generalized fashion and - to be fair - it does this job pretty well. But it can't be used to create the most accurate structure for any kind of document as illustrated by the following diagram:
To illustrate this aspect a bit more take a look at the required markup for the eye-catcher in the upper right:
<dl style="float:right"> <dt> <a href="http://kaioa.com/b/0808/content_presentation.svgz"> <img src="http://kaioa.com/b/0808/content_presentation.png" width="192" height="192" alt="Illustration of interchangeable presentation" title="click for SVG"/> </a> </dt> <dd>Interchangeable presentation</dd> </dl>
The inline style attribute at the very beginning isn't great, but it should always float to the right, because floating to the left would look really ugly. Why would I want to do that? I saw no benefit in creating some class just for that. Well, lets ignore that for now. The point is that we have some definition list there, which contains one term (which contains an anchor element, which contains an image) and one definition.
Basically I just used some random elements which happen to provide the required structure. It isn't a definition or a list. It's some image with some tag line.
There is even more nonsense. Why are the paths absolute? Well, to work around some issues of some aggregators. You also have to use absolute paths if you use anchor links. The width and height attributes are also pointless. They don't belong to the content. They are merely there to aid the rendering of browsers. It's a lot more pleasant to the eyes if no reflowing happens and it's also a tad quicker to render (since there is no need to recalculate the layout once the image headers are loaded).
While that stuff is somewhat mandatory it isn't related to the content itself. It shouldn't be part of handwritten markup. Even more so if it can be generated automatically.
Even something as simple as headings are somewhat tiresome with (X)HTML. Ideally they should start with H1 and go all the way down to H6 if necessary. Steps shouldn't be omitted and there should be only one H1 heading (one root - everything else is silly). What could go possibly wrong there?
With a CMS you're only writing an (X)HTML fragment and the first headline is from a separate title field. If you look at that page this generated title will be either a first level heading or a second level heading. And over at those overview pages, which only display excerpts it's usually a second level heading, but it might be even a third level heading if those articles are grouped by author for example.
The XHTML 2.0 working draft addresses this issue with the introduction of H elements whose semantic weight is proportional to the section nesting level. The following example was directly taken from the working draft:
<body> <h>This is a top level heading</h> <p>....</p> <section> <p>....</p> <h>This is a second-level heading</h> <p>....</p> <h>This is another second-level heading</h> <p>....</p> </section> <section> <p>....</p> <h>This is another second-level heading</h> <p>....</p> <section> <h>This is a third-level heading</h> <p>....</p> </section> </section> </body>
It would be great if that issue would be solved now, but you can't actually use XHTML 2.0 yet.
If a rich text editor is used you may end up with some extraneous markup. If you're unlucky it might be even invalid. Needless to say that humans also do mistakes. And validators sometimes let really horrible mistakes with devastating consequences slip through.
As you can imagine this will lead to massive problems as soon as you try to transform it into a different format. Even if only 1% of the pages require manual interaction, it will make you cry if there are thousands. With that point of view in mind you can probably understand why I'm a big fan of absolute strictness. In a perfect world all browsers would only display an error message if the markup or styling is invalid.
By using a different markup language than your current publishing target markup language you can avoid overly rigid coupling. It also ensures that your content will always be in a transformable state. If the need arises you can output any kind of new markup. E.g. you will be able to switch to HTML6.2 or XHTML3.2 in 2020 without having to touch the markup of any of your 5000 articles.
Just imagine the uproar. Not even 24 hours after IE12 became self-aware and deleted all copies of itself your retro geek page will be the first bigger website to make the switch to the very latest standards (which were already supported by all other browsers). That would be so cool.
All joking (and wishful thinking) aside, there are a lot of markup languages to choose from. The most popular ones are probably Markdown, Textile, Texy (I refuse to put the exclamation point there), and Wikitext. However, those options might be a tad too limited for your taste or they simply may not meet your requirements.
On the XML side there is an infinite amount of options since you can create your own schema there. You can use as many elements and attributes as you need. And if you ever need some new kind of structure you can just create it. You can also use standardized schemata such as DocBook.
In most content management systems the XSL transformation and XML validation introduces an additional step in the pre-templating phase. In Drupal this can be done during the filtering step. Unsurprisingly the XML Content module does just that. If server-sided caching is enabled this additional transformation step will be virtually free. With that extra step in place the pipeline looks like this:
With my own schema in place and some automation the markup for the eye-catcher (compare with the markup above) could look like this:
<eyecatcher src="content_presentation.svgz" alt="Illustration of interchangeable presentation" desc="Interchangeable presentation"/>
The generated markup, however, could look the same. But it could also look completely different if it's desired. Since the original graphic is an SVG, a clickable thumbnail can be created in any size. Right now I use a width of 180 pixels and a maximum height of 180 pixels and the drop shadows ramp these values up to 192 pixels.
Apparently it would be really handy if the rendering, additional effects, and post processing were fully automated. This would allow me for example to change the dimensions later on or to use different effects (e.g. a magnifying glass icon overlay). It would also allow me to get rid of that bitmap altogether as soon as it isn't required anymore.
In retrospect it's somewhat funny that it took me that long to realize that virtually everyone (me included) creates non-portable content and that it's actually rather easy to circumvent this issue. A few years ago when I first read about XSLT in the context of web applications I didn't see the point. One could just use XTHML all the way, right?
Well, now I can see that the path from the most accurate representation to some representation (e.g. XHTML 1.1) is a one way route. It won't be possible to utilize more meaningful structures once they are introduced if you were restricted to a specific set (which didn't cover your requirements completely) at the point of creation.