I really like standards and absolute strictness when it comes to things which are interpreted by zillions of different programs. After all, a scenario like this just asks for trouble. Validators do help there and as I wrote a few months ago they can really help you avoid many issues, potential issues, and also future issues.
However, I do like automation a bit more and herein lies the problem: 100% standards compliance isn't always an attainable goal. And if you simply can't get a perfect score, you cannot use those validators for your automated tests. A test which always fails isn't really helpful.
There are many things, which never will be valid and you can't do anything about it. Proprietary or legacy content management systems and components thereof are a good example. Another source of pain are those bloody rich text editors. Some of them produce amazingly awkward markup with zillions of font tags all over the place for good measure.
CSS is also troublesome. Many people prefer using hacks (aka "filters") instead of conditional comments for attaching the required training wheels for Internet Explorer. But even if you keep that stuff away, there are still vendor prefixes and sparkling new stuff you just have to use for something. It's new and exciting, after all.
HTML5 lifted some restrictions. Putting block elements into <a> elements is now perfectly fine for example. But the validator from the W3C will of course still complain about every small detail which isn't absolutely perfect.
Now, perfection is of course a great thing. But if you know that you can never reach it for this project (and all its sub projects), then you should look for an alternative goal. A browser doesn't really care about small details like missing alt attributes, font tags, or if & characters in URLs which weren't escaped.
So, this second best thing we're looking for is an identical DOM tree in every relevant browser. Fortunately we can be pretty sure that it will be identical if all tags were closed and if they were closed in the right order.
Writing a parser for this wasn't too difficult. You only need to handle start tags, end tags, short tags, comments, DOCTYPE declarations, CDATA sections, and of course style and script blocks. Also, add a list of start tags which should be treated like short tags (meta, br, img, and the like). But that's about it. Push start tags on a stack, pop and compare if you get a close tag. At the very end check if the stack is empty. (I can't share the ugly source, sorry.)
Pretty straightforward, but it does catch virtually all those of those annoying layout breaking errors.
CSS requires a bit more work and a completely different strategy. The validator basically does exactly what we want, but we'd like it to turn a blind eye to some of things we're doing. So, instead of doing the lax validation all by ourselves, we do some pre processing and then let the validator do its work.
Typically there will be things like the star and the underscore "filters" to deal with the shortcomings of IE6 and IE7. Typically it will look somewhat like this:
.some .selector{
width:100px; /* any good browser */
*width:90px; /* IE6 and IE7 (mnemonic: * is a wildcard) */
_width:80px; /* IE6 (mnemonic: _ is low) */
}A good strategy for this stuff is to strip the underscore and star prefixes of all attributes. The nonsensical rules will look like this:
.some .selector{
width:100px; /* [...] */
width:90px; /* [...] */
width:80px; /* [...] */
}Fortunately the validator doesn't care if it makes any sense and will take a close look at each of those rules – no matter how pointless they are.
The next thing you have to take care of are vendor prefixes like "-moz-", "-o-", and "-webkit-". The easiest option is to discard these statements by replacing them with a comment. Deleting lines isn't a good idea, because the reported line numbers won't necessarily match the original style sheet anymore. So, use comments to keep things in place.
And finally you have to take care of things which will never be in the specs like zoom or filter and things which aren't covered by the validator yet like overflow-x or box-shadow. You could try to validate those new things on your own, but parsing them is quite a lot of extra work and since it's interesting new stuff you've probably already ensured that it works as intended anyways.
Well, just keep in mind that it's all about catching common errors. You won't prevent all of them no matter what you do. Pick the low hanging fruit and move on.
There isn't validation for JavaScript per se, but there is JSLint (also available as addon for Komodo), which comes pretty close.
Fortunately JSLint offers some degree of customization on a per file basis. All you need is a comment in a specific format and everything is fine. Here is one from a recent project:
/*jslint white: true, onevar: true, undef: true, nomen: true, eqeqeq: true, bitwise: true, regexp: true, newcap: true, immed: true, browser: true, plusplus: false */
Note that there shall be no space between the start of the comment and "jslint".
So, if there is some old script no one wants to improve, you can just disable every feature of JSLint in order to make it pass. Another scenario is if your company's code conventions disagree with some of Crockford's recommendations. For example we're fine with ++ and -- as long as the only thing going on in this statement, therefore the "plusplus" checks were disabled.
Happy unit testing. :)
Comments
Post new comment