The fact there are better specs doesn’t help if a large part of the work is handling things that are outside the spec. You better show every “buggy” page similar to how the major browsers show them or the new browser will be considered defective. That’s the unfortunate reality of web tech (I wish every page with an js error or incorrectky closed tag would be a big fat error message but it isn’t). And that’s still a lot of slow guesswork I imagine.
> You better show every “buggy” page similar to how the major browsers show them or the new browser will be considered defective.
Used to be true, I doubt that it is anymore.
There are too few (I could find exactly none, to be honest) sites are around these days that are unreadable when rendered strictly according to a newish (say, 2019) HTML/Javascript spec.
The proliferation of front-end frameworks means that almost no site is going out of spec, and because any site that doesn't meet a large portion of the spec is invisible to search engines, having the site be broken when sticking to the various specs is no issue.
In short:
1. With practically all large-traffic sites using a framwork, a browser that strictly sticks to the specs and the specs alone is not at a disadvantage.
2. With important on SEO, a site that is unreadable on a recent spec is not going to be found anyway by the large body of traffic.
Conclusion: a browser that sticks to the spec and the spec alone has a fighting chance.
It's the complexity and edge-cases of an exceptionally large, complicated and self-contradictory spec with thousands of edge-case when different parts of the spec are combined that's the problem.
All this "quirks mode" stuff is part of the specification, no? It makes it all a bit more complex than it has to be, but I do believe it's specified.
I'm not really sure if "you need to be bug-compatible" is still true; it probably was 15 years ago, but Chrome, Firefox, and WebKit tend to be pretty decent these days.
Quitks mode is one thing, but most browsers have specific rules for specific websites, a manual process to update and handle those cases. Pretty sure chrome and safari have hundreds of these rules.
It's not clear to me if those are due to shortcomings in WebKit, the site, or if it's to be "bug-compatible" with anything else. Either way, 1,600 lines of code doesn't seem a lot to me.
If anything, websites have become way less clean and invalid HTML. I remember people, including myself, putting W3C validator icons on websites. Rarely do I see any these days, because of all the invalid HTML and dynamically created websites. Maybe all the tags are closed nowadays, so maybe at least that. But which elements are used inside which other elements and whether they are semantically appropriately used is another matter.
One of the ideas behind HTML5 is that while there is some concept of validity and well-formedness, essentially any random stream bytes describes exactly one DOM tree, in some cases the resulting tree is surprising, but even then should be same across all conformant parsers (modulo scripting support).
The end result is that validation is not that much interesting anymore, because the idea was that valid (X)HTML document should parse the same accross all browsers (which it mostly did, but that did not say much about how it was actually rendered).
Like most people I gave up on the whole semantic pedantry a long time do. Correct header ordering, basic semantics like <nav>: sure, that's great. But "no <p> inside <dt> allowed!" just makes no real sense and is exceedingly pedantic.
The validator badges were kind of a backlash against the tag soup of the day; part of the reason for that was that everyone who knew how to program a VCR could get employed as a "webmaster" in those days, but also because the authoring tools for non-tech authors weren't as good. HN sees a lot of posts from non-tech people, often written on WordPress, Medium, or whatnot. 25 years ago it would more likely have been "tag-soup'd" by some non-tech person who just learned a bit of HTML.
Nowadays, HTML parsing is exhaustively defined in the form of a couple of state machines, so it’ll behave the same everywhere. It’s genuinely easy to implement perfectly (though it’ll still take a while because there is quite a bit of it).
> The fact there are better specs doesn’t help if a large part of the work is handling things that are outside the spec
The way that web specs are handled means that better specs actually bring a lot of those things into the spec. i.e. browser implementers will define a new spec that clearly explains the quirk, and then align on the implementation. There is also a huge test suite which can be used to test conformance.
It's not perfect, but it's definitely a significantly better situation than we had.