503 Error when parsing some XHTML files
This is not the first time I've seen issues about [URL="http://forums.java.net/jive/thread.jspa?threadID=64828&tstart=0"]getting HTTP Error 503[/URL] when trying to access the W3C xhtml1-strict.dtd file. W3C has explained [URL="http://www.w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic"]why it was blocking[/URL] the file since they were getting "up to 130 million requests per day".
There were comments that called for Sun, now Oracle, to fix the JAXP/JDK XML library. However, as a general purpose library, I don't think we can make it a special case. When a parser encounters a DTD declaration, it would attempt to read and parse it. It is an application's decision if it should be (reading the dtd).
That said, some libraries are indeed short on providing users the ability to control the behavior of the underlying parser. [URL="http://forums.java.net/jive/thread.jspa?threadID=64828&tstart=0"]The forum question[/URL] mentioned above is an example of such libraries. The XPath API does not have an option to allow users to tell the parser, in this case, to ignore the DTD. One way to overcome the problem is to supply your own parser, that is instead of passing an inputsource (xpath.evaluate(expression, inputSource), using a document (xpath.evaluate(expression, document) and disable DTD before creating the document:
As explained in the [URL="http://www.w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic"]W3C document[/URL], the URI is used for identification, a way to say it's HTML. If one is parsing incoming html pages, one does not really need it to tell it's a html page.
[url=http://blogs.sun.com/joew/entry/503_error_when_parsing_some]Read More about [503 Error when parsing some XHTML files...[/url]