If I try to parse an HTML document using JAXP/DOM and specify, via whatever means, a HTML4 dtd (i.e. http://www.w3.org/TR/html4/strict.dtd) it throws a fatal error. Catching that, I can extrapolate that "The declaration for the entity "ContentType" must end with '>'" on line 81 of the dtd.


Line 81 of the HTML4 strict dtd:


<!ENTITY % ContentType "CDATA"
-- media type, as per [RFC2045]
-->

Even though the Java w3c.dom API states that 'Document' can encapsulate an HTML document why does the parser not allow the lagacy style comments that seem to have been ruled out looking at the XML spec (at least, that appears to be the issue).

Quote Originally Posted by "xml 1.0 spec third ed
For compatibility, the string "--" (double-hyphen) MUST NOT occur within comments.] Parameter entity references MUST NOT be recognized within comments.
Does anyone know of a way this can be overcome?