Paul, thank you for creating this JIRA issue on my behalf; I have no problem with you doing that.
Some minor points, relating to the title of this issue:
- "wrongly stored against" would not be my choice of words. I was very careful in my comment on the "Feedback" page not to say any such thing. This is as far as I went:
are you sure you want to do this? "Yes" is a perfectly acceptable answer. I don't expect to be privy to your design discussions.
- The truth of "This is invalid XHTML" depends on the version of XHTML. Certainly, it's true for XHTML 1.0. (See Chris's earlier comment.)
Some further thoughts on the use of a data-* attribute in this context...
From the W3C document HTML5: A vocabulary and associated APIs for HTML and XHTML:
Custom data attributes are intended to store custom data private to the page or application
...
These attributes are not intended for use by software that is independent of the site that uses the attributes.
...
these attributes are intended for use by the site's own scripts, and are not a generic extension mechanism for publicly-usable metadata.
How "publicly usable" do you intend/consider the Confluence storage format to be?
Personally, I think the "custom data" under discussion here does fall within the intended use described by the W3C, but still, I think that it's a point - verging on philosophical - worth considering over a pint of beer.
If (as I do), you use a W3C XML Schema 1.0 document (XSD) or XML DTD to validate document instances, then (to the best of my knowledge) you need to explicitly define each data-* attribute. You cannot (again, to my knowledge), in these particular schema languages, simply allow any data-* attribute (that is, with a wildcard). Not actually a problem, just an observation.
Finally, I agree with Chris's point:
what I do think is a really bad idea is to encode structured data inside these attributes in JSON format.
Suppose that I am, for whatever reason, constrained to using something so archaic as an XSLT 1.0 stylesheet to process this data. (Or, in fact, anything that does not more-or-less natively parse JSON.) I really would not look forward to picking apart that JSON string. (As someone who does some JavaScript programming - note that I did not refer to myself as a JavaScript programmer! - I do understand the attraction of JSON in that specific programming context .)
I'm agnostic about Chris's two suggestions. Either would be better than the current JSON encoding. I can see positives and negatives to both approaches. I'm tempted to go into unsolicited detail here, but I must start reminding myself that I have a day job, and, although it's personally interesting to me, discussing the design of the Confluence source format is rather peripheral to my employer's core business (developing IBM-brand z/OS software products) and my primary role in that business (writing the user documentation for those products). We simply use Confluence for our internal wiki (not as, say, the authoring environment for our DITA-source user documentation).
Chris, I cannot say how much I appreciate your input here. Sincere thanks for adding your voice, and for the link to that blog post.
klortho will definitely make these things easier to manipulate (with XML tools) when we are enhancing page layouts, but won't be fixing it for the current implementation.