Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-25410

Page Layouts information is wrongly stored against a div element with a data-* attribute. This is invalid XHTML.

      This is raised from a customer comment on CAC.

      I believe the analysis is spot on but to summarise, we should strive to to keep storage format at close to XHTML as possible. Additional things we need should be in our namespace. The 'data*' attribute is not valid XHTML.

      Likewise, these layouts render with the data attribute as well. I imagine this is done so we can recognise layouts when we copy and paste into the Editor. Confluence now renders output to HTML 5 so this is fine.

      So regarding storage format. Could we put a custom namespaced attribute on the root div of the layout? Or better still could we create a custom element to represent the layout? Perhaps something like <ac:layout>. Obviously this should transform to a <div> when we render to view.

      Anyway, the point here isn't to propose a solution, but instead just to call out that the implementation detail here is deviating from one of the goals of Confluence storage format.

            [CONFSERVER-25410] Page Layouts information is wrongly stored against a div element with a data-* attribute. This is invalid XHTML.

            klortho will definitely make these things easier to manipulate (with XML tools) when we are enhancing page layouts, but won't be fixing it for the current implementation.

            Petch (Inactive) added a comment - klortho will definitely make these things easier to manipulate (with XML tools) when we are enhancing page layouts, but won't be fixing it for the current implementation.

            Chris Maloney added a comment - - edited

            Please give some thought, when devising new formats in future, to making sure they are readable by general-purpose XML tools, like XSLT.
            Cheers!

            Chris Maloney added a comment - - edited Please give some thought, when devising new formats in future, to making sure they are readable by general-purpose XML tools, like XSLT. Cheers!

            This won't be specifically fixed, though it is expected to be removed in a future version of Confluence with enhanced page layouts.

            Petch (Inactive) added a comment - This won't be specifically fixed, though it is expected to be removed in a future version of Confluence with enhanced page layouts.

            Graham - two more cents from me: I don't think there's any need (at all) for self-loathing or apologies. You have done an enormous amount of work reverse engineering this format, and the problems and concerns you have are all valid. I, for one, appreciate it, and I hope the folks at Atlassian do, too.

            Too often, these formats are contrived willy-nilly, with the only thought being how to expedite the job of the developer who's writing the website, or in this case, the wiki platform. Interoperability is almost always an afterthought, and it shouldn't be.

            Chris Maloney added a comment - Graham - two more cents from me: I don't think there's any need (at all) for self-loathing or apologies. You have done an enormous amount of work reverse engineering this format, and the problems and concerns you have are all valid. I, for one, appreciate it, and I hope the folks at Atlassian do, too. Too often, these formats are contrived willy-nilly, with the only thought being how to expedite the job of the developer who's writing the website, or in this case, the wiki platform. Interoperability is almost always an afterthought, and it shouldn't be.

            Re-reading my previous comment on this JIRA issue, together with my original comment on the Confluence page that resulted in this issue, now makes me cringe with embarrassment. Some admissions:

            • Prior to Chris's comment, I had not really inspected or thought about the value of the new data-atlassian-layout attribute. I had just looked at the attribute name, and thought "not XHTML 1.0". It was only after Chris pointed it out (the JSON encoding) that I looked at it, and belatedly added my "me too" comment.
            • I had observed the use of data-* attributes in various applications, but the news that data-* attributes were part of the HTML5 spec frankly blindsided me. I had no idea until Chris pointed this out.

            Reading my original comment in Confluence, and my subsequent (first) comment here, you might have thought (because, re-reading what I wrote, I realize I gave this impression) that I knew all along that data-* attributes were allowed in HTML5, and that the JSON-encoded value was a bad idea. I didn't. I only knew because of Chris's comment, and (regarding data-* attributes) because I followed the link in that comment, and then various other links from there (such as to the W3C document that I cited). What an arrogant putz. Chris, Paul, my apologies to both of you.

            Self-loathing aside... I've just sent an email to the xml-dev mailing list with the subject line "Validating data-* attributes in XHTML5?".

            Graham Hannington added a comment - Re-reading my previous comment on this JIRA issue, together with my original comment on the Confluence page that resulted in this issue, now makes me cringe with embarrassment. Some admissions: Prior to Chris's comment, I had not really inspected or thought about the value of the new data-atlassian-layout attribute. I had just looked at the attribute name, and thought "not XHTML 1.0". It was only after Chris pointed it out (the JSON encoding) that I looked at it, and belatedly added my "me too" comment. I had observed the use of data-* attributes in various applications, but the news that data-* attributes were part of the HTML5 spec frankly blindsided me. I had no idea until Chris pointed this out. Reading my original comment in Confluence, and my subsequent (first) comment here, you might have thought (because, re-reading what I wrote, I realize I gave this impression) that I knew all along that data-* attributes were allowed in HTML5, and that the JSON-encoded value was a bad idea. I didn't. I only knew because of Chris's comment, and (regarding data-* attributes) because I followed the link in that comment, and then various other links from there (such as to the W3C document that I cited). What an arrogant putz. Chris, Paul, my apologies to both of you. Self-loathing aside... I've just sent an email to the xml-dev mailing list with the subject line " Validating data-* attributes in XHTML5? ".

            Graham Hannington added a comment - - edited

            Paul, thank you for creating this JIRA issue on my behalf; I have no problem with you doing that.

            Some minor points, relating to the title of this issue:

            • "wrongly stored against" would not be my choice of words. I was very careful in my comment on the "Feedback" page not to say any such thing. This is as far as I went:

              are you sure you want to do this? "Yes" is a perfectly acceptable answer. I don't expect to be privy to your design discussions.

            • The truth of "This is invalid XHTML" depends on the version of XHTML. Certainly, it's true for XHTML 1.0. (See Chris's earlier comment.)

            Some further thoughts on the use of a data-* attribute in this context...

            From the W3C document HTML5: A vocabulary and associated APIs for HTML and XHTML:

            Custom data attributes are intended to store custom data private to the page or application
            ...
            These attributes are not intended for use by software that is independent of the site that uses the attributes.
            ...
            these attributes are intended for use by the site's own scripts, and are not a generic extension mechanism for publicly-usable metadata.

            How "publicly usable" do you intend/consider the Confluence storage format to be?

            Personally, I think the "custom data" under discussion here does fall within the intended use described by the W3C, but still, I think that it's a point - verging on philosophical - worth considering over a pint of beer.

            If (as I do), you use a W3C XML Schema 1.0 document (XSD) or XML DTD to validate document instances, then (to the best of my knowledge) you need to explicitly define each data-* attribute. You cannot (again, to my knowledge), in these particular schema languages, simply allow any data-* attribute (that is, with a wildcard). Not actually a problem, just an observation.

            Finally, I agree with Chris's point:

            what I do think is a really bad idea is to encode structured data inside these attributes in JSON format.

            Suppose that I am, for whatever reason, constrained to using something so archaic as an XSLT 1.0 stylesheet to process this data. (Or, in fact, anything that does not more-or-less natively parse JSON.) I really would not look forward to picking apart that JSON string. (As someone who does some JavaScript programming - note that I did not refer to myself as a JavaScript programmer! - I do understand the attraction of JSON in that specific programming context .)

            I'm agnostic about Chris's two suggestions. Either would be better than the current JSON encoding. I can see positives and negatives to both approaches. I'm tempted to go into unsolicited detail here, but I must start reminding myself that I have a day job, and, although it's personally interesting to me, discussing the design of the Confluence source format is rather peripheral to my employer's core business (developing IBM-brand z/OS software products) and my primary role in that business (writing the user documentation for those products). We simply use Confluence for our internal wiki (not as, say, the authoring environment for our DITA-source user documentation).

            Chris, I cannot say how much I appreciate your input here. Sincere thanks for adding your voice, and for the link to that blog post.

            Graham Hannington added a comment - - edited Paul, thank you for creating this JIRA issue on my behalf; I have no problem with you doing that. Some minor points, relating to the title of this issue: "wrongly stored against" would not be my choice of words. I was very careful in my comment on the "Feedback" page not to say any such thing. This is as far as I went: are you sure you want to do this? "Yes" is a perfectly acceptable answer. I don't expect to be privy to your design discussions. The truth of "This is invalid XHTML" depends on the version of XHTML. Certainly, it's true for XHTML 1.0. (See Chris's earlier comment.) Some further thoughts on the use of a data-* attribute in this context... From the W3C document HTML5: A vocabulary and associated APIs for HTML and XHTML : Custom data attributes are intended to store custom data private to the page or application ... These attributes are not intended for use by software that is independent of the site that uses the attributes. ... these attributes are intended for use by the site's own scripts, and are not a generic extension mechanism for publicly-usable metadata. How "publicly usable" do you intend/consider the Confluence storage format to be? Personally, I think the "custom data" under discussion here does fall within the intended use described by the W3C, but still, I think that it's a point - verging on philosophical - worth considering over a pint of beer. If (as I do), you use a W3C XML Schema 1.0 document (XSD) or XML DTD to validate document instances, then (to the best of my knowledge) you need to explicitly define each data-* attribute. You cannot (again, to my knowledge), in these particular schema languages, simply allow any data-* attribute (that is, with a wildcard). Not actually a problem, just an observation. Finally, I agree with Chris's point: what I do think is a really bad idea is to encode structured data inside these attributes in JSON format. Suppose that I am, for whatever reason, constrained to using something so archaic as an XSLT 1.0 stylesheet to process this data. (Or, in fact, anything that does not more-or-less natively parse JSON.) I really would not look forward to picking apart that JSON string. (As someone who does some JavaScript programming - note that I did not refer to myself as a JavaScript programmer! - I do understand the attraction of JSON in that specific programming context .) I'm agnostic about Chris's two suggestions. Either would be better than the current JSON encoding. I can see positives and negatives to both approaches. I'm tempted to go into unsolicited detail here, but I must start reminding myself that I have a day job, and, although it's personally interesting to me, discussing the design of the Confluence source format is rather peripheral to my employer's core business (developing IBM-brand z/OS software products) and my primary role in that business (writing the user documentation for those products). We simply use Confluence for our internal wiki (not as, say, the authoring environment for our DITA-source user documentation). Chris, I cannot say how much I appreciate your input here. Sincere thanks for adding your voice, and for the link to that blog post.

            I don't see any serious problems with have data-* attributes in your xhtml. This is a convention borrowed from HTML5, which you can read about in any number of places, including this blog post by John Resig.

            However, what I do think is a really bad idea is to encode structured data inside these attributes in JSON format. That is assumes that the only agent that will be interested in reading and processing this data are browsers. The baseline format that you have here is XML, and so I think it would be much more preferable to encode this data in new structured XML elements and attributes (yes, even ones that don't exist in XHTML) rather than try to shoehorn them into a single data-* attribute, just because you are trying to conform to XHTML.

            When you transform the underlying XML into (X)HTML(5) on the way out, then that's another story.

            So, for example, now you have something like this:

            <div class="contentLayout" 
              data-atlassian-layout="
                { &quot;name&quot;:&quot;pagelayout-three-sidebars&quot;,
                  &quot;columns&quot;:[&quot;sidebars&quot;,&quot;large&quot;,&quot;sidebars&quot;],
                  &quot;header&quot;:true,
                  &quot;footer&quot;:true }">
              ...
            </div>

            The @data-atlassian-layout value translates into this:

            { "name":"pagelayout-three-sidebars",
              "columns":["sidebars","large","sidebars"],
              "header":true,
              "footer":true }

            I would suggest that this would be better:

            <div class="contentLayout">
              <atlassian-layout name='pagelayout-three-sidebars'
                                columns='sidebars large sidebars'
                                header='true'
                                footer='true'/>
              ...
            </div>

            If you insist on using data-* attributes, then this would be much better than JSON:

            <div class="contentLayout" 
                 data-atlassian-layout-name='pagelayout-three-sidebars'
                 data-atlassian-layout-columns='sidebars large sidebars'
                 data-atlassian-layout-header='true'
                 data-atlassian-layout-footer='true'/>
              ...
            </div> 

            Chris Maloney added a comment - I don't see any serious problems with have data-* attributes in your xhtml. This is a convention borrowed from HTML5, which you can read about in any number of places, including this blog post by John Resig . However, what I do think is a really bad idea is to encode structured data inside these attributes in JSON format. That is assumes that the only agent that will be interested in reading and processing this data are browsers. The baseline format that you have here is XML, and so I think it would be much more preferable to encode this data in new structured XML elements and attributes (yes, even ones that don't exist in XHTML) rather than try to shoehorn them into a single data-* attribute, just because you are trying to conform to XHTML. When you transform the underlying XML into (X)HTML(5) on the way out, then that's another story. So, for example, now you have something like this: <div class= "contentLayout" data-atlassian-layout=" { &quot;name&quot;:&quot;pagelayout-three-sidebars&quot;, &quot;columns&quot;:[&quot;sidebars&quot;,&quot;large&quot;,&quot;sidebars&quot;], &quot;header&quot;: true , &quot;footer&quot;: true }"> ... </div> The @data-atlassian-layout value translates into this: { "name" : "pagelayout-three-sidebars" , "columns" :[ "sidebars" , "large" , "sidebars" ], "header" : true , "footer" : true } I would suggest that this would be better: <div class= "contentLayout" > <atlassian-layout name= 'pagelayout-three-sidebars' columns= 'sidebars large sidebars' header= ' true ' footer= ' true ' /> ... </div> If you insist on using data-* attributes, then this would be much better than JSON: <div class= "contentLayout" data-atlassian-layout-name= 'pagelayout-three-sidebars' data-atlassian-layout-columns= 'sidebars large sidebars' data-atlassian-layout-header= ' true ' data-atlassian-layout-footer= ' true ' /> ... </div>

            The fly in the ointment is that to fix this bug we will also need a migration task on existing pages. We should do this ASAP while the likely customer usages of layouts are still small.

            Paul Curren added a comment - The fly in the ointment is that to fix this bug we will also need a migration task on existing pages. We should do this ASAP while the likely customer usages of layouts are still small.

              Unassigned Unassigned
              e5d24123002e Graham Hannington
              Affected customers:
              1 This affects my team
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: