Is There Any Better Way? Publishing Process For CDISC Standards Documentation
1. The Pain
I read from Lex Jansen (@LexJansen) that CDISC SDTM v1.3 and SDTMIG v3.1.3 were newly released. It’s pretty nice since CDISC SDTM was supposed to be released semiannually in the new publishing cycle. We can see the team put great efforts on this new version, but frankly speaking, this delivery (the way to display, not the content itself) is far away elegant.
The new SDTM Implementation Guide (IG) v1.3 is just a temporary workaround shipment, as an embedded file “How to Use SDTMIG 3.1.3” indicates,
SDTMIG 3.1.3 is presented as an annotated version of SDTMIG 3.1.2. This approach was taken for SDTMIG in order for the document to be released quickly without an extensive rewrite. The content presented as annotations will be incorporated into a single version of documentation in a future release.
What does “annotated” mean? When you replace “should” to “must” in the file,
- strikethrough the word “should”
- insert the replacement “must”, and
- add a sticky note to indicate the change above
This is annoying. There are 143 sticky notes throughout the whole documentation, including replacement, deleting, files attachment and such and the reason, is said to ensure “the document to be released quickly without an extensive rewrite”. BUT 143 sticky notes in a PDF file! it’s already huge editing effort ever!
2. The Reason (or The Conjecture)
Almost everybody complains of Microsoft Office Excel and Word, but Ура(!), they are still dominant in our working spaces (especially heavy in clinical world? I’m not sure). I didn’t have any personal connection with CDISC publishing team, but from the documentation released, I’m pretty confident that these files (SDTMIG v3.1.3 and others) were edited in Word and then published into PDF via Adobe products (very common practice, isn’t it?).
Now you may understand why CDISC publishing team delivered this “annotated” version due to limited time and human resource (although editing 143 sticky notes was also a big work load). The clue is Word! Word! Word!
Microsoft Word is extremely popular for its WYSIWYG (What You See Is What You Get), but it can’t separate contents from formats and it will a disaster when maintaining a frequent updated Word file by multiple users. In this CDISC SDTMIG case, there are about 143 content updates supplied by CDISC community worldwide, but when applying such content updates to the original Word file, you are always reasonable to worry about that such updates would change something(yes SOMETHING) unexpectedly! The biggest concern for CDISC standard files, I guess, again with confidence, is if such updates destroy the in-text links or other cross references which offers the nice navigation throughout the documentation.
So, this “annotated” version at least is safe (and SAFE is much more important than what it looks): no links proven worktable in v3.1.2 will broken in this time pushing new release, and things would get better in the future (from the same source, “How to Use SDTMIG 3.1.3”):
CDISC is currently discussing how future documentation will be published ensure documentation is easy to navigate and read and at the same time easy to maintain.
3. The Prospective
Yes I will end with a (set of) suggestion(s). The bottom line is no Word anymore and I promise no additional cost and pain compared to digging into Microsoft Word and Adobe Acrobat.
Take SDTM IG v3.1.3 as a demo project:
- Convert all the contents of SDTM IG v3.1.2 (from PDF, or original Word) to a text based format. Personally I prefer Markdown and reStructuredText. Actually it doesn’t matter which one is chosen for test purpose, because such text based formats can be easily transferred (much easier than from Word/PDF). The benefits of these two formats are separation of contents and formats, and very intuitive to learn (much easier than HTML; almost WYSIWYG). This task is machine doable somehow but also needs manually modification. But all in all, it is not a big deal, it is only about 300 pages.
- Edit these text files according to the new SDTM IG v3.1.3 updates.
- Distribute these text files (and rendered output files in PDF/ HTML formats) to a vendor supported or self hosted collaborating site, like GitHub.
- Call for CDISC team members and users to report any issues and even encourage them to directly edit them online (don’t worry, it won’t be mess; we are in a version control system like GitHub).
- Then the next version will come out naturally (and peacefully).
then I’m looking forward to hearing your ideas.
4. Additional Notes
The markup standards mentioned above in my proposal, Markdown and reStructuredText, are not replacement for CDISC metadata standard, ODM and its XML derivatives Define.xml. Instead, they are better formats to get rid of Microsoft Word for community collaborating of editing the “narrative” parts of models (the PDF files we read from CDISC), for example, SDTMIG we discussed before.

19. July 2012 um 09:00
Why introducing yet another markup standard (Markdown, reStructureText) when the metadata could be published in an XML format that is already known to CDISC: ODM.
CDASH already has done that.
A stylesheet can make the metadata look like the familiar tables.
For narrative text, I think people like to use the tool they are used to: for most people Word.
19. July 2012 um 09:15
=====
To Lex
=====
Thanks Lex. I should make a more clearer statement that such markups are not intend to replace XML, but Microsoft Word for better collaborating narrative text such as SDTMIG.
21. July 2012 um 13:43
I think this is an interesting case because it points at how important tools and reliability are.
markdown (and relatives like Multimarkdown, Textile & Wikimarkup) are a brilliant solution to the problems with large word documents. I have felt his pain myself many times – for example when reviewers make formatting changes that cause a days work to get back to the original. They are readable and so easy to create, and still regular enough to be transformable to pretty well any format – inlcuding XML, XHTML etc. (See Multimarkdown pages and the pandoc converter)
Point taken Lex – but where are the easy to use tools for editing general XML files?
Take a look at Marked working on a complicated document to see what I mean.
Personally I have moved to LaTeX (via LyX) but for simple notes and drafts I like markdown.
I think the point is that being stuck on a tool (for whatever reason) has costs which are often unnoticed by the victims, but result is oddities like this.
3. August 2012 um 15:36
I had to think of your blog entry when I read this:
http://bbs.cdisc.org/bbs/forums/thread-view.asp?tid=3591&mid=7843