July 08, 2008

MAML Editor: Progress II

In a previous blog post I discussed my work on a WYSIWYG editor for authoring MAML topics, either inside or outside of DocProject.  This post is simply another quick update on my progress.

Preview Release Timeline

A few weeks ago I started a new job and they put me on a project with a tight deadline so I haven't had much time to do anything else.  Last weekend I was still too busy to do much with the editor (even on July 4th) so I missed my most recent target date: yesterday.

I'll probably have time to work on the editor each night this week though so I should be able to put a working preview out there soon.  Last night I was able to almost complete key input behavior processing, which I had actually planned to work on after the preview.  I was also able to complete a few controls; e.g., the alert control now has a drop down list in the header with icon-text pairs as list items (shown in the screenshot below).  It was so easy to implement too compared to having to owner draw items in WinForms apps - WPF is really cool.  The next control I'm going to write is for MAML linking, which should be fun :)

Progress

I still have to add the command buttons but I've been deferring that because it should be pretty easy to do with limited functionality for the initial release (e.g., the editor's not schema based yet so all buttons will always be enabled.)  Although, I'll probably only add a small subset of commands initially, which will save time.

One thing that I may do for the preview though is to add the ability to insert custom command buttons through a dialog and have them associated with custom MAML markup. Once the markup is inserted, it can be parsed as if it was being loaded from a MAML file using the existing factory/parser infrastructure - that should fill in the gap nicely for MAML elements that I won't be providing support for in the initial release.  (Note that this is entirely conceptual - I don't know if it'll actually be worth the effort for the preview release.)

The other major feature that the editor is currently lacking is the ability to parse the physical flow back into MAML for save operations (it currently saves as XAML, for testing purposes).  I don't see this being a difficult process though since every element in the document is currently tagged with a parser context object, which provides direct access to the particular parser that can transform the element into MAML, in much the same way that it parses MAML into elements when a topic is loaded.  (Load functionality works fine :)

Originally I was hoping to avoid having to parse the physical flow by simply saving the in-memory LINQ to XML document (this was working for a short while during development), however I discovered later that the only change event that gets raised is the FlowDocument's TextChanged event (IIRC), making it difficult to track the relationship between changes in the flow and the underlying XDocument structure.  Although, I do see an opportunity to implement change tracking like this in the future due to my parser context and parser input behavior designs, but for the sake of time I'll probably defer it until a subsequent beta release.

Screen Capture

The following is an updated image of the same MAML topic being edited as in my last post.  It might not look like much has changed, but functionally speaking, it's now actually becoming a useful app (barring the missing save functionality :).

This time all of the styles that you see are loaded dynamically from XAML style sheets (currently embedded resources, but eventually loose XAML as well) and, although it's not obvious from the image, the editor actually behaves much more like you'd expect from a schema-based editor.  For example, pressing the Delete key at the end of the MAML Introduction will no longer cause the first section's header to move up into the introduction's paragraph.  This behavior is controlled by each individual MAML element - they define their own key press behavior using a custom preview & bubble approach, just like real WPF controls, but they also inherit some default behavior that restricts the flow a bit.

I've also started using WPF annotations for debugging (not visible in the image below) and I also plan to use them to display a delete button near the currently selected block element.  So to delete an entire section from the editor you would click anywhere inside the section and the delete button will appear, perhaps floating at the top-right corner of the element.  That's the way I think it should work for all block elements, such as the MAML introduction, section, procedure, alert table and code block shown in the image below.

(Note: The document shown below was actually loaded from a real MAML topic file.  The red squiggles are real-time spell checking errors provided by WPF and each element's styles were applied dynamically using XAML style sheets.)

image

June 04, 2008

Sandcastle Styles Project on CodePlex

The Sandcastle Styles project is now live on CodePlex.  From the project's home page:

The goal of this project is to improve Sandcastle by providing a rolled-up solution to various presentation style issues in a manner that is highly visible to the Sandcastle community and also involves community feedback.

This project was started by Paul Selormey, Eric Woodruff and myself.  Currently we're the only active contributors but we'll be happy to take any feedback you have to offer.  Let us know by starting a discussion.

Sandcastle features and issues should be submitted directly to the Sandcastle Issue Tracker so that the Sandcastle team will be notified.

What's Inside?

Eric has included his presentation style patches from the SHFB project - we're now deploying them as a part of Sandcastle Styles.  There are additional bug fixes and modifications that were made for the May 2008 release of Sandcastle as well.  He has also included a new MAML guide as a separate download and custom code providers for VB and C# that can be used to extract assemblies and XML documentation files from Visual Studio Web Site projects.

We've also added some new features to each presentation style, such as support for the MAML glossary document type, image placement support in MAML, enhanced <autoOutline> in MAML, and also an auto-generated bibliography for both conceptual and reference topics.  Tools such as DocProject and SHFB support these features with the patches applied.

I've started an Examples project that shows the typical usage of XML documentation comments (all tags supported by Sandcastle) and common MAML markup, such as linking, tokens, code snippets and media.  I've also included DocProject's MAML document templates for those that are using a different tool.  Lastly, there's an API that can be used to test Sandcastle filtering capabilities in our automation tools, which also provides some nice examples of how to add XML documentation to code and how to use the <include/> tag to share comments and keep code neat.

We expect to improve our library of examples and include additional languages and project types in the future, and I hope to make it available as a separate download as well.  Also, if people will donate language packs for Sandcastle we'll tidy them up and host them.

Hopefully the community will find this project to be a valuable Sandcastle resource.  Any feedback will be appreciated :)

June 03, 2008

MAML Editor: Progress

This blog post is just an update on my progress creating a WYSIWYG (what you see is what you get) MAML editor.  I've received a lot of feedback (relative) for the MAML editor and I know that people are waiting for one, so I'd like to share with you some information about the one that I'm working on and when it will be available.

For those of you who are unaware of what MAML is, you can read about it here.

I've created a very basic MAML editor using .NET 3.5, C# 3.0, LINQ to XML and WPF, but it's not currently my priority since I'm planning to release DocProject 1.11.0 RC later this week.  A preview of the editor will be released though in the next couple of weeks as a standalone WPF application, a VS 2008 package (with MAML item templates) and also a standalone class library so that other tool developers can use the editor in their own applications - although note the dependency on the .NET 3.5 runtime and the DocProject license, which is currently the GNU-GPL.

The editor isn't schema based yet - I've hard-coded support for the Conceptual document type only, although since that type shares many common elements with other types, like How To and UI Reference, they're basically supported as well.  The only difference now is that there's no schema validation and the UI cannot create several elements that are optional in the schema for the other document types.  Though the current implementation has been designed to be flexible enough so that adding these things should be easy to do later on.

Since the editor is based on a WPF RichTextBox control, we get a free spell checker with squiggly underlines and an undo/redo buffer.  It's also really simple to implement custom control logic for various elements (the current implementation uses a factory pattern to generate parsers for the individual elements, by name), such as a link editor, tables, images, and other stuff.  So far I've added collapsible sections that actually function when you click the button, some custom logic that controls input behavior for various key presses, and also whitespace normalization for text (Run content in WPF).

Loading and saving also works, however synchronization between the data model in memory and the logical tree of the WPF content model still needs to be implemented - a major part of the editor - but with the event model and my current architecture I don't think it'll be too difficult; e.g., the data model currently uses XElement instances (from LINQ to XML) that contain annotations for their corresponding WPF content elements (Section, TableCell, Paragraph, Run, etc.) when the data is first parsed.  The idea is to update the document directly when changes in the UI are detected.  Saving the changes are then as simple as saving the XDocument.

Metadata is also editable, so I've decided to generate two separate files when saving a single topic.  The first file is the topic file itself, which contains the root topic node, its id and revisionNumber attributes, as well as the actual MAML content.  The second file is an XML companion file (to use the Sandcastle lingo) that contains metadata such as the topic's title, TOC title, MS Help attributes and index keywords.  I may have another XML-based file created in the future to save editor settings for individual topics, such as the visibility status of individual sections; i.e., a designer file.

MAML topics are saved with an .aml extension (Assistance Markup Language) and companion files use a .cmp extension with the same base file name as the .aml topic.  In the VS 2008 package, the .cmp file will be created with a dependency on the .aml so that it appears as a child in Solution Explorer.  The next version of DocProject will provide support for these extensions so that when I release a preview of the editor your existing DocProjects and DocSites will already be compatible.

There's still a lot more to do before the editor is actually usable for generating quality MAML topics, but it's on its way.  After a bit more functionality is implemented I'll blog about it in more depth and post some screenshots.  For now, check out a screenshot below of the editor as I edit a simple How To topic that was previously written using Visual Studio's XML editor.

(Note that this is just a temporary WPF app that hosts the control while I develop it.  The final app will have command buttons for semantic markup, such as database, ui, application, system, localUri, etc. and block-level elements such as section, procedure, list and table.   I didn't implement them yet because commands seem pretty simple in WPF and I wanted to concentrate on the basic editor support first, as a WPF proof of concept (I'm a first-time WPF developer :)

image

The image above shows a MAML topic with introduction text, a collapsible section, a procedure and a code example.  I also splashed some style in there, like the large font for the section title, just for the screenshot (the final editor will most likely get style information at runtime from WPF content resource themes so that its appearance will be completely customizable, without having to rebuild the source code).

Clicking the toggle button for the section will actually toggle the visibility of the section's content in the editor (note that the image is ugly only because it's on a button and I didn't remove the border yet).  I plan to add other controls as well for functions such as adding links, tokens and media.  The controls will actually be inside the document as part of the flow like the section toggle button - this is one of the really powerful features of WPF 3.5 flow documents.

There's even a context menu with Cut, Copy and Paste commands that I didn't have to write since it's provided by WPF.  Editing the text and everything else behaves exactly like you would expect from a WYSIWYG editor, which also means that I'm going to have to add custom logic to attach behavior to specific elements so that everything is a bit less fragile.  For example, one thing that I've done already is to add a behavior so that when the Enter key is pressed the cursor moves to a new line instead of an entirely new paragraph.  Pressing Enter a second time, while on the new blank line, will delete the blank line and start a new paragraph (it's more complicated than that, but you get the idea).  Having a new line within the same paragraph won't change the MAML output, it just provides some control over the appearance while editing.  Of course, if this is confusing or annoying I can always remove it later.

I'd like to get feedback on my ideas so that I can quickly provide an editor that people are satisfied with.  If you have any suggestions about how the editor can be improved please let know.  Thanks!

May 24, 2008

MAML Migration: The Next Step in the Evolution of Help Authoring

Are you part of that herd of developers that is used to documenting applications by writing help topics in raw HTML?  The power of it is nice, being able to add a pinch of bold here, a splash of italics there, some CSS for different layouts, a floating image, several nested tables, an abundance of hyperlinks, embedded Flash and media players, and even some JavaScript to boot.  What more could we want?  Or maybe a better question is, what could possibly make us want to give any of that up?

XML Documentation Comments

Well, XML documentation comments that may be added to code modules (I'm assuming everyone's familiar with this stuff by now) was one thing that prompted .NET developers to start documenting their code without using HTML.  It's nice to be able to apply a bit of XML structure to our documentation, isn't it?

Commonly used semantics for describing an API may be expressed in a universal way with XML tags such as summary, remarks and example, and the compiler builds an XML documentation file that contains the code comments found in each module when we build our project.  If you take a look at the contents of this file you may see a repeating pattern - schema - that seems like it could be used by some other tools to do, well, other things with it...

Compare XML documentation to your legacy HTML help topics and what do you notice?  The XML comments that you add to APIs do not typically contain much layout or formatting, whereas your HTML topics are chock-full of <b>'s, <i>'s, and <u>'s, and a whole mess of other HTML to describe the document's layout and formatting.  Ok, ok, if you've done it correctly then you've made judicious use of CSS - applying class names to all of those <h*>'s, <div>'s, <span>'s, <p>'s, <a>'s, <td>'s, <tr>'s, <ol>'s, <ul>'s, <dl>'s, and certainly many other HTML tags that only add to the confusion when authoring topics (as opposed to designing them).

Now you might be thinking, "Dave, it's not entirely true that XML documentation is without formatting.  What about the para, code and c elements?".  And to that my reply would be, "Ok, so then what exactly do they look like?".  If you look in the XML documentation files that are produced by your compilers, you'll see the markup exactly as it appears in your code modules.  In other words, no HTML and no CSS - nothing more than semantic usage: paragraph, code block and in-line code.  (If you were thinking something more like, "leading white-space, use pre formatting, code coloring and a fixed font", then you're getting ahead of yourself, so slow down!)

My point is that the semantics for the aforementioned XML documentation tags are clear (i.e., what the tags represent), but their appearance is not yet defined (i.e., their style and format).  Take a look at the other Recommended Tags for Documentation Comments and you won't find anything out of the ordinary.  Each tag has an obvious reason for its existence - to mark up regions of text that serve a particular purpose in the documentation.  But how do they look?  Nobody knows!  ;)

Sandcastle

Now's probably a good time to introduce Sandcastle.  For those of you that aren't familiar with it yet, Sandcastle is Microsoft's tool set for producing HTML help topics dynamically by inspecting managed assemblies and incorporating the markup from XML documentation.  From the assemblies that you provide to Sandcastle, it automatically infers a table of contents (TOC), various pseudo-topics such as Properties and Methods, and also generates many individual topics to cover the entire API.  The documentation you've written within XML documentation tags, such as summary, remarks and example, is automatically added to the generated topics in the appropriate places.

The results of running Sandcastle on your assemblies and XML documentation is a set of files that are web-ready HTML help topics for your project.  This is typically referred to as reference documentation, since it provides a reference for developers that use your API.  These topic files can be used as input to a tool such as HTML Help Workshop (Help 1.x) to produce a stand-alone compiled help file (.chm) that may be distributed with your application as an external help module.  The .NET Framework even gets in on the action by providing helpful APIs for integrating context-sensitive help and Help 1.x navigation into your managed applications.  (See the Help class for more information.)

Presentation Styles

Sandcastle provides three presentation styles that it can produce for your documentation out-of-the-box.  Each one consists of a set of XSL transformation files that convert XML documentation into XML-based HTML (not XHTML, however).  They also contain resources such as icons and, of course, CSS style sheets.

For an example of a Sandcastle presentation style, look no further than the documentation for Visual Studio and the .NET Framework on MSDN.  The appearance that MSDN uses is similar to the VS2005 presentation style in Sandcastle.  I believe that Microsoft actually uses a customized version to build their internal documentation, even for Visual Studio 2008.  The other, experimental styles, that ship with Sandcastle are Prototype and Hana.

For more information about Sandcastle, see my Sandcastle Help article on CodePlex.

From XML Documentation Comments to Reference Documentation

So the process is actually quite simple.  As developers we can easily document our source code using XML documentation, which allows us to concentrate more on writing the content instead of having to worry about formatting it with HTML.  When we build our project, the compiler will produce an XML documentation file that can be passed to Sandcastle, which then inspects our assemblies and automatically generates reference documentation that includes the comments that we added to our source code, but in a pretty HTML/CSS-based style that looks very similar to MSDN.  Nice!

User Documentation

Sandcastle can automatically generate reference documentation that is useful to other developers, but what about user documentation?  I mean things like How To, Sample, Walk-through, Overview, etc. - stuff that an end-user would want to have.  Well don't expect Sandcastle to know what you're thinking - we still, unfortunately, have to get concepts out of our heads and into help topics manually.  (At least for the time being, until someone invents HAL ;)

Conceptual documentation (how Sandcastle refers to user documentation) is often much harder to write than XML documentation comments since it requires a more in-depth understanding of the application being documented.  It's easy to look at the source code, notice that an exception is being thrown and then add an exception element to the XML documentation comments for that API.  Or to notice that a particular algorithm is being used and to add a comment in the remarks element that mentions it.  But to understand and be able to express the purpose of different user interface (UI) elements, how to perform various UI-related tasks, and how the individual APIs and components fit into the designs of other high-level processes in an enterprise-level application, is certainly more difficult and typically requires an understanding of many different aspects of the application.  So the bigger the application the harder it is to write conceptual documentation, and not just because it's more time consuming but also because it's more complex.

So if writing conceptual documentation can be more time consuming and harder to accomplish than writing XML documentation comments, why do people still insist on writing conceptual documentation in HTML?  Maybe the advantages of XML documentation comments can be applied to conceptual documentation as well.

The Perils of Writing HTML Help

I started this post by pointing out one very common way of writing help: raw HTML.  We've all done it, and I know that each time I do I end up reinventing the wheel all over again.  A new HTML layout, CSS styles, some new and strange way of cross-referencing, JavaScript for collapsible sections, etc., must all be redeveloped.  (Yea, some companies are too cheap to buy a tool that does this automatically - and so am I. :)

Creating a new help topic starts with copying an existing HTML file that is used as a template, of which there's usually only one kind containing a header, with style sheet links and scripts, a body that's empty, and a footer.  Writing a help topic requires having to look through the other topics quite often to find out which HTML tags and CSS class names I should be using for various styles.  This is especially annoying when I have a good idea that I simply want to put down quickly and be done with it.  Uh oh, that hyperlink to an HTML topic that I've been copying and pasting throughout my documentation is actually misspelled - time to do a search and replace.  Hmm, I'm not sure that I like the format that I've been using for laying out tables - oh well, it's not worth the effort to fix it now.

Is There a Better Way?

Technical writers, I can only assume, take help authoring more seriously than that.  They get paid to worry about things such as structure, readability and maintenance, so it shouldn't surprise us to know that there's a much better way to write help than simply using raw HTML.  As developers, we could probably learn a thing or two from them when writing our own documentation, whether it's for an API or conceptual topics.

Lucky for us, Microsoft has a huge library of documentation and employs technical writers to write their "official" help, which is then published to the web on MSDN.  (Sorry about that horrible reference for technical writers, but I couldn't find anything better.  I know that I've seen someone from Microsoft, probably Anand, state that they don't use developer code comments internally and instead have professional authors write it.)  This means that over the years they've had to come up with a solution that makes authoring help manageable, which is a huge task for such a large documentation set.  They also needed a way to manage file names and links for cross-referencing help topics (think, See Also section).  Since the look and feel of MSDN changes from time to time, the ability to write documentation that is absolutely independent of any one style or format was imperative as well.

So we have an invaluable example to which we can aspire.  A whole plethora of documentation written with clarity and precision using standardized techniques.  If you take a look at the documentation on MSDN, you should see a crisp and clean style that, when compared to your raw HTML help topics, probably looks far more professional.  This is nothing new to us though - we've been referencing it for quite a long time now as .NET developers.  Many people, predating .NET, have even watched MSDN documentation improve dramatically over the years, and most developers that need to write their own documentation seem to want to reproduce the same look and feel.  Many tools have even helped us to generate reasonable facsimiles in the past (such as NDoc).

But have we finally come to the point where we can write our own help topics without having to remember abstract HTML tags and CSS class names?  Can it be transformed automatically into documentation that looks the same as MSDN, or any other style for that matter?  Is there a way to simply specify a unique identifier for another topic and have hyperlinks generated automatically?  What about linking to reference topics?  Is there a way to ensure that topics of a similar nature will all share the same exact structure?

The answer to all of these questions, of course, is yes.  (But wouldn't it be funny if it was no?  I'd probably take a nap.)

Microsoft Assistance Markup Language (MAML)

Microsoft uses Sandcastle internally to generate help topics for the .NET Framework, so it's no wonder that Sandcastle also provides a way to apply structured authoring techniques to conceptual documentation, in much the same way that XML documentation comments are used by developers to write reference documentation.  In Sandcastle, conceptual topics are written in MAML.

MAML is an XML schema that defines various high-level document types, such as How To, Walkthrough, Sample, Glossary, Whitepaper, Troubleshooting and many others.  These document types provide the structure of a help topic, which doesn't change.  What can change though, is how Sandcastle presents this structure when it generates HTML topics.  This means that, for example, the markup in all of your How To topics will look similar, regardless of the presentation style that you choose.  As a matter of fact, the markup in your How To topics will be similar to mine, even if we choose to produce HTML help output in very different styles.

The schema also defines various XML elements that mark up text using a semantic approach.  For example, the ui tag is applied to text that corresponds to a user interface element, such as the text on a button.  Another example is alert, which also requires an attribute named, class that indicates the type of alert, such as note, caution, tip, warning, and others.  Another is country, which you may have already guessed, describes a country!  You would surround text with an application element when it represents the name of an application.  I think you get the idea...  By my count there's well over 40 elements that you can choose from.   And with Visual Studio's XML editor you can actually have IntelliSense tell you what they all are and where it's appropriate, within the topic's structure, to use them.

The beauty of all this is that the Sandcastle presentation style that you choose controls the HTML layout of the MAML document type used by your topic.  It also defines how all of the MAML elements will appear in the HTML.  For example, alert is transformed into an HTML table layout, while ui and application are simply bolded.  Special formatting is not actually applied to text that is specified as being the name of a country, but you could update the transformation to change the HTML markup or possibly just add a CSS rule to apply the formatting that you want, without having to update the actual topic itself.

A MAML Example

Here's a small portion of the Glossary help topic that I've written for my Auto-Input Protection (AIP) project.

<?xml version="1.0" encoding="utf-8"?>
<topic id="14790228-f45b-42d5-9b3e-f6b4ab932b9e" revisionNumber="0">
  <developerGlossaryDocument xmlns="http://ddue.schemas.microsoft.com/authoring/2003/5" 
                             xmlns:xlink="http://www.w3.org/1999/xlink">
    <glossary>
      <title>Glossary</title>
      <glossaryEntry>
        <terms>
          <term>AIP</term>
        </terms>
        <definition>
          <para>
            An acronym that stands for Auto-Input Protection.
          </para>
        </definition>
      </glossaryEntry>
      <glossaryEntry>
        <terms>
          <term>Answer</term>
        </terms>
        <definition>
          <para>
            A user's or bot's response to a challenge.  In AIP, the correct answer is a 
            string of text that matches the text on the CAPTCHA image.  An incorrect 
            answer does not match.
          </para>
        </definition>
      </glossaryEntry>
      <glossaryEntry>
        <terms>
          <term>CAPTCHA</term>
        </terms>
        <definition>
          <para>
            An acronym that stands for Completely Automated Public Turing test to tell 
            Computers and Humans Apart, trademarked by Carnegie Mellon University according 
            to the following article: <externalLink>
              <linkText>CAPTCHA. (2008, March 26).</linkText>
              <linkUri>http://en.wikipedia.org/w/index.php?title=CAPTCHA&amp;oldid=201120981</linkUri>
            </externalLink> In Wikipedia, The Free Encyclopedia. Retrieved 09:01, March 27, 2008.
          </para>
        </definition>
      </glossaryEntry>
      <glossaryEntry>
        <terms>
          <term>Challenge</term>
          <term>Test</term>
        </terms>
        <definition>
          <para>
            A CAPTCHA image, being displayed on a web page, to which a user must respond 
            with an answer by entering the text that they see on the image.  The result 
            is pass or fail.
          </para>
        </definition>
      </glossaryEntry>
    </glossary>
  </developerGlossaryDocument>
</topic>

The following image shows the results of the glossary transformation into HTML, built by DocProject (a tool that I've written to automate Sandcastle inside Visual Studio).  The VS2005 presentation style was used for this example.

image

And now here's the same exact topic file after being transformed into HTML using the Hana presentation style.

 image

There are a few things to point out about all of this.

First of all, notice that the topic that I've written only uses some very basic XML, yet the output obviously contains additional layout and style, which differs depending upon the presentation style that I've chosen.  In the Hana version I've even left in the default header that warns about pre-release documentation.

You may have also noticed the letter bar and the individual letter sub headers.  Where'd they come from?  These features are not actually part of Sandcastle, but Eric Woodruff and I have added them to the presentation styles by modifying the XSL transformations that convert the MAML Glossary document type into HTML.  The additional behavior automatically detects the glossary terms in the topic and creates the letter bar and headers dynamically.  All of the terms are sorted alphabetically as well (although it's not obvious in my example because they're already in alphabetical order in my topic file).

Pretty cool, right?  You'll be able to get these Glossary updates from the new Sandcastle Styles project on CodePlex, which should go public within a few days after the next Sandcastle release.  This project was started by Paul Selormey, Eric Woodruff and myself.  In the last week we've been diligently working on preparations for our first release, so please check it out when we go live and let us know what you think :)

Linking in MAML

The last thing that I want to point out about the previous example is that it contains a hyperlink to an external web site.  As you can see from my topic, MAML supports an externalLink element that accepts text in a linkText element and a URI in a linkUri element.  It also accepts alternate text in a linkAlternateText element, but that's optional.

Instead of linking to external URIs, you can also link to any of the other topics being documented.  To do that you would use a very simplified version of the XLink specification on a link element, as in the following example:

<link xlink:href="37852294-410f-4bb2-9008-c5fa9dfb4347">Part II</link>

Right, topics are identified by GUIDs.  Currently, Sandcastle also requires that all conceptual topic files are named with a GUID and an .xml extension.  A bit annoying at first, but if you use DocProject it provides a Topic Explorer tool window that makes it easy to find the topic that you're looking for without having to open all of them :)

Notice that in my example the value in the href does not have an .xml file extension specified.  That's because link doesn't reference files, it references topics.  This is important to realize because it's not the same as the way linking works in HTML - this is actually dynamic.  If Sandcastle cannot find a topic that is associated with the specified GUID, then it doesn't generate a hyperlink at all.

This is a bit different from what we're used to in HTML, which allows us to link to anything under the sun using only one tag: a.  So why such weirdness in MAML?  I think the answer to that question is actually quite simple, although for some reason it's easy to miss when first starting out with MAML.  The MAML schema defines elements that apply structure and semantics to text, instead of format and style, like HTML.  For this reason, you wouldn't see a tag named simply, a in MAML because it's not descriptive at all.  Link, on the other hand, is very descriptive.  And since an HTML anchor is meant to provide the source point of a diametric link, its use is actually more limited than XLink.  The XLink specification actually provides a way to establish relationships between one or more resources (at least that's my interpretation of it), which would offer much more flexibility.  So MAML provides a mechanism to link to other topics, not just external URIs, and the XLink implementation provides an explicit way to describe links as being special - they must be processed by Sandcastle.  Currently, Sandcastle doesn't actually seem to use any of XLink's features though aside from what has been deemed as "simple" usage, but maybe that'll change in the future.

But that's not all.  If you want to create a link to an API in your reference documentation, you would use the codeEntityReference element instead.  Yikes!  So now we've got yet another way to link.  But again, keep in mind that MAML is much more expressive than HTML, and that's why we've got different tags for linking to different things.  The benefit being that our intentions are clear when we write our topics so that different styles of linking can be handled differently.

The following XML snippet illustrates all three approaches to linking in MAML topics.  Each example is a child of the relatedTopics element, which, in the Sandcastle world, will eventually become your topic's See Also section.

<relatedTopics>
  <codeEntityReference>T:MyNamespace.MyClass</codeEntityReference>
  <codeEntityReference>P:MyNamespace.MyClass.MyProp</codeEntityReference>
  <codeEntityReference>M:System.IO.File.OpenText(System.String)</codeEntityReference>
  <externalLink>
    <linkText>DocProject</linkText>
    <linkUri>http://www.codeplex.com/DocProject</linkUri>
  </externalLink>
  <link xref="home">My Home Page</link>
  <link xref="Contact Us"/>
  <link vref="/related.aspx">Related web page</link>
  <link xlink:href="14790228-f45b-42d5-9b3e-f6b4ab932b9e">Part II</link>
</relatedTopics>

Notice that there are also two more link types in the example above that I didn't mention previously: link elements with xref and vref attributes.  This type of linking is used instead of externalLink so that only an ID must be specified instead of an entire URL.  The ID is part of an ID-to-URL mapping that is configured elsewhere.  This feature is not actually part of Sandcastle though; it's provided by a custom build component that I've written which, for the next release of DocProject, has been modified to support conceptual builds as well.  The component is called ResolveExternalLinksComponent and it's available as a separate download or as part of DocProject.  Without this build component xref and vref do nothing.

Conclusion

HTML is out.  MAML is in.

Well, it's not actually as substantial of a change as I'm implying - HTML is still being used extensively as the final output for compiling help; however, we no longer have to author help topics in HTML, which is a huge benefit.

So all this stuff might seem really wonderful in print, but I feel that I must warn you: It actually took me a few weeks before I finally started to get rid of that itch to lace my topics with bold and italic phrases where it didn't actually add any value.  When you first start writing MAML it can feel very restrictive, and it is compared to HTML in terms of how quickly you can apply new styles, since to do that you have to leave the actual topic and modify files in the Sandcastle presentation; but it's actually much more expressive in terms of describing information and that's what we should be concentrating on when we write help topics - the information.

What I've learned from writing topics in MAML is that using elements such as ui, userInput, math, date, and many others, as well as externalLink, codeEntityReference and link for linking, ultimately accomplish the same thing as HTML but in a much better way - no more CSS class names to remember or abstract HTML tags like b and i (or strong and em too).  Instead, I can specify exactly what a phrase represents and continue writing.  The format and style is already defined for me by the presentation style that I choose, even if I haven't chosen it yet!  However, if I've already chosen one that mostly fits my needs but I'm not happy with a particular style, I can apply some HTML and CSS to the different MAML elements without having to update anything in the topics themselves.  By reusing the same common tags throughout my documentation, it looks much more professional, it's easier to manage and it's even portable since it's all XML, so if in the future I want to generate Open XML documents instead of HTML, I won't even have to change anything in my topics.

Note that if you want to convert all of your existing HTML topics to MAML in a batch process, I've got a tool called DocToMaml.  It's currently in beta, but it does work.  Any feedback on it will be appreciated :)

For the next version of DocProject 2008 (Beta 3) I'm working on a MAML WYSIWYG editor that is integrated into Visual Studio, so keep your eyes open for that.

If you have any feedback about how MAML and Sandcastle's conceptual build process can be improved please let the Sandcastle team know by submitting a request to the Sandcastle Issue Tracker on CodePlex.

April 20, 2008

DocToMaml 1.0 Beta is Now Available

DocToMaml 1.0 Beta is a tool that converts HTML and XHTML help files to Microsoft Assistance Markup Language (MAML) in a batch process.

MAML help files are used by DocProject and Sandcastle to build user documentation in various presentation styles.

DocToMaml provides a console mode and a project-based graphical user interface (GUI) for adding inputs and defining conversion rules to quickly and easily convert HTML files into MAML. Future releases will provide more flexibility for defining rules, but the beta can still be used to greatly decrease the amount of effort that would be required to convert large numbers of HTML files into MAML, compared to doing it manually.

DocToMaml-GUI

Features
Some of the features of DocToMaml are:
  • GUI and console mode.
  • Project-based for quickly saving and loading different configurations.
  • File and folder inputs are supported.
  • Global and input-specific rule sets may be defined.
  • Edit the source HTML of a file input using a WYSIWYG editor for full control over the conversion.
  • Convert a file input in memory so that you can see the result quickly.
  • MAML results for file inputs can be modified in a text box.
  • Batch conversion saves results to disc for all file and folder inputs and generates a conceptual artwork file that can be used by Sandcastle's ResolveArtLinksComponent build component.
  • Hyperlink references to local topics are updated automatically.
  • Preliminary user documentation.
Current Limitations
  • Only the Conceptual MAML document type is supported.
  • Subsections and sub containers are not supported. For example, nested tables and lists are added to the current in-line element, if one exists; otherwise, they are added as new top-level containers.
  • Images appear to be broken in source view, although this has no effect on the output. You don't have to update the paths to broken images since DocToMaml doesn't do anything with image files in the current release anyway.
For help with DocToMaml, see the compiled help file that is available for download on the release page.