September 13, 2007

DocSites 1.8.0 Preview

In the upcoming DocProject 1.8.0 release candidate I've made some changes and added several substantial features to the DocSite templates that I'd like to share with you in this blog post. Changes to the DocSite templates include:

DocSite in Solution Explorer
  • ASP.NET Theme presentation:
    • Ships with one default theme named, "BasicBlue", which has the same appearance as the current DocSite templates.
    • Subtle improvements to appearance, mostly for Firefox and Opera.
    • Retained div-only + CSS layout with a sticky footer; i.e., no tables.
    • Includes some useful .SKIN files.
    • Full support for IE6 as well as continued support for IE7, Firefox and Opera.
  • Componentized Master page, with user controls for the header, footer, breadcrumbs, sidebar, TOC and index.
  • ASP.NET Localization with resource-based text in all pages and controls.
  • Customizable and highly extensible full-text search built from the ground up:
    • Designed for general-purpose indexing but customized specifically for indexing HTML help topics.
    • Has a default in-memory search index provider (I'll also build an SQL Server search provider for a subsequent release).
    • Complex query support with operators such as AND, OR, - (not) and grouping with parenthesis.
    • Customizable, factor-based weight and rank calculations.
    • URL query string supports bookmarking searches.
  • Keyword browse page for live browsing of the search index (also supports a URL query string for bookmarking).
  • Administration page for configuring DocSite options, search factors and for viewing simple search statistics.
  • New Project Wizard – Create DocSite Credentials step allows a user to set the administrative DocSite credentials when a DocSite is first created.
  • The DocSite templates reference a new class library that encapsulates common DocSite functions.

Breakdown of Features

The rest of this article will provide more information on some of the features listed above. If you have any questions or comments please let me know!

New Project Wizard – Create DocSite Credentials

The first change that you'll notice to the DocSite templates is a new wizard step:

New Project Wizard - Create DocSite Credentials
Figure 1: New Project Wizard - Create DocSite Credentials

This step allows you to enter an administrative user name and password for accessing the DocSite administration page (more on that later). The credentials are stored in the web.config file in the standard ASP.NET forms authentication section.

For custom DocSite templates authors can easily enable this feature by adding a single user to the authentication section with "$admin$" as the name (the password will be ignored). The mode of encryption that is chosen for the authentication section (e.g., Clear, SHA1, MD5) is reflected in the dialog's help paragraph automatically, and will also be applied to the user's password before it's stored in the web.config file.

Themes and Localization

The DocSite templates now use a default theme named, BasicBlue. Its appearance is almost identical to the appearance of the current DocSite templates, but now it has full support for IE6 as well as a sticky footer for all .aspx pages that do not use an IFRAME to display page content.

Supporting IE6 without a table layout was difficult, but I succeeded with the help of CSS expressions, which are specific to IE (but that means they're supported by IE7 too). The hard part was trying to support IE7's more standardized rendering engine without breaking IE6, and then supporting IE6 without IE7 falling back to things like CSS expressions. But after some flipping out and loss of hair, I finally got it working :)

Customizing the appearance of DocSites should be much easier now that it's fully themed. And with Sandcastle's high level of flexibility, anyone with some previous experience working with XSL and CSS should be able to come up with a custom Sandcastle presentation style and DocSite theme that work together for a completely customized appearance without affecting the DocSite's functionality, or even better, to improve on it. BTW, I'm certainly interested in hearing new ideas for themes if you have any.

Text that appears in all of the DocSite pages and user controls have been extracted into ASP.NET global and local resource files for easy localization.

Indexed Search and Browse

Finding help content in DocSites will be much easier now with the addition of the new search and browse features.

Note: The screenshots depict the Admin link as a different color than other hyperlinks, but that's simply because I clicked it.  What you see is the "followed" hyperlink color.  Before Admin is clicked it's the same color as the other links in the header: blue.

Search Page

DocSite Search Page
Figure 2: DocSite Search Page

The abstract, locale, namespace and library information are all taken from each topic's MSHelp data found in an XML data island generated by Sandcastle for Help 2.x. Although, due to a possible limitation in the next Sandcastle release, I might have to remove this feature temporarily until I can create a custom solution for a subsequent release.

DocSite search supports complex queries with operators that are defined by the search provider itself. By default, your DocSite will support the Boolean AND and OR operators, - (the negation operator) and parenthesized grouping:

DocSite Search Help
Figure 3: DocSite Search Help

The search help page is accessible by simply leaving the search box empty and clicking the search button (the one with the large magnifying glass).

NOTE: In the future I plan on adding an advanced search page that contains HTML topic-specific search functions, such as searching by the culture in which a topic was written, the topic's title, etc.

Browse Index Page

The button with the tiny magnifying glass, next to the search button, is the browse button. Clicking this button will open the browse page using the keywords entered into the search box, although all operators and grouping will be ignored. Clicking the button with an empty search box simply opens the browse page without any selected keywords.

Browse Index Page
Figure 4: Browse Index Page

The browse page allows you to drill down through a query by removing or adding one keyword at a time until you find a result set that contains the topic in which you're interested. The browse page is also useful for administrative purposes as it displays some statistical information about individual keywords as well.

The letter bar at the top of the page provides a list of keywords that start with the selected letter. Clicking a letter will display the keyword list. Then, clicking a keyword will add it to the filter and the new filtered results will be shown. You can continue to browse the index by the first-letter of keywords or view and modify the active filter at any time. The letter bar itself is encapsulated in a user control and is completely customizable using the administration page (discussed later).

Toggle between Search and Browse

The button next to the search query on the search results page and the button next to the keywords on the browse page allow you to easily toggle between these two distinct modes of locating help topics so that you can choose the one that is better suited for your particular query.

The Index

Both the search and browse features work with a full-text index of the HTML help topics generated by Sandcastle. The index is stored in-memory and generated on-the-fly by the website itself, not DocProject, which means that you don't have to rebuild the project to rebuild the index. It also means that you can tweak factors for weight and rank calculations and then rebuild the index on-demand through the web interface, allowing fine-tuning of the search results for common keywords that you might expect your end-users to search against. You can manage the search index and factors through the administration page, which I'll discuss later.

Your DocSite will automatically rebuild the in-memory index after the application has started upon the first request for the search page or a particular keyword on the browse page. In the future I plan on adding an SQL Server search provider that you can use instead of the default in-memory provider so that the index does not have to be rebuilt each time the ASP.NET worker process is recycled or when changes to the application are published that will cause it to restart (like certain modifications to the web.config file).

Site Administration Page

The DocSite administration page provides a set of configurable options with application-scope.

DocSite Administration Page
Figure 5: DocSite Administration Page

To reach this page simply click the Admin link in the top-right corner of your DocSite. If you haven't logged in then you will be brought to the login page first. Enter the credentials that you configured when the project was first created and then click the Login button to continue to the DocSite Administration page.

Settings

Settings are stored in the DocSite.config file found at the root of the project. Changes made directly to this file will not take effect until the application is restarted; therefore, I highly recommend using the web interface to configure settings whenever possible so that you do not have to restart the application (and lose the entire in-memory search index).

There are two main categories in the administration page: General and Search. Changes to any options in either category are applied immediately.

The Create Index link allows you to regenerate the index on-demand. This is useful for ensuring that the index is generated immediately after the DocSite has been published and for updating the index after modifying keyword, weight or rank settings.

Client Settings

Letter bar

Configure the characters that will appear in the letter bar at the top of the browse index page. Add characters from your native language or remove existing characters without having to modify any code.

Sidebar size persisted

Controls whether a cookie will be used on the client to remember the last position of the sidebar's handle. This settings is enabled by default.

The Search category has a few subcategories for viewing statistics and managing the DocSite Search and Browse Index pages.

Index Statistics

This category provides read-only information:

Provider name

Name of the current search index provider. DocSiteMemorySearchProvider is the only implementation that will ship in DocProject 1.8.0; however, expect SqlSearchProvider in a subsequent release.

Last creation date

The last date and time that the index was created.

# keywords

The total number of distinct keywords found among all documents that were indexed.

# documents

The total number of HTML help topic files that were indexed.

Settings

Provides general index-related settings:

Root search path

Virtual path of the root directory in which the provider will begin the search, recursively. I highly recommend that you do not change this setting.

Public search enabled

Indicates whether unauthenticated users can use the DocSite search feature. Note that if you're logged in after disabling this setting you will still see the search box since it's always available to authenticated users. This setting is enabled by default.

Public browse enabled

Same as the option above but for the DocSite browse index page instead of the search page. This setting is enabled by default.

Keyword Settings

Settings related to indexed keywords and keywords in search queries.

Minimum keyword length

Defines the minimum number of characters that a word must contain in order to be included in the index. This setting is also used to ignore words in queries that have fewer characters than the specified value. The default value is 2.

Excluded keywords

Comma-separated list of keywords that must be excluded from the index and queries. You can add or remove keywords as you see fit.

Hot title keywords

Comma-separated list of keywords that have special semantics in titles of HTML help topics and are used to add additional value to matches in query results through the Title Rank Factors values (below).

You can customize this list if you want to improve search results with queries that contain full or partial matches to these particular words when found in titles.

Weight Factors

Weight factors are integers that affect the weight of indexed keywords. Note that weight is calculated while the index is being created and rank is calculated when a keyword matches a search query. The values in this section do not affect rank directly; however, the weight of a keyword does add additional rank to matches.

Early position keyword

Factor used to calculate the weight of a keyword depending upon its location in the document being indexed. Typically, the earlier the position the higher the weight.

The formula used to calculate weight is:

keyword position / source length * -{value} + {value}

Where {value} is the value of this factor (10 by default).

Title keyword

Factor used to calculate additional weight for keywords found in the titles of indexed documents. The base weight of title keywords is the value of the Early factor (above), but calculated relative to the title and not to the entire document. The value of this factor is then multiplied by the result. The default value is 1.2, which gives 20% more weight to keywords found in titles.

Title Rank Factors

Additional rank values for keyword matches that are found in titles.

Exact title hot keyword match

Additional rank for exact keyword matches found in titles and the Hot title keywords list (above). The default value is 250.

Partial title hot keyword match

Additional rank for partial keyword matches found in titles and exact matches found in the Hot title keywords list (above). The default value is 100.

Exact title keyword match

Additional rank for exact keyword matches found in titles. The default value is 50.

Partial title keyword match

Additional rank for partial keyword matches found in titles. The default value is 20.

Conclusion

In this blog post I described new features to come for the DocSite templates in DocProject 1.8.0 RC. These new features will surely be appreciated by your end-users since they make it much easier than before to locate and browse help topics. As a site administrator you now have a much easier way to configure some of the DocSite features and will have an easier time modifying the appearance and localizing text to meet your particular needs and the needs of your clients.

Add comment