April 19, 2017

Monads, and Comonads, and LINQ! Oh, my!

Programming through the lens of mathematics offers new ways to solve old problems and old ways to solve new problems. Mathematics is full of surprising relations and dualities.

In this blog post, we’ll journey through an alternate reality in which an OOP language like C# becomes more abstract and expressive by adopting some functional programming concepts, rooted in a branch of mathematics called Category Theory.

Over The Rainbow

Our journey begins with a few familiar types in a probably unfamiliar categorization.

Sync

Async

Monad

IEnumerable<T>

IObservable<T>

Comonad

Lazy<T>

Task<T>

Don’t worry if you don’t understand these concepts yet, we’ll meet them again later...

IEnumerable<T> is the list monad.

IObservable<T> is the continuation monad.

Task<T> is the continuation comonad.

Lazy<T> is the singleton comonad (for lack of a better name).

These types appear to have different purposes, yet there is actually a pattern hiding in there somewhere. The pattern can be stated in structural terms as follows.

Monads have bind and unit functions.

Comonads have cobind and extract functions.

Bind

Unit

IEnumerable<T>

SelectMany; a.k.a., flatMap

Constructors of Array, List<T>, etc.

IObservable<T>

SelectMany; a.k.a., flatMap

Observable.Return; a.k.a., just


Cobind

Extract

Task<T>

ContinueWith; a.k.a., then

get_Result

(a property named Result)

Lazy<T>

?

get_Value

(a property named Value)

You may have noticed that Lazy<T> is incomplete. Can we define a cobind function for Lazy<T>? What would it look like? What would it do?

And what’s the relationship between monads and comonads? Is there a pattern?

Perhaps if we were to examine these monadic and comonadic types through a lens of abstraction, then we could identify the pattern and figure out how to implement cobind for Lazy<T>.

“Professor Marvel Never Guesses”

Let's start by examining the bind function for IEnumerable<T>. Here’s its signature:

public static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> selector)

So the bind function takes an instance of the monadic type for which it's defined, in this case IEnumerable<T>, and a function that projects the element type, T, into another instance of the monadic type, IEnumerable<R>.

Its purpose may not be clear yet, so we’ll just try and implement it by heeding Erik Meijer’s advice: “let the types guide you”.

Well, we have a selector function that accepts a T, but we don’t have an instance of T yet. Since the source is a sequence, it seems that we must invoke the selector function for each T that it contains. So let's start there. We’ll do foreach over source and apply selector to each value.

public static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> selector)
{
foreach (var value in source)
{
var results = selector(value);

// ...
}
}

Great, we're done with the Select (a.k.a., map) part. Now let’s check the return value.

The selector returns an IEnumerable<R> for a given T, and we need to return a single IEnumerable<R>. We can’t simply return the IEnumerable<R> for the first T, because we must return them all. But how do we convert many IEnumerable<R> into a single IEnumerable<R>?

How about we iterate each of the sequences and yield all of the results?

(An Iterator Block makes this easy in C# – it spares us from having to implement IEnumerable<T> and IEnumerator<T> ourselves.)

public static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> selector)
{
foreach (var value in source)
foreach (var innerValue in selector(value))
yield return innerValue;
}

And now we're done with the Many (a.k.a., flat) part.

We've just implemented the bind function in the list monad by iterating over the source sequence, mapping each element into a nested sequence, and flattening all of the nested sequences into the output sequence. The name flatMap makes perfect sense! The name SelectMany makes sense too, although it’s not a typical naming convention.1

The significance of this implementation may not be clear yet, but one thing is certain: it was easy to derive this implementation by following the types.

It’s A Twister! It’s A Twister!

Shouldn’t all monads have similar signatures for their SelectMany implementations?

Let's look at bind for IObservable<T>. And for comparison, IEnumerable<T> as well.

public static IObservable<R> SelectMany<T, R>(this IObservable<T> source, Func<T, IObservable<R>> selector)

public static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> selector)

Okay, so there's clearly a pattern here. It’s the same exact signature, just with different generic types.

How can we implement bind for IObservable<T>?

Since the source is a sequence, it seems that we must invoke the selector function for each T that it contains. And there's only one way to get T's out of an observable: Subscribe

public static IObservable<R> SelectMany<T, R>(this IObservable<T> source, Func<T, IObservable<R>> selector)
=> source.Subscribe(value =>
{
var results = selector(value);

// ...
});

We're done with the Select (a.k.a., map) part. Now let’s check the return value.

The selector returns an IObservable<R> for a given T, and we need to return a single IObservable<R>. Again, we’re in a situation similar to IEnumerable<T> where we have many instances of the monadic type yet the return type requires a single instance.

To convert many IObservable<T> into a single IObservable<T>, how about we Subscribe to all of the sequences and push all of their values to the observer? It’s a similar concept to the inner foreach that flattens the IEnumerable<T>.

public static IObservable<R> SelectMany<T, R>(this IObservable<T> source, Func<T, IObservable<R>> selector)
=> source.Subscribe(value => selector(value).Subscribe( ? ));

Uh oh, from where do we get an observer to pass to Subscribe? And how do we return an IObservable<R> from this expression? Subscribe returns IDisposable, so that's not helpful.

What we need is an implementation of IObservable<T> that we can return from SelectMany, so that when Subscribe is called with an IObserver<T>, that observer is available to our expression. Then we can pass the observer to the projected observables' subscriptions.

If this seems backwards, it’s because it is. The reactive IObservable<T> interface is dual to the interactive IEnumerable<T> interface. They are structural opposites, so we must think oppositely when solving structural problems like this one.

IEnumerable<T> is synchronous because the caller of IEnumerator<T>.MoveNext blocks while the next value is computed. IObservable<T> is asynchronous because after Subscribe is called with an IObserver<T>, the caller is not necessarily blocked and may continue executing. OnNext is called when values are computed in the future.

Using the same observer to subscribe to each of the projected sequences is similar to yielding from all of the projected enumerables in our IEnumerable<T> implementation above; however, we don't have a language integrated feature like Iterator Blocks, so we’ll have to use Rx's Observable.Create function instead. The Observable.Create function takes a function that has an IObserver<T> argument, which we’ll use to make the inner subscriptions.

public static IObservable<R> SelectMany<T, R>(this IObservable<T> source, Func<T, IObservable<R>> selector)
=> Observable.Create<R>(observer => source.Subscribe(value => selector(value).Subscribe(observer)));

And now we’re done with the Many (a.k.a., flat) part.

Note: I’ve avoided addressing the following issues in the code example above for the sake of simplicity – the focus here is on the essence of the monadic operations, not the implementation details of how to properly implement operators.

  1. This implementation violates one of Rx's contracts: Serialized notifications per observer (§4.2 in the Rx Design Guidelines). If any of the inner observables push notifications on different threads, then a downstream observer may receive overlapping notifications, which is very bad. I'm not including a proper implementation in this blog post, but we can imagine that synchronizing calls to OnNext within a lock would suffice.
  2. This implementation fails to properly support cancellation since we’re ignoring the IDisposable object that Subscribe returns. A proper implementation must hold the disposable for each inner observable, until it terminates. Furthermore, the inner disposables must be tied to the lifetime of the outer disposable, such that if the outer disposable is disposed, then all of the inner subscriptions are cancelled as well.
  3. This implementation fails to properly handle OnError and OnCompleted notifications from the source observable. Furthermore, it improperly handles OnCompleted notifications from the inner observables by completing the sequence as soon as one of the inner observables completes, rather than completing when all observables, including the source, have completed.

The implementation of bind for IObservable<T>, like bind for IEnumerable<T>, is a flattening and a mapping; however, due to the natural asynchrony of the IObservable<T> interface, bind provides an asynchronous merging rather than a synchronous flattening.

In other words, the SelectMany operator for IObservable<T> is actually Select + Merge, whereas SelectMany for IEnumerable<T> is actually Select + Concat.

The bind operator for IEnumerable<T> accepts a sequence as input. For each value, an inner sequence is projected and iterated. Due to the synchronous nature of IEnumerable<T>, only one inner sequence can be iterated at a time. The bind operator will not yield any values from subsequent projections until it has completely iterated over the current inner sequence.

Conversely, the bind operator for IObservable<T> merges all inner subscriptions together, such that when any inner subscription produces a value, it’s immediately pushed downstream.

In both implementations of bind, the inner sequences are being flattened down into a single output sequence.

We’re Not In Kansas Anymore

Language Integrated Query (LINQ) introduces the monad as a first-class citizen in C#. Query Comprehension Syntax is a syntax for monads.2

They are monads because, other than satisfying the monad laws, each monad is a generic type with a specific purpose, paired with a bind function that implements sequential composition over that type, while preserving its purpose.

We can understand how bind represents sequential composition by following this logic:

  1. The bind function SelectMany(M<T> source, Func<T, M<R>> selector)has two parameters.
  2. The source parameter is of the type M<T>. Think of M as the monadic type, which contains values of type T. Monadic types are generic types. (This is what is meant by “amplification” or “embellishment”.)
  3. The selector function projects a value of type T into M<R>, which is the monadic type M containing values of type R.
  4. M<T> represents a computation of values of type T. More precisely, M<T> represents an antecedent computation, because a value of T must be computed before it can be passed to selector.
  5. The selector function represents a subsequent computation, because it accepts a value of type T and returns M<R>. And just like M<T>, M<R> represents a computation of values of type R.
  6. Both M<T> and M<R> are of the monadic type M, although they may contain different types. M<R> can be used as the source parameter in a subsequent bind operation.
  7. The bind operator can be chained together across may projections over the monadic type M, such that M<T> goes to M<R> goes to M<Q>, and so on. Each projection through bind is a linear step in a larger computation.
  8. Therefore, the bind function sequentially composes antecedent computations with subsequent computations, while preserving the purpose of the monadic type.
  9. However, the cardinality of M may be greater than 1, depending on the actual type of M. Containing more than one T means having more than one M<R> because each T will be projected into an instance of M<R>.
  10. The bind operator flattens M<M<R>> into a single M<R> to ensure that its result is a single container of the monadic type M. Flattening preserves sequential composition regardless of the cardinality of M.

For example, IEnumerable<T> represents a lazily-computed, synchronous sequence of values of type T. That’s its purpose, which SelectMany must preserve. As the source for SelectMany, IEnumerable<T> represents an antecedent computation of values of type T, because each T must be computed before it can be projected. SelectMany iterates the source, invoking the selector function for each value of type T, and projecting it into IEnumerable<R>. IEnumerable<R> represents a subsequent computation, because it can be subsequently iterated to generate values of type R. To continue the computation sequentially, IEnumerable<R> can be used as the source for another call to SelectMany. Therefore, SelectMany implements sequential composition.

IObservable<T> represents a lazily-computed, asynchronous sequence of values of type T. It’s similar to IEnumerable<T>, except asynchronous. And how do we consume an asynchronous sequence? Not by foreach, but by passing in a continuation (IObserver<T>). The SelectMany operator for IObservable<T> subscribes to the source with its own IObserver<T>. Sequential composition is implemented by awaiting values pushed into the IObserver<T> and then pushing them as input into a subsequent computation, represented by IObservable<R>. As far as sequential composition goes, it doesn’t actually matter that IObservable<T> is asynchronous, because going from T to R remains sequential.

IObservable<T> is asynchronous because in order to get data out of it, you must pass in a function (a continuation) to be invoked later. It’s the so-called, “continuation passing style”, or CPS for short. Thus, IObservable<T> is the continuation monad.

The bind operator enables us to write sequential composition in a declarative style. SelectMany sequentially composes two parts of our query, while hiding the imperative- or CPS-style mechanisms that are required to read values from the monadic type. We can think of a monadic type as a container of values, and bind lets us compose them without having to explicitly specify how to extract their values. We don’t have to use foreach for enumerables or call Subscribe for observables, until we want to leave the monad. That’s why we use foreach and Subscribe only at the ends of our LINQ queries.

This pattern works with more than just sequences. There are many types of monads; e.g., the “Maybe” monad implements null propagation, as Wes Dyer describes here, while introducing monads. There’s a State monad that represents mutability in a pure functional language like Haskell, an I/O monad that represents input and output, also in Haskell, there’s the Reader monad for computing within an environment, similar to having global or static fields (you guessed it: Haskell), among others.

Monads provide a declarative programming model, but note the relation to imperative-style code. Two consecutive from clauses in a LINQ query is compiled into a single call to SelectMany. It’s like having a semicolon at the end of each line. The first from expression is evaluated, and then the second from expression is evaluated, with the value of the previous expression still in scope.

from x in Xs //;
from y in getYs(x) //;
select (x, y)

This query looks similar to our implementation of SelectMany for IEnumerable<T>, which consisted of nested foreach loops! And in fact it works exactly the same, because this query simply compiles into a call to SelectMany.

A query is a single expression pieced together by applying a sequence of operators, rather than a series of statements that are pieced together by a sequence of semicolons. It’s the composability of the monad that enables us to write a query as a single expression, and it’s the expression syntax that makes the declarative programming style appear similar to the imperative style.

This style of programming is declarative because we can use operators like where, join and group by without sacrificing the readability of the program with all of those foreach loops that we’d need to write if we weren’t using the list monad, or all of those continuations that we’d need to write if we weren’t using the continuation monad.

Essentially, monads enable us to declare what we want the program to do rather than how the program should do it. The “how” part is hidden by the monad, and it differs between monads.

from x in Xs
where x != null
from y in getYs(x)
join z in Zs on y.P equals z.P
select (x, y, z)

Is the above an interactive query over IEnumerable<T> or a reactive query over IObservable<T>?

Or maybe it’s a hybrid? Perhaps Xs is an IObservable<T> and getYs returns an IEnumerable<T>?

Each is a valid possibility. There’s no way of knowing by looking at the query alone!

The monadic pattern is a generalization on composition with embellishments, so it’s no surprise that queries over different types can look the same; however, some operators don’t make sense in some monads. For example, Rx defines the Sample(TimeSpan) operator for IObservable<T> to sample elements by time, which only makes sense because IObservable<T> is asynchronous. Sampling by time doesn’t make sense in IEnumerable<T> because it’s synchronous; however, enumerables do have the ability to sample elements by data; i.e., the Where operator. We can also sample elements by data in IObservable<T>, thus Rx provides a similar Where operator.

Essentially, a monad consists of an embellishing type and a function, bind, that preserves the embellishment through sequential composition.

A monad must also have a unit function, which enters the monad. For example, Rx provides Observable.Return (a.k.a., just).3

Technically, a monad must also satisfy some simple mathematical laws: left identity, right identity and associativity. I won’t go into them here.

Perhaps what makes monads seem mysterious and confusing is how such a simplistic functional trick can be so powerful as to warrant an entirely new syntax in languages like Haskell and C#. Are monads confusing because they mysteriously turn complex imperative-style code into a neat declarative-style code that looks eerily similar to imperative style? I’m confused even writing that question.

A comprehension syntax isn’t technically required, it just makes monads nicer to use! C# doesn’t have keywords for every useful operator, so most of the queries we write are in the form of fluent-method call syntax, whereby we chain operators together using dot notation and pass lambda expressions as function arguments.

For example, here’s the same query again written in fluent-method call syntax:

Xs.Where(x => x != null)
.SelectMany(x => getYs(x).Select(y => (x: x, y: y)))
.Join(Zs, xy => xy.y.P, z => z.P, (xy, z) => (x: xy.x, y: xy.y, z: z));

It’s not great, although it’s probably how most languages use monads. Regardless, both syntaxes (comprehensions and fluent-method call) form declarative expressions. The comprehension syntax is often nicer though because it hides all of those lambdas and automatically broadens the scope of query variables; e.g., in the query comprehension written earlier, x is still in scope after join. The C# compiler accomplishes this simply by changing the type of T to a compiler-generated type that carries all of the variables in scope. We must do the same trick ourselves when writing in the fluent-method call syntax if we want to keep earlier values in scope. We typically do that by projecting anonymous types or tuples throughout our query, as I’ve done in the example above.

One last interesting thing to note about monads…

Because of its declarative nature, applying bind forms a lazily-evaluated expression, just like monads in lazily-evaluated functional programming languages. SelectMany and other operators do not cause any side effects when they are applied. They simply return a new instance of the monad, so that we may continue applying operators to build our computation. When we are ready for side effects, then we leave the monad and the computation may begin. How it begins depends on the purpose of the monadic type.

IEnumerable<T> requires code to pull each value through the composition, thus we use foreach.

IObservable<T> pushes values through the composition, once we’ve called Subscribe.

And that’s the mysterious monad. You can get deeper into the concept of monads in Category Theory, but I think that the essence of why they’re useful to us as programmers is described above.

All good, but monads are only half the story!

If I Only Had a Codomain

According to my chart at the beginning of this post, Task<T> is a comonad. Why?

If you don’t believe me, then we’ll try and disprove that claim by implementing bind for Task<T>. If we can implement bind, then it’s probably a monad, right?

We already know how the signature for bind must look based on the pattern that we’ve identified above for enumerables and observables. They each have similar signatures. To go from the enumerable bind to the observable bind we must only do a monadic type replacement. So let’s grab the bind signature for IEnumerable<T> or IObservable<T> and simply replace the monadic type with Task<T>:

public static Task<R> SelectMany<T, R>(this Task<T> source, Func<T, Task<R>> selector)

Looks good, no problems so far.

Since the source is a Task<T>, it seems that we must invoke the selector function with the T that it contains.

public static Task<R> SelectMany<T, R>(this Task<T> source, Func<T, Task<R>> selector)
=> selector(source.Result);

Okay, that compiles, but it’s totally incorrect! Reading the Result property may block, and blocking isn’t preserving the purpose of Task<T>. The purpose of Task<T> is to represent the asynchronous computation of T. The SelectMany function must allow us to sequentially compose asynchronous computations, which means that we must be able to project a Task<T> into a Task<R> without blocking to read T. Essentially, SelectMany must return a lazily-evaluated expression, just like the implementations for IEnumerable<T> and IObservabe<T>.

Great, so how do we get the T out of the Task<T> asynchronously?

For the C# devs out there, our instinct is to define SelectMany as an async method and then await source. And that may seem appropriate, but clearly await must be doing something under the covers in order to get the T from the Task<T> without blocking. How does it do it? Whatever the C# compiler is doing for await, we must be able to do ourselves…

Well, the only way to get the result asynchronously is to pass a continuation to ContinueWith (a.k.a., then). But what is this strange method anyway? Let’s take a look at its signature (one overload in particular4).

public static Task<R> ContinueWith<T, R>(this Task<T> source, Func<Task<T>, R> selector)

Hmm, now that method looks familiar. Is it bind? Not quite. Here’s bind again:

public static Task<R> SelectMany<T, R>(this Task<T> source, Func<T, Task<R>> selector)

The type arguments in the selector function are reversed! Well, they are sort of reversed. The T and R type arguments remain in the same positions, but the generic Task type is in the opposite position. The selector function for ContinueWith has a Task parameter, rather than returning a Task.

So what is the selector in ContinueWith supposed to do?

Instead of being given a value and returning a new, embellished value, it’s given an embellished value and it has to return a new, unembellished value. In other words, it has to extract the value from the comonadic type inside the projection! This makes sense if we think of comonads (and monads) as containers in general.

And that’s how to safely (i.e., in a non-blocking manner) extract a value from Task<T>. We must read the Result property only from within the selector (continuation) supplied to ContinueWith, in order to preserve the purpose of Task<T>.

public static Task<R> SelectMany<T, R>(this Task<T> source, Func<T, Task<R>> selector)
=> source.ContinueWith(task => selector(task.Result)).Unwrap();

Unwrap is also required, because selector returns Task<R>, yet the selector in ContinueWith requires an R. So we make R a Task<R>, thus ContinueWith returns Task<Task<R>>. SelectMany requires Task<R>, so we must apply Unwrap one time to ditch the outer Task.

Recall that bind is a flattening projection (flat map; it’s like going from IEnumerable<IEnumerable<T>> to IEnumerable<T>), whereas here we’re using ContinueWith to compose over an extending projection (Task<R> becomes Task<Task<R>>). The concepts of flattening and extending are opposites. The bind and cobind functions are opposites; therefore, ContinueWith is cobind.

Here, Unwrap is merely the flattening of our bind implementation. It’s how we convert our cobind’s extending projection into bind’s flattening projection.

The Task<T>.Result property (technically, the get_Result method) is the extract function of the Task<T> comonad. The extract function is to comonad as the unit function is to monad. The unit function gets us into the monad, extract gets us out of the comonad. They are opposites!

And now we see why Task<T> is a comonad, and not a monad. Another way of looking at: we can only implement bind in terms of cobind; i.e., we can’t implement SelectMany for Task<T> without calling ContinueWith.

Other than the comonadic laws (analogous to the monad laws), comonads are a triple consisting of a generic type, a cobind function and an extract function. Comonads are the opposite of monads, which are a triple consisting of a generic type, a bind function and a unit function.

As you can see from the code above, it’s fairly easy to convert from cobind to bind. Therefore, we can convert comonads into monads. We can also convert monads into comonads, although I’m not showing it here, but you can imagine that one indicator of a type being monadic rather than comonadic is that when implementing cobind, you’d discover that you must implement it in terms of bind. Try it for yourself.

A major reason that we don’t use Task<T> as a monad is because its comonadic form lends itself naturally to a really useful compiler trick: coroutines. A coroutine is syntactic sugar that allows us to write imperative-style code for continuations, hiding the messy details of the underlying state machine that the compiler must generate. In C#, it’s known as async methods. The await keyword essentially compiles into a call to ContinueWith.

Although we can’t use this compiler trick with monads directly, we can convert a monad into a comonad, and that’s exactly what Rx’s ToTask extension method does. But even better than that, Rx offers an extension method that implements C#’s awaitable pattern for IObservable<T>, thus we can directly await an IObservable<T> to safely (i.e., in a non-blocking manner) extract the final value of the sequence. This works well for singleton observables, especially for aggregations like Sum and ToList.

Because C# has great support for async methods, it’s often recommended to use Task<T> rather than IObservable<T> when your computation only has a single result; however, it appears that for a future release of Rx, Bart De Smet has implemented the new awaitable pattern in C# 7. We’ll be able to define async methods that directly return an observable rather than a Task<T>. This is certainly not a replacement for Task<T> in all async methods, but it may come in handy. To return an observable from an async method, we’ll have to use a new interface, ITaskObservable<T>, which derives from IObservable<T>. It’s just a small grammatical difference, perhaps. One neat thing about ITaskObservable<T> is that it has singleton semantics; i.e., it only calls OnNext once, similar to the fact that Task<T> can only contain a single Result.

Along The Yellow Brick Road…

At the beginning of our journey we had asked the following questions:

“Can we create a cobind function for Lazy<T>? What would it look like? What would it do?”

Along the way we’ve met IEnumerable<T>, IObservable<T> and Task<T>. Now we know how to proceed.

We’ll take the signature for Task<T>’s cobind function and replace all Task<T> with Lazy<T>.

public static Lazy<R> ContinueWith<T, R>(this Lazy<T> source, Func<Lazy<T>, R> selector)

We have a Lazy<T> and a function that requires a Lazy<T>, so should we apply it? (#WWEMD5)

public static Lazy<R> ContinueWith<T, R>(this Lazy<T> source, Func<Lazy<T>, R> selector)
=> selector(source);

Sure, but what about the return value now? We have an R but we require a Lazy<R>.

That’s simple. Let’s make the whole expression lazy by wrapping it in a new Lazy<R>. That makes sense because we must preserve the purpose of Lazy<T>, which is lazy initialization, right?

public static Lazy<R> ContinueWith<T, R>(this Lazy<T> source, Func<Lazy<T>, R> selector)
=> new Lazy<R>(() => selector(source));

When reading Lazy<R>.Value, the selector will first extract the value of Lazy<T>, and with it compute R. In effect, Lazy<T> isn’t evaluated until we attempt to evaluate Lazy<R>, because Lazy<R> depends on Lazy<T>.

Thus, laziness is preserved through composition.

How might our code look when using this ContinueWith implementation for Lazy<T>?

Let’s first define a simple function named Create to take advantage of type inference. It’s just a bit more declarative than having to construct a Lazy<T> ourselves and specify T explicitly.

public static class Lazy
{
public static Lazy<T> Create<T>(Func<T> valueFactory)
=> new Lazy<T>(valueFactory); }

Here’s a simple program that composes two lazy computations:

var i = Lazy.Create(() =>
{
Console.WriteLine("Initializing ‘i’");
return 42;
});

var d = i.ContinueWith(lazy => lazy.Value * 2);

Notice that the construction of Lazy<T> assigned to d is hidden by the cobind function. This is the declarative style of programming, and it appears the same in comonads as in monads.

Using i and d as defined above, what does the following console program print?

Console.WriteLine(i.IsValueCreated);
Console.WriteLine(d.IsValueCreated); Console.WriteLine(d.Value);

Console.WriteLine(i.IsValueCreated);
Console.WriteLine(d.IsValueCreated);
Console.WriteLine(i.Value);

Scroll down to see the output…

.

.

.

.

.

.

False
False
Initializing ‘i’
84
True True
42

Notice that i.IsValueCreated returns False at first, even though d is defined in terms of i. Invoking the ContinueWith function doesn’t cause any side effects, just like how applying SelectMany doesn’t cause any side effects for monads. We can build up a computation with cobind and evaluate it later; i.e., lazy evaluation.

When the program reads d.Value, then i.Value is initialized. Afterwards, i.IsValueCreated returns True, even though the program hasn’t read i.Value directly yet. That’s because d reads i.Value in its continuation, so evaluating d requires i to be evaluated as well. Evaluating the outermost Lazy<T> propagates the act of evaluation up to the source.

Finally, when the program reads i.Value at the end, notice that it’s not reinitialized. Lazy<T> caches the result of its factory function.

We’re Off To See The Wizard

Where does laziness come from? Is it something in the water?

No, it’s something in the function.

A function provides an identifiable name for a sequence of statements in the imperative style. Imperative style programming languages are eager, in that when a program encounters a statement, it’s evaluated immediately. That’s what imperative means. Therefore, laziness is achieved simply by wrapping statements in a function. A function may be evaluated at any time by invoking it, by name or delegate.

A statement evaluates eagerly.

A statement in a function evaluates lazily.

A lambda expression represents an anonymous function with a body consisting of a single statement. We bind a lambda expression to a named variable or parameter, and that becomes its name for the purpose of invoking it. A lambda expression isn’t evaluated at the site at which it’s defined, but at the site at which it’s invoked. Therefore, lambda expressions are lazily-evaluated. They are functions.

In a pure functional language like Haskell, a function consists of expressions, and expressions are interchangeable with values.6 Replacing an expression with its value cannot be done safely in imperative-style languages because an expression may have side effects, and you’d expect those side effects to occur each time that the expression is evaluated. Evaluating a Haskell program is similar to reducing a mathematical expression into its simplest form, yet expressions in Haskell are, perhaps surprisingly, lazily evaluated. Evaluation is deferred until an expression’s value is actually needed by the program. And once an expression has been evaluated, its value may replace every instance of that expression in the program. Perhaps expressions in Haskell are similar to Lazy<T> in C#?

So how is Lazy<T> related to a function?

Well, the Lazy<T> constructor turns a parameterless function into a type

public Lazy(Func<T> valueFactory)

that has a property

public T Value { get; }

which is technically a parameterless function

public T get_Value()

that has the same exact signature as valueFactory, the Func<T> that was used to construct the Lazy<T> instance in the first place.

So can we shake off the Lazy<T> type and implement Func<T> as a comonad?

Let’s try it. We’ll start with the signature of ContinueWith by simply replacing all Lazy<T> with Func<T>.

public static Func<R> ContinueWith<T, R>(this Func<T> source, Func<Func<T>, R> selector)

That’s wonderfully weird and magical looking. And for our next trick, the types will guide us…

We have a Func<T>, and we have a function that takes a Func<T> as its argument.

Note: A function with a function parameter is known as a higher order function. In this case, we’ve got two higher order functions: ContinueWith and selector.

Functions are given data as arguments.

Higher order functions are given functions as data.

Let’s pass our source function to selector.

public static Func<R> ContinueWith<T, R>(this Func<T> source, Func<Func<T>, R> selector)
=> selector(source);

Just like ContinueWith for Lazy<T>, we notice that our return types don’t match. The selector function returns R, but we need a Func<R>.

We must wrap the entire expression in another function!

public static Func<R> ContinueWith<T, R>(this Func<T> source, Func<Func<T>, R> selector)
=> () => selector(source);

Well, that was easy. It looks the same as the implementation for Lazy<T>, but without a Lazy<R> wrapper. The outer function provides the laziness.

For comparison, here’s the Lazy<T> definition from above:

public static Lazy<R> ContinueWith<T, R>(this Lazy<T> source, Func<Lazy<T>, R> selector)
=> new Lazy<R>(() => selector(source));

Is the Func<T> comonad the same as the Lazy<T> comonad?

No, because Func<T> will always compute a value when you invoke it, whereas Lazy<T> only computes its value once. Furthermore, Func<T> may return a different value each time that it’s invoked (in a side-effecting language like C#, this is common.)

Therefore, Func<T> is lazy, but it may also cause side effects each time that you invoke it, whereas Lazy<T> may only cause side effects once.

Should we use the Func<T> comonad at all?

Probably not. It offers no real advantage compared to composing lambda expressions, although it does hide the closure (for better or worse). Perhaps Func<T> is the closure comonad? A blog post for another day.

Meet The Lazy Wizard

The laziness of Lazy<T> is related to the idea that applying co/bind forms a lazily-evaluated expression.

We can declare computations over IEnumerable<T> and IObservable<T> that are lazy; i.e., there are no computational side effects until we iterate or subscribe, respectively. And likewise for Task<T>, we can compose together many ContinueWith calls without causing any side effects; that is, until the Task<T> completes. There’s obviously a race condition here, but the point is that ContinueWith doesn’t block. ContinueWith forms a lazily-evaluated computation, although it’s possible that Task<T>.Result was already computed before ContinueWith was applied, which may result in the continuation executing synchronously.7

And so it goes for Lazy<T>. The purpose of Lazy<T> is simply lazy evaluation. That’s what ContinueWith needs to preserve across compositions; although, it’s possible that the lazy value may have already been computed by the time the program attempts to extract it. Just like Task<T>.Result, once it’s computed, we can keep reading Lazy<T>.Value over and over again without recomputing it. We can even build new computations on top. In the case of Lazy<T>, a new computation over a pre-computed value would simply reuse the pre-computed value as the source for the new computation, without having to reevaluate the original source; i.e., memoization of a parameterless function. This is illustrated in the code example above with the explicit call to i.Value at the end of the program: the side effect of writing to the console upon initializing i.Value had already occurred earlier in the program, and it’s not repeated again at the end.

All of these co/monadic types are lazy, but they also differ in behavior when it comes to laziness. The laziness of Lazy<T> and Task<T> isn’t permanent, it’s a one-time use. The laziness of IObservable<T> and IEnumerable<T> is potentially repeatable.

To put it another way: Lazy<T> and Task<T> cache their value to avoid duplicating side effects, whereas IObservable<T> and IEnumerable<T> may cause side effects each time that they are “invoked”. I smell a pattern.

Pulling Back The Curtain

Lurking in the shadows, as usual, are side effects. Being able to control side effects is a major benefit of having these co/monadic constructs in C#.

The point of Lazy<T> is to ensure that any side effects only occur once, when computing a single value of T. That’s why Lazy<T> is often used to represent a static singleton in C#. If computing the value causes side effects, then we don’t want to compute the value until it’s actually needed by the program. And once it’s been computed, we don’t want to repeat the computation. Of course, there are other reasons to have singletons, but using a static readonly field without Lazy<T> is generally good enough for most singletons. It’s only when we want to defer side effects that we use Lazy<T> instead.

A synchronous sequence of values is similar to Lazy<T> but with a larger cardinality: Lazy<T> only has a single Value, whereas IEnumerable<T> has many. But there’s another significant difference: IEnumerable<T> doesn’t necessarily cache its results. Each time that you iterate, you may get different values. Furthermore, an enumerable query is a computation that may have its own side effects regardless of whether the source IEnumerable<T> caches its results or not, but those side effects aren’t realized until you iterate it.

Task<T> represents an asynchronous function that eventually computes a Result. We use Task<T> to avoid duplicating the side effects of an asynchronous computation. Once it completes, subsequent reads of the Result property will always return the same value, thus side effects won’t occur again. If we actually want to repeat the side effects, then we must invoke the async function again, which will return a new instance of Task<T>.

An asynchronous sequence of values is similar to Task<T> but with a larger cardinality: Task<T> only has a single Result, whereas IObservable<T> has many. But there’s another significant difference: IObservable<T> doesn’t necessarily cache its results. Each time that you Subscribe, you may get different notifications. Furthermore, an observable query is a computation that may have its own side effects regardless of whether the source IObservable<T> caches its notifications or not, but those side effects aren’t realized until you Subscribe.

Side effects are related to temperature:

Cold means that side effects may occur.

Hot means that side effects do not occur.

So how does the laziness of co/monads relate to temperature?

Let’s consider the properties of the types of the co/monads separately from their co/bind implementations.

The Func<T> type may return a different value each time that it’s invoked, and that’s the cold programming model. Therefore, functions are cold.

The Func<T> comonad described above defers side effects until the outer-most function is invoked. That’s also the cold programming model, and essentially its purpose. The Func<T> comonad composes functions lazily. But it’s also pointless, given that Func<T> is already cold. The Func<T> comonad simply adds an unnecessary layer of cold indirection.

The Lazy<T> type is initially cold, because it defers side effects until its Value property is read, yet it becomes hot for subsequent reads of the Value property. In other words, the first time that Value is read it may cause side effects, but subsequent reads definitely will not. (There’s also a race condition if Value may be read by multiple threads simultaneously, but that's similar to Task<T>.Result, so let’s address the race condition in our analysis of Task<T> that follows.)

The Lazy<T> comonad is initially cold, because cobind defers side effects until the outer-most Value property is read, yet it becomes hot for subsequent reads of the outer-most Value property.

The Task<T> type is initially cold, because it can be constructed by passing in a function and it defers side effects until its Start method is called; however, it can only be executed once. Calling Start again throws an exception. Furthermore, the Task<T>.Result property is always hot; therefore, Task<T> transitions to hot when Start is called.

However, it’s rare to acquire a Task<T> that hasn’t already been started. Typically we use Task.Run, Task.Factory.StartNew, or invoke some BCL or user-defined function that returns a Task<T>. In these cases, Start has already been invoked, thus the computation may have already begun even before the caller acquires a reference to the Task<T>. The caller can’t invoke Start again, and reading the Result property never causes side effects, thus our typical usage of Task<T> is hot, yet the Task<T> type itself is technically cold. In other words, notice that we typically acquire an instance of Task<T> by invoking a function that returns it. Functions are cold, as stated previously. Therefore, each time that we invoke a function that performs an asynchronous computation, it may return a new instance of Task<T> to represent its Result, which is always hot. Invoking the function is essentially like invoking the Start method yourself. In that case, Task<T> is hot because it has already been started by the function that returned it, yet the Task<T> type itself is technically cold because it must be started at some point.

Alternatively, there’s another way to get an instance of Task<T> (a.k.a., “future”): use a TaskCompletionSource<T> (a.k.a., “promise”). TaskCompletionSource<T> replaces the need to call Start by essentially making the Result property assignable. The Task<T> that it returns is hot because it’s constructed without a function (via an internal constructor), thus it has no computation to be deferred. A consumer of the Task<T> is unable to call Start, thus it’s unable to cause side effects, and so this usage of Task<T> is hot. This doesn’t change the fact that the Task<T> type itself is cold, at least as far as its public API is concerned. The promise mechanism is useful because it decouples the two features of a future: It separates its ability to defer a computation from its ability to compute asynchronously; i.e., TaskCompletionSource<T> makes Task<T> hot without having to begin a computation by calling Start. Instead, the computation occurs externally and terminates when Result is assigned (or the task faults, or it’s canceled – but we’ll ignore these states for the purposes of this analysis).

The Task<T> comonad is initially cold, because cobind defers side effects until the Task<T> completes. It’s cold because the continuation is a function, and functions are cold. However, it becomes hot once the Task<T> completes. If you apply ContinueWith after the Task<T> has completed, then it may execute the continuation synchronously. There’s a race condition.

Therefore, the Lazy<T> and Task<T> types have the same temperature and similar behavior; i.e., they are both initially cold and then transition to hot by caching their value, which avoids duplicating side effects. Lazy<T>.Value blocks until its computation has completed; likewise, Task<T>.Result blocks until its computation has completed. The difference between them is that Lazy<T> computes its Value synchronously, and Task<T> computes its Result asynchronously. This means that to extract the Value from Lazy<T>, a program must block. To extract the Result from Task<T>, a program can do other things while Result is being computed. Since the only purpose of Lazy<T> is to defer its computation of Value, it doesn’t make sense to have a method like Run; it would simply block until Value has been computed, thus defeating the purpose of Lazy<T>. Since the purpose of Task<T> is asynchrony, in addition to deferring its computation of Result, it’s actually useful to hide its coldness behind a function that returns a hot Task<T>.

As comonads, both Lazy<T> and Task<T> are initially cold, yet they become hot once their value has been computed. In the case of Lazy<T>, cobind is synchronous (it blocks until Value is computed), whereas for Task<T>, cobind is asynchronous (it doesn’t block, although it may execute the continuation synchronously when Result has already been computed and the ExecuteSynchronously flag was set). In either case, the transition from cold to hot means that there’s a race condition.

Therefore, comonads and their comonadic types are cold, which is the definition of deferred execution. In addition, comonads that transition from cold to hot have a race condition.

The IEnumerable<T> type is either cold, as in C# Iterator Blocks, or it’s hot, as in pre-computed lists and dictionaries.

The IEnumerable<T> monad is cold, because bind defers side effects until the enumerable that it returns is iterated. This is true even if the IEnumerable<T> that is the source of the computation is hot. Although no side effects will occur when iterating a hot enumerable such as a List<T>, the computation itself (a.k.a., the query) may cause side effects when it’s iterated, and it can only cause side effects when it’s iterated.

The IObservable<T> type is either cold, as in Rx generators like Return, Range, Generate, Timer, Interval and Create, or it’s hot, as in Subject<T>, events and hot Task<T> conversions.

The IObservable<T> monad is cold, because bind defers side effects until an observer subscribes to the observable that it returns. This is true even if the IObservable<T> that is the source of the computation is hot. Although no side effects will occur when subscribing to a hot observable such as a Subject<T>, the computation itself (a.k.a., the query) may cause side effects when an observer subscribes, and it can only cause side effects when an observer subscribes. Furthermore, due to the asynchrony of IObservable<T>, some interesting and useful combinations of cold and hot temperatures are also possible. (See Hot and Cold Observables for details.)

Therefore, the IEnumerable<T> and IObservable<T> types can either be cold or hot, and they have similar behaviors; i.e., they both represent sequences, which may or may not cause side effects when computing their elements. The difference between them is asynchrony. IEnumerable<T> computes its elements synchronously. IObservable<T> computes its elements asynchronously. This means that to pull the next value from an IEnumerable<T>, a program must block. To be pushed values from an IObservable<T>, a program calls Subscribe and can immediately do other things. In either case, iterating an IEnumerable<T> or subscribing to an IObservable<T> doesn’t necessarily cause side effects. It depends on the implementation; e.g., List<T> does not cause side effects, so it’s hot. An Iterator Block may cause side effects, to it’s cold. Observable.FromEventPattern returns an observable that wraps an event, which is hot. Observable.Range generates notifications each time that you subscribe, so it’s cold.

As monads, both IEnumerable<T> and IObservable<T> are initially cold. Applying bind on an IEnumerable<T> or IObservable<T> won’t cause any side effects until it’s iterated or subscribed, respectively. However, depending upon the implementation, they may or may not transition to hot.

Therefore, monads are cold regardless of the temperature of their types, which is the definition of deferred execution. In addition, monads that transition from cold to hot have a race condition.

Monads and comonads have identical temperatures with regard to composition – they are both cold. Therefore, consecutively applying co/monadic operators is free of side effects. Co/bind implements deferred execution, such that side effects won’t occur until you leave the co/monad.

There’s No Place Like Home

We use monads and comonads to control side effects. It’s the purpose of the property of laziness in co/bind. Co/monads allow us to write queries declaratively, without causing side effects. We control when side effects may occur by reading/invoking/iterating/subscribing at a later time.

Lazy<T> is a comonad with the purpose of causing side effects only once, attached to a blocking computation. The Func<T> comonad is even closer to the metal in that it represents repeatable side effects; although, perhaps it’s not very useful as a comonad.

Task<T> is a comonad with the purpose of causing side effects only once, attached to an asynchronous computation. However, the laziness of Task<T> is subtle. Due to a race condition that is common in the Task<T> implementation of cobind, we must either live with side effects possibly occurring immediately (although asynchronously in general), or we can build our computation on a promise (TaskCompletionSource) and glue it to the future (Task<T>) when the program is ready for side effects. One really nice way of doing that is to define a coroutine; a.k.a., an async method in C#. An async method let’s you await a Task within the imperative style of programming, meaning that typical control flow constructs are allowed without having to resort to implementing a complex state machine over CPS (because the compiler does it for you). Upon the first await, if the Task didn’t complete synchronously, then a promise is returned to the caller of the async method and the remainder of the computation from that point is referenced in a continuation that is attached to the Task via cobind. Side effects of the downstream computation can be controlled easily by simply holding the Task rather than applying await. The program can apply await at any time in the future to continue the computation with its Result. The race condition, in this case, is irrelevant.

IEnumerable<T> is a monad with the purpose of causing repeatable side effects, attached to a blocking computation.

IObservable<T> is a monad with the purpose of causing repeatable side effects, attached to an asynchronous computation.

Even though a sequence may be hot or cold, applying the bind operator (or any operator) itself doesn’t cause any side effects, until the sequence is iterated/subscribed. Furthermore, an operator may cause side effects each time that its return value is iterated/subscribed. The monadic bind function is about repeatable side effects, regardless of the temperature of its monadic type.

In conclusion:

Comonads represent lazy computations of single values, are initially cold, and transition to hot when the computation completes.

Monads represent lazy computations over zero or more values, depending on the type. They are initially cold, and may transition from cold to hot, depending on the type.

And why is any of this important?

Because a language like C# is full of side effects. Being able to introduce side effects so easily is often what makes C# really useful, but it’s also what makes many C# programs really confusing and buggy.

Controlling side effects in a declarative manner makes programs much easier to reason about, and to prove correctness. Although, some parts of a program are just much easier to write using imperative side effects.

Therefore, the careful combination of declarative and imperative styles within the same program is often the best choice. Functions generally help with the former, as in the procedural programming style. Monads also help with the former, computing over sequences in a declarative style. Comonads (via coroutines) generally help with the latter, when single-valued asynchronous computations are required.

These techniques are becoming even more important as modern programs increasingly depend on concurrency.

References

Notes

  1. The name of SelectMany is similar to the name of the Select operator, which is the same as the SELECT operator in SQL. Likewise, the Where operator in LINQ is named the same as the WHERE operator in SQL, and so on. When LINQ was invented, apparently the designers had chosen SQL-like names for the operators to make developers more comfortable. Most developers were used to SQL, but didn't have a clue about monads. And this naming convention works of course because SQL is a monadic comprehension syntax!
  2. The C# compiler actually looks for a special signature of a method named SelectMany rather than the one shown above. It must have an intermediate selector parameter in addition to the result selector. It's just a performance optimization and so I won’t elaborate on it here.
  3. Rx actually provides several generators, including Range, Generate, Timer, Interval and Create. There are many ways to enter the continuation monad. Likewise, there are many ways to enter the list monad, IEnumerable<T>, because there are many types that implement IEnumerable<T>. And C# offers another option: Iterator Blocks.
  4. The FCL implementation of ContinueWith is defined as an instance method. Here, I’ve rewritten it as an extension method and renamed its parameters, for parity with the SelectMany method. Other than those insignificant changes, it remains identical to the definition in the FCL.
  5. What Would Erik Meijer Do?   https://twitter.com/headinthebox/status/543919077752729600
  6. https://wiki.haskell.org/Referential_transparency
  7. The TPL doesn’t invoke continuations synchronously unless you opt-in, and even then it’s not guaranteed. For the most part, the TPL is designed to introduce concurrency. This is in contrast to the design of Rx, which is free-threaded and rarely introduces concurrency. Operators introduce concurrency when you explicitly specify a concurrency-introducing scheduler, although operators that involve some kind of timing may default to a concurrency-introducing scheduler if you don’t pass in one; e.g., Throttle, Sample, Buffer, etc. This is because Rx is primarily about taming concurrency, not introducing it.

Tags: , ,

Async | C# | Patterns | Rx

March 10, 2008

ContractN on CodePlex

I've just released the first beta version of ContractN, which is a fully managed programming-by-contract framework that uses the CLR's built-in support for aspect-oriented programming (AOP), which provides a way to seamlessly and intuitively describe the required pre- and post-conditions of a type's contract.

I got the idea based on this blog post by Sasha Goldshtein; although, Sasha's implementation and that of ContractN have little in common other than their purpose.  I liked the way Sasha used dynamically-generated code, but other than that I wasn't satisfied having to create the extra stuff that was required to enforce conditions.

Advantages of ContractN

The real benefit of ContractN, IMO, is that there's no overhead in terms of setting up your classes or their consumers, and you don't have to worry about breaking encapsulation or having to create any supporting interfaces or classes.  You just need to derive an object from ContractN.ProgrammingByContract. (It's a shame that this is required though - I hope AOP is eventually introduced as a C# language feature; although, I suspect that the CLR would have to participate as well to alleviate the need for subclassing.)

ProgrammingByContract is an abstract base class that sets up the AOP infrastructure, which is used to intercept method invocations and define conditions as attributes.  After defining a class that inherits from ProgrammingByContract, simply add the appropriate attributes wherever you'd like to enforce certain conditions on the public contract of your type.  Here's a really simple C# 3.0 example:

public class Person : ContractN.ProgrammingByContract
{
  [InRequired, OutRequired]
  public string Name { get; set; }
}

The example illustrates that there are some built-in conditional attributes for basic argument checking, such as InRequiredAttribute, OutRequiredAttribute and ReturnRequiredAttribute, which throw an exception when an argument (or in the case of the latter two, output parameters and a return value) are null.  I plan to add more conditions in the future, and the framework was designed to be flexible enough so that you can define your own conditions as well.

Custom Conditions

To define a new condition, create a class that derives from either PreConditionBaseAttribute, PostConditionBaseAttribute or ConditionBaseAttribute and override the abstract members.  Usage should be intuitive depending upon the implementation that you need, but you can look at the source code for InRequiredAttribute (beta 1) as an example.

Dynamic Conditions

I decided to take a different approach than Sasha in terms of how dynamic conditions are implemented.  I wanted to take advantage of static type checking instead of hard-coding constraints as strings.  PreConditionAttribute and PostConditionAttribute allow you to specify a method name on the decorated type that will be invoked automatically whenever any public method is called.  That also includes constructors and accessors for properties and events.  (Note that these attributes are also valid on a property, constructor, method or event.)  The method signature can vary depending upon your needs; generally it will be a parameterless instance method, though static methods are allowed as well.  The types of arguments that are accepted encapsulate information about a particular invocation.  For example, you could define a method that accepts an argument typed as ContractN.PropertyCall and the method will be invoked automatically whenever a property get or set accessor is invoked, pre- or post-invocation depending upon the attribute used.  Within these methods you can write code that will actually be part of the class that it's meant to constrain, which maintains encapsulation, makes it easy to share code between constraints and allows access to state with compiler support.

Here's a more sophisticated example than above.  It makes use of pre- and post-conditions described above and also implements ad-hoc conditions by providing an implementation of ICondition:

[PreCondition("TestStatic")]
[PreCondition("TestInstance")]
class AdvancedFeatures : ProgrammingByContract, ICondition
{
    #region Public Properties
    public string TestProperty { get; set; }
    #endregion

    #region Methods
    private static void TestStatic(PropertyCall info)
    {
        Console.WriteLine("Pre Static Property: {0}", info);
    }

    private static void TestStatic(MethodCallBase info)
    {
        Console.WriteLine("Pre Static Default: {0}", info);
    }

    private void TestInstance(MethodCallBase info)
    {
        Console.WriteLine("Pre Instance Default: {0}", info);
    }

    public void NoOperation()
    {
    }
    #endregion

    #region IPreCondition Members
    void IPreCondition.Apply(ConstructorCall info)
    {
        Console.WriteLine("Pre: {0}", info);
    }

    void IPreCondition.Apply(MethodCall info)
    {
        Console.WriteLine("Pre: {0}", info);
    }

    void IPreCondition.Apply(PropertyCall info)
    {
        Console.WriteLine("Pre: {0}", info);
    }

    void IPreCondition.Apply(EventRegistrationCall info)
    {
        Console.WriteLine("Pre: {0}", info);
    }
    #endregion

    #region IPostCondition Members
    void IPostCondition.Apply(ConstructorCall info, MethodCallReturn returnInfo)
    {
        Console.WriteLine("Post: {0}", info);
    }

    void IPostCondition.Apply(MethodCall info, MethodCallReturn returnInfo)
    {
        Console.WriteLine("Post: {0}", info);
    }

    void IPostCondition.Apply(PropertyCall info, MethodCallReturn returnInfo)
    {
        Console.WriteLine("Post: {0}", info);
    }
    
    void IPostCondition.Apply(EventRegistrationCall info, MethodCallReturn returnInfo)
    {
        Console.WriteLine("Post: {0}", info);
    }
    #endregion
}

When the above class is used as follows:

AdvancedFeatures features = new AdvancedFeatures();
features.TestProperty = "A test value";
features.NoOperation();

The console output is:

Pre Static Default: ContractN.ConstructorCall
Post: ContractN.ConstructorCall
Pre: ContractN.PropertyCall
Pre Instance Default: ContractN.PropertyCall
Pre Static Property: ContractN.PropertyCall
Post: ContractN.PropertyCall
Pre: ContractN.MethodCall
Pre Instance Default: ContractN.MethodCall
Pre Static Default: ContractN.MethodCall
Post: ContractN.MethodCall

One of the really neat uses for the PreConditionAttribute that I've found is to have ObjectDisposedException automatically thrown instead of having to check for it everywhere.  Here's an example that builds off of the Person class in my first example:

[PreCondition("NotDisposed")]
public class Person : ContractN.ProgrammingByContract, IDisposable
{
    [InRequired, OutRequired]
    public string Name { get; set; }

    private bool disposed;

    private void NotDisposed()
    {
        if (disposed)
            throw new ObjectDisposedException(this.GetType().FullName);
    }

    public void Dispose()
    {
        disposed = true;
    }
}

After the Dispose method is called, public members are no longer accessible to external types.  All calls will result in ObjectDisposedException automatically!

Conclusion

ContractN is more of a hobby project for me now than anything else, but it has potential and I'd really like to see it blossom into something useful.  If you have any suggestions for improvements please let me know and I'll consider them for the next beta release.  (I'm going to establish a timeline on-the-fly as I get ideas for features.)  I have some interesting ideas for new conditions and I hope to blog about them in the future, so don't wander too far off if you're interested...

February 13, 2008

Sandcastle Build Component Templates 1.1 Released

The Sandcastle Build Component Templates 1.1 release provides two C# Item Templates that you can use to easily create custom Sandcastle Build Components in Visual Studio 2005 and Visual Studio 2008.

In this blog post I intend to provide some details about the templates, with more focus on hosted components and DocProject's APIInstructions for template installation and usage can be found here.

Basic Sandcastle Build Component Item Template

The Basic Sandcastle Build Component item template is a multi-file Visual Studio Item Template, written in C#, that provides a working example of a custom build component that can be used for command-line builds or with editor support in automation tools such as DocProject and the Sandcastle Help File Builder.  It comes with a built-in graphical editor and a dynamic sub property for the DocProject Properties window, which can be shown by expanding the component in a build component stack (for more information about the stack properties, see How To Use Third-Party Sandcastle Components in DocProject).

Hosted Sandcastle Build Component Item Template

The Hosted Sandcastle Build Component item template is the same as the basic template, but it also provides direct access and a working example of how to gather information from DocProject's API, thus DocProject is required in order to compile the build components that are created based on this template.  Although, DocProject isn't actually required to use the components if you provide a fall-back initialization path.

Initialization Paths

The template's Configuration class provides two initialization paths, one of which is a fall-back that's automatically invoked if the DocProject host API is not detected.

Note: The host API is never available during builds - it's only available when the component's editor is displayed.

Fall-back initialization works since the template depends upon just-in-time (JIT) compilation.  If you properly encapsulate the use of host interfaces, as the template does out-of-the-box, then you can control the initialization path of the component's Configuration class.

The template's Host class is used to encapsulate host interfaces and services, and it provides a way to safely check, at runtime, whether a particular host is reachable.  Support for DocProject's host API is built-in to the template.  All other hosts will use the fall-back initialization path automatically unless you add support for them to the Host and Configuration classes.  The only requirement is that the host invokes your component's editor by calling its EditValue method and passes in an IServiceProvider implementation that can be used to obtain a reference to one or more of the host's services.  (On a side note, the value argument should be set to a System.Xml.XPath.IXPathNavigable implementation that's positioned on the outer XML of the component's configuration, which includes the <component> element itself.)

Modifying Project Options

The template gathers read-only data as an example and does not actually modify any build settings or project options at runtime.  If you want your component implementation to write to the API, then make sure there's a clear relationship between your component and the changes that it's making.

For example, a proprietary AuthorBuildComponent changing the value of the Generate root API topic setting when edited will probably not be obvious to end-users.  But it might be appropriate for the same component editor to add external sources if, for example, it were to provide a way for an end-user to create a list of associations between sources and author names, much like the Version management dialog does with version information.   In this case, the component's editor could save the input-to-author mappings as its inner XML configuration data and add the external sources as a convenience for the user.

Hosted Components as Plug-Ins

Hosted build components are not meant to be a plug-in mechanism for DocProject, per se, but instead are a plug-in mechanism for Sandcastle.  DocProject's API is exposed merely to provide a more user-friendly and integrated experience when editing Sandcastle build components in DocProject.

If you want to create a true DocProject plug-in, then create a custom build engine provider or a build process component, which can provide control over all aspects of DocProject, including the entire build process, engine settings, project options, tool bars, tool windows, menus, custom wizard steps in the New Project Wizard, etc.

DocProject's API for Hosted Components

With the recent release of DocProject 1.10.0 RC, a new interface was added that would provide build components access to DocProject's API, for their graphical editors.  The API can be used to gather information about the environment in which the component's editor is being hosted.  The component can then provide automatic configuration and apply appropriate editor defaults, simplifying configuration for end-users while also adhering to the project paradigm that DocProjects and DocSites use to store their settings.

The purpose of the gateway interface, IDocProjectHost is to provide components with access to DocProject-specific interfaces and types for the context in which the component's editor is being hosted.  Here's the interface's definition:

public interface IDocProjectHost
{
    IDocProject Project { get; }
    BuildSettings Settings { get; }
    DocProjectOptions Options { get; }
    IBuildEngine Engine { get; }
    IEnvironmentHost EnvironmentHost { get; }
    bool RunningInVisualStudio { get; }
}

The Hosted template gets a reference to an IDocProjectHost implementation through the service provider that's passed to the editor's EditValue method.  If the host is available then the Configuration class's InitializeForDocProject() method is invoked instead of the fall-back Initialize() method (see above for information about initialization paths).

The InitializeForDocProject method uses the types and interfaces exposed through IDocProjectHost to gather information, which it then stores in private fields.  The fields back properties that are used by the template's EditorControl class.  They are typed as nullable primitives, such as System.String and bool?, so if the values are never set then they'll remain null, in which case the EditorControl will substitute them with "Unavailable" for its read-only display.

Developers are free to remove this code (actually, it's recommended) and replace it with appropriate code for gathering information that's required by the component being developed.  However, the same pattern can be used to safely encapsulate data that can be initialized with or without a host API.

Note: The configuration fields must not be typed as host-specific types or interfaces if you want to be able to use your components in other environments, without the host API installed.

Conclusion

With the release of DocProject 1.10.0 RC and the corresponding Hosted Sandcastle Build Component Item Template 1.1, developers can now have seamless integration in DocProject for their custom build components' editors, providing a more robust platform on which to configure and build documentation while potentially simplifying configuration for end-users through automation and inferred default settings.

January 17, 2008

DocProject Roadmap

As you may have already heard, Sandcastle [1] was officially released to the web (RTW) through CodePlex on January 15, 2008.  The license that was chosen is the Microsoft Public License (Ms-PL) [2], but keep an eye on it because it may change [3].  There are still a few unanswered questions that I have [3] about how Microsoft's work on Sandcastle will continue and how it will affect DocProject, but things are looking very positive.

In this blog post I'd like to express a few of my goals and ideas for DocProject and maybe even try to establish some preliminary timeline.  And of course, if you have any thoughts your feedback will be welcomed :)

The End of Phase 1

With the help of community feedback and my propensity for being anti-social in favor of sitting in a dark room, programming, DocProject's long initial development phase is nearing an end.  The next release, which you can read about in my blog (DocProject 1.10.0 RC Preview [4]), brings together a few long-awaited features that will hopefully start to make DocProject into a useful help authoring tool (HAT) [5].  I'm going to try for Monday, January 21st, to release 1.10.0 RC, and I'm hoping that the worst-case scenario will be the following Friday.

From there I expect to deploy only one more Release Candidate that contains various outstanding bug fixes and feature requests and then I'll freeze DocProject's feature set for Visual Studio 2005.  The following deployment, 1.12.0, should therefore be the first DocProject Production release.  Expect that in March, 2008.

Visual Studio 2005 vs. Visual Studio 2008

I expect to have my own licensed copy of VS 2008 Standard edition sometime before the end of February, but until then I've decided that I will not be concentrating on any new deployments for DocProject 2008 Beta, although I'm very excited about using VS 2008 and I'll certainly jump on that bandwagon as soon as I can.

So for now, you can expect development to continue like normal for VS 2005.  But once I start concentrating on VS 2008, I'm probably going to freeze the feature set for VS 2005, distribute the first Production release of DocProject 1.x, and then switch gears for .NET 3.5 and, possibly, VS Package development.  I intended to eventually release DocProject as a true package, hopefully with its own editors and actual project types [6] (i.e., DocProject and DocSite Templates [7] that are recognized by Visual Studio as distinct project types, with their own property pages, etc.).

After DocProject 1.x goes into stabilization (the end of phase 1) I'll certainly provide support for it and help people whenever I can if they have questions about how to modify the source code or how to add additional features themselves.  Although, the only official deployments that you'll see will probably be major releases that fix a large number of bugs.

Well that's my plan anyway because in the last couple of DocProject 2008 releases I've realized that maintaining two versions of the project is almost like double the work; however, since Visual Studio 2008 will allow me to target the .NET 2.0 Framework I may be able to work on both versions of DocProject simultaneously, without much additional effort.  If that's true then hopefully I won't feel the need to freeze the DocProject 1.x feature set.  Although, technically I can't really freeze DocProject's feature set anyway since it's open source ;)

DocProject Documentation

I plan to distribute compiled help for DocProject 1.x that consists of both API reference and conceptual documentation after the first Production release.  I'd say that a week or two should be enough time to author useful, preliminary documentation.  After that I'll probably just add content here and there and deploy whenever I reach certain milestones that I define on the fly.

A .chm will be downloadable from CodePlex and the installer will present you with the option to have DocProject's help content (.HxS) merged with Visual Studio's help.  I'm even considering hosting a DocSite as an online reference and as a great working example of DocProject's and Sandcastle's capabilities.

Phase 2

The next phase will pickup with the development of DocProject 2008, targeting Visual Studio 2008 and the .NET 3.5 Framework.  The timeline will probably start sometime in March, 2008.  From there I hope to deploy on a monthly basis as I've done with DocProject for the past year+ (since December, 2006 as a matter of fact).

Planning

Although DocProject, in the next release [4], will provide first-class support for building mixed reference and conceptual documentation, unbounded filtering capabilities, much improved performance and a mostly finalized API, I still see lots of room for improvement.  Namely, in the area of content-based features such as for authoring topics and localization.  Also, here's a few of the outstanding tasks [8] that I'm going to look into for DocProject 2008: 

  • Running DocProject in areas of heightened security (e.g., Vista with UAC enabled).
  • SQL Server full-text search provider for the DocSite templates.
  • Microsoft Word and Adobe PDF output.
  • The ability to annotate comments with developer notes. 
  • Automatic token replacement.
  • Some new DocSite skins, maybe including one that makes judicious use of SilverLight.  (Documentation might start looking better than the software it documents, if I achieve my goal, that is ;)
  • Change control and better support for team scenarios; e.g., the ability to have a server that continuously builds live documentation to keep it up-to-date.
  • The development of a simple community wiki for the DocSite templates, which I guess we can start calling the "DocSite Community Wiki" feature (sure, some might say it's a rip-off of the MSDN Community Content Wiki idea, and I guess they'd be correct ;).  This should be very useful in team scenarios for large, volatile, hosted documentation sets.
  • The ability to register multiple Build Process Components [9] per project, with a management UI as well.  I may even turn the Sandcastle/Deployment [10] engine into a BPC that can be added to any DocProject or DocSite.
  • Other various user-defined tasks (bug fixes and feature requests).

I have even more ideas that I've been tossing around for a while, so I hope to aggregate them into a blog post and eventually add them as work items in CodePlex.  And if you have any ideas for DocProject features please let me know!

Development

Development will target Visual Studio 2008 and the .NET 3.5 Framework, and as I mentioned previously, I'm definitely considering rewriting areas of DocProject as a true VS Package with its own custom editors and project types.  I'm also considering using WPF for the interfaces.  There will be a learning curve but it's one that I'm going to have to accept eventually (and I really do want to learn).

Collaboration

I expect to continue collaborating with community members to find bugs and DocProject's potential for new features.  It seems to me that the experience of DocProject users has been positive, so I'd like to continue working on DocProject in the future in the same way as I've been for the last year.  But if you feel differently please let me know.  Even though it's open source I'd still like to provide an acceptable level of service and support in my free time since the time that I put into helping end-users is also time that I'm putting into DocProject (even if it's just thought), which helps to make a better product even for me to use.

Extensibility

It might also be worth noting that I'm going to maintain the "openness" of DocProject by continuing to introduce new public APIs and areas for extensibility whenever possible.  With the limited feedback that I've received, I'd say that DocProject's high level of extensibility was a success.  A few people have mentioned to me that their businesses have created custom build engine providers [11] and build process components, and I've even used BPCs to help people debug issues in the past.  Actually, my favorite feature of DocProject might actually be the build process component, which, incidentally, doesn't really have anything in particular to do with documentation.

Deployment

It's hard to predict when the first Production release for DocProject 2008 will be deployed, but if I can use the past year as a metric then I'd say another year from now is certainly reasonable, and possibly even longer than might be required.  To be honest, I think the summer of 2008 is a reasonable goal as long as I keep my eyes on the prize.

Phase 3

At the start of phase 3 I hope to have already achieved my major goals for DocProject, which includes much better integration into Visual Studio and additional features that will make authoring documentation extremely simple and, in many cases, automated.  Thanks to Sandcastle I believe that this goal is attainable.

For the future I think I'm aiming for much better external support so that non-Visual Studio users will be able to benefit from DocProject's features as well.  The team-based features that I mentioned previously might get pushed into this phase, but we'll see how it goes.  And finally, I'd like to hear your ideas for where you'd like to see DocProject go.  After I'm able to meet most of my own goals I'd like to brainstorm with the community to come up with some new and innovative ideas to be a part of DocProject in the future.

References

[1] Sandcastle on CodePlex, http://www.codeplex.com/Sandcastle

[2] Microsoft Public License (Ms-PL), http://opensource.org/licenses/ms-pl.html

[3] CTP, License and Source Code, http://www.codeplex.com/Sandcastle/Thread/View.aspx?ThreadId=20557

[4] Dave Sexton's Blog: DocProject 1.10.0 RC Preview, http://davesexton.com/blog/blogs/blog/archive/2008/01/15/docproject-1-10-0-rc-preview.aspx

[5] Help authoring tool. (2008, January 9). In Wikipedia, The Free Encyclopedia. Retrieved 11:22, January 17, 2008, from http://en.wikipedia.org/w/index.php?title=Help_authoring_tool&oldid=183180723

[6] DocProject Work Items: VS 2008 Shell and Extensibility, http://www.codeplex.com/DocProject/WorkItem/View.aspx?WorkItemId=14029

[7] DocProject Components, http://www.codeplex.com/DocProject/Wiki/View.aspx?title=DocProject+Components

[8] DocProject Work Items, Advanced List, http://www.codeplex.com/DocProject/WorkItem/AdvancedList.aspx

[9] DocProject's Build Process, Build Process Components, http://www.codeplex.com/DocProject/Wiki/View.aspx?title=Build+Process#BuildProcessComponent

[10] DocProject Sandcastle/Deployment Plug-In, http://www.codeplex.com/DocProject/Wiki/View.aspx?title=Sandcastle+Deployment+Plugin

[11] DocProject tutorial: Creating a Build Engine Provider, http://www.codeplex.com/DocProject/Wiki/View.aspx?title=Creating+a+Build+Engine+Provider

October 21, 2007

C# Multi-line XML Documentation Comments

I was just checking out the documentation on MSDN about XML comments and I discovered that C# supports multi-line comments for XML documentation as well:

/**
 * <summary>Hello World!</summary>
 */
public void HelloWorld() { }

Interesting, but I wouldn't recommend using multi-line comments for a few reasons:

  • They do not support Smart Comment Editing, where by Visual Studio inserts multiple comment lines and an empty summary tag automatically after starting the comment, which in this case is done by typing /** instead of ///.
  • Intellisense is limited:
    • top-level tags are listed but context is ignored (e.g., the para tag is never listed, even if the cursor is within a summary element).
    • Entering a tag name and pressing enter will not produce a closing tag automatically.
  • The rules for processing markup may be problematic since the compiler will look for a prefix pattern consisting of white-space, an asterisk and more white-space; however, a slight change in formatting in the prefix of one or more lines (e.g., less leading white-space) may cause the compiler to include one or more asterisks as part of the comments.
  • They are certainly not as common as triple-slash comments and will probably be confusing to most people (I know if I were to have seen this format used, and I already may have, I would have doubted whether it was even valid.)

I'm curious to know if anyone has found this feature to be more useful than triple-slash comments, and why :)

October 08, 2007

The Help! Project on CodePlex

I've just published the Help! project on CodePlex, which is intended to provide a fully managed compiled help system to replace Help 1.x (.chm).  Please contribute to this project by using the discussions area to post comments and ideas or join the development team.  If you want to join the team then please see the Welcome! thread for information.

For now I'll be using my blog to post news and information about this project.

July 29, 2007

DocProject for Sandcastle 1.7.0 RC Preview

It's about that time again…

Just FYI, DocProject 1.6.2 RC had more downloads in the last month than any other version to date. Thanks to everyone who tried it and provided valuable feedback!

And BTW, I wrote a new article called How to Diagnose and Resolve Issues that provides guidance for gathering diagnostic information if you're having issues with DocProject and would like to try to solve the problem yourself or if you would like to ask for assistance.  You can find other How To... articles here.

In the next version, 1.7.0, there are a number of really cool features that I think people will like, as well as some bug fixes that should solve the problem with Team Build and provide a better experience using the HTML Editor. In this blog post I'll show a preview of some of the new features and describe my plans for them in the future.

Like usual, please let me know what you think so that with your support I can continue to make DocProject a useful community tool.

New Features

  • XML Documentation Editor
  • API Topic (Shared Content) Designer Improvements
  • Missing Dependencies List
  • External UI (DocProject.exe)

I'm still testing and finishing up some other tasks, features and bugs. I'm running a bit late since I was hoping for a mid-July release, but a few of the features that I will describe below somehow snuck into this release (that seems to happen a lot with me ;). I'm also quite busy with work but I think I might be able to manage a deployment next week, without any showstoppers popping up. So far, things are looking good though.

XML Documentation Editor

The API Topic Management Dialog now supports authoring XML documentation (summary, remarks and example) for all API members, not just the project and namespace summaries. See Figure 1 below.

I've created a new control named, ContentEditor and another control named, ContentListEditor, which is used for the new and improved content dialogs: API Topic Management (Figure 1) and the API Topic Designer (Figure 4). As you can see in both figures, there are now two tabs at the bottom for switching between design-mode and source-mode and a list of items that can be edited. Only one item can be edited at a time by selecting it in the list. After editing an item its name becomes bolded in the list until you save your changes.

Note: The content editor control actually provides another tab named, Preview, which raises an event that controls can handle to provide read-only HTML based on the changes made in the editor. Because this feature has not been implemented yet the Preview tab is currently hidden at runtime.

Figure 1: API Topic Management – Xml Documentation Editor

Xml Documentation

Remember that this is an xml documentation editor, which means that tags such as see, c and code may be used in Source mode even though they won't be formatted in Design mode. Sandcastle uses these tags when it builds your documentation to apply special formatting or to link to other topics.

One feature that I plan on adding in the future is the ability to drag nodes from the tree onto the designer and have it serialize <see cref="{id}">name</see> automatically. I'd also like to show see links as hyperlinks in the designer, but maybe as green or using double-underline to distinguish them from normal hyperlinks. Another possibility is to use the stylesheets from your DocProject's or DocSite's selected presentation style and then format all of the xml documentation tags in the designer as they will appear in the compiled documentation.

Another feature that I plan on adding is API type-specific sections in the item list. In other words, if you select a method node to edit then the editor should automatically list each of the method's parameters next to summary, remarks and example. Tags such as permissions and seealso should be listed as well, although I'd like to open a custom editor interface for these tags.

Currently, you can add custom tags to the list using DocProject's configuration file (DaveSexton.DocProject.dll.config, normally found in C:\Program Files\Dave Sexton\DocProject\bin), but they'll apply to every API element. I'd like to implement a feature that allows you to specify which sections apply to which API elements though (probably using DocProject's configuration file, partially shown in Figure 2 below).

Note: Do not confuse the DaveSexton.DocProject.dll.config file with the DocProject.exe.config file (I'll explain more on that below) located in the same directory.

Figure 2: DaveSexton.DocProject.dll.config xmlDocumentationTags attribute

HTML to XML Conversion

I've made a few improvements to my HTML Editor control and the way HTML is converted to XML (a few bug fixes).

The HTML Editor control that you see in Figure 1, which you may already be familiar with if you've used DocProject 1.6.0, is based on the WebBrowser control. The WebBrowser control is basically a managed wrapper around Internet Explorer. For the HTML Editor it's a wrapper around IE's design mode, which only produces HTML markup (in some cases, very poor HTML markup.) Sandcastle requires valid XML, so DocProject converts your HTML to XML automatically. The results of the conversion can be viewed in Source mode. The markup that you enter in Source mode is also converted automatically, so don't be surprised if when you forget to add a closing p tag, DocProject adds it for you when switching views or saving your changes :)

The HTML Editor is a work-in-progress, so I expect issues with HTML to XML conversions to spring up every now and then. Please let me know if you experience an error. If you can provide me with the exact steps to reproduce the problem and the markup that you used, I'll do my best to fix parser bugs and improve the editor's usefulness.

Persistence

Your comments are saved in individual xml files in your DocProject's or DocSite's Comments folder, at the root of the project. An {assembly name}.xml file is created for each assembly and one project.xml file is created to contain the project and namespace summaries. If you prefer you can edit these files using Visual Studio's XML editor. If you haven't edited any API elements for a particular assembly then its XML file will not be present in the Comments folder, but you can create it yourself. The name of an assembly's xml documentation file must be the display name of the assembly with .xml appended to the end or else DocProject will not be able to merge your documentation with imported documentation.

Merging

XML documentation that appears in your code is not shown in this dialog. By default, comments created in the dialog overwrite code comments, on a section by section basis; e.g., summary, remarks or example. But you can change this setting to give code comments precedence instead of being overwritten by selecting KeepSource for the Merge xml documentation setting found in Tools > Options > DocProject > Active Projects > Content.

Figure 3: Merge xml documentation options for documentation written in the API Topic Management dialog

As you can see in the figure above, the API Topic Management dialog can be accessed from the option immediately below the Merge xml documentation option.

Xml Documentation Clean-up

If you create documentation for a type that is later removed from your project then the documentation will still remain in the XML file for the assembly in which the API element is defined, however it will no longer be displayed in the API Topic Management dialog so you must remove it from the file manually. In the future I may add a feature that will allow you to clean up your xml documentation files at the click of a button.

API Topic (Shared Content) Designer Improvements

The API Topic Designer uses the new content list editor control and provides a list of content items that you can edit, such as the header, footer and various titles.   The items are loaded from the Presentation\Style\Content\shared_content.xml file in your DocProject or DocSite.  Changes to the items using the dialog are written directly to this file as well when they are saved.  Deleting an item in the file will remove it from the list and adding an item to the file will add it to the list the next time the dialog is opened.

Figure 4: API Topic Designer in 1.7.0

Just like in 1.6.0, the API Topic Designer appears as a step in the New Project Wizard with all of the new features.  The header section is the default item selected.

As of the June 2007 CTP release of Sandcastle, presentations have a hard-coded header and footer that will probably be inappropriate for your projects.  I've decided not to handle this in DocProject so you must delete the header and footer manually if you do not want them to appear in your documentation.  Although the New Project Wizard is a good place to do that, you can still update the header and footer at a later time using the API Topic Designer.

FYI, I've already requested in the forums that the Sandcastle team set these values to empty in the default presentations to save effort on behalf of end-users.

Text-only Content Items

Not all of the items are XML-editable using the API Topic Designer. Most are text-only since there will probably not be a need to use HTML formatting for them. Text-only items only show the Source tab and will escape any XML markup that you enter. To configure which items use the HTML editor on a per-presentation basis you must edit DocProject's configuration file (DaveSexton.DocProject.dll.config, normally found in C:\Program Files\Dave Sexton\DocProject\bin):

Figure 5: DaveSexton.DocProject.dll.config sharedContentXmlItems attribute

Missing Dependencies List

A while back, around version 1.3.0, I attempted to build documentation for DocProject using DocProject itself but I quickly found that a bug in Sandcastle's MRefBuilder program (actually, CCI) relating to binding redirection prevented me from doing so. In the last version, 1.6.2, I decided to try again but this time I found that DocProject's ReferenceResolver class wasn't able to resolve the Interop.MSHelpCompiler.dll reference (provides the Help 2.x compiler services) for the Sandcastle project (DaveSexton.DocProject.Sandcastle.dll). The problem is that DocProject's Sandcastle assembly is being resolved in the GAC, but the MSHelpCompiler interoperability assembly is not installed in the GAC, so it cannot be found. I do plan to fix this bug in a future release, but since I've expierenced this issue myself I assume that there may be other scenarios that cannot be handled automatically by the reference resolver.  This features was added so that users can manually provide the path to dependencies that cannot be resolved automatically.

FYI, nobody has actually mentioned any problems with DocProject's automatic discovery of source dependencies so I assume that this feature will be used quite rarely, if at all. But then again, it must be used by DocProject itself just to build its own documentation ;)

I suggest that you do not use this feature unless you get an error when building. The error will be produced by Sandcastle's MRefBuilder utility and will state something about one or more unresolved dependencies, mentioning the full assembly name as well. From the information in the build output window you should be able to locate and add the missing dependency yourself using the new dialog:

Figure 6: Missing Dependencies List

This dialog is similar to the External Sources dialog that was added to DocProject 1.6.2 RC. You can find both dialogs in Tools > Options > DocProject > Active Projects > Build.

To add an assembly (.dll or .exe that cannot be resolved automatically by DocProject) to the list you must enter the full path and name into the text box in the new row (the row with the *) or click the button with the ellipses on the right side to select a file. You can also edit existing rows or delete rows by clicking the row header and pressing the Delete key.

The button on the left side of each row will map an absolute path to a path that is relative from the project's root directory, like the example path in Figure 6. This is useful for team and build-server scenarios.

External UI

DocProject 1.6.0 provided preliminary support for Visual C# Express and Visual Basic Express editions of Visual Studio. Unfortunately, because add-ins are only supported by Visual Studio Standard and higher, Express users had to manually edit the project files of DocProjects and DocSites to configure them after using the New Project Wizard since all of DocProject's options and dialogs are hosted by the add-in.

DocProject 1.7.0 RC has an external UI that uses a property grid, just like DocProject's Active Projects tools options page, so Express users can configure their DocProjects and DocSites using the same tools as Visual Studio Standard+ users: 

Figure 7: DocProject.exe

For Standard+ users, though this feature may be handy at times, I still recommend using Visual Studio in most situations to configure your DocProjects (via the tools options pages) and to build them in Visual Studio since it will automatically build dependency projects for you. Also, the Include Project Output Dialog does not work in the external UI and if you add an image to the XML documentation editor or the API Topic Designer while using the external UI, the image will not be included as a project item in Visual Studio (although it will still be imported into the Art folder - you'll have to Show All Files and include it manually).

Automatically Open the Active DocProject

You can add a link to the program in Visual Studio for quick access by going to Tools > External Tools... and clicking Add (this can be done in Visual Studio Standard and higher as well if desired).

By specifying the arguments in Figure 8 below, when you run the External UI from Visual Studio's Tools menu the program will automatically open the active DocProject or DocSite in Solution Explorer (that is, the project that contains the current file that is opened or the current file/folder that is selected in Solution Explorer).

The DocProjectBuildPath environment variable used in Figure 8 is registered on your system when you install DocProject.  It points to DocProject's bin directory, commonly found at C:\Program Files\Dave Sexton\DocProject\bin.

Figure 8: DocProject.exe - Visual Studio Tools Registration

Building DocProjects and DocSites

While developing this program I realized that it would be quite easy to build a DocProject from this interface using DocProject's BuildController class, so I gave it a shot. It actually worked so well that I now build my test DocProjects and DocSites using the GUI for debugging purposes. Breakpoints in code being hosted by Visual Studio's process tend to time-out after a few seconds of inactivity, due to COM issues, but since DocProject.exe is fully-managed it makes it much easier to attach, debug and continue, without having to restart Visual Studio and load up an entire solution.

Figure 9: DocProject.exe build trace

DocProject.exe will not only build your DocProject's or DocSite's help, but will also build the project's assembly output using MSBuild first (remember, DocProjects and DocSites are MSBuild projects). During a build you'll see descriptive status information describing the current step and the progress bar, just like you see in Visual Studio. You can even cancel builds at any time.

The program will also remember some settings for when you restart, such as the last position, size and window state, with multi-monitor support, the position of the toolbars and the height of the build trace window (Figure 9).

Currently, this program does not support DocProjects and DocSites that have managed C++ projects as sources since they can only be built using a solution file, but I plan to look into implementing that feature in a future version of DocProject.

DocProject.exe Extensibility

The code that is used to configure DocProjects and DocSites in Visual Studio Standard+ is the same code that is executed by the external UI. This means that if you have created a custom build engine provider for your organization it will automatically work with the external UI. Not only does this include the options in your DocProjectOptions implementation, which will appear as new properties in the grid, but also any Visual Studio toolbars that are created by your provider will appear as well. For example, in Figure 7 above you can see the Sandcastle toolbar that appears in Visual Studio Standard and higher in the DocProject UI (the right-most toolbar).

Sandcastle and Sandcastle/Deployment are examples of build engine providers that are registered in DocProject's configuration file (DaveSexton.DocProject.dll.config, commonly found in C:\Program Files\Dave Sexton\DocProject\bin).  Several of the options that appear in the figures above, including the API Topic Designer and Management Dialogs are provided by the Sandcastle build engine provider plug-in.

The next major How To... article on my to-do list is guidance for creating custom providers that can be used across projects in an organization, but if you feel like testing the waters yourself check out the DaveSexton.DocProject.Sandcastle and DaveSexton.DocProject.DeploymentSandcastle projects in the source code and try creating your own plug-in.

Conclusion

With the 1.7.0 RC, Visual Studio Express users will be able to do much more with DocProject than they could before with the new VS-like external UI. As a documentation author, you'll be able to write long examples and remarks sections using an HTML Editor and store all of the comments in your DocProject's Comments folder, which is merged with your xml code comments automatically before each build. You'll also be able to use the HTML editor to design topics' shared content, such as the header and footer – like before – but with the addition of every other item that appears in Sandcastle's shared_content.xml file as well. And now you can configure your DocProjects and DocSites to reference dependencies that could not be resolved automatically, however rare that might be.

I hope you like the new features and I look forward to receiving your feedback.

June 26, 2007

DocProject for Sandcastle 1.6.2 Release Candidate

DocProject 1.6.2 was released and is now available for download on CodePlex.

I want to thank all of DocProject's users that took the time to report bugs and provide diagnostic information to me. You've been helpful.

Recently, Anand blogged about DocProject and some of the other community projects that can automate Sandcastle to generate help for managed APIs. After his post, DocProject's rate of download increased by three times and then kept increasing. And since then I've certainly received more feedback than usual. Thanks Anand!

Updates

The latest release addresses some issues that users were reporting relating to the refresh version of the June 2007 CTP of Sandcastle, which was released a few days after its original release. I would have updated DocProject sooner if I knew :)

There is also support now for adding external sources (assemblies and corresponding xml documentation). This new feature also provides support for Web Site Projects since you must use a Web Deployment Project to generate a managed assembly, which can be added as an external source for your DocProject or DocSite.

In Progress

Currently, I'm looking at providing the ability to create and manage custom topics. I noticed that in Sandcastle's June 2007 CTP refresh there are some transformations and files related to conceptual topics. That sounds interesting to me and may be the feature that people have been asking me to add. I'll definitely research this more.

April 05, 2007

2 in 1: DocProject for Sandcastle 1.4.0 and AIP Beta 1

I've just published 2 open source deployments in one month, one of which is a brand new project: Auto-Input Protection (AIP). Both deployments are available on CodePlex.

DocProject 1.4.0 Release Candidate Is Now Available

DocProject drives the Sandcastle help generation tools using the power of Visual Studio 2005. Choose from various project templates that build compiled help version 1.* or 2.* for all project references. DocProject facilitates the administration and development of project documentation with Sandcastle, allowing you to use the integrated tools of Visual Studio 2005 to customize Sandcastle's output.

Download DocProject from CodePlex, which comes with the complete source code (C#). Try it out and let me know what you think.

Here are some of the new features in 1.4.0:

  • DocProject's Sandcastle build engine now uses the Sandcastle March CTP transformations.
  • DocProjects and DocSites can now build Html Help 1.*, Help 2.*, neither or both via a single build.
  • The New Project wizard has been extended for the Sandcastle build engine, allowing you to choose the type of help that your project will build (see previous feature).
  • DocSites use AJAX to improve client-side performance.
  • DocSite's index can be filtered, client-side.
  • The tools options page, "Dave Sexton's Tools", has been renamed to "DocProject" and was split into separate sub categories.
  • A status notification and progress bar now appear during help builds.
  • Improved error handling, including logging to the Application event log.
  • Several bug fixes.

Use a DocSite template to build an AJAX-enabled ASP.NET Web Application for your compiled help, which includes an interactive TOC, filterable index, breadcrumbs, an auto-generated header and footer, and a link to download the compiled help file. The auto-generated website has been tested for compatibility with the IE7, Firefox and Opera web browsers only.

Example DocSite
Figure 1: Example DocSite shown in Internet Explorer 7 on Windows Vista; Sandcastle vs2005 presentation.

Auto-Input Protection Beta 1 for ASP.NET Is Now Available

AIP is an extensible ASP.NET web control that provides CAPTCHA protection for your blogs, forums, wikis and websites, greatly reducing the likelihood of unwanted form submission from automated spam and hacks.

Want to see it in action? Look no further than in the comments section of my own blog posts. :)

Download AIP from CodePlex, which comes with the complete source code (C#). Try it out and let me know what you think.

Add the AIP web control to a web page with full designer support and only a few modifications to your web.config file:

Default AIP Web Control

Modify the AIP web control's presentation with a custom template and modify the CAPTCHA image that it generates by changing the bitmap and filter providers' settings in your web.config file:

Customized AIP Web Control

Easily implement custom CAPTCHA algorithms that produce randomized patterns with a custom bitmap provider and filter providers:

Customized AIP Web Control - Cross Hatch 1 Customized AIP Web Control - Cross Hatch 2 Customized AIP Web Control - Cross Hatch 3 Customized AIP Web Control - Cross Hatch 4

December 15, 2006

MSDN Wiki

Have you noticed the new Community Content sections on msdn2 documentation?

These new sections are part of the MSDN Wiki project, live on MSDN2 documentation. [1]

Community Content

Community Content is not intended to be a forum or discussion area, nor is it meant to be a place to report bugs or issues. Its purpose is for developers to collaborate through the extension of MSDN documents. Anyone can add their own tips and tricks, or simply provide extra information about a given topic if it will add value, which will then be available to anyone reading the online documentation. [2]

Participate!

To participate, simply find any MSDN2 document that has a Community Content section at the bottom and click the Add new Community Content link. Read and agree to the terms and conditions including the Code of Conduct [2], enter your display name and you're all set. (Note: You need a Windows Live ID to sign up.)

A Contributor's Rights to Their Content

The following FAQs are excerpts from [3]:

Can I reuse the content I contribute to the wiki in other publications (for example, a book or a magazine article)?  Can I reuse the code I contribute in my commercial applications?

Yes, and yes!  As described above, we do not ask for ownership of contributions of content or code.  Instead, the Contribution Agreement gives us a non-exclusivelicense to contributions.  As a result, you are free to reuse your own content or code however you like.

Who owns the rights to content I add to MSDN Wiki?  What does the contribution agreement mean?  

Although many collaborative development projects ask for assignment of ownership by contributors, we decided that you should own your own contributions.  The Contribution Agreement gives us a non-exclusivelicense to your contributions.

My Contributions So Far...

I recently added Designing Thread-safe Events in C# to the Event Design guidelines documentation since it didn't mention thread-safety anywhere. I also added a chart (a poorly-formatted one since MSDN doesn't allow table elements) to the UriComponents Enumeration document to illustrate the behavior of each flag. You can review all of my posts and edits at my public user profile on MSDN [4].

Feedback

Submit feedback about MSDN Wiki or view the feedback of others, including bug reports and suggestions at Microsoft Connect [5].

--
References

[1] MSDN Wiki RC0 Details
https://connect.microsoft.com/content/content.aspx?ContentID=996&SiteID=112

[2] MSDN Wiki -- Code of Conduct
http://msdn2.microsoft.com/en-us/library/communitycodeofconduct.aspx

[3] MSDN Community Content FAQ
§ Legal Framework
http://msdn2.microsoft.com/en-us/library/communitymsdnwikifaq.aspx

[4] Public User Profile: Dave Sexton
http://msdn2.microsoft.com/en-us/library/user-dave%20sexton.aspx

[5] Feedback, MSDN Wiki
https://connect.microsoft.com/feedback/default.aspx?SiteID=112

As always, I'd love to hear from anyone that has something to say about this post.  Drop me a comment.

.NET | C# | msdn | wiki