August 05, 2013

Hot and Cold Observables

I've touched on the topic of observable temperature while answering questions on the Rx forum and in my previous post on subjects.  I tried explaining it more deeply in this particular discussion, but now I'm finally ready to provide a deep and comprehensive1 analysis in this blog post.

I'll answer the following questions: 

What really makes an observable hot or cold?
Why should I care?
How can I tell if an observable is hot or cold?
How can I change the temperature of an observable?

The analysis that follows is not intended for beginners, though you may find the following section and the conclusion to be informative.  You may also want to watch this first and look here for additional information.

Common Sense

We like to think of hot observables from two different perspectives as follows.

From the perspective of observers, there's the potential to miss notifications.  Hot observables are "live" sequences, meaning that they are happening whether we observe them or not.  The canonical example is a mouse, which of course is free to move regardless of whether you subscribe to an IObservable<MouseMoveEventArgs> or not.  In general, if I hand you an observable and tell you that it's hot, then as an observer you'd infer that you might have missed notifications that happened before you subscribed.  Pretty simple.

From the perspective of observables, hot observables broadcast notifications to all observers.  The canonical example is converting an event into an observable; e.g., FromEventPattern.  If three observers subscribe to IObservable<MouseMoveEventArgs>, then each observer will serially2 observe the same notifications, as opposed to observing different notifications or concurrent notifications.

We like to think of cold observables from the same perspectives as follows.

From the perspective of observers, there's the potential for each observer to get different notifications.  Cold observables are "generated" sequences, meaning that they can generate different notifications for every observer.  Furthermore, observers may receive notifications asynchronously, with respect to each other.  The canonical example is Create, which can asynchronously generate notifications whenever an observer subscribes.

From the perspective of observables, cold observables won't generate notifications until an observer subscribes, and they generate notifications each time that an observer subscribes.  The canonical example is Range, which generates a range of numbers whenever an observer subscribes.

In summary, common sense tells us that:

Hot observables are always running and they broadcast notifications to all observers.

Cold observables generate notifications for each observer.

I intend to show that these ideas are more like symptoms rather than definitions.  I'll also identify a pattern to reduce them into a primitive concept: side effects.  Ultimately, I'd like to ensure that hot and cold are well-defined terms based primarily on the concept of side effects.3

Enough skipping rocks over the topic.  Let's dive into the watery refraction and expose the peculiarities of our naïve understanding of these concepts.

Uncommon Sense

We know that an observable can be either hot or cold, but can they mix?  Can an observable be generated while broadcasting?  Can it be generated  and yet always running?  Can it be always running without broadcasting?  Can the temperature change dynamically?

That last question is particularly interesting.  Can an observable change its temperature after an observer subscribes?  E.g., Observer E subscribes first, then F subscribes; gets a hot observable while F gets a cold observable.

Can an observable change its temperature for the same observer after it subscribes?  E.g., Observer G subscribes to a cold observable that eventually transitions into a hot observable, for G, while it remains subscribed.

What about specialized subscription behaviors, such as one that causes an observable to behave like it's hot until all observers unsubscribe and then it becomes cold again for a subsequent subscription, which flips it back to being hot again -- does that behavior exist?

The answer to all of these questions is YES.  Well, it seems more like YES and SORT OF, based on our common sense definitions.  I'll try to explain.

Those behaviors might seem strange, but you've probably used them together before.  They're implemented by a couple of well-known Rx operators; e.g., Replay and RefCount.  Similarly, PublishLast and some particular overloads of Publish have strange behaviors with regard to temperature.

For example, keeping in mind our common sense definitions of hot and cold, let's assume that you have an observable O and you call O.Replay() to get an IConnectableObservable<T>, which you assign to C and then call C.Subscribe(J) and C.Subscribe(K).  I think we'd all consider C to be hot (or at least warm, whatever that means) because all of its notifications are broadcast to J and K, even though notifications won't arrive until Connect is called.  It's not entirely hot because it doesn't necessarily satisfy the always running factor, hence the introduction of the confusing term warm.

So far, Replay is identical to Publish.

Then you call C.Connect(), which may cause J and K to observe notifications from O.  If O is hot, then the always running factor was satisfied anyway, but if O is cold, then it's not satisfied; regardless, I think we'd all consider C to be hot at this point because of its broadcasting behavior alone.

Finally, you call C.Subscribe(L), and now there's a race condition on L with respect to our common sense understanding of observable temperature.  If O generated any notifications sometime between the calls to C.Connect() and C.Subscribe(L), then L will immediately observe replayed notifications, thus C was technically a cold observable when L subscribed because those particular notifications aren't rebroadcast to J and K; i.e., C generates specific notifications for L.  It means that while C is hot for J and K, it's cold for L; however, it's not permanently cold for L because after C replays all of the notifications that L missed, then L will be observing the same exact broadcast notifications as J and K, which means that C transitions from cold to hot for L.  Although, if O hadn't generated any notifications before you called C.Subscribe(L), then C would start out hot for L just like J and K.

Apparently, sometimes Replay is hot (though only if its source is hot, and only before Connect is called), sometimes it's warm (though only if its source is cold, and only before Connect is called), sometimes it transitions from warm to hot (though only for observers that subscribed before calling Connect) and sometimes it transitions from cold to hot, with a race condition (though only for observers that subscribe after Connect was called).

Part of the reason that this is confusing is because always running, broadcast and generate are actually relative concepts.  They implicitly refer to the time of subscription, though we tend to ignore this fact.  Always running refers to pre-subscription, generate refers to the time of subscription and broadcast is a post-subscription concept.  This explains how operators like Replay can offer strange combinations of these behaviors.

The common sense meanings of hot and cold are diluted by their differences in relation to the time of subscription and the connectability of operators like Replay, with a dependence on the temperature of the source and the presence of a race condition.

Technically it's possible for an observable to be always running without broadcasting; e.g., simply don't subscribe to a hot observable.  Our common sense definition tells us that it's hot even though it's not broadcasting.  Always running and generating are opposite behaviors, which makes broadcasting seem more like a symptom rather than a definition.

It's also possible for an observable to be generated while broadcasting; e.g., publishing a cold observable.  Again, common sense tells us that it's hot even though it's not always running, because we know that an underlying observable is actually responsible for generating notifications, which are broadcast to all observers.  This makes always running seem more like a symptom rather than a definition.

And finally, as shown by the Replay example above, an observable can be always running and yet generated at the same time.  They'll overlap until generation completes, like emptying a buffer, after which the observable seamlessly transitions into the always running sequence.  It's hard to say whether this kind of sequence is purely cold based on the common sense definition of generation alone.  We almost want to call it, "hot with some additional notifications".  This makes generation seem more like a symptom rather than a definition.

In general, our common sense definitions of hot and cold aren't precise enough to be useful to observers so we tend to ignore them and make assumptions based on what really matters: side effects.  All we really care about is answering one simple question:

If I subscribe to your observable, might it cause side effects?

It would be great if we could specify this behavior in a word.  The terms cold and hot should provide the answers yes and no, respectively.

Subscription Side Effects

Wikipedia defines side effect as:

Modifying some state or causing an observable interaction with calling functions of the outside world.

However, sometimes mutation is the primary effect; e.g., assignment.  Clearly the definition is relative.

So let's define side effect for our purposes as:

Any effect that is not the primary effect.

The primary effect of subscribing to an observable (i.e., calling Subscribe) is to register an observer for callbacks, which probably means that the observer will be added to the observable's internal list of observers and a disposable will be created and returned to represent unsubscription.

Likewise, the primary effect of .NET event registration is to add a handler to the list of delegates in the event's MulticastDelegate instance.  We can easily convert any event into an observable.  The primary effect of subscribing to an observable event is to convert an observer into an event handler, register it with the event, and return a new disposable that unregisters the observer from the event.

That pattern remains the same for other native asynchronous conversions as well, such as FromAsyncPattern and ToObservable (Task/Task<T>).

In addition, the compositional nature of Rx (more on that later in this post) allows operators to form subscription chains.  Subscribing to the outer-most operator's observable causes it to subscribe to the previous operator's observable, which subscribes to another observable, and so on.  The entire subscription chain is certainly part of the primary effect of subscribing.  Ultimately, the goal is to subscribe to the inner-most observable through as many intermediary subscriptions as necessary.

Subscription chains inherit side effects.  If any inner subscriptions cause side effects, then the outer-most subscription inadvertently causes them too; therefore, our definition of subscription side effects implicitly includes the sum of all inherited subscription side effects.

Based on the ideas above, a subscription side effect is:

Any side effect other than subscribing to another observable, adding an observer to an observable's "list of observers" and creating a new disposable for unsubscription.

where the meaning of "side effect" is defined by wikipedia and "list of observers" is defined by the necessary conversion to the underlying asynchronous model, if any.

Perhaps for the sake of simplicity I should reduce that definition.  Side-effect inheritance is implied, as is adding an observer to a "list of observers" and returning a disposable for unsubscription.  All of these behaviors are intuitively part of the subscription mechanism.

Therefore, a subscription side effect is:

Any effect beyond an observable's subscription mechanism.

Examples of subscription side effects include:  Calling Schedule, OnNext, OnError, OnCompletedGetEnumerator or MoveNext, mutating a field, creating an object in memory, running a CPU-intensive computation, sending a web request, reading a file, ending a process, formatting your C drive, or really anything else that you can think of that isn't merely the subscription mechanism and may cause an observable effect on the outside world, including notifications.

Covering All Bases

Duality

If you're interested in the duality between observables and enumerables, then copy and paste the Covering All Bases section into your favorite text editor and replace the following character sequences (not necessarily on word boundaries):

asynchronous synchronous
observable enumerable
observe enumerate
subscription enumeration
subscribing enumerating
subscribe MoveNext
notification element
always running pre-calculated
broadcast broadcast/share
subject
connection
connect
IBuffer
pushed
listening in to
calling OnNext, OnError or OnCompleted
yielding

It works!  Some of the grammar and spelling is a bit off though, and some of the ideas and operators may be entirely irrelevant due to the lack of asynchrony in enumerables, but the given definitions of hot and cold and the summarization are correct.

The concept of subscription side effects completely covers our common sense understanding of temperature as follows.

Recall our definition of subscription side effects from above:

Any effect beyond an observable's subscription mechanism.

Recall our common sense definitions of temperature from above:

Hot observables are always running and they broadcast notifications to all observers.

Cold observables generate notifications for each observer.

Understanding how cold behavior relates to subscription side effects is easy:

Generate means calling OnNext, OnError or OnCompleted when Subscribe is called.  This can be done synchronously or asynchronously with an IScheduler.  Either way these actions are, by definition, subscription side effects.  That's pretty simple.

Now let's jump to the common sense definition of hot and see how it compares:

Always running means that as soon as an observer subscribes it begins listening in to notifications.  It implies that notifications will not be generated as a side effect of subscribing, but that the mere act of subscribing enables an observer to receive notifications that would have occurred anyway.  In other words, subscribing to an observable that is always running does not necessarily cause subscription side effects.  The primary effect of subscribing is all that is needed to begin receiving notifications asynchronously.

Broadcast means that when a notification is ready to be observed it is pushed to all currently subscribed observers, which does not seem to imply anything about subscription side effects.  Theoretically, an observable could asynchronously generate notifications upon subscription, pushing them to the original observer that caused these subscription side effects while also broadcasting them to any observers that are fast enough to subscribe before the notifications are generated.  That's not the same behavior as Replay, for example, because Replay doesn't broadcast the notifications that it generates for an individual observer.  Apparently this behavior doesn't exist in any Rx primitives.  When an observer subscribes to a cold observable, without publishing, it observes notifications that are generated for it alone; i.e., cold observables don't broadcast.

Broadcast in Rx means that an observable is not responsible for generating notifications itself, but instead broadcasts notifications on behalf of some underlying observable.  That's the difference between the common sense definitions of cold and hot, respectively; therefore, broadcasting does not cause subscription side effects, meaning simply that when you subscribe to a broadcasting observable there will be no subscription side effects.

Broadcasting a cold observable has to cause subscription side effects once, though it's not caused by an observer subscribing to the broadcasting observable, but the broadcasting observable subscribing to the cold observable; a.k.a., "connecting".  Connection is exposed as a public operation, either by a subject's ability to act as an observer or by an operator returning IConnectableObservable<T>.  Theoretically, connection isn't necessary if the underlying observable is always running because observers that haven't subscribed yet are going to miss notifications whether the broadcasting observable is connected or not; however, if the underlying observable isn't always running (i.e., it's cold), then connection is required to ensure that observers don't miss notifications before they've subscribed to the broadcasting observable.  As a result, exposing the connection behavior of broadcasting observables defers subscription side effects of the underlying observable.  This implies again that broadcasting doesn't cause subscription side effects.  Subscribing to a broadcasting observable is all that is needed to begin receiving notifications asynchronously from the underlying observable, although notifications may be deferred until connection if the observable is connectable.

An emerging pattern here is that subscription side effects aren't expected at all in hot observables, including the case where a hot observable broadcasts notifications for a cold observable, because subscription side effects are deferred until connection.  We'll call them connection side effects instead.

Connectability may be hidden from observers.  Subjects are observables, thus the AsObservable operator can be used to hide their connectability.  The same applies to IConnectableObservable<T>.  In either case, hot behavior is retained even when observers know nothing about the underlying connectability of observables; therefore, simply knowing that an observable broadcasts implies that it doesn't cause subscription side effects.  It may have caused or will cause connection side effects, but that's beyond an observer's control when connectability has been hidden.

In addition, some stateful behaviors in Rx use connection side effects to separate notifications from other kinds of subscription side effects.  This is perhaps the most confusing case as far as the common sense definitions of hot and cold are concerned.

Recall from a previous section, when we reviewed the strange behaviors of Replay and RefCount, we determined that our common sense definitions are diluted because of their differences in relation to the time of subscription and the connectability of operators, with a dependence on the temperature of the source and the presence of a race condition.

Replay, PublishLast, and overloads of Publish that have an initialValue parameter, broadcast all subscription side effects, including notifications, to any observer that subscribes before they are connected to the underlying observable.  They also broadcast all subscription side effects, including notifications, to any observer that subscribes after connection but before the underlying observable generates any notifications.  They broadcast all subscription side effects, excluding missed notifications but including future notifications, to any observer that has missed notifications.  The missed notifications aren't broadcast, they are generated specifically for the particular observer that missed them.

These operators show that hot and cold are relative terms with regard to different kinds of subscription side effects, of which notifications are just one example.  They separate missed notifications from other kinds of subscription side effects to return a relatively cold observable or a relatively hot observable, respectively.  This is possible because broadcasting is a post-subscription concept and the ability to miss notifications is a pre-subscription concept that can be remedied at the time of subscription through generation, thus allowing an observable to be hot for some observers and cold for others, simultaneously.

Furthermore, the ability of these kinds of observables to transition from the common sense definitions of cold to hot is of no importance now, given that all we really care about is subscription side effects.  In other words, the newly defined hot and cold terms are relative to the time of subscription and to different kinds of subscription side effects, so the ability to transition is irrelevant.  An observable either causes subscription side effects or it does not.

Thus we no longer need any confusing terms such as warm to describe the behavior of connectables.  A connectable observable is always hot before it's connected and is generally hot afterwards, though some operators apply special behavior to missed notifications to ensure that they aren't actually missed.  These operators return an observable that is cold with regard to missed notifications, for a particular observer, and hot with regard to any other kind of subscription side effect, including future notifications; however, because our new definitions of hot and cold make transitioning irrelevant, we can say absolutely that an observer that has missed notifications will get a cold observable from these operators.  (The race condition's effect on our new definition of cold will be addressed later in this post.)

In summary, our new understanding of temperature with regard to subscription side effects tells us that:

Hot and cold are relative terms with regard to the time of subscription and to different kinds of subscription side effects.

Hot observables either don't cause subscription side effects or defer them until connection, where they become connection side effects.

Cold observables rely on subscription side effects.

Perhaps we can distinguish between these terms further by defining whether subscription side effects may or may not occur:

Hot observables do not cause subscription side effects.

Cold observables do cause subscription side effects.  (Though keep reading because later on I'll relax this definition a bit.)

Finally, we now have a specific and useful definition of observable temperature:

Temperature indicates the propensity of an observable to cause subscription side effects.

Furthermore, we can largely ignore the relativity to particular kinds of subscription side effects since it only matters when discussing the special behaviors of Replay, Publish and PublishLast.  And those operators only distinguish missed notifications from other kinds of subscription side effects, so they can easily be described as generally hot yet cold with respect to missed notifications.

In general, the terms hot and cold cover all kinds of subscription side effects, including notifications. 

Concerning Replay, PublishLast, and Publish overloads with an initialValue parameter, they return observables that are cold for particular observers that have missed notifications, yet they are hot with respect to any other kind of subscription side effect.

Now let's test the reverse: We'll see if our original common sense definitions can be inferred from our new definitions of hot and cold.

Given an observable that is known to not cause any subscription side effects, one must infer that it's either always running, which never causes subscription side effects, or it must broadcast, which defers subscription side effects into connection side effects; therefore, we must refer to the observable as hot.

Our new definition seems to be holding up so far.  Let's see how it compares to the common sense definition of cold.

Given an observable that is known to cause subscription side effects, one can infer that it might generate notifications when an observer subscribes, thus we may refer to it as cold.

It seems like we've got a match; however, notice that I wrote "might generate notifications".  That's because, as described previously, subscription side effects are far more general than just generating notifications.  A cold observable, by our new definition, can cause any subscription side effect.  For example, it can send a web request or asynchronously read from a file, without ever generating any notifications.  Of course, generating a notification would be useful so it's quite common for these examples in particular to generate notifications asynchronously once the operation has completed.

There are other kinds of subscription side effects that do not generate any notifications - not even asynchronously.  For example, imagine tracking the number of observers subscribed to your observable; each time Subscribe is called an integer field is incremented by 1.  That's similar to the behavior of the RefCount operator, described above.  Another example is subscription logging, for diagnostic purposes, which may cause a file to be updated on disc every time an observer subscribes.  The common sense definition of cold doesn't cover these cases, but our new definition does!

It seems to me that our new definition of temperature entirely covers the common sense definition, and does so elegantly as diametrically opposed concepts.  Furthermore, our new definition of hot nicely ties together the two orthogonal concepts that make up the common sense definition of hot, while our new definition of cold identifies the important factor and generalizes it.

But why are subscription side effects the important factor?

Composition

LINQ operators enable the composition of observables into "queries".  Applying any operator to one or more observables means that it will subscribe to those observables when you subscribe to the query.

When defining a query, it's not uncommon to reference an observable multiple times, which means passing the same observable to different operators or to the same operator as multiple arguments.  Since we know that operators subscribe to the observables that you pass to them, referencing the same observable multiple times means that it will potentially have multiple subscriptions.

Some operators have the semantics of multiple subscriptions without multiple references.  For example, Retry only accepts a single observable and potentially subscribes to it multiple times.  In this case it's desirable to duplicate subscription side effects.  For example, if we pass to Retry an observable that makes a web request as its subscription side effect, then we want Retry to execute that subscription side effect every time it re-subscribes.  In other words, we want to retry the web request.

That's why subscription side effects are the important factor.  From the point of view of the Retry operator, we don't necessarily want to duplicate the observable's notifications (as indicated by our common sense definition of cold), we simply want to duplicate the subscription side effects, whatever they might be.  In other words, it's not about generating the notifications, it's about executing the side effects, which are often responsible for generating the notifications.

Sometimes an observable is referenced multiple times without the semantics of allowing duplicate subscription side effects.  For example, you could apply the Where operator twice to the same observable to create two different observables, then apply the Zip operator to merge them back into a single observable in a pairwise fashion.  The query's output semantics aren't important here; what's important is that you don't want to create two subscriptions to the original observable.  Instead, you want broadcast behavior.

Broadcasting is essentially the same thing as sharing subscription side effects among multiple observers.  In the previous example, imagine that the observable to which you are applying the Where operator is cold; e.g., each time that you subscribe it sends a web request to open a reactive socket connection to a server and then streams the responses as notifications.  The Zip operator is going to subscribe twice, but we don't want to send two web requests.  Our desire to broadcast the notifications is the same as our desire to broadcast the web request, because it's the web request (i.e., the subscription side effect) that is responsible for generating notifications in the first place.

And again, that's why subscription side effects are the important factor.  In the former case we wanted to duplicate them, while in the latter case we wanted to broadcast them.

Composition in Custom Operators

The previous cases are special because we had prior knowledge of an observable's temperature.  But what if we didn't?

Composite operators are queries themselves; however, they don't typically define contracts for the temperatures of their observable parameters.  In other words, when we define a custom operator, we must make assumptions as to the temperatures of observable parameters.

An obvious question is: Why can't we define temperature contracts on operators?

Well, technically you could, but then you'd be forcing callers in some cases to explicitly change the temperature of observables before calling your operator, which changes the declarative nature of some operators by bringing temperature into the foreground when perhaps it's unnecessary.

Furthermore, if a caller is unaware of the temperature, then an assumption has to be made anyway.  Unless the contract is propagated to a caller that can control the temperature - and now we've got a viral effect, as is common with contracts.

Alternatively, you'd probably want to have two overloads for every operator where temperature is of concern: one overload for cold observables, one for hot observables.  But how can you overload operators by temperature without defining new types?

Do we really need this complexity?  What are the disadvantages of making assumptions at the operator level?

If you were to assume that all observable parameters are hot, then it's fine if your operator relies on broadcast behavior, but it's wrong if your operator must duplicate subscription side effects; e.g., Retry.  Alternatively, if you were to assume that they are cold, then the reverse is true: It's fine if duplicating subscription side effects is what you want, but it's wrong if you need broadcast behavior.

Therefore, we need to consider the possibilities of conversion before we can determine which assumption is best in general.

Temperature Conversion

First, a quick note about language:

IObservable<T> is an immutable type; therefore, there's no way to mutate its temperature.  When I refer to "changing" an observable to a particular temperature or "making" an observable a particular temperature, I'm using those terms in the general sense of unary operator application.  For example:

var x = 10;
var y = -x;

You might say that the second line "negates x", but technically it doesn't because after the negation operation, x hasn't changed.  It's applying the negation operator to x and assigning the result to y.  The former is just a natural way of describing the essence of the operation.  Although it's ambiguous in this situation, it's not when dealing with IObservable<T> because it's immutable.  Therefore, "changing" an observable's temperature or "making" it have a different temperature simply refers to the application of some operator and the assignment of the result into a new variable, which holds a different observable with the desired temperature.

We can change a cold observable into a hot observable by applying the Publish operator, which has overloads that project the observable into a selector function as a hot observable (though the operator itself returns a cold observable) and overloads that return a connectable observable.  The latter defers subscription side effects until connection, effectively broadcasting them across multiple observers as connection side effects.

Publish makes a cold observable hot.4

Defer makes a hot observable cold.

The Defer operator allows us to add subscription side effects to a hot observable, thus making it cold.

Defer also allows us to convert invocation side effects into subscription side effects; e.g., given a function that returns a hot observable like Start or a C# 5.0 async method that returns a Task<T>, which you then convert into an observable with ToObservable, you can wrap either function in Defer to change its invocation side effects into subscription side effects, thus creating a cold observable from a hot function.

Alternatively, you can change the temperature of a hot observable by combining it in various ways with a cold observable.  For example, if you Concat a hot observable to a cold observable, then the result is a cold observable, by our new definition, because the subscription side effects of the cold observable are prepended onto the hot observable.  If you were to apply the Merge operator instead, then the result would also be a cold observable because, again, the new observable retains the cold observable's subscription side effects.  Remember, it's the subscription side effects that are important, which is why combining a hot observable with a cold observable in such a way that the subscription side effects of the cold observable are preserved yields a cold observable.

Notice, however, that converting from cold to hot simply adds broadcasting behavior, while converting from hot to cold relies on the addition of subscription side effects.  In other words, going from cold to hot is as simple as calling one operator, but going in reverse requires side effects to be added.

In the previous section, we left wondering which assumption is best for an observable with an unknown temperature: cold or hot.  Perhaps we can answer that now by examining the consequences of making wrong assumptions with respect to temperature conversions.

When we need a hot observable, assuming that it's already hot and being wrong means that subscription side effects are duplicated, which breaks the semantics of our query; therefore, let's assume that every observable is cold and apply Publish.  When we're wrong, we'll end up applying broadcast behavior to a hot observable, which would simply have no effect.  That's 1 point for assuming that observables are cold, because when we're wrong, applying Publish does no real harm.5

When we need a cold observable, assuming that it's already cold and being wrong means that subscription side effects won't be duplicated, which breaks the semantics of our query; therefore, we must assume that every observable is hot and... do what?  Remember, the semantics of our query aren't to introduce new subscription side effects, we want to duplicate the subscription side effects of the observable that was handed to us.  We can't make a hot observable cold without introducing subscription side effects; therefore, all we can do is assume that the observable is already cold and hope for the best!  That's 2 points for cold, 0 points for hot.

In summary:

We must assume that any observable with an unknown temperature is cold.

In other words, it's safest to assume that any given IObservable<T> may cause subscription side effects because it's easy to broadcast subscription side effects when necessary, but once broadcast it's impossible to reverse it.  There's no way to force a broadcasting observable's Subscribe method to cause subscription side effects that were already broadcast.

Note that there is a way to reverse the broadcast behavior of an IConnectableObservable<T>: simply dispose of the object that was returned by the Connect method.  In fact, that's exactly how RefCount works.  But this can't be done with any IObservable<T>.  Even trying to cast it to IConnectableObservable<T> won't help because casting doesn't give you a reference to the disposable that was returned by Connect.

Given that it's safest to assume that any unknown IObservable<T> is cold, it follows that sometimes we're going to be wrong.  We should probably relax our previous definition of cold to account for that fact.

Here are our latest definitions: 

Hot observables do not cause subscription side effects.

Cold observables may cause subscription side effects.

That should cover all possibilities.

Classification of Subscription Side Effects

A subscription side effect can be classified as any of the following.6

  • Synchronous
    • Notification
      E.g., directly calling OnNext, OnError or OnCompleted.
      (Technically, notifications are always Asynchronous in Rx due to its implicit use of the CurrentThreadScheduler to schedule subscriptions, though we'll ignore that fact and define Synchronous in relation to the outer-most subscription only, where synchronously calling OnNext, OnError or OnCompleted actually blocks Subscribe from returning.)
    • Out-of-band
      E.g., directly assigning a field, creating a large object in memory, executing a CPU-intensive computation or (synchronously) deleting a file on disc.
  • Asynchronous
    • Notification
      E.g., using an async mechanism, such as Task or IScheduler, to schedule calls to OnNext, OnError or OnCompleted.
    • Out-of-band
      E.g., making a web request or reading a file on disc.  Often these kinds of operations are used for Asynchronous Notification as well.
  • Composition
    Subscribing to another cold observable that causes its own subscription side effects, either synchronously or asynchronously.
  • Computation
    Executing costly or side-effecting code for each notification; e.g., a costly projection via Select or imperative side effects via Do.

Perhaps these classifications could be useful when documenting operators.

For example, the documentation for the Range overload without an IScheduler parameter could state that it returns an observable that causes Synchronous Notification subscription side effects.  The overload with an IScheduler parameter may cause Synchronous or Asynchronous Notification subscription side effects, depending upon the specified IScheduler; this is similar for all operators that are overloaded with an IScheduler parameter.

The documentation for the Interval overload without an IScheduler parameter could state that it returns an observable that causes Asynchronous Notification subscription side effects.

The documentation for the Defer operator could state that it returns an observable that may cause Synchronous Out-of-band subscription side effects, depending upon the specified function.  It could also state that it may cause Composition subscription side effects, which basically means that it inherits the subscription side effects of the observable that is returned by the specified function.

The documentation for the Select operator could state that it returns an observable that may cause Composition subscription side effects, which basically means that it inherits the subscription side effects of the source.  It could also state that it may cause significant Computation side effects, depending upon the specified selector function.

Most of these terms should be self-explanatory by now; however, Computation is particularly interesting since I haven't mentioned it yet.  I consider a computation subscription side effect to be:

Any effect beyond an observable's subscription and notification mechanisms that occurs for each notification.

The notification mechanism is the minimally required effects for notifying; i.e., directly calling OnNext, OnError or OnCompleted are the primary effects of notification.  Anything else is a computational side effect.

Although this kind of side effect is part of an observable's computation, it won't occur until subscription.  Subscribe is essentially the beginning of the computation, thus subscription side effects are technically computation side effects.  But since subscription side effects are the focus of temperature, that's why I refer to computation side effects as a kind of subscription side effect, in the sense that computation side effects don't occur until an observer subscribes.

Computation side effects are part of the definition of cold observables, though it basically means that applying any computational or side-effecting operator to a hot observable converts it into a cold observable; e.g., applying Select to a hot observable means that all observers will receive different notifications, generated specifically for them, which is the common sense definition of cold.  That's not a very useful definition of cold, so I propose to separate the notion of computation side effects from subscription side effects and to exclude them from the new definition of cold, introducing new terms: active and passive.  This separation allows a hot observable to be referred to as hot even if it causes computation side effects.  And you can still apply Publish to broadcast the computation side effects similar to how you'd publish a cold observable to make it hot.

Passive observables do not cause any significant computation side effects.

Active observables do cause significant computation side effects.  Every notification may be computed with some relative cost.

Select and Do are common examples of operators that return active observables.

These terms are context-sensitive, as indicated by use of the word "significant".  An observable that is passive in one usage can be considered active in another.  For example, a simple projection over an observable of integers, such as adding 5 to each value, probably isn't costly enough in terms of memory or performance to care if its computation side effects are duplicated.  In contrast, a projection that calculates the factorial of each value might be costly enough in certain scenarios to consider publishing.

Conclusion

Temperature indicates the propensity of an observable to cause subscription side effects.

A subscription side effect is:

Any effect beyond an observable's subscription mechanism.

Subscription mechanisms generally consist of storing an observer in a list or subscribing to other observables, and returning a disposable for unsubscription.

Examples of subscription side effects include:  Calling Schedule, OnNext, OnError, OnCompletedGetEnumerator or MoveNext, mutating a field, creating an object in memory, running a CPU-intensive computation, sending a web request, reading a file, ending a process, formatting your C drive, or really anything else that you can think of that isn't merely the subscription mechanism.

Hot observables do not cause subscription side effects.

Cold observables do cause subscription side effects; however, we must assume that any observable with an unknown temperature is cold, and sometimes that assumption will be wrong; therefore, a more accurate definition is:

Cold observables may cause subscription side effects.

Again, we must assume that any observable with an unknown temperature is cold.

In the simplest case, an observable may be permanently hot or permanently cold, though some observables change their temperature over time so that different observers may get different temperatures; therefore:

The temperature of an observable is relative to the time of subscription.

Temperature is also relative to the kind of subscription side effect; however:

In general, the terms hot and cold cover all kinds of subscription side effects, including notifications.

Replay, PublishLast, and Publish overloads with an initialValue parameter return observables that, once connected, are cold for observers that have missed notifications yet hot with respect to any other kind of subscription side effect.

We must also consider an observable's computation side effects.  A computation side effect is:

Any effect beyond an observable's subscription and notification mechanisms that occurs for each notification.

Technically subscription side effects are initial computation side effects, although I've separated these concepts so that hot and cold only refer to subscription side effects, thus ensuring that hot is a more useful term.  As a result, the phrase computation side effects can be applied to either hot or cold observables.  I've assigned the following terms:

Passive observables do not cause any significant computation side effects.

Active observables do cause significant computation side effects.

The significance of computation side effects on performance is relative to the operation, query or application.  What is significant in one context may be insignificant in another.

Now I can finally answer the original questions from the top of the post with absolute precision and certainty.

What really makes an observable hot or cold?

Short answer:

An observable's temperature is its propensity to cause subscription side effects.

Long answer:

Hot observables do not cause subscription side effects.

Cold observables may cause subscription side effects.

In addition, there are two related concepts that may be applied to both cold and hot observables:

Passive observables do not cause any significant computation side effects.

Active observables do cause significant computation side effects.

Why should I care?

You need to be aware of whether subscribing to an observable multiple times is going to duplicate side effects; i.e., cold behavior.

When defining a query that references the same observable multiple times, you'll typically want to broadcast notifications among the operators to ensure that they observe the same exact notifications.  In other words, you want a hot observable.  A cold observable may generate different notifications for each observer.  You'll also want to broadcast other subscription side effects; e.g., given a cold observable that makes a web request each time an observer subscribes, you don't want two operators in the same query to make two separate web requests.

Alternatively, sometimes you want duplicate subscription side effects in your queries; e.g., given a cold observable that makes a web request each time an observer subscribes, applying the Retry operator means that you want to keep making web requests until one succeeds.  The very purpose of the Retry operator is to duplicate subscription side effects.  It's perhaps a fairly common mistake to pass a hot observable to Retry and then wonder why Fiddler isn't showing a second web request after the first attempt failed.

Furthermore, you must consider computation side effects to determine whether they may have a noticeable performance impact on your query.  If they do, then you'll probably want to broadcast them as well.

How can I tell if an observable is hot or cold?

There's no way to be sure from an observer's point of view.  Subscription side effects might be directly observable or they might not.

Furthermore, subscription side effects that generate notifications might be indistinguishable from the notifications that would be generated if you assumed that the observable is hot.

The only way to know for sure is to look at the documentation source code.  :)

When in doubt, it's safest to assume that an unknown observable is cold.

Temperature by Operator

(You may want to review Rx operators by category, although it's somewhat outdated as of Rx 2.0).

All of the Rx generator methods return observables that are of course cold by definition: Return, Range, Timer, IntervalGenerate and Create.

There are 4 additional generators that are kind of edge cases: Empty and Throw are cold; Start is cold too, but we'll get back to it later; Never is technically hot, but it doesn't really matter.

In general, it's safe to assume that Rx's built-in conversion operators for .NET events return hot observables; e.g., FromEvent and FromEventPattern.  In reality, there's no guarantee.  For example, you can define a custom .NET event and implement an add method that synchronously invokes each handler that it receives.  That event would convert into a cold observable!  Luckily, events generally don't do strange things like that, so it's safe to assume that they are hot.

You can be sure that Replay, Publish and PublishLast overloads that return IConnectableObservable<T> are hot until you call Connect, though after Connect it's more complicated.  Publish overloads without an initialValue parameter are always hotReplay, PublishLast, and Publish overloads with an initialValue parameter, become cold when Connect is called since they introduce a race condition.

Each of those operators also have overloads that accept a selector function.  The selector overloads return cold observables, though the observable passed to the selector function is similar to above.  When the selector is called, the observable passed in is hot, but when the selector returns, the observable passed in is automatically connected and starts to behave as described previously.

FromAsyncPattern and ToAsync return functions that return observables similar to PublishLast, but connected when the function is invoked.  It's safest to assume that the observable is cold, just like PublishLast.  You can think of the function as a parameterized connectable since it acts like the Connect method of IConnectableObservable<T>.

Start is similar to ToAsync and PublishLast, though it's not connectable, thus it's safest to assume that it returns a cold observable.

Task and Task<T> are themselves hot, although the ToObservable conversion returns an observable similar to PublishLast, thus it's safest to assume that ToObservable returns a cold observable.

Defer returns a cold observable, though actually it depends on whether it causes subscription side effects or not.  Typically the only reason that you wouldn't know whether it causes subscription side effects is when you use it to defer an unknown function that returns an observable.  The unknown function might not cause any invocation side effects and may return an observable that doesn't cause any subscription side effects, thus Defer may simply be passing through a hot observable.  That behavior indicates either an edge case or improper usage of the operator.

StartWith, Sample and Using return cold observables.  They all cause subscription side effects.

All of the remaining operators inherit temperatures from their source observables.  Well, maybe not all remaining operators – I may have missed a couple when I reviewed them for this post ;)

Select, Do, Aggregate and Scan may cause significant computation side effects.  (See above for a description of computation side effects.)

Technically, any operator with a delegate parameter that's invoked for some or all notifications may cause significant computation side effects.  There are many more operators like this than what I've listed above.

Remotable causes significant computation side effects, assuming that you consider transmitting notifications through the .NET remoting infrastructure to be significant in terms of its performance implications.

How can I change the temperature of an observable?

Subscription side effects:

Simple Conversions

Publish is useful for broadcasting subscription side effects.  It makes a cold observable hot.

Defer is useful for adding subscription side effects and for converting invocation side effects into subscription side effects.  It makes a hot observable or observable-returning function cold.

Operators such as Merge and Concat, among many other combining/zipping/joining operators, can add subscription side effects to hot observables by combining them with cold observables, thus returning cold observables.

Complex Conversions

Using a Subject as an observer makes a cold observable hot, though it's not a good reason to use subjects.  See my previous post on subjects for details.

Replay, PublishLast, and Publish overloads with an initialValue parameter, are useful for broadcasting subscription side effects while also replaying notifications to observers that missed them.  These operators make cold observables hot in terms of all subscription side effects except for missed notifications; therefore, they remain cold in regard to missed notifications.

Computation side effects:

Publish is useful for broadcasting computation side effects.  Whether or not you want to broadcast them depends upon their significance with respect to resource usage and/or performance and how they compare to your operation, query or application.

Operators such as Select and Do, among many others, are useful for adding computation side effects.

 


  1. Not in a mathematical or academically formal sense.  Posting the idea to my blog allows me to focus on rationalizing it and elicits proper feedback.  Though if anybody can explain it in a mathematically formal sense, then by all means please describe it in the comments.
  2. Serial behavior is described in §§4.2, 6.7 of the Rx Design Guidelines document.  Note that FromEventPattern does not enforce this contract, so it's either the raiser's responsibility to ensure that it does not raise the event concurrently or it's an observer's responsibility to apply the Synchronize operator.
  3. Obviously not in any kind of official capacity.  I just need a way to reference these new concepts now.  I'd be happy to adopt better or formal naming conventions if they already exist or will emerge in the future.
  4. Publish, PublishLast and Replay also have convenient overloads accepting a selector function that acts as a scoped query in which the original observable is hot, thus allowing callers to avoid having to deal with connection themselves.  The temperature of an observable returned by these overloads is cold because it publishes and connects as a subscription side effect.
  5. Perhaps it would be nice if Rx's implementation of Publish could somehow detect whether an observable is either Subject<T> or it's hidden by the AsObservable operator, in order to avoid introducing yet another subject into the query unnecessarily.  But that would be an internal optimization and it doesn't change the fact that publishing a hot observable probably has no noticeable performance impact in most programs.
  6. I've made all of these terms up myself.  I'll gladly replace them with formal terms if someone provides them to me in the comments.

Tags:

.NET | Rx

Comments (2) -

Nick Nelson United States
11/22/2013 9:33:26 AM #

Wow!  I just came across this post today and I just wanted to say this is the most comprehensive explanation of hot and cold observables I have seen.  Thank you very much for writing this.

Reply

Nick
10/19/2015 11:49:06 PM #

Side effects is the devilry that makes Rx so difficult in inherently imperative language. Great article! I spent 6 or 7 hours reading and re-reading this and the "Subject or not to Subject" and finally feel that my Rx knowledge started to crystallize.

Reply

Pingbacks and trackbacks (1)+

Add comment