May 05, 2012

TCP Qbservable Provider Security

UPDATE: The prototype TCP Qbservable Provider library that is described in this article has become the Qactive project on GitHub, including many bug fixes and improvements.

This is the second post in a series of posts on my TCP Qbservable Provider library, which is a proof-of-concept library that enables easy hosting of queryable observable services over TCP.  In this post, we dive into security.

There are two kinds of security that I believe must be fully supported by any IQbservable Provider to be a viable option for hosting public web services that can accept queries from anonymous clients:

  1. Semantic security
  2. Resource security

I’ve recently checked into source control new security features that address these issues.  More work needs to be done before the next release, but it’s a good start.  I’ve also included new examples in the example applications: SandboxedService, LimitedService and MaliciousClient.  These new examples are explained throughout this post.

Note that the new restricted security model is enabled by default for all hosted queries.  You must opt-out of the default security model by setting specific properties and by using a specific API to host your service.  The details are explained within this post.

Semantic Security

Semantic security is about limiting the kinds of queries that a service will execute to a minimal set that makes sense for the kind of service being offered.  It can be broken down further into two subcategories:

  1. Operators and Methods
  2. Types of Expressions

Operators and Methods

Semantic security limits the kinds of operators and methods that the service permits.  Clients can technically send any kind of operators that they want – you can’t prevent anonymous clients from sending strange (or even malicious) queries; however, the service must reject queries that don’t make sense based on the operators used within the queries to prevent the server from executing code that is unrelated to or just plain wrong for the service being offered.

For example, imagine that you’ve written a service that pushes news to clients: IQbservable<Article>.  Without semantic security, the service will happily accept any kind of query from clients.  Thus clients may execute operations such as windowing, buffering, aggregation, creating timers, creating ranges, zipping, joining, filtering, projecting, among others.  That’s a lot a functionality to offer for a news service, though perhaps it makes sense for your particular implementation; e.g., it might be nice to allow a client to query for articles of two different authors joined by coincidence over a particular duration.  However, does it make sense for a client to submit a query that uses the Range operator to generate a sequence of values from 1 to 10, for each article, and then project each value without the article?  Probably not.  A client could just as easily query for all articles unconditionally and then generate a range from 1 to 10 locally, without having to burden the server with such a trivial request.  Perhaps it would be better for this service to disallow use of the Range operator.

In fact, it may be better to limit queries to just simple filtering and projection, at first.  Clients could filter news articles and select specific properties in which they’re interested.  This may provide enough functionality to support common queries and yet still provides some useful optimizations.  It also keeps your service fair and responsive by doing less work per query, assuming that clients don’t generally need any other functionality.  After going live with your service you could accept feature requests for other kinds of query operators based on user feedback, consider whether the additional semantics make sense for your particular service, and then enable those operators.  You should also consider whether your server can handle the additional load of more complex queries – the next section, Resource Security, addresses that topic.

The first release of TCP Qbservable Provider (1.0 Alpha) already supports semantic security by allowing you to host queries from your own IQbservableProvider by calling the ServeTcp extension method; however, to limit query operators you must manually create a custom ExpressionVisitor, which can be somewhat difficult to do.  So to make things easier, a new feature (recently checked into source control) of the TCP Qbservable Provider library enables easy whitelisting of operators so that you don’t have to create your own IQbservableProvider or ExpressionVistor.

To use this feature, simply create an empty instance of ServiceEvaluationContext and call the AddKnownOperators method to whitelist operators that your service must support.  Then assign this context to a QbservableServiceOptions object that is passed to the ServeQbservableTcp extension method, which is responsible for hosting your observable over TCP.

var context = new ServiceEvaluationContext(
    Enumerable.Empty<MethodInfo>(),
    includeSafeOperators: false,
    includeConcurrencyOperators: false);

context.AddKnownOperators("Where");
context.AddKnownOperators("Select");

var service = source.ServeQbservableTcp(
    endPoint,
    new QbservableServiceOptions()
    {
        EvaluationContext = context
    });

With the above configuration, client queries using any methods other than the Where and Select operators are automatically rejected by the server.  To see rejection in action, take a look at the MaliciousClient and LimitedService examples checked into source control.

Types of Expressions

The ExpressionOptions property of the QbservableServiceOptions object provides lower-level control over the types of expressions that are permitted.  By default, only basic expressions are permitted.  Expressions such as assignments, blocks, loops, constructor calls and others are rejected by the service.  The default option is recommended for public services, though you can assign this property to any combination of flags to allow additional expression types when needed.

Opting Out of Semantic Security

To disable the semantic security model entirely, set the AllowExpressionsUnrestricted property of the QbservableServiceOptions object to true.  By doing so, the provider will ignore the ServiceEvaluationContext object and the ExpressionOptions property and simply allow any expression and operator to be used within a query.  Of course, this is only recommended if you fully trust all clients, which is not the case in public scenarios.

Resource Security

Resource security is about preventing unauthorized code from accessing system resources, such as memory, CPU, the file system and networking components.  It can be broken down further into two subcategories:

  1. API Security
  2. Algorithmic Security

API Security

The FCL offers many APIs that malicious clients could abuse, such as System.AppDomain, System.Environment, System.IO.*, System.Reflection.*, System.Security.*, System.Windows.*, etc.  In order to safely host a service that can execute arbitrary code from clients, the service would have to prevent unauthorized access to certain APIs.  The whitelisting approach used by the semantic security model described previously is one way to solve this problem, though the mechanism that is generally used to prevent unauthorized access to APIs in the .NET Framework is Code Access Security (CAS).  CAS offers a way to restrict APIs through permissions.

Whitelisting has an advantage over CAS.  CAS allows certain APIs to execute that don’t make sense in a server environment, simply because they don’t demand any permissions; e.g., System.Console.WriteLine.  In order to entirely prevent use of System.Console within queries, whitelisting must be used.

However, there are advantages to using CAS over whitelisting.  For one thing, the whitelisting approach I’ve implemented only checks methods and operators, while the CAS approach applies to any operator, method, constructor, event or property.  Furthermore, the CAS approach applies generally to the entire .NET Framework and third-party assemblies, whereas the process of manually whitelisting APIs may inadvertently expose clients to a security vulnerability hidden deep within some seemingly benign API.  In other words, CAS frees you from the burden of having to ensure that your chosen whitelisted operators and methods don’t inadvertently permit clients to format your C: drive.

Therefore, using sandboxing together with whitelisting provides the strongest security model for hosting services publicly by limiting the APIs that services will permit within anonymous client queries to the set of APIs that are resource-safe and semantically acceptable for a given service.

To incorporate CAS security into the TCP Qbservable Provider library, I’ve added new overloads for the QbservableTcpServer.CreateService method.  You may already be familiar with the CreateService method, since it’s also the API that enables you to host a service accepting an argument, as described in my first blog post in this series.  The new overloads begin with an AppDomainSetup parameter.  They host services within a sandboxed AppDomain that applies minimal CAS permissions by default, though it’s easily configurable by passing in a PermissionSet object to one of the overloads.  By default, sandboxed services will only permit queries to execute code that runs within a transparent security context, denying access to any API that demands additional permissions.  In other words, basic code can be executed within queries, but any APIs that demand additional permissions are rejected by throwing a SecurityException.

To host your service in a sandbox with minimal permissions, call the QbservableTcpServer.CreateService method and pass in an AppDomainSetup object as the first argument.  You must assign its ApplicationBase property to the base path of your sandboxed AppDomain, from which dependent assemblies will be loaded.  The .NET Framework guidelines recommend specifying a different path than the application’s actual base path, for additional security.  In the example below, I’ve assigned the sandbox’s base path to the application’s sandbox_bin subfolder, which would contain only the assemblies that the sandbox requires to execute the host service.

Unfortunately, that’s not all you must do.  The service factory argument cannot be a lambda because it’s invoked within the AppDomain and, therefore, must be serializable, though the compiler does not ensure that lambdas’ closures are serializable.  You may be able to workaround this if a closure isn’t created by referencing a method on a MarshalByRefObject, but that may actually defeat the purpose of the sandbox because ultimately client queries would be executed outside of the sandbox.  Instead, you should use a static factory method to create your service, as shown in the example below.

At the moment, due to type-inference issues, you must explicitly wrap the call to the factory method within an explicit delegate, though I plan on addressing that problem in the future, if possible.

var appBase = Path.GetDirectoryName(new Uri(Assembly.GetEntryAssembly().CodeBase).LocalPath);

var newAppBase = Path.Combine(appBase, "sandbox_bin");

var service = QbservableTcpServer.CreateService<object, int>(
    new AppDomainSetup() { ApplicationBase = newAppBase },
    endPoint,
    new Func<IObservable<object>, IObservable<int>>(CreateServiceObservable));
...
public static IObservable<int> CreateServiceObservable(IObservable<object> request)
{
    return Observable.Range(1, 3);
}

With the above configuration, client queries using APIs that require any permissions other than security-transparent execution are automatically rejected by the server.  To see rejection in action, take a look at the MaliciousClient and SandboxedService examples checked into source control.

Keep in mind that your service callbacks, including the CreateServiceObservable method shown above, are executed with the reduced permission set of the sandboxed AppDomain; however, you can assert additional permissions from within your callbacks if needed.  Additional permissions may be needed when creating the service observable or executing observation side-effects, as shown in the following example.

var service = QbservableTcpServer.CreateService<object, int>(
    new AppDomainSetup() { ApplicationBase = newAppBase },
    endPoint,
    new Func<IObservable<object>, IObservable<int>>(CreateServiceObservable));

service.Subscribe(terminatedClient =>
    {
        new PermissionSet(PermissionState.Unrestricted).Assert();

        try
        {
            Log.Trace("Client shutdown: " + terminatedClient.Reason);
        }
        finally
        {
            PermissionSet.RevertAssert();
        }
    });

In the example above, the hypothetical Log.Trace method demands full trust to execute.  Since the callback is executing under the permission set granted to the sandboxed service, the demand will fail unless the required permissions are asserted, even across an AppDomain boundary.

The reason why full trust can be asserted here but not by clients’ queries is because callbacks are executed by one of your own assemblies while queries are executed within security transparent dynamic assemblies, which cannot assert permissions beyond the set granted to the sandbox.

The entry point assembly is automatically granted full trust permissions by the CreateService method and, presumably in the above example, the assembly containing the callback code is the application assembly, so it’s able to assert full trust.  Though if it’s not the application assembly, then you can specify additional full-trust assemblies by calling a specific overload of the CreateService method that accepts an array of Assembly objects.

Note that to assert full trust, assemblies must be strong-named.

Algorithmic Security

Algorithmic security is about restricting the types of expressions that the service permits based on their effect on system resources.

For example, returning to our IQbservable<Article> news service, let’s assume that we’re executing our service in a sandbox and we’ve already applied limited semantic security to prevent most operators from being executed, with the exception of a few that we think will be useful to clients for querying news articles:  Where, Select, Aggregate, Scan, GroupBy, Join, Timer, Interval and Sample.  We’re also allowing primitive types, List<T> and arrays to be created within clients’ queries.

Is our news service safe?

No, it’s not safe.  We wouldn’t want the service executing a client’s query if it contained expressions that performed any of the following actions, listed in an approximately-progressively-worsening order:

  • Performs 100 operations in a single query.
  • Creates 500 joins.
  • Allocates a 10MB array for every article containing the individual words within that article.
  • Aggregates lists of 10K articles containing the entire contents of each article.
  • Samples at a very short duration causing frequent output.
  • Starts a high-rate timer causing frequent output.
  • Starts 100K low-rate timers.
  • Allocates a single 2GB array.
  • Creates a single 10GB string.
  • Uses the GroupBy operator in such a way that it never frees memory.

These kinds of queries negatively impact CPU and memory, so their use must be restricted somehow.  To do that, we must first identify how system resources can be effected by queries, then determine how each particular API can be used within queries to effect system resources, and finally design a model for constraining each particular API, or otherwise prevent them entirely.

Generally, the issues with the example queries above are as follows:

  1. Unrestricted quantities of permitted operators.
  2. Unrestricted combinatorial use of operators.
  3. Unrestricted operator arguments.
  4. Unrestricted allocation of objects, including primitive types, collections and arrays.
    1. Unrestricted magnitude.
    2. Unrestricted quantity.
  5. Unthrottled output.
  6. Unthrottled input.  (Applies to duplex queries only; examples are not provided above.)

It seems that some of these issues are much easier to detect than others, though all of them need to be addressed to have a truly secure service model for publicly hosting queryable observables.

In general, when any potentially unsafe expressions are found within a client’s query, they must be evaluated to determine whether their particular usage is permitted and under what constraints.  You may notice some overlap with the concept of semantic security described previously.  The difference is that semantic security is only about semantics; i.e., whether the intent of a query through the expressions it uses is compatible with the intent of the service, whereas algorithmic security is about constraining the effects particular expressions have on system resources.

In order to constrain algorithmically-unsafe expressions, we must go beyond simple whitelisting and provide additional options for configuring the limits of particular expressions on a per-operation and per-type basis.

Currently, the TCP Qbservable Provider library does not offer any solutions for algorithmic security, though it’s something that I’m considering for a future release.  I plan to add configurable options such as the following:

  1. Maximum quantities for individually whitelisted operators and methods, with reasonable defaults.
  2. Maximum quantities for categorized operators and methods, with reasonable defaults; e.g.,
    1. Permit only 1 usage of concurrency-introducing operators.
    2. Permit only 1 usage of timer operators.
    3. Permit only 1 usage of buffering operators.
  3. Constraints on particular operator arguments, with reasonable defaults; e.g.,
    1. Interval:  30 seconds <= period <= 12 hours
    2. Range:  0 <= count <= 10
    3. Buffer:  2 <= length <= 100
  4. Maximum quantity and capacity of new collections and arrays, with reasonable defaults; e.g., new object[N], where N <= 10
  5. Maximum quantity and size of strings.
  6. Maximum size of incoming serialized expression trees, in bytes.
  7. Maximum size of serialized query payloads, in bytes, with separate settings for output notifications and duplex input notifications.
  8. Input/output throttling, with a default duration of 30 seconds.

In the meantime, I’ve configured the default semantic security whitelist to exclude all potentially unsafe operators, to disallow array allocations and to disallow explicit calls to constructors.  Note that your service is still capable of creating unbounded objects, including arrays and collections, though queries cannot create non-primitive objects themselves.  Assignments and void-returning methods are also disallowed by default, so all objects and collections are immutable.

However, the current security model isn’t an exhaustive solution.  It doesn’t prevent clients from using clever algorithms to cause stack overflows and out of memory exceptions.  It doesn’t even, for example, prevent clients from easily creating a 2GB string.

Conclusion

There’s more work to be done on the security model to safely host queryable observable services publicly.  Even with whitelisting and sandboxing, the TCP Qbservable Provider is easily vulnerable to simple attacks that could bring down an entire process.

The current security model should, however, prevent clients from directly damaging your system.  Direct damage is prevented by using the sandboxing APIs to create your service, which rejects queries that demand unsafe API permissions.  You should also use semantically-restricted expressions, which are enabled by default, to reject queries attempting to use operators and methods that aren’t permitted by your service’s whitelist.

Tags: ,

IQbservable | Rx | Rxx

Add comment