Friday, December 5, 2008

More LINQ Challenges

One of the craziest things to manipulate in a custom LINQ provider is the expression tree.

As I discussed in a prior post, this sample LINQ statement:

var query = from x in lq
where x == "4"
select x.Clone();


becomes an expression that basically is this function:

lq.Where(x => (x = "4")).Select(x => x.Clone());

(Before you say it, yes, that query is a bit weird; it's for illustrative purposes only and probably doesn't make a lot of sense...)

The function itself isn't rocket science but it's constructed using MethodCallExpression objects, which actually kind of read from right to left. That is, in the above example, the main expression you receive when you execute the query will start with the Select method - the second parameter will be a LambdaExpression containing this:

x => x.Clone()

and the first parameter will be another MethodCallExpression containing this:

lq.Where(x => (x = "4"))

... and that MethodCallExpression will have the second parameter set to this:

x => (x = "4")

and the first parameter set to a constant value representing this:

lq

... in other words, right-to-left.

One of the first questions I had was how the compiler and expression tree deals with embedded functions, for example the use of the Clone() function above, and how are those functions included in the tree alongside the IQueryable functions like Select() and Where().

If you also recall from last time, .NET calls CreateQuery() repeatedly for every line of the LINQ syntax query to build the expression (at least in my example). It also expects that each expression returned will have a return type of IQueryable, where T is the appropriate query type (in the example above, the return type is a single string). If this behavior is a key part of normal LINQ behavior you can then assume your query will be a string of MethodCallExceptions, with all embedded functions in the query implemented as LambdaExpressions in the second column of the MethodCallException. You could therefore create an expression tree visitor that makes a few assumptions:
  1. The node you receive to execute will always be a MethodCallException (probably one that references IQueryable.Select()).
  2. The first parameter will always chain to another MethodCallException to the root node, which should be (if you follow the advice of this and other postings on the topic) will be a ConstantExpression representing the original IQueryable object.
  3. Any other functions that need to be resolved will likely be in a LambdaExpression on the second parameter of a MethodCallException.
Now, you might get a tree that violates these assumptions, and how you might navigate to that root node may be tricky. However, if you can find the root node and at least partially resolve the tree you can take unresolved nodes and hopefully execute them client-side (MSDN has a weird howto on this under "How to: Execute an Expression Tree" - I use the offline MSDN so if you want a link to that search the title) and that should work in some cases.

Lining up query parameters to input parameters and dealing with Select() transformations is another story I hope to build up to as well - as a more complex query builds and builds the datatype can evolve (for example, a Select() that transforms one record type to another).

I'm getting closer to being able to build a generic SQL LINQ layer that will be as simple as possible to use for other purposes, such as to front-end an old database engine that I worked on years ago under the super-secret code name "Juggernaut" (it's similar in nature to Microsoft Velocity and a few other products out there, although it's slightly different in focus).

No comments: