DigitallyCreated
Blog

Only showing posts tagged with "LINQ"

Performing Date & Time Arithmetic Queries using Entity Framework v1

If one is writing legacy code in .NET 3.5 SP1’s Entity Framework v1 (yes, your brand new code has now been put into the legacy category by .NET 4 :( ), one will find a severe lack of any date & time arithmetic ability. If you look at LINQ to Entities, you will see no date & time arithmetic methods are recognised. And if you look at the canonical functions that Entity SQL provides, you will notice a severe lack of any date & time arithmetic functions. Thankfully, this strangely missing functionality in canonical ESQL has been resolved as of .NET 4; however, that doesn’t help those of us who haven’t yet upgraded to Visual Studio 2010! Thankfully, all is not lost: there is a solution that only sucks a little bit.

Date & time arithmetic is a pretty necessary ability. Imagine the following simplified scenario, where one has a “Contract” entity that defines a business contract that starts at a certain date and lasts for a duration:

Contract Entity Diagram

Imagine that you need to perform a query that finds all Contracts that were active on a specific date. You may think of performing a LINQ to Entities query that looks like this:

IQueryable<Contract> contracts = 
    from contract in Contracts
    where activeOnDate >= contract.StartDate &&
          activeOnDate < contract.StartDate.AddMonths(contract.Duration)
    select contract

However, this is not supported by LINQ to Entities and you’ll get a NotSupportedException. With canonical ESQL missing any defined date & time arithmetic functions, it seems we are left without a recourse.

However, if one digs around in MSDN, one may stumble across the fact that the SQL Server Entity Framework provider defines some provider-specific ESQL functions with which one can use to do date & time arithmetic. So we can write an ESQL query to get the functionality we wish:

const string eSqlQuery = @"SELECT VALUE c FROM Contracts AS c
                           WHERE @activeOnDate >= c.StartDate
                           AND @activeOnDate < SqlServer.DATEADD('months', c.Duration, c.StartDate)";

ObjectQuery<Contract> contracts = 
    context.CreateQuery<Contract>(eSqlQuery, new ObjectParameter("activeOnDate", activeOnDate));

Note the use of the SqlServer.DATEADD ESQL function. The “SqlServer” bit is specifying the SQL Server provider’s specific namespace. Obviously this approach has some disadvantages: it’s not as nice as using LINQ to Entities, and it’s also not database provider agnostic (so if you’re using Oracle you’ll need to see whether your Oracle provider has something like this). However, the alternatives are either to write SQL manually (negating the usefulness of Entity Framework) or to download all the entities into local memory and use LINQ to Objects (almost never, ever an acceptable option!).

Notice that using the CreateQuery function to create the query from ESQL returned an ObjectQuery<T> (which implements IQueryable<T>)? This means that you can now use LINQ to Entities to change that query. For example, what if we wanted to perform some further filtering that can be customised by the user (the user’s filtering preferences are set in the ContractFilterSettings class in the below example):

private ObjectQuery<Contract> CreateFilterSettingsAsQueryable(ContractFilterSettings filterSettings, MyEntitiesContext context)
{
    IQueryable<Contract> query = context.Contracts;

    if (filterSettings.ActiveOnDate != null)
    {
        const string eSqlQuery = @"SELECT VALUE c FROM Contracts AS c
                                   WHERE @activeOnDate >= c.StartDate
                                   AND @activeOnDate < SqlServer.DATEADD('months', c.Duration, c.StartDate)";
        query = context.CreateQuery<Contract>(eSqlQuery, new ObjectParameter("activeOnDate", filterSettings.ActiveOnDate));
    }
    if (filterSettings.SiteId != null)
        query = query.Where(c => c.SiteID == filterSettings.SiteId);
    if (filterSettings.ContractNumber != null)
        query = query.Where(c => c.ContractNumber == filterSettings.ContractNumber);
    
    return (ObjectQuery<Contract>)query;
}

Being able to continue to use LINQ to Entities after creating the initial query in ESQL means that you can have the best of both worlds (forgetting the fact that we ought to be able to do date & time arithmetic in LINQ to Entities in the first place!). Note that although you can start a query in ESQL and then add to it with LINQ to Entities, you cannot start a query in LINQ to Entities and add to it with ESQL, hence why the ActiveOnDate filter setting was done first.

For those who use .NET 4, unfortunately you’re still stuck with being unable to do date & time arithmetic with LINQ to Entities by default (Alex James told me on Twitter a while back that this was purely because of time constraints, not because they are evil or something :) ). However, you get canonical ESQL functions that are provider independent (so instead of using SqlServer.DATEADD above, you’d use AddMonths). If you really want to use LINQ to Entities (which is not surprising at all), you can create ESQL snippets in your model (“Model Defined Functions”) and then create annotated method stubs which Entity Framework will recognise when you use them inside a LINQ to Entities query expression tree. For more information see this post on the Entity Framework Design blog, and this MSDN article.

EDIT: Diego Vega kindly pointed out to me on Twitter that, in fact, Entity Framework 4 comes with some methods you can mix into your LINQ to Entities queries to call on EF v4’s new canonical date & time ESQL functions. Those methods (as well as a bunch of others) can be found in the EntityFunctions class as static methods. Thanks Diego!

DigitallyCreated Utilities v1.0.0 Released

After a hell of a lot of work, I am happy to announce that the 1.0.0 version of DigitallyCreated Utilities has been released! DigitallyCreated Utilities is a collection of many neat reusable utilities for lots of different .NET technologies that I’ve developed over time and personally use on this website, as well as on others I have a hand in developing. It’s a fully open source project, licenced under the Ms-PL licence, which means you can pretty much use it wherever you want and do whatever you want to it. No viral licences here.

The main reason that it has taken me so long to release this version is because I’ve been working hard to get all the wiki documentation on CodePlex up to scratch. After all, two of the project values are:

  • To provide fully XML-documented source code
  • To back up the source code documentation with useful tutorial articles that help developers use this project

And truly, nothing is more frustrating than code with bad documentation. To me, bad documentation is the lack of a unifying tutorial that shows the functionality in action, and the lack of decent XML documentation on the code. Sorry, XMLdoc that’s autogenerated by tools like GhostDoc, and never added to by the author, just doesn’t cut it. If you can auto-generate the documentation from the method and parameter names, it’s obviously not providing any extra value above and beyond what was already there without it!

So what does DCU v1.0.0 bring to the table? A hell of a lot actually, though you may not need all of it for every project. Here’s the feature list grouped by broad technology area:

  • ASP.NET MVC and LINQ
    • Sorting and paging of data in a table made easy by HtmlHelpers and LINQ extensions (see tutorial)
  • ASP.NET MVC
    • HtmlHelpers
      • TempInfoBox - A temporary "action performed" box that displays to the user for 5 seconds then fades out (see tutorial)
      • CollapsibleFieldset - A fieldset that collapses and expands when you click the legend (see tutorial)
      • Gravatar - Renders an img tag for a Gravatar (see tutorial)
      • CheckboxStandard & BoolBinder - Renders a normal checkbox without MVC's normal hidden field (see tutorial)
      • EncodeAndInsertBrsAndLinks - Great for the display of user input, this will insert <br/>s for newlines and <a> tags for URLs and escape all HTML (see tutorial)
    • IncomingRequestRouteConstraint - Great for supporting old permalink URLs using ASP.NET routing (see tutorial)
    • Improved JsonResult - Replaces ASP.NET MVC's JsonResult with one that lets you specify JavaScriptConverters (see tutorial)
    • Permanently Redirect ActionResults - Redirect users with 301 (Moved Permanently) HTTP status codes (see tutorial)
    • Miscellaneous Route Helpers - For example, RouteToCurrentPage (see tutorial)
  • LINQ
    • MatchUp & Federator LINQ methods - Great for doing diffs on sequences (see tutorial)
  • Entity Framework
    • CompiledQueryReplicator - Manage your compiled queries such that you don't accidentally bake in the wrong MergeOption and create a difficult to discover bug (see tutorial)
    • Miscellaneous Entity Framework Utilities - For example, ClearNonScalarProperties and Setting Entity Properties to Modified State (see tutorial)
  • Error Reporting
    • Easily wire up some simple classes and have your application email you detailed exception and error object dumps (see tutorial)
  • Concurrent Programming
    • Semaphore/FifoSemaphore & Mutex/FifoMutex (see tutorial)
    • ReaderWriterLock (see tutorial)
    • ActiveObject - Easily inherit from ActiveObject to separately thread your class (see tutorial)
  • Unity & WCF
    • WCF Client Injection Extension - Easily use dependency injection to transparently inject WCF clients using Unity (see tutorial)
  • Miscellaneous Base Class Library Utilities
    • SafeUsingBlock and DisposableWrapper - Work with IDisposables in an easier fashion and avoid the bug where using blocks can silently swallow exceptions (see tutorial)
    • Time Utilities - For example, TimeSpan To Ago String, TzId -> Windows TimeZoneInfo (see tutorial)
    • Miscellaneous Utilities - Collection Add/RemoveAll, Base64StreamReader, AggregateException (see tutorial)

DCU is split across six different assemblies so that developers can pick and choose the stuff they want and not take unnecessary dependencies if they don’t have to. This means if you don’t use Unity in your application, you don’t need to take a dependency on Unity just to use the Error Reporting functionality.

I’m really pleased about this release as it’s the culmination of rather a lot of work on my part that I think will help other developers write their applications more easily. I’m already using it here on DigitallyCreated in many many places; for example the Error Reporting code tells me when this site crashes (and has been invaluable so far), the CompiledQueryReplicator helps me use compiled queries effectively on the back-end, and the ReaderWriterLock is used behind the scenes for the Twitter feed on the front page.

I hope you enjoy this release and find some use for it in your work or play activities. You can download it here.

ASP.NET MVC Model Binders + RegExs + LINQ == Awesome

I was working on some ASP.NET MVC code today and I created this neat little solution that uses a custom model binder to automatically read in a bunch of dynamically created form fields and project their data into a set of business entities which were returned by the model binder as a parameter to an action method. The model binder isn't particularly complex, but the way I used regular expressions and LINQ to identify and collate the fields I needed to create the list of entities from was really neat and cool.

I wrote a Javascript-powered form (using jQuery) that allowed the user to add and edit multiple items at the same time, and then submit the set of items in a batch to the server for a save. The entity these items represented looks like this (simplified):

public class CostRangeItem
{
    public short VolumeUpperBound { get; set; }
    public decimal Cost { get; set; }
}

The Javascript code, which I won't go into here as it's not particularly cool (just a bit of jQuery magic), creates form items that look like this:

<input id="VolumeUpperBound:0" name="VolumeUpperBound:0" type="text />
<input id="Cost:0" name="Cost:0" type="text" />

The multiple items are handled by adding more form fields and incrementing the number after the ":" in the id/name. So having fields for two items means you end up with this:

<input id="VolumeUpperBound:0" name="VolumeUpperBound:0" type="text />
<input id="Cost:0" name="Cost:0" type="text" />
<input id="VolumeUpperBound:1" name="VolumeUpperBound:1" type="text />
<input id="Cost:1" name="Cost:1" type="text" />

I chose to use the weird ":n" format instead of a more obvious "[n]" format (like arrays) since using "[" is not valid in an ID attribute and you end up with a page that doesn't validate.

I created a model binder class that looked like this:

private class CostRangeItemBinder : IModelBinder
{
    public object BindModel(ControllerContext controllerContext, ModelBindingContext bindingContext)
    {
        ...
    }
}

To use the model binder on an action method, you use the ModelBinderAttribute to specify the type of model binder you want ASP.NET MVC to bind the parameter with:

public ActionResult Create([ModelBinder(typeof(CostRangeItemBinder))] IList<CostRangeItem> costRangeItems)
{
    ...
}

The first thing I did was create two regular expressions so that I could identify which fields I needed to read in (the VolumeUpperBound ones and the Cost ones). I chose to use regular expressions because they're much simpler to write (once you learn them!) than some manual string hacking code, and I could use them to not only identify which fields I needed to read, but also to extract the number after the ":" so I could link up the appropriate VolumeUpperBound field with the appropriate Cost field to create a CostRangeItem object. These are the regular expressions I used:

private static readonly Regex _VolumeUpperBoundRegex = new Regex(@"^VolumeUpperBound:(\d+)$");
						
private static readonly Regex _CostRegex = new Regex(@"^Cost:(\d+)$");

Both regexs work like this: the ^ at the beginning and the $ at the end means that, in the matching string, there cannot be any characters before or after the characters specified in between those symbols. It effectively anchors the match to the beginning and the end of the string. Inside those symbols, each regex matches their field name and the ":" character. They then define a special "group" using the ()s. Using the group allows me to pull out the sub-part of the regex match defined by this group later on. The \d+ means match one or more digits (0-9), which matches the number after the ":".

Note that I've made the Regex objects static readonly members of my binder class. Keeping a single instance of the regex object, which is immutable, saves .NET from having to recompile a new Regex every time I bind, which I believe is a relatively expensive process.

Then, in the BindModel method, I first make sure that the model binder has been used on the correct type in the action method:

if (bindingContext.ModelType.IsAssignableFrom(typeof(List<CostRangeItem>)) == false)
    return null;

I then use LINQ to perform matches against all the fields in the current page submit (via the BindingContext's ValueProvider IDictionary). I do this using a Select() call, which performs the match and puts the results into an anonymous class. I then filter out any fields that didn't match the regex using a Where() call.

var vubMatches = bindingContext.ValueProvider
                    .Select(kvp => new 
                        {
                            Match = _VolumeUpperBoundRegex.Match(kvp.Key), 
                            Value = kvp.Value.AttemptedValue
                        })
                    .Where(o => o.Match.Success);
					
var costMatches = bindingContext.ValueProvider
                    .Select(kvp => new 
                        {
                            Match = _CostRegex.Match(kvp.Key), 
                            Value = kvp.Value.AttemptedValue
                        })
                    .Where(o => o.Match.Success);

Now I need to link up VolumeUpperBound fields with their associated Cost fields. To do this, I do an inner equijoin using LINQ. I join on the result of that regex group that I created that extracts the number from the field name. This means I join VolumeUpperBound:0 with Cost:0, and VolumeUpperBound:1 with Cost:1 and so on.

List<CostRangeItem> costRangeItems = 
        (from vubMatch in vubMatches
         join costMatch in costMatches 
           on vubMatch.Match.Groups[1].Value equals costMatch.Match.Groups[1].Value
         where ValuesAreValid(vubMatch.Value, costMatch.Value, controllerContext)
         select new CostRangeItem
                   {
                       Cost = Decimal.Parse(costMatch.Value),
                       VolumeUpperBound = Int16.Parse(vubMatch.Value)
                   }
        ).ToList();

return costRangeItems;

I then use a where clause to ensure the values submitted are actually of the correct type, since the Value member is actually a string at this point (I pass them into a ValuesAreValid method, which returns a bool). This method marks the ModelState with an error if it finds a problem. I then use Select to create a CostRangeItem per join and copy in the values from the form fields using an object initialiser. I can then finally return my List of CostRangeItem. The ASP.NET MVC framework ensures that this result is passed to the action method that declared that it wanted to use this model binder.

As you can see, the solution ended up being a neat declarative way of writing a model binder that can pull in and bind multiple objects from a dynamically created form. I didn't have to do any heavy lifting at all; regular expressions and LINQ handled all the munging of the data for me! Super cool stuff.

Dynamic Queries in Entity Framework using Expression Trees

Most of the queries you do in your application are probably static queries. The parameters you set on the query probably change, but the actual query itself doesn't. That's why compiled queries are so cool, because you can pre-compile and reuse a query over and over again and just vary the parameters (see my last blog for more information).

But sometimes you might need to construct a query at runtime. By this I mean not just changing the parameter values, but actually changing the query structure. A good example of this would be a filter, where, depending on what the user wants, you dynamically create a query that culls a set down to what the user is looking for. If you've only got a couple of filter options, you can probably get away with writing multiple compiled queries to cover the permutations, but it only takes a few filter options before you've got a lot of permutations and it becomes unmanageable.

A good example of this is file searching. You can filter a list of files by name, type, size, date modified, etc. The user may only want to filter by one of these filters, for example with "Awesome" as the filename. But the user may also want to filter by multiple filters, for example, "Awesome" as the filename, but modified after 2009/07/07 and more than 20MB in size. To create a static query for each permutation would result in 16 queries (4 squared)!

My first foray into creating dynamic queries is a bit less ambitious than the above example, however. I have a scenario where I need to pull out a number of Tag objects from the database by their IDs. However, the number of the Tag objects needed is determined by the user. They may select 3 Tags, or they may select 6 Tags, or they may select 4 tags; it's up to them.

The most boring approach is, of course, to get each Tag out of the database individually with its own query (the "get each Tag individually" approach):

IList<Tag> list = new List<Tag>();
foreach (int tagId in WantedTagIds)
{
    int localTagId = tagId;

    Tag theTag = (from tag in context.Tag
                  where
                    tag.Account.ID == AccountId &&
                    tag.ID == localTagId
                  select tag).First();

    list.Add(theTag);
}

You could compile that query to make it run faster, but it's still a slow operation. If the user wants to get 6 Tags, you need to query the database 6 times. Not very cool.

This is where dynamic queries can step in. If the user asks for 3 Tags, you can generate a where clause that gets all three Tags in one go; essentially: tag.ID == 10 || tag.ID == 12 || tag.ID == 14. That way you get all three Tags in one query to the database. So, I wrote some generic type-safe code to perform exactly that: generating a where clause expression from a list of IDs so that a Tag with any of those IDs is retrieved.

To understand how I did this, you need to understand how the where clause in an LINQ expression works. It is easiest to understand if you look at the method-chain form of LINQ rather than the special C# syntax. It looks like this:

IQueryable<Tag> tags = context.Tag.AsQueryable()
                                  .Where(tag => tag.ID == 10);

The Where method takes a parameter that looks like this: Expression<Func<Tag, bool>>. Notice how the Func delegate is wrapped in an Expression? This means that instead of creating an actual anonymous method for the Func delegate, the compiler will instead convert your lambda expression into an Expression Tree.

An Expression Tree is a representation of your expression in an object tree. Here is an object tree that shows the main objects in the expression tree generated by the compiler for the lambda expression in the above example's Where method:

Expression Tree

The LambdaExpression has a collection of ParameterExpressions, which are the parameters on the left side of the => symbol in the code. The actual Body of the lambda is made up of a BinaryExpression of type Equals, whose Right side is a ConstantExpression that contains the value of 10, and whose Left side is a MemberExpression. A MemberExpression represents the access of the ID property on the tag parameter.

So if we wanted to represent a more complex expression such as:

tag => tag.ID == 10 || tag.ID == 12 || tag.ID == 14

this is what the expression tree would look like. It looks a bit daunting, but computers are very good at trees, so writing code to generate such a tree is not too difficult with the help of a little recursion.

I defined a special utility method that allows you to create an expression tree like the one above that results in a Where expression that accepts a particular tag so long as its ID is in a certain set of IDs. The method is generic and reusable across anywhere where you need a Where filter that gets "this value, or this value, or this value... etc". The public method looks like this:

public static Expression<Func<TValue, bool>> BuildOrExpressionTree<TValue, TCompareAgainst>(
        IEnumerable<TCompareAgainst> wantedItems, 
        Expression<Func<TValue, TCompareAgainst>> convertBetweenTypes)
{
    ParameterExpression inputParam = convertBetweenTypes.Parameters[0];
    
    Expression binaryExpressionTree = BuildBinaryOrTree(wantedItems.GetEnumerator(), convertBetweenTypes.Body, null);
    
    return Expression.Lambda<Func<TValue, bool>>(binaryExpressionTree, new[] { inputParam });
}

It's stuffed full of generics which makes it look more complicated than it really is. Here's how you call it:

List<int> ids = new List<int> { 10, 12, 14 };
Expression<Func<Tag, bool>> whereClause = BuildOrExpressionTree<Tag, int>(wantedTagIds, tag => tag.ID);

As I explain how it works, I suggest you keep an eye on the last expression tree diagram. The method defines two generic types, one called TValue which represents the value you are comparing, in this case the Tag class. The other generic type is called TCompareAgainst and is the type of the value you are comparing against, in this case int (because the Tag.ID property is an int).

You pass the method an IEnumerable<TCompareAgainst>, which in our case is an IEnumerable<int>, because we have a list of IDs we are comparing against.

The second parameter ("convertBetweenTypes") can be a bit confusing; let me explain. The expression we are defining for the Where clause takes a Tag and returns a bool (hence the Func<Tag, bool> typed expression). Since the set of values we are comparing against are ints, we can't just do an == between the Tag and an int. To be able to do this comparison, we need to somehow "convert" the Tag we receive into an int for comparison. This is where the second parameter comes in. It defines an Expression that takes a Tag and returns an int (or in generic terms takes a TValue and returns a TCompareAgainst). When you write tag => tag.ID, the compiler generates an Expression Tree that contains a MemberExpression that accesses ID on the tag ParameterExpression. This means wherever we need to do a Tag == int, we instead do a Tag.ID == int by substituting the Tag.ID MemberExpression generated in the place of the Tag. Here's a diagram that explains what I'm ranting about.

The main purpose of this method is to create the final LambdaExpression that the method returns. It does this by attaching the expression tree built by the BuildBinaryOrTree method (we'll get into this in a second) and the ParameterExpression from the convertBetweenTypes to the final LambdaExpression object.

The BuildBinaryOrTree method looks like this:

private static Expression BuildBinaryOrTree<T>(
    IEnumerator<T> itemEnumerator, 
    Expression expressionToCompareTo, 
    Expression expression)
{
    if (itemEnumerator.MoveNext() == false)
        return expression;

    ConstantExpression constant = Expression.Constant(itemEnumerator.Current, typeof(T));
    BinaryExpression comparison = Expression.Equal(expressionToCompareTo, constant);

    BinaryExpression newExpression;

    if (expression == null)
        newExpression = comparison;
    else
        newExpression = Expression.OrElse(expression, comparison);

    return BuildBinaryOrTree(itemEnumerator, expressionToCompareTo, newExpression);
}

It takes an IEnumerator that enumerates over the wantedItems list (from the BuildOrExpressionTree method), an expression to compare each of these wanted items to (which is the compiler-generated MemberExpression from BuildOrExpressionTreeMethod), and an expression from a previous recursion (starts off as null).

The method creates an Equals BinaryExpression that compares the expressionToCompareTo and the current itemEnumerator value. It then joins this in an OrElse BinaryExpression comparison with the expression from previous recursions. It then takes this new expression and passes it down to the next recursive call. This process continues until itemEnumerator is exhausted at which point the final expression tree is returned.

Once this returned expression tree is placed in its LambdaExpression by the BuildOrExpressionTree method, you end up with a pretty expression tree like this one shown previously. We can then use this expression tree in the where clause of a LINQ method chain query.

Here's the final "generated where clause" query in action:

using (DHEntities context = new DHEntities())
{
    int[] wantedTagIds = new[] {12, 24, 1, 4, 32, 19};

    Expression<Func<Tag, bool>> whereClause = ExpressionTreeUtil.BuildOrExpressionTree<Tag, int>(wantedTagIds, tag => tag.ID);

    IQueryable<Tag> tags = context.Tag.Where(whereClause);

    IList<Tag> list = tags.ToList();
}

So how much better is this approach, which is decidedly more complex than the simple "get each Tag at a time" approach? Is it worth the effort? I performed some benchmarks similar to the ones I did in the last blog to find out.

In one benchmark run, I ran these queries, each a hundred times, each getting out the same 6 tags:

  • The "get each Tag individually" query (uncompiled)
  • The "get each Tag individually" query (compiled)
  • The "generated where clause" query. The where clause was regenerated each time.

I then ran the benchmark 100 times so that I could get more reliable averaged values. These are the results I got:

  Average Standard Deviation
"Get Each Tag Individually" Query Loop (Uncompiled) 3212.2ms 40.2ms
"Get Each Tag Individually" Query Loop (Compiled) 1349.3ms 24.2ms
"Generated Where Clause" Query Loop 197.8ms 5.3ms

As you can see, the Generated Where Clause approach is quite a lot faster than the individual queries. We can see compiling the Individual query helps, but not enough to beat the Generated Where Clause query, which is faster even though it is recompiled each time! (You can't precompile a dynamic query, obviously). The Generated Where Clause query is 6.8 times faster than the compiled Individual query and a whopping 16.2 times faster than the uncompiled Individual query.

Even though dynamic queries are lots harder than normal static queries, because you have to manually mess with Expression Trees, there are large payoffs to be had in doing so. When used in the appropriate place, dynamic queries are faster than static queries. They could also potentially make your code cleaner, especially in the case of the filter example I talked about at the beginning of this blog. So consider getting up to speed with Expression Trees. It's worth the effort.

Making Entity Framework as Quick as a Fox

Entity Framework is the new (as of .NET 3.5 SP1) ORM technology for the .NET Framework. ORM technologies are widely accepted as the "better" way of accessing relational databases, because they allow you to work with relational data as objects in the world of objects. However, ORM tech can be slower than writing manual SQL queries yourself. This can be seen in this blog that benchmarks Entity Framework versus LINQ to SQL and a manual SQLDataReader.

Hardware is cheap (compared to programmer labour, which is not) so getting a faster machine could be an effective strategy to counter performance issues with ORM. However, what if we could squeeze some extra performance out of Entity Framework with only a little effort?

This is where Compiled Queries come in. Compiled queries are good to use where you have one particular query that you use over and over again in the same application. A normal query (using LINQ) is passed to Entity Framework as an expression tree. Entity Framework translates it into a command tree that is then translated by a database-specific provider into a query against a database. It does this every time you execute the query. Obviously, if this query is in a loop (or is called often) this is suboptimal because the query is recompiled every time, even though all that's probably changed is the parameters in the query. Compiled queries ensure that the query is only compiled once, and the only thing that varies is the parameters.

I created a quick benchmark app to find out just how much faster compiled queries are against normal queries. I'll illustrate how the benchmark works and then present the results.

Basically, I had a particular non-compiled LINQ to Entities query which I ran 100 times in a loop and timed how long it took. I then created the same query, but as a compiled query instead. I ran it once, because the query is compiled the first time you run it, not when you construct it. I then ran it 100 times in a loop and timed how long it took. Also, before doing any of the above, I ran the non-compiled query once, because it seemed to take a long time for the very first operation using the Entity Framework to run, so I wanted that time excluded from my results.

The non-compiled query I ran looked like this:

IQueryable<Transaction> transactions = 
                from transaction in context.Transaction
                where
                  transaction.TransactionDate >= FromDate &&
                  transaction.TransactionDate <= ToDate
                select transaction;

List<Transaction> list = transactions.ToList();

As you can see, it's nothing fancy, just a simple query with a small where clause. This query returns 39 Transaction objects from my database (SQL Server 2005).

The compiled query was created like this:

Func<DHEntities, DateTime, DateTime, IQueryable<Transaction>> 
    query;

query = CompiledQuery.Compile(
            (DHEntities ctx, DateTime fromD, DateTime toD) =>
                from transaction in ctx.Transaction
                where 
                  transaction.TransactionDate >= fromD &&
                  transaction.TransactionDate <= toD
                select transaction);

As you can see, to create a compiled query you pass your LINQ query to CompiledQuery.Compile() via a lambda expression that defines the things that the query needs (ie the Object Context (in this case, DHEntities) and the parameters used (in this case two DateTimes). The Compile function will return a Func delegate that has the types you defined in your lambda, plus one extra: the return type of the query (in this case IQueryable<Transaction>).

The compiled query was executed like this:

IQueryable<Transaction> transactions = query.Invoke(context, FromDate, ToDate);

List<Transaction> list = transactions.ToList();

I ran the benchmark 100 times, collected all the data and then averaged the results:

  Average Standard Deviation
Non-compiled Query Loop 534.1ms 20.6ms
Compiled Query Loop 63.1ms 0.6ms

The results are impressive. In this case, compiled queries are 8.5 times faster than normal queries! I've showed the standard deviation so that you can see that the results didn't fluctuate much between each benchmark run.

The use case I have for using compiled queries is doing database access in a WCF service. I expose a service that will likely be beaten to death by constant queries from an ASP.NET MVC webserver. Sure, I could get larger hardware to make the WCF service go faster, or I could simply get a rather massive performance boost just by using compiled queries.

Sexy C# Filter Code for RealDWG Database Navigation

I've been having to use RealDWG for my part time programming work at Onset to programmatically read DWG files (AutoCAD drawing files). I've been finding RealDWG a pain to learn, as AutoDesk's documentation seems to assume you're an in-house AutoCAD programmer, so they blast you with all these low level file access details like BlockTables and SymbolTables. However, this isn't surprising, since I believe AutoDesk eat their own dog food and use the same API internally. That doesn't make it any easier to learn and use, though.

RealDWG (or ObjectARX, which is the underlying API that is wrapped with .NET wrapper classes) reads a DWG file into an internal "database". Everything is basically an ObjectId, which is a short stub object that you give to a Transaction object that will get you the actual real object. Objects are nested inside objects, which are nested inside more objects, and none of it typesafe, as Transaction returns objects as their top level DBObject class. So you're constantly casting to the actual concrete type you want. Casting all over the place == bad.

Anyway, I found that navigating around a RealDWG object graph was a pain. For example, to find BlockTableRecords (which are inside a BlockTable) that contain AttributeDefinitions (don't worry about what those are; it's not important) you need to do something like this:

BlockTable blockTable = (BlockTable)transaction.GetObject(db.BlockTableId, OpenMode.ForRead);
    
foreach (ObjectId objectId in blockTable)
{
    DBObject dbObject = transaction.GetObject(objectId, OpenMode.ForRead);

    BlockTableRecord record = (BlockTableRecord)dbObject;
    
    if (record.HasAttributeDefinitions == false)
        continue;

    //Do what you need to here...
}

That's really verbose and messy, with a lot of code just dedicated to opening objects and casting them, and filtering. It was annoying me, so I refactored it and wrote the following sexy method that uses generics with delegates to clean that right up:

private IEnumerable<T> Filter<T>(Predicate<T> predicate, IEnumerable realDwgEnumerable, Transaction transaction) 
    where T : class
{
    foreach (ObjectId obj in realDwgEnumerable)
    {
        DBObject dbObject = transaction.GetObject(obj, OpenMode.ForRead);

        T genericObj = dbObject as T;
        if (genericObj == null)
            continue;

        if (predicate != null && predicate(genericObj) == false)
            continue;

        yield return genericObj;
    }
}

The method is generic and takes an IEnumerable (which is the non-generic interface that all the RealDWG stuff that you can iterate over implements) and a Transaction (to open objects from ObjectId stubs with). Additionally, it takes a Predicate<T> that allows you to specify a condition on which concrete objects are included in the final set. It returns an IEnumerable of the generic type T.

What the method will do is iterate over the IEnumerable and pull out all the objects that match the generic type that you define when you call the method. Additionally, it will return only those objects that match your predicate. The method uses the yield return keywords to lazy return results as the returned IEnumerable is iterated over.

Here's the above messy example all sexed up by using this method:

BlockTable blockTable = (BlockTable)transaction.GetObject(db.BlockTableId, OpenMode.ForRead);

IEnumerable<BlockTableRecord> blockTableRecords = Filter<BlockTableRecord>(btr => btr.HasAttributeDefinitions, blockTable, transaction);

foreach (BlockTableRecord blockTableRecord in blockTableRecords)
{
    //Do what you need to here...
}

Just like using LINQ, the above snippet (which looks longer than it really is thanks to wrapping) simply declares that it wants all BlockTableRecords in the BlockTable that have attribute definitions (using a lambda expression). It's much neater, if only because it shifts the filtering code out of the way into the Filter method, so that it doesn't clutter up what I'm trying to do. It also makes the foreach type-safe, because now we're iterating over an IEnumerable<T> rather than a non-generic IEnumerable. Worst case: the IEnumerable<T> is empty. No InvalidCastExceptions here.

Another place where this method is awesome is when you've got lots of different typed objects getting returned as you iterate over the RealDWG IEnumerable object, which happens a lot. Using this method you can very simply get the type of object you're looking for by doing this:

IEnumerable<TypeIWant> objectsIWant = Filter<TypeIWant>(null, blockTableRecord, transaction);

Optimally you use the overload that doesn't include the Predicate<T> as a parameter, instead of passing null as the predicate (the overload passes null for you). I didn't show that here, but it's in my code.

Wrapping up, this again confirms how much I love C# 3.0 (and soon 4.0!). All the new language features let you do some simply awesome things that you just can't do in aging languages like Java (Java doesn't even have delegates, let alone lambda expressions!).