Reinventing Linq

What I cannot create, I do not understand.

Richard Feynman's reported last words

I reinvented the wheel last week. I sat down and deliberately coded something that I knew already existed, and had probably also been done by many many other people. In conventional programming terms, I wasted my time. But it was worthwhile, and what's more I would recommend almost any serious .NET programmer do precisely the same thing.

I recreated Linq.

Well, slight exaggeration there. More precisely, I recreated the basic Linq-to-objects functionality provided by the System.Linq.Enumerable class in the .NET 3.5 System.Core assembly.

Why on earth would I want to do that?

Well, a couple of reasons. The first is practical. Our product at work runs on .NET 2.0. Since we don't make use of WPF, WCF or WF, we never moved our platform requirements on to .NET 3.0, and since .NET 3.5 drags that heavyweight stack of useless (for our purposes) bumph along with it, we haven't moved on to 3.5 either. Nevertheless, we're using the .NET 3.5 generation tools, which include the C# 3.0 compiler. That gives us the ability to compile .NET 2.0 applications from code that makes use of all those exciting C# 3.0 features:

  • Auto-implemented Properties
  • Object initialisers
  • Collection initialisers
  • Anonymous types
  • Partial classes
  • Extension methods
  • Linq query syntax

Except that... you don't get the last two, after all. Write C# code that uses extension methods or Linq queries and compile for .NET 2.0, and you won't get very far. Because Linq requires that the objects you're querying support certain methods, which almost certainly won't be there (although there's actually nothing to stop you adding such methods directly to your classes and then writing Linq queries). What you really want to do (or what I really want to do, at any rate) is write Linq queries against IEnumerable<T>, and that means you need to have extension methods. And extension methods depend upon being able to mark methods with the System.Runtime.CompilerServices.ExtensionAttribute, which is in the .NET 3.5-only System.Core assembly.

So we get everything else, but not extension methods, and thus not Linq.

But.. it turns out you can fool the C# compiler. It isn't, it turns out, looking for System.Runtime.CompilerServices.ExtensionAttribute in System.Core, it's just looking for System.Runtime.CompilerServices.ExtensionAttribute anywhere it can find it. And if you include the following class definition in your compilation, it will happily build code that declares extension methods, and then you can also compile code that calls them:

namespace System.Runtime.CompilerServices
    
{
    public class ExtensionAttribute : Attribute
    {
    }
}

So, that gives us one more feature off the list. But we still can't use Linq against IEnumerables, because we don't have System.Linq.Enumerable and all its extension methods.

But there's nothing to stop us writing our own version...

Nothing, that is, except good programming sense. Normally, my well developed programmer spidey senses start tingling whenever I find myself thinking 'well, I could just build my own version....', and with good reason. It goes against all the instincts of "once and once only", "don't repeat yourself", and the aversion to reinventing the wheel, that a sensible programmer has ingrained in the soul.

And yet, I built my own version of the Linq to objects library. What was I thinking? Why not just step back, and say "Well, obviously we can make do without Linq?" We can, true. But a couple of things drove me on.

  1. It would be fun, and interesting. Most importantly, it would be a good way to get the hang of what Linq really does and how it works.
  2. I had access to a good test suite. For reimplementing an existing library of code, I needed a test suite, and one was already available in the form of the 101 Linq samples Microsoft produced as part of the VS2008 documentation. All I needed was a way to compare output of the samples running against my Linq library against the samples running against the System.Core implementation to get a relatively high degree of confidence that the implementation was sound. I really wouldn't have proceeded - nor got so far, frankly - without this.

Now, I'm probably not the first person to do this. And for a moment I considered googling for an existing implementation (I still haven't, actually. Anyone know if there's a good solid Linq-to-objects for .NET 2.0 out there already?). But I figured there might be licensing restrictions on any existing code, and besides, remembering point 1, where would be the fun? Equally, checking out the source code for System.Core's implementation, which is only a few clicks away in the VS2008 debugger these days, was a no-no, because that way lies EULA infringement hell.

But I'll admit, the main thinking was "Well, it'd be a good way to figure out how Linq ticks...", so that was reason number two.

So I set out by first of all modifying the Linq 101 samples to dump their output into a series of files, one per sample. I set those aside in a reference output directory.

Next, I recompiled against my (currently empty) Linq library. Obviously, I hit a lot of compile errors. These I fixed one batch at a time by stubbing out empty Linq operator extension methods. I took advantage of the great work done on Hooked On Linq to snag the method signatures and pseudocode explanations for each method, and got the app to a compilable state. Then, with a BeyondCompare window open showing the difference between my output files and the reference files, I set about making my output match the reference output (I could have had an NUnit harness compare the results directly, but the BeyondCompare output was a lot more revealing than a string mismatch assertion failure was going to be).

It didn't take that long, actually - about six hours' coding, I'd guess, including the test harness work. Linq against IEnumerable isn't that hard to write; although you do end up building a lot of methods to support every scenario in the 101 samples, none of them is that complicated - especially if you make the simpler overloads call into their more complex siblings and leverage your own growing codebase to build the more complex features, instead of building each operation in isolation. I did make use of some lightweight code generation to implement the 40 variant numerical aggregates (sum of a list of integers, sum of a list of doubles, average of a list of integers, etc.) - Visual Studio text templates are becoming my favourite toy of the moment; even so, so far I've not implemented the equivalent operations on lists of nullable numeric types. I had to draw the line somewhere.

How good are the results? Well, I'm outputting the same answers as the MS Linq library in general, with a few caveats. Occasionally I produce the results in a different order, and my Average implementation produces slightly different answers because I generate the average cumulatively, rather than accumulating the sum and dividing by the number of values (easily changed, but not just now). I'm sure that the 101 samples don't give that thorough a coverage of the code, either; they work as a good first pass test that the core functionality of filtering, grouping, joining and aggregating works, but they don't exercise corner cases (empty or infinite lists, for instance) or error scenarios, so a bit more robust unit testing around the framework might be worthwhile. (How does one unit test an infinite list, by the way?)

Still, all things considered it was a very worthwhile exercise for me personally. I now know what some of the funkier bits of the Linq query comprehension translation are asking for when they generate calls to GroupJoin and SelectMany. I discovered that writing the ThenBy method that is used after a call to OrderBy was probably the most complicated bit of the entire API (unless I'm missing something obvious). I found out about a few functions that exist in Linq that I didn't know about before, like Range, ToDictionary, and some of the variants of Aggregate.

So all in all, I think it makes for a pretty worthwhile exercise for anyone who has any intention of becoming familiar with Linq. Having built it up myself, I've now got a much greater understanding of what it's doing, and that's no bad thing.

Print | posted on Monday, March 17, 2008 6:00 PM

Feedback


Gravatar

# re: Reinventing Linq 5/1/2008 12:00 PM Lennie De Villiers

Hi,

Very nicely done! :-)

You can have a look at LINQBridge: http://www.albahari.com/nutshell/linqbridge.html I'm using it on one of my client's projects.

Kind Regards,

lennie De Villiers


# re: Reinventing Linq 5/6/2008 11:50 AM James

I have since found LINQbridge - comparing code, the approach they took to OrderBy/ThenBy is certainly a bit tidier than mine, though I'm gratified to see that it still has the same essential complexity, so I probably didn't miss anything obvious. I found it interesting that in a lot of cases, though both my approach and theirs involved reusing parts of your API to build others, in many cases we approached reuse from opposite ends. Also found it amusing that both their and my implementations broke the massive Enumerable class into a number of partial class files.

I have to say, though - LINQBridge doesn't ship with tests. How confident are you in it :)? I should run it through my 101-smaples test harness and see how it fares...


# Why Can't Microsoft Ship Open Source Software? 7/4/2008 12:01 AM Coding Horror

In Codeplex wastes six months reinventing wheels, Ryan Davis has a bone to pick with Microsoft: I saw an announcement [in March, 2007] that CodePlex, Microsoft's version of Sourceforge, has released a source control client. This infuriates me. This...

Title  
Name  
Email
Url
Comments   
Please add 5 and 5 and type the answer here: