The shadowy side of LINQ to SQL

Don’t get me wrong. I love LINQ and LINQ for SQL. But the more I have been digging into it the more things I discover that, as a developer, I may want to explicitly choose how to use. That’s probably why my talk on Linq for SQL with Web Apps last night at VTdotNET was pretty long. (Going to have to cut it down to fit a one hour slot at DevConnections – egad!)

I spent a lot of time trying to be sure people were aware of what is going on in the background when they may just see on the surface that they are so easily writing a query and the data so easily appears the way they want it to.

1) For example, the default Optimistic Concurrency. Not that Optimistic Concurrency is always a bad thing, but almost everyone groaned when I said that it was on by default – for all columns. The UpdateCheck property of LINQ to SQL column (oh, it’s so much easier to say and type “DLINQ”) defaults at Always. When you use the designer, that’s what you get. The enum I prefer is “WhenChanged“. The current (May CTP) designer doesn’t have a property exposed at the UI, which means that if you want to change it in code, you can’t use the UI again or it will be overwritten. Of course, this is the May CTP and all of this might change in future CTPs, Betas or the release. When writing any of your own update logic, it’s a lot of work to deal with all of those columns. I suppose any way you slice it, dealing with concurrency is a pain in the butt and what LINQ for SQL has is a stake in the ground.

2) Another biggie is to have an awareness of how the LINQ queries are turned into TSQL which is really matter of knowing is the hard work done in TSQL or in memory after the data is returned? I’ve been talking with Frans Bouma (creator of LLBLGen who knows a lot about ORM) about this as I was trying to understand what I was seeing. There doesn’t seem to be a discernible pattern – though there must be and I’m just not seeing it (until maybe I explore 2 or 3000 more queries? :-)) For the most part, the queries I looked at in SQL Profiler looked as I would expect them. And certainly much smarter people than myself are creating the algorithms to do this work. But there was one query that stood out like a sore thumb.

The query gets data from one table which has child objects. I’m displaying data from the child objects, so I do expect lazy loading to first grab the parent data and then grabs a set of child objects for each parent. It also does some filtering based on the child table. For example suppliers and it’s products.

from s in db.suppliers where s.Products.Count > 2 select s

So on the surface, I’m just asking for this sql query.

select * from suppliers  WHERE ((
    SELECT COUNT(*)
    FROM [Products] WHERE products.[SupplierID] = suppliers.[SupplierID])) >2

This returns 16 of the 29 suppliers in Northwind.

And then as I populate my data control and ask for product info, I expect lazy loading to go out and get the products for each of the 16. So 16 more queries.

But what I get is two additional queries for each supplier.

First there is a query which gets the count of products for the supplier. Second is a query that gets the products.

I can’t figure out why the first query is necessary.

Of course, (caveat caveat caveat), I can’t imagine being the person (probably more than one! ) that is creating the code that has to take any weird LINQ query I come up with and translate it into TSQL. So my point is more that [some] developers might want to know that this is going on and make a decision about whether or not to use the defaults, override them (note this from the Dec 12th chat with the LINQ folks about upcoming mods: You can disable deferred loading by setting a property on the DataContext, and then you can describe pre-fetch behavior using the new DataShape class.“) or just write your own data access.

3) Which begs to point out Lazy Loading: Again there are pros and cons to any technology choice you make. Just know that it’s going on and how it changes based on how you do write or enumerate your queries. The samples of lazy loading usually show code which explicitly enumerates through query results. What people also need to think about is that it’s happening elsewhere too. For example, the way I defined the datalist that displays the above Supplier/Product data forces lazy loading to happen.

4) Caching and nullables. I’m putting these together because I haven’t dug into them. I think anyone interested in performance should know how LINQ caches data and what to expect from it (and what NOT to expect from it).

5) All those other table/column properties. I haven’t gotten to them yet. For example, DeleteCommand on a Table. I wonder if that just lets me point to the stored proc that I want to be run for a delete.

6) When you think you found a way to do something, there are probably 18 other ways to do it too. Of course, that’s nothing new! 🙂

I have a lot more banging around in my head. But I gotta get back to work. Though I still need to work up a good response to the guy who summed up LINQ to SQL as a “glorified dataset”.

Though I was worried that I was a little fanatical when I presented this stuff last night (hey, it’s data access, whadya want?), I was happy to hear from attendees that they were really glad I went behind the magic to show what’s happening in the background that might be of interest to them.

  Sign up for my newsletter so you don't miss my conference & Pluralsight course announcements!  

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.