Checking for EF to TSQL Query Compilation Changes in VS2010 Beta1

Previous "What's New in EF 4 Posts"

ef4

 

 

 

I know that the very brainy Kati Iceava is working on improving the query compilation that SqlClient does when creating TSQL from EF queries, so I checked a few queries to see if the changes had reached the Beta yet. I looked at three different queries and saw that the changes I was looking for are not in this particular version so I won't bother worrying about them until we get the next rev.

QUERY: 

from c in context.Customers.Include("Orders")
orderby c.Person.LastName
select c;

COMMENTS: This is a query that has to do a bunch of left outer joins to handle eager loading the child data and also ordering by reference data.

TSQL:   I don't even know how it would be possible to improve this because there are very explicit requirements of the query. SO far, no change yet.

 

QUERY:

string param = "Smith";
IEnumerable<Person> FiveLetterPeople = from p in context.People
                                                                where p.LastName == param select p;
List<Person> pList = FiveLetterPeople.ToList();

param = "Thompson";
IEnumerable<Person> EightLetterPeople = from p in context.People where p.LastName == param select p;
pList = EightLetterPeople.ToList();

COMMENTS: This parameters in the two version of this query get sent along with their length (@p__linq__2 nvarchar(5)',@p__linq__1=N'Smith' for the first query and @p__linq__2 nvarchar(8)',@p__linq__2=N'Thompson' for the second). When SQL Server executes them, because the parameter length is there, SQL can't leverage the query cache. The two queries are cached separately.

TSQL: No change yet, but I know that this is getting fixed.

Update - This test was run targeting .NET 3.5, not .NET 4.0. Whoops. The fix is, indeed , in Beta 1. Please see my follow up post: Upgrading EF projects to EF4- Don't forget to target .NET 4.0 

 

QUERY:

from c in context.Customers.Include("Orders")
where c.Person.LastName.StartsWith("M")
select c;

COMMENTS: StartsWith doesn't get converted very well into TSQL. I'm not sure if this is a problem in how the LINQ Expression tree or EF Expression tree are built or if it's a problem with how SqlClient transposes StartsWith. THere is no equivalent in TSQL and the method is translated to:

WHERE (CAST(CHARINDEX(N'M', [Join3].[LastName]) AS int)) = 1

This does not take advantage of indexing and is a very poorly performing query. While ESQL has a LIKE operator that you can use to force the TSQL to use LIKE in the predicate (WHERE LastName LIKE 'M%'), StartsWith doesn't "think" to use LIKE.

TSQL: No change yet. I know this is on Kati's radar and have high hopes that we'll see this query compile to a SARGable query in a later rev of VS2010.

#1 Cleve Littlefield on 5.19.2009 at 7:51 PM

On the first query, why do they not just use the multiple result feature of ADO.Net and munge it client side?So this:from c in context.Customers.Include("Orders")orderby c.Person.LastNameselect c;would roughtly translate to:SELECT [fields]FROM Customer CJOIN Person P ON C.PersonId = P.PersonIdORDER BY P.LastName----SELECT [fields]FROM Orders O WHERE O.CustomerId in (SELECT CustomerId FROM Customer)Or the second part can be:SELECT [fields]FROM Orders OJOIN Customer C ON O.CustomerId = C.CustomerIdSure, if you have a complicated WHERE for your base query, it would be executed multiple times.It could actually create a temp table (in a table variable) if this was a concern.Possible there would be the need to control if this needs to be wrapped in a transaction to prevent dirty reads.But all in all it would support eager loading much better.

#2 Julie on 5.21.2009 at 5:34 PM

Hi CleveThe queries are broken down into query trees and then the provider (which could be System.Data.SqlClient or any other EF enabled ADO.NET data provider written by a 3rd party and hitting some other db besides SQL Server) translates the query trees into it's specific syntax. The process is extremely generalized, rahter than having a person be able to read a particular query and identify the obvious optimal query.julie

#3 Cleve Littlefield on 5.22.2009 at 3:29 PM

I understand how the SQL is generated, however, I think they can have special handling for based on the model for query trees that use the Include statement, to allow better eager loading.

#4 Julie on 5.22.2009 at 3:37 PM

Got it.I would highly recommend putting this comment somewhere that the team can see it. Two places would be in the MSDN forums (be clear that it's a suggestion/comment not a question) or on the EF Design blog.julie