Entity Framework June 2011 CTP: TPT Inheritance Query Improvements

I want to look at some of the vast array of great improvements coming to EF that are part of the June 2011 CTP that was released yesterday.

Everyone’s going on and on about the enums. Smile

There’s a lot more in there. Not sure how many I’ll cover but first up will be the TPT store query improvements.

I’ll use a ridiculously simple model with code first to demonstrate and I’ll share with you a surprise discovery I made (which began with some head scratching this morning).

Beginning with one base class and a single derived class.

 public class Base
 {

   public int Id { get; set; }
   public string Name { get; set; }

 }

  public class TPTA: Base
  {
    public string PropA { get; set; }
  }

By default, code first does TPH and would create a single database table with both types represented. So I use a fluent mapping to force this to TPT. (Note that you can’t do this configuration with Data Annotations).

  public class Context: DbContext
  {
    public DbSet<Base> Bases { get; set; }

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
      modelBuilder.Entity<Base>().Map<TPTA>(m => m.ToTable("TPTTableA"));
    }

  }

Here’s my query:

 from c in context.Bases select new {c.Id, c.Name};

Notice I’m projecting only fields from the base type.

And the store query that results:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name]
FROM [dbo].[Bases] AS [Extent1]

Not really anything wrong there. So what’s the big deal? (This is where I got confused… Smile )

Now I’ll add in another derived type and I’ll modify the configuration to accommodate that as well.

public class TPTB: Base
{
  public string PropB { get; set; }
}

modelBuilder.Entity<Base>().Map<TPTA>(m => m.ToTable("TPTTableA"))
.Map<TPTB>(m => m.ToTable("TPTTableB"));

Execute the query again and look at the store query now!

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name]
FROM  [dbo].[Bases] AS [Extent1]
LEFT OUTER JOIN  (SELECT 
    [Extent2].[Id] AS [Id]
    FROM [dbo].[TPTTableA] AS [Extent2]
UNION ALL
    SELECT 
    [Extent3].[Id] AS [Id]
    FROM [dbo].[TPTTableB] AS [Extent3]) AS [UnionAll1] ON [Extent1].[Id] = [UnionAll1].[Id]

Egad!

Now, after switching to the new bits (retargeting the project and removing EntityFramework.dll (4.1) and referencing System.Data.Entity.dll (4.2) instead)

The query against the new model (with two derived types) is trimmed back to all that’s truly necessary:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name]
FROM [dbo].[Bases] AS [Extent1]

Here’s some worse ugliness in EF 4.0. Forgetting the projection, I’m now querying for all Bases including the two derived types. I.e. “context.Bases”.

SELECT 
CASE WHEN (( NOT (([UnionAll1].[C2] = 1) AND ([UnionAll1].[C2] IS NOT NULL))) 
AND ( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)))) THEN '0X' WHEN (([UnionAll1].[C2] = 1)
AND ([UnionAll1].[C2] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1], [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], CASE WHEN (( NOT (([UnionAll1].[C2] = 1) AND ([UnionAll1].[C2] IS NOT NULL))) AND ( NOT (([UnionAll1].[C3] = 1)
AND ([UnionAll1].[C3] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C2] = 1)
AND ([UnionAll1].[C2] IS NOT NULL)) THEN [UnionAll1].[PropA] END AS [C2], CASE WHEN (( NOT (([UnionAll1].[C2] = 1) AND ([UnionAll1].[C2] IS NOT NULL))) AND ( NOT (([UnionAll1].[C3] = 1)
AND ([UnionAll1].[C3] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll1].[C2] = 1)
AND ([UnionAll1].[C2] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll1].[C1] END AS [C3] FROM [dbo].[Bases] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id], [Extent2].[PropA] AS [PropA], CAST(NULL AS varchar(1)) AS [C1], cast(1 as bit) AS [C2], cast(0 as bit) AS [C3] FROM [dbo].[TPTTableA] AS [Extent2] UNION ALL SELECT [Extent3].[Id] AS [Id], CAST(NULL AS varchar(1)) AS [C1], [Extent3].[PropB] AS [PropB], cast(0 as bit) AS [C2], cast(1 as bit) AS [C3] FROM [dbo].[TPTTableB] AS [Extent3]) AS [UnionAll1] ON [Extent1].[Id] = [UnionAll1].[Id]

And now with the new CTP:

SELECT 
CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)))
AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN '0X' WHEN (([Project1].[C1] = 1)
AND ([Project1].[C1] IS NOT NULL)) THEN '0X0X' ELSE '0X1X' END AS [C1], [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1)
AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1)
AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[PropA] END AS [C2], CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1)
AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([Project1].[C1] = 1)
AND ([Project1].[C1] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [Project2].[PropB] END AS [C3] FROM [dbo].[Bases] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id], [Extent2].[PropA] AS [PropA], cast(1 as bit) AS [C1] FROM [dbo].[TPTTableA] AS [Extent2] ) AS [Project1] ON [Extent1].[Id] = [Project1].[Id] LEFT OUTER JOIN (SELECT [Extent3].[Id] AS [Id], [Extent3].[PropB] AS [PropB], cast(1 as bit) AS [C1] FROM [dbo].[TPTTableB] AS [Extent3] ) AS [Project2] ON [Extent1].[Id] = [Project2].[Id]

At first glance you may think “but it’s just as long and just as ugly” but look more closely:

Notice that the first query uses a UNION for the 2nd derived type but the second uses another LEFT OUTER JOIN. Also the “Cast 0 as bit” is gone from the 2nd query. I am not a database performance guru but I’m hoping/guessing that all of the work involved to make this change was oriented towards better performance. Perhaps a DB guru can confirm. Google wasn’t able to. Winking smile

July 6th Update: I talked with Kati Iceva who is the EF query compiler (among other things) goddess on the EF team. She told me that the first query (projecting from the base) is where to look for benefits. The second one is not there yet. The fact that they’ve got the outer joins now rather than the Union and that they lost the extra cast is a setup for some future improvements to TPT queries that they’ll be able to implement. There is a TPT blog post forthcoming on the team blog so but if you’re interested enough in EF to read my blog, you probably read that one (and their EF Design blog) already. Winking smile

I did look at the query execution plans in SSMS. They are different but I’m not qualified to understand the impact.

Here’s the plan from the first query (from EF 4 with the unions)

planfromunions

and the one from the CTP generated query:

planfromctp

9 thoughts on “Entity Framework June 2011 CTP: TPT Inheritance Query Improvements

  1. Hi July,

    I understand that you can do TPT using the [Table("TPTTableA")] data annotation, right?

    Anyway, both queries are horrible, not just ugly, it looks like there is some work to do for the EF team!

  2. Hi Julie,

    The biggest problem with TPT in the current version is that it takes forever to even generate the query in the first place (let alone execute it). I think with 3 or 4 tables inheriting from a base class the code generation for the sql was taking about 1 minute (that was even before it hit the db!!). Hopefully, even if the code is ugly it will at least be able to run!

  3. Hi Julie,

    Great stuff this is! Thanks for sharing.

    Have you seen what happens when you make the Base class abstract and execute the "from c in context.Bases select new {c.Id, c.Name}" query?

  4. The resulting query using your Base (now marked as abstract), TPTA and TPTB classes is:

    SELECT

    [UnionAll1].[Id] AS [C1],

    [Extent3].[Name] AS [Name]

    FROM (SELECT

    [Extent1].[Id] AS [Id]

    FROM [dbo].[TPTTableB] AS [Extent1]

    UNION ALL

    SELECT

    [Extent2].[Id] AS [Id]

    FROM [dbo].[TPTTableA] AS [Extent2]) AS [UnionAll1]

    INNER JOIN [dbo].[Bases] AS [Extent3] ON [UnionAll1].[Id] = [Extent3].[Id]

    I was playing with the CTP and used a subset of an existing model. As you may have guessed the root class of that model happens to be abstract. So I didn’t see any improvement in the queries not knowing the abstract keyword was causing this at that time. After trying your example I found out about it.

  5. Forgot to mention that the difference between the abstract and the non-abstract variant is already present in EF 4.1. The resulting queries of the abstract variant of 4.1 and the CTP (4.2) are exactly the same. Only the non-abstract variant shows improvement in 4.2.

    If someone knows the reason behind this difference I would be glad to hear.

Leave a Reply

Your email address will not be published. Required fields are marked *


2 + = five

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>