Use Projections and a Repository to Fake a Filtered Eager Load

Entity Framework’s Include method which eager loads related data in a query does not allow sorting and filtering. People ask for this feature frequently.

There is a way around the problem,  I wrote about it in  the June Data Points column in MSDN Magazine (Loading Related Data: http://msdn.microsoft.com/en-us/magazine/hh205756.aspx) but since I just saw another comment on twitter about this, I thought I would stop working on Chapter 7 of the DbContext book that is already delayed and write a quick blog post about using a projection instead.

One caveat…you cannot do this if you have a long running context and you want the objects to remain attached to the context. If there are other related entities in the context, the context will always give them to you. More on this at the end of the post.

Okay, today’s model will be based on a roads and trees on that road. I need some type of guide because we’ve been socked in with fog for two days and I can barely see the trees across my road.

We’ll have two classes (and no I’m not trying to rub in the fact that we don’t have geo support yet, but it’s coming in .NET 4.5.)

public class Tree
  {
    public int Id { get; set; }
    public string Description { get; set; }
    public decimal Lat { get; set; }
    public decimal Long { get; set; }
    public int RoadId { get; set; }
  }
  public class Road
  {
    public Road()
    {
      Trees = new List<Tree>();
    }
    public int Id { get; set; }
    public string Name { get; set; }
    public string Town { get; set; }
    public List<Tree> Trees { get; set; }
  }

The way loading works in Entity Framework is an all or nothing scenario with one exception.

The DbContext lets you do a filter in an explicit load (i.e. after the root is already in memory). Here’s a passing test which loads only maple trees and checks to see that the count of non-maple trees is NOT greater than 0.

(MyInitializer seeds the database with one road that has three related Trees (a maple, a maple and a pine)).

    [TestMethod()]
    public void CanFilterOnExplicitLoad()
    {
      Database.SetInitializer(new MyInitializer());
      var context = new DataAccess();
      var road = context.Roads.FirstOrDefault();
      context.Entry(road).Collection(r => r.Trees)
        .Query().Where(t => t.Description.Contains("Maple"))
        .Load();
      Assert.IsFalse(context.Trees.Local
.Where(t=>!t.Description.Contains("Maple"))
.Count()>0);
    }

But that’s beside the point, since the question is about eager loading.

You can’t filter when eager loading with Include.

But you can get the results you want eagerly if you project.

  [TestMethod()]
    public void CanFilterOnProjection()
    {
      Database.SetInitializer(new MyInitializer());
      var context = new DataAccess();
      var road = context.Roads
        .Select(r=>new{
          r,
          r.Trees.Where(t=>t.Description.Contains("Maple")})
        .FirstOrDefault();
      Assert.IsFalse(context.Trees.Local
.Where(t => !t.Description.Contains("Maple"))
.Count() > 0); } }

That’s nice but whine whine whine, I returned an anonymous type.

If you use a repository, you can hide that.

  public class DumbRepositoryButGoodEnoughForThisDemo
  {
    DataAccess _context=new DataAccess();

    public List<Road> GetRoadsWithFilteredTrees(string treeFilter)
    {
      var roadAndTrees = _context.Roads
       .Select(r=>new{
          Road=r,
          Trees=r.Trees.Where(t=>t.Description.Contains("Maple"))})
        .ToList();
      return roadAndTrees.Select(rAt=>rAt.Road).ToList();
    }
  }

When I return the list of roads from the projection, the Trees will be attached thanks to the context recognizing the relationships.

Here’s another test that does pass:

   [TestMethod()]
    public void RepositoryFilteredRoadsReturnsRoadWithTrees()
    {
      Database.SetInitializer(new MyInitializer());
      var rep = new DumbRepositoryButGoodEnoughForThisDemo();
      var roads = rep.GetRoadsWithFilteredTrees("Maple");
      Assert.IsTrue(roads.FirstOrDefault().Trees.Any());
      Assert.IsFalse(roads.FirstOrDefault()
.Trees
.Where(t => !t.Description.Contains("Maple"))
.Count() > 0); }

And a screenshot as further proof:

projection 

Scenario When This May Not Work As Expected

As Brian points out in the comments, there is one BIG draw back that is also a problem with the filtered explicit load. If there are already related entities tracked by the context, they will automatically be attached to any related entity in the context. That means if the Pine tree was already being tracked then as long as the Road is attached to the context, it will see ALL of the related trees in the context, including the Pine that was already there.

If you are using a pattern where you have a short-lived context that is instantiated just to execute the query, then it’s not a problem. Most of my architectures are like this. But if you are writing a Windows Form or WPF form and have a context that hangs around, you could run into this problem.

You’ll never avoid the problem when the entities are attached to the context.

But here’s a twist on the repository method that will return disconnected objects from a context that is managing multiple object.

   public List<Road> GetDisconnectedRoadsWithFilteredTrees(string treeFilter)
    {
      var roadAndTrees = _context.Roads.AsNoTracking()
       .Select(r => new
       {
         Road = r,
         Trees = r.Trees.Where(t => t.Description.Contains("Maple"))
       })
        .ToList();

      var rt = new List<Road>();
      foreach (var r in roadAndTrees)
      {
        r.Road.Trees.AddRange(r.Trees);
      }
      return roadAndTrees.Select(rAt => rAt.Road).ToList();
    }

18 thoughts on “Use Projections and a Repository to Fake a Filtered Eager Load

  1. I haven’t tried this, but is the magic that lets it recognize the relationship and return the (now filtered) related entities in any way related to the ".Local" first level cache? I’m just wondering if you could accidentally get more than you bargained for if more related entites happened to be known to the first level cache) due to previous queries to the same Entity in that context instance.

  2. @Brian.Hmm interesting question. This is behavior that we’ve had since EFV1 so it’s not due to the DbSet.Local property.  Because the context is tracking the related entities, it automatically builds up it’s own comprehension of the graphs via the navigations. The team refers to this as "relationship span". If the road was already in the context and it had a 2 related maple trees also in the context that were not in the database, then those would be returned as well. When the db retrieves the road and other trees from the database, and sees that the road is already in the context, by default it will use the existing road (not the one from the db). But that affect of returning ALL of the Maple trees (those known by the context and those that just came from the database)…. I believe you would get the same affect from an Include query when the road and some maple trees unknown to the db were already in the context.

  3. Yeah, sorry…didn’t word that very well since I see the world through DbContext colored glasses :)

    Specifically I was wondering if you used your setup above to query for a Road with its Maple trees and then query for that same Road and, say, its Birch trees, then after the second call to the repository would the tree collection you got back also unintentionally include the Maple trees (because even though they wouldn’t have come from the database on the second query, the context already knew about them from the previous query)?

    I hadn’t even considered the addition of entities that didn’t come from the database but that’s an interesting wrinkle as well.

  4. Sorry I had to be the bearer of the monkey wrench… Love the column and the blog. Keep up the great work (and enjoy your trip next month to glorious Sandusky…simultaneously the city of my birth and most despised name in recent popular media).

  5. One trick here, you must have LazyLoading explicitly turned off, and changeTracking on, for this to work as described.

Leave a Reply

Your email address will not be published. Required fields are marked *


4 + = nine

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>