Hooray! Turns out it was a bug ("unintended breaking change) and it’s getting fixed for RTM! 🙂 [see Connect submission and Microsoft’s response] Great catch George!
This is an edge case that I had not noticed, but was brought to my attention this morning in an email from Dave Russell who pointed me to this forum thread: EF4 Include method returns different results than EF1 Include
I had to test it myself because that’s the way I am.
Here is a model with the same set up : a many-to-many relationship between Person & Brewery defined using a join entity (FavoriteBrewery) and Brewery has a one-to-many relationship with Beer.
Putting a Stake in the Ground with .NET 3.5
This is the .NET 3.5 version of the model with no foreign keys.
If I query for all people
var list = context.People.ToList();
I get back 137 people.
If I query for all people who have any FavoriteBreweries:
list = context.People.Where(p => p.FavoriteBreweries.Any()).ToList();
I see there are only 3 people who have favorite breweries
Now let’s start eager loading.
list = context.People.Include("FavoriteBreweries").ToList();
This gives me all 137 people, with those favorite breweries attached to 3 of the people. (I checked by querying the ObjectStateManager for entries that are FavoriteBrewery types.)
Next, I eager load the Brewery objects along with those FavoriteBrewery join entities.
list = context.People.Include("FavoriteBreweries.Brewery").ToList();
I still have 3 FavoriteBreweries join entities. (There are only two breweries in my sample database.)
Person 1 — FB — Brewery1
Person 2 — FB — Brewery2
Person 5 — FB — Brewery1
Now I ask for the beers that are made by the breweries
list = context.People.Include("FavoriteBreweries.Brewery.Beers").ToList();
In the lame little sample database, Brewery1 has 2 beers and Brewery2 has 0 beers.
I still get the same # of Favorite Breweries
Person 1 — FB — Brewery1 …Beer Count=2
Person 2 — FB — Brewery2 …Beer Count =0
Person 5 — FB — Brewery1 … Beer Count=2
Testing the same loading in .NET 4
In .NET 4, there has been a LOT of work done on the query generation. .NET 3.5 used a LOT of outer joins which was really messy. Much of this has been replaced with inner joins. But the outer join allowed the previous query to bring back the Favorite Breweries and Breweries even when the Breweries had no beers.
Using a new version of the model that now includes foreign keys:
And running the same tests in .NET 4.0 where I”m getting the benefit of the new query processing, the query results change.
It is only when I eager load beers (where I know I have an empty collection for one of the breweries).
Here are the results:
Person 1 — FB — Brewery1 …Beer Count=2
Person 5 — FB — Brewery1 … Beer Count=2
The inner join prevented the database from discovering Brewery2.
I tested the same type of eager loading but this time using a many to many that does not include a join entity.
The results for an eager load that grabbed People.Include(“StoreLocations.Registers”) did not pick up the Stores with no registers.
This is definitely different than what we have learned to expect from querying. I would only expect the stores to be filtered out if I had explicitly added logic into my query to only retrieve stores with registers.
It’s a breaking change that, because it’s an edge case, I don’t expect to be modified by RC so you need to be aware of it.
I’ll probably just point to this blog post from my book rather than walking through the whole thing again.
Thanks Dave for pointing out the forum and to George Fitch who started the forum thread: “good catch!”.
  Sign up for my newsletter so you don't miss my conference & Pluralsight course announcements! Â
Thank you for writing about this Julie. When I use the Include method on a query, I expect it to do just that, Include everything that I tell it to. Not IncludeAlmostEverything, not IncludeWithExceptions, just plain INCLUDE. Like I said in my MSDN post, I’m not very happy with this change. As a matter of fact I’m very POed. Henrik Dahl’s workaround is OK, to a point. Increasing the query time by 50% is not my idea of progress. Other workarounds that I’ve tried, including projection techniques shown on your own site, have worked but are even slower when the query returns 1000 or so results (like my original post stated, I’m getting these results for a telephone directory). I’m submitting this to MSDN Connect, but like you said, nothing will probably change. Damn EF4 with its fancy Inner joins…
It must be a bug
For anyone interested, here is the link to the Microsoft Connect suggestion:
connect.microsoft.com/…/ef4-include-met
Microsoft left a comment on the above (post #3) Connect suggestion page today about enabling EF1’s Include method behavior. While not promising anything this late in the release cycle, it does sound encouraging.
Microsoft has now responded that this new behavior is NOT intentional, but actually a bug. They also intend to fix it in time for EF4 RTM.
connect.microsoft.com/…/ef4-include-met