Creating Models from Databases with Billions of Tables

By now you know I’m prone to exaggeration so that “billions” thing is a stretch, but you get the point hopefully.

Lots of legacy databases have hundreds or even more tables in them.

People building Entity Data Models by reverse engineering a database often bring every table and view from the database into a single model.

Here’s an untouched model built from all of the tables and view of the AdventureWorks database – totalling about 95 entities.


This is already unmanageable and causes design time performance issues, and it’s not even 100 entities.

I get questions all of the time from developers asking what to do  with large models.

Here is a recent question I got via email":

“I have a database with 150+ tables. Would you break the model up into multiple edmx files?”

My response (with some enhancements for the sake of this blog post):

Yes I would. I have lots of clients working with multiple models. One approach is to think of a model per transaction. E.g., if you have a data entry screen that only needs 6 entities, build a model for that. (That may be exaggerated).

Take a look at:  scroll down to his large model discussion & demo. Ward and I have come up with the same conclusion. Ward has written something like a dissertation as he goes through the pros & cons of various approaches to the problem.

Also, another approach to consider (with or without breaking up the model) is a different designer that doesn’t try to visualize all of the entities (which makes things so cumbersome with the EDMX designer). LLBLGen Pro is already a great ORM in its own right. But they’ve taken their knowledge and built a deisgner specifically for EF4 with the goal of dealing with larger models.

Finally, just as an FYI, I’m guest editing an upcoming issue of DevPro Connections magazine and specifically asked Patrik Lowendahl to write an article on this very topic.

  Sign up for my newsletter so you don't miss my conference & Pluralsight course announcements!  

2 thoughts on “Creating Models from Databases with Billions of Tables

  1. Interesting point. Out of interest, do you have any feel for the extent to which you would/should use multiple models if you’re doing code-first with lots of types?

  2. If you are using code first then I expect it should be easy to envision your domain boundaries. You’d define which classes belong in a domain by whatever you expose through the context. Does that makes sense? I don’t hnk you need to think about "multiple models", just focus on a context class and what entities it exposes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.