The Great DataSet Debate

Speaking of Scott Hanselman, he started a fantastic discussion about datasets. His issue was returning datasets from webservices (public ones) that may have subscribers that are not .NET clients and don’t grok datasets. There are a bunch of discussions that spun off of this as well as a pointer to a February discussion and incredible debate in the comments on Barry Gervin’s site.

Being a database developer for a gazillion years, I love working with data in my apps so I thought I’d add my 2 cents into here of some of the things that crossed my mind while reading through some of this stuff.

For the last few years, .NET has defined a choice of web services vs. remoting. Basically the prescription was if you are using nothing but .NET in your solution (client and server), then go for remoting and use strongly defined objects (though there were still some interesting perf #s when it came to datasets..). If you were creating something public that could be subscribed to by non .NET clients, then use a web service. That definition also presumes that datasets might not be the thing to return from the web methods since you would be dealing with clients that need to do a lot of work to read the dataset. (by my read, this is Scott’s main point).

I should mention here that I use web services for a lot of applications that are strictly .NET based and I have never implemented remoting. This was not an explicit decision based on some deep investigation, rather because I learned web services first (since Microsoft marketing really forgot the remoting message for the first 6 months) and stuck with them.

With Indigo coming down the pipes, that prescription changes. Basically the message is that if you are creating something new, use web services. Period. Regardless of an unknown or a strictly .NET defined client. So now many more strictly .NET solutions will be using web services and returning datasets will not pose that particular problem (client not understanding datasets) and the question will come down to perf. As Scott Swigart points out “Hanselman lives in an arena where there’s no such thing as fast enough, but few applications live under this constraint.”

There is definitely a choice to be made, but I think the bigger problem is that many developers are not aware of the choice. The dataset has become their tofu – an all purpose, maleable tool that can be used as every data container. I wonder how many people know that you can even fill a datatable from a dataadapter? Or how about just using a dataset/datatable/datareader as a translator….to get data out of a database and then push it into another type of object.

Many developers do not understand the baggage that comes along with a dataset. It’s good/useful/handy/helpful baggage if you need that functionality. But it can be dead weight if you are not using it. 

ADO.NET 2.0 is going to help more too, of course, with the datatable becoming a first class citizen (readxml and writexml), dataset can now load data in from a datareader (though you can’t return a datatable or datareader from web services with the same simplicity of dataset), lots of new merging functionality and more.

  Sign up for my newsletter so you don't miss my conference & Pluralsight course announcements!  

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.