*Windows Azure SQL Database makes the blog post title just too darned long!
Followup post: Testing out EF6 Connection Resiliency
I’ve been messing around with the DbExecutionStrategy types in EF6 and trying to force the strategy to kick in.
I spent a horrid amount of time going down the wrong path so thought I would elaborate on the one I’ve been banging on: SqlAzureExecutionStrategy.
I thought that by disconnecting my network connection at the right moment I would be able to see this work. But it turned out that my results were no different in EF5 or with EF6 using it’s DefaulSqlExecutionStrategy (which does not retry).
I finally looked yet again at the description and at some older articles on connection retries for SQL Azure and finally honed in on a critical phrase:
A transient failure is one that will most likely correct itself pretty quickly.
Looking at the code for the SqlAzureExecutionStrategy, you can see the following comment:
/// This execution strategy will retry the operation on <see cref="T:System.TimeoutException"/> /// and <see cref="T:System.Data.SqlClient.SqlException"/> /// if the <see cref="P:System.Data.SqlClient.SqlException.Errors"/> contains any of the following error numbers: /// 40613, 40501, 40197, 10929, 10928, 10060, 10054, 10053, 233, 64 and 20
Focusing on these errors led me to a number of articles aimed at dealing with this same list of transient failure.
Here is a code sample from the Windows Server AppFabric Customer Advisory Team that contains brief descriptions of the errors:
// SQL Error Code: 40197 // The service has encountered an error processing your request. Please try again. case 40197: // SQL Error Code: 40501 // The service is currently busy. Retry the request after 10 seconds. case 40501: // SQL Error Code: 10053 // A transport-level error has occurred when receiving results from the server. // An established connection was aborted by the software in your host machine. case 10053: // SQL Error Code: 10054 // A transport-level error has occurred when sending the request to the server. // (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) case 10054: // SQL Error Code: 10060 // A network-related or instance-specific error occurred while establishing a connection to SQL Server. // The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server // is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed // because the connected party did not properly respond after a period of time, or established connection failed // because connected host has failed to respond.)"} case 10060: // SQL Error Code: 40613 // Database XXXX on server YYYY is not currently available. Please retry the connection later. If the problem persists, contact customer // support, and provide them the session tracing ID of ZZZZZ. case 40613: // SQL Error Code: 40143 // The service has encountered an error processing your request. Please try again. case 40143: // SQL Error Code: 233 // The client was unable to establish a connection because of an error during connection initialization process before login. // Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy // to accept new connections; or there was a resource limitation (insufficient memory or maximum allowed connections) on the server. // (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) case 233: // SQL Error Code: 64 // A connection was successfully established with the server, but then an error occurred during the login process. // (provider: TCP Provider, error: 0 - The specified network name is no longer available.) case 64: // DBNETLIB Error Code: 20 // The instance of SQL Server you attempted to connect to does not support encryption. case (int)ProcessNetLibErrorCode.EncryptionNotSupported: return true;
A TechNet wiki article on handling transient failures from Windows Azure SQL Database, warns:
One thing that’s tricky about retry logic is actually inducing a transient error for testing.
Oh, now they tell me! (I am a pitbull and tried so many things already).
Their solution is to intentionally cause a deadlock.
So the SqlAzureExecutionStrategy builds in the same retry logic that until now we’ve had to hand code (ala the example in the TechNet article or the Customer Advisory Team code sample.
I have no question that it will work. I just really like to demonstrate before and after when I can rather than just regurgitate what I’ve read somewhere. 🙂
Followup post: Testing out EF6 Connection Resiliency
Sign up for my newsletter so you don't miss my conference & Pluralsight course announcements!
12 thoughts on “EF6 Connection Resiliency for SQL Azure* – When does it actually do it’s thing?”
We had similar problems testing if it was working locally. Once it was in production though, it was obvious from the start.
Prior to adding the SqlAzureExecutionStrategy, we got db connection related exceptions logged fairly regularly. After deploying a version that uses the EF6 connection resiliency, our logs are virtually free of this kind of exception. The difference is huge. Previously, we were very worried about the connection issues with Sql Azure (worried enough to look at alternative persistence strategies) so this new feature is very welcome.
Hi, this feature is complemented by the new ADO.Net idle connection resiliency in .NET Framework 4.5.1 blogs.msdn.com/…/net-framework-4
Thanks for that link Erik. I was pointed to that feature to see if that’s why I wasn’t seeing the connection fail but in my case I was using VS2012 & .NET 4.5, not 4.5.1. But yes, they’ve built the connection resiliency in a bunch of places so we don’t need to rely on finding the guidance/code samples to deal with it any more! Even if you’re using straight ADO.NET.
this is a great confirmation of the feature. I will be able to point people to your comment when they ask about possible dropped connections with EF & Azure SQL DB. Thanks!
Have you found a way to log when it happens?
Ben, yes. Glenn Condron from the team suggested a much easier path to see this happening and it would also allow you to log it. EF6 now has command interception. You can use the method which hits as a query is about to execute and log it then. I’m also going to see if simply setting logging (a simpler way to use this feature) will capture the retries. Blog post forthcoming when I get some other stuff out of the way and some time to try the second idea out.
Hi Julie, We have been playing with this too and have not been able to find a reliably way to invoke a transient error for testing so I am interested if you find a way. Also I am concerned that a transient error doesn’t always mean that the transaction failed and you have no way of knowing. See blogs.msdn.com/…/sql-database-co
Given that most devs are going to run into the same issues on WASD do you know of any plans to create an idempotent schema with EF6 that can test for transient exceptions that have failed successfully verses failed unsuccessfully. Or is it every one for themselves for this common problem.
Erik's comment has me confused. I read the link, and it sounds like we don't need to use SqlAzureExecutionStrategy with EF6 – as long as we're running on .NET 4.5.1.
Do we need to continue to add a DbConfiguration to set up the use of this strategy, or is this all handled for free now when it comes to retrying queries in an Azure environment?
I managed to find a way to reliably evoke transient errors in azure.
Also, one thing to keep in mind when dealing with transient exceptions that another commenter hinted at above is the idempotency issue.
Article on how to actually implement Connection Resiliency and Command Interception
with working sample code in ASP.NET