LINQ To Entities – Balance Between Cost Of Database Calls And Data Volume

I was performance tuning some of the rushed queries on one of the web apps that I’m currently work on. What I’ve realized was writing efficient LINQ queries requires a little more effort than I thought. In this post I’m going to show you some of the issues I’ve encountered and the lessons I’ve learned from them.

Many Database Round-trip

One problem that I’ve encountered a lot was unnecessary database round-trips. I’ll explain later what I mean by unnecessary (because sometimes many round-trips is better than many Includes). Take the above query for example, here when ToList() is called, it means that query goes to database to retrieve the data. But are we finished with our query? No! the next line tries to project the query into a new view model and because it wants to declare a variable, it turns the expression lambda to a statement lambda. On line ten when it’s navigates to retrieve the ProductGroupId  by this code productGroup.ProductGroupToSystems.ProductGroupId  we hit the remote database for every OrderDetail in orderDetails.

This can be a major performance problem for remote databases. Because database calls can be expensive and should be avoided if we can. This is an arbitrary example, but even if we really had to use statement lambda or use LINQ to object, maybe because we want to call a local method, we should be aware of the cost of database calls and measure our performance with profiler. We should always pay attention to the fact that ToList() is for times when we have the final values after we project it to something else. Otherwise if we use navigation to retrieve something, we incur performance costs by more database calls, this query can be easily be refactored to the second version.

Query Database In A Loop

Another similar problem that I saw was we have a method that retrieves some specific row of data and that method is called in some kind of loop. In these situations one should simply query the database for all the data that we want once, and set the data for each object in memory, you can see the both version of queries down below.

Only Return What You Need From A Query

Another obvious problem that I Saw in a lot of queries was that we retrieve things we don’t need (imagine it with Tyler Durden’s voice), for example if we only need CustomerID, CompanyName, ContactName, why we retrieve an entire row with every column for it?

Delicate Balance Between The Cost Of Database Calls And Data Volume

Some people might tell you that you should hit the remote database as little as you possibly can. That can be true in a lot of cases, but not always. Sometimes hitting database lots of times is preferable to using lots of includes and retrieving a humongous amount of data. It’s really about  finding the right balance between the two. Let me show you a query that I had and then I show you the performance with and without includes.

First the query with include:

The query without include:

When I use include, I have one database call, but that is not a good thing, because that one call is very expensive. I have three nested includes, and that retrieves a ridiculous amount of data. If you want to know what happens when you use include, read this answer from stackoverflow. But when I don’t use include I have many database calls, but they have a very short duration, here I had 300 calls, if let’s say on average every call took 400 (which it didn’t, it’s less), 300 times 400 = 120,000. But look at when I used include in the first image, its duration is 291,709, so as you can see hitting the database less doesn’t have any value in this case. Here’s some good points about it from a post on stackoverflow.

Use AsNoTracking In Read-only Scenarios

If you don’t need to change the entities and you only want to retrieve them use AsNoTracking(), this can have couple of benefits:

  • Memory Usage: Because the context doesn’t reference the entities after they’re returned, GC can collect them as soon as they’re not referenced
  • Performance: Because EF doesn’t have to cross-match the returned entities with the ones in identity map, your queries might execute slightly faster
  • Currency: Because the returned entities materialized directly from the database as opposed to local cache, the returned entities reflects the latest values in the database
  • Statelessness: Because the entities are not being tracked, you can continue to use the same context instance indefinitely for read-only queries, because the reasons that short-lived Contexts are recommended doesn’t apply

dbForge Event Profiler for SQL Server

I highly recommend this product since it’s free and it made my life a lot easier. The user interface is better than sql server profiler and it has a lot of features that sql server profiler doesn’t have.

Summary

In this post, I discussed how unnecessary database round-trips can adversely effect performance. I also discussed that it shouldn’t be the goal of our LINQ queries and it all depends on our situation. I also described how AsNoTracking() can help us slightly increase the performance of our queries when  the entities are used only for reads.

Share...
 

Hamid Mosalla

Hi, I'm Hamid Mosalla, I'm a software developer, indie cinema fan and a classical music aficionado. Here I write about my experiences mostly related to web development and .Net.

 

Leave a Reply

Your email address will not be published. Required fields are marked *