What is entity framework?

Entity Framework is an ORM (Object Relationship Mapping) tool. While this doesn't mean a lot, what is more important is what it does.

Entity Framework (or EF) is acting as a translator, and mapping external concepts into objects that can be used within your code. EF is most commonly used in conjunction with SQL server (Microsoft's flagship relational database). In this particular setting EF can be used to scan your database and generate classes describing each and every concept stored in the database.

Entity framework gets a lot of bad press as it takes away control in favour of ease, but on the flipside you only have to manage one code base, and you rarely have to delve into the database. Historically you would have to maintain code in both your application and the database leading to duplication as well as interesting maintenance challenges.

Performance

However, one of the biggest hurdles that can be introduced when implementing EF are performance issues. These usually stem from the implementation and can be described, much to the love of my team, by using an elaborate set of analogies. These focus mainly around querying a database using entity framework.

Problem One - Round trips.

When you do your weekly shopping you wouldn't go to the store, buy a product, go home and then go and buy the next item on your shopping list.

With entity framework you build a query which is then executed at the point you enumerate it. This effectively turns your wish list of items into a physical shopping basket of items you can then use. If someone were to forget to actually checkout and get those items, just leaving them in a pile at the supermarket, they would be left scrabbling to collect each one as required. This is the same with Entity Framework. What began as an efficient one-off trip to get the information your application needed rapidly turns into many many trips each adding a small delay which, when discussing 10's, 100's or 1000's, really adds up while also adding strain to your supermarket database that will need to process more requests and more checkouts.

Problem Two - The Empty Packet.

When you go to the store to buy a ready meal but then find out the box is empty and have to make a trip back to the supermarket for each of the components.

One of the powerful things that Entity Framework can do is handle the relationships between your data and allow you to have that modelled in code. In a supermarket setting that might be a single user with an order that has many products. Entity Framework then permits us to talk about both those concepts at the same time, together or independently. If, in that setting, you only wanted to know when a customer checked out, you probably don't need the full list of related items and could suffice with just the order. However if you wanted to print a till receipt you would need every item. Where this can become an issue in Entity Framework is when the required data isn't prefetched. With Entity Framework you can specify "Includes" data that you would like fetching that relates to your main query. So you could bring back the order and include the items. Obviously you would only want to do this in a setting where you needed the data or else that would be wasteful. However, in the setting where you need that but don't fetch it initially it's easy to get trapped in a scenario where each time you talk about the list of products they are individually lazy loaded. This results as per the analogy - in a very chatty application that will request every single product one at a time. Lazy loading can be disabled as a whole but then your application would crash at the point you needed that related but unloaded data.

Problem three - The whole store.

You wouldn't go to the supermarket and bring home the whole store to get that one item you needed to buy.

This one is a bit more recent and ties into the EF Core releases (now fixed in .Net 5 EF). This ties back into the including of related data and making sure everything is up-front loaded. Historically in EF6 (.Net Framework) this would split the queries one for each concept and return these efficiently. With EF Core, a question was raised over data consistency vs this splitting. If the data is fetched as part of multiple queries any of those individual concepts could have been updated and the data could be incorrect. This led to the removal of the splitting (now reintroduced as a toggleable option in the latest version). This caused the issue of bringing together each different database concept into a single query and, due to the way SQL server works, this created a multiplied out number of rows. In one instance we saw 26 rows become 43,500 which was a significant performance hit. This can be overcome by enabling the splitting of queries or reducing the number of concepts brought back in a single query and manually splitting these.

While these can be prevented globally by preventing lazy loading and configuring the database context to work as you desire, the analogies still help explain why each of those options is so important when running an efficient application.

If you are experiencing a performance issue with your .NET application, and it uses EF STaC Solutions may be able to help. We've helped our customers optimise applications that receive millions of database queries a day, assisting them in reducing the cost of their underlying Azure cloud infrastructure. Don't hesitate to reach out if you are having issues.

You can email us here, tweet us over on Twitter, or message us on LinkedIn