Service-orientation is inevitable for performant database applications

Databases cause two performance problems for applications. Firstly they are difficult to scale out compared to other server roles which means they can easily become the scalability bottleneck, and secondly they are on a remote machine which means that each query made to them involves network latency. To reduce processing overhead and latency, a common approach is to make as few queries as possible, and to only return as much data as is strictly needed for any query, meaning that fewer joins are likely in the database and the network payload is smaller.

The object-oriented approach to reducing payload size is lazy-loading, where only the most frequently used properties of objects are retrieved in the query, and properties that are thought to be less frequently used are loaded on-demand when they are accessed. Lets take a concrete example. Say you are modelling a movie with information such as the title, thumbnail, and actors you might model this in the object-oriented world as a Movie class with Title, Thumbnail and Actors properties. If you find that most of the time you only need to display the title and thumbnail but not the actors, then you might decide to lazy-load the Actors property to remove the join in the database to the table of actors and reduce the network payload.

But what happens when you have a view of movies that needs to display the actors for each movie in the list? It’s not a commonly used view so you want to keep the actors lazily loaded (as always populating them would make the general case less efficient) but it’s not so infrequently used that poor performance can be ignored.

Without breaking encapsulation the only possible solution is that each movie in the list makes an independent call to the database to retrieve its actors, so you cause multiple database accesses in a relatively inefficient way (databases are optimised for processing data in sets so single rows are typically not much quicker than a small set) and add significant network latency too. To reduce the latency you could use a parallel loop, but even then you haven’t alleviated any of the database overhead which is the thing we’re most concerned about.

If you have collections of objects that have lazily loaded properties, you have to break encapsulation to improve performance.

We could break encapsulation only as far as the collection and say that movies have a specialised collection type and that they are aware of the collections they are contained in, so when a movie is asked to load its actors it requests the collection to load the actors for all the movies as a bulk operation. This means that the movie has to keep track of which collections it is in using callbacks from the add/insert/remove operations, and that you are restricted to specific types of collection which precludes the use of things like Linq-to-Objects. Breaking encapsulation in this way isn’t an attractive option.

The next level at which we could break encapsulation is to create a static method on the Movie class that accepts a sequence of movies and populates the actors on each of them. We now aren’t restricted to specific collection types, and the population of the actor properties can be done in a single efficient batch operation, but this is no longer transparent to the user as they have to call a method to have the properties populated. Moreover, because the properties exist on the class this is somewhat unintuitive as you don’t expect to have to pass an object to a method to have its properties populated, so it probably makes more sense to remove the Actors property from the Movie object and have the method that does the bulk retrieval return a dictionary of actors by movie.

Unfortunately when it comes to testing this static method to retrieve the actors, we find that it isn’t very test-friendly as it can’t be stubbed or mocked, so there always has to be a database with suitable test data behind it. Instead of a static method, then, we’ll move it off the Movie class and make it an instance method on a MovieService class which is retrieved in an indirect way (such as from a service container) to enable stubbing and/or mocking. We’ve now got a highly performant and testable solution, but to achieve this we had to remove any direct relationship between movies and actors.

This scenario can be applied to any lazily loaded property on any type of object which can exist in collections, which leads us to the inescapable conclusion that lazy loading simply isn’t a reasonable option in most cases, thus objects must always be returned as a whole. As such, objects must not have a direct relationship with any other object that does not form part of that whole. Any properties that would have been lazily loaded are now separate objects, and retrieved in bulk by services as and when needed. The domain model has been transformed into a set of state objects with no direct relationship which are passed between services; it has been transformed into messages.

By optimising the performance of our object-oriented model, we arrived at a service-oriented message-passing architecture.

You may not like it. You may say that this isn’t the easiest model to program against. You may say it reduces the discoverability of related items and functionality. And I’m not going to argue with you on any of those points. But take consolation in the fact that it was inevitable from the start.

blog comments powered by Disqus
Fork me on GitHub