Database concurrency in the real world
 

Home
Mission
News
Blog
Tools
FAQ
Publications
Downloads
Contact
Contents

Database concurrency conflicts in the real world.

- Maurice de Beijer -

 

A lot of articles have been written about database concurrency conflict detection and the various ways of handling them. Unfortunately most of these articles, and accompanying solutions, have one major flaw in focusing on the technical issues and database implementation instead of real world data and how this is used. In this article I will try to show the difference between focusing on the database implementation and on the real world data. I will show some possible approaches on how to solve these concurrency issues.

 

What is a database concurrency conflict

Let’s start with a quick recap of what database concurrency conflicts are and why they need to be solved in the first place.

 

Most database applications in this world are multi-user applications. This means that, at any given point in time, you can expect multiple persons and/or processes reading from and writing to a database. Given that multiple persons/processes are updating data in a database it is only a matter of time before two separate persons/processes are trying to update the same piece of data. Because a typical update cycle consists of read into memory, update in memory and write back to the database we will run into occasions where user 1 and 2 will both read the data into memory, user 1 will update the data and write those changes back to the database before user 2 does the same. Now we have a concurrency control conflict because user 1 read the data before user 2 wrote it back to the database. Why is this? Well simply because if user 1 writes his data back into the database he will overwrite the changes made by user 2 causing them to be lost.

 

This kind of database concurrency issue can occur both with humans or automated processes or a combination of the two. A concurrency issue is more likely to occur with human users as the read/update/write cycle is likely to take much longer. However that said the same can occur between automated processes and is harder to solve there because in the case of a human you can ask what the user wants and respond to that while a process needs to have all actions fully automated.

 

The current state

Let’s first take a look at what is generally said and done about solving database concurrency issues. Typically solving the problem is divided into two basic approaches:

  1. Pessimistic concurrency control.
  2. Optimistic concurrency control.

Below I will give a brief description of the two and the different options in handling them. This is just intended to clarify the problem so is in no way a complete coverage of concurrency handling and all of its aspects.

 

Pessimistic concurrency control

In the case of pessimistic concurrency control a collision is prevented because the user/process must take some action before being allowed to change the data. This action can be a number of things but typically involves locking the data in the database thereby preventing another user from holding the same lock.

 

Advantages:

  • Simple to implement.
    Because database servers support and enforce locking mechanisms it is very easy to implement. Because a user needs to place the lock before making any changes he or she is informed before a change is made that there is a conflict.
  • Very secure.
    Because the locking is implemented by the database server it is very reliable and you as a developer can be assured that nothing will be able to ignore it and change the data anyway.

Disadvantages:

  • Not very scalable.
    Locking data in a database require an open connection to a database. This means that every user must have at least one open connection to the database which means more resources and licenses. When using older database servers locking might also prevent other users from reading the data.
  • Prone to deadlocks.
    Suppose two users both want to change two pieces of data A and B. User 1 first locks A and user 2 first locks B. Now both want to lock the second part of the data but cannot because it is already locked by another user. Now we have a problem if both decide to wait until the data is available.
  • Locks can last a long time.
    If a user starts changing some data, he or she has a lock on the data until it is saved. If the user is distracted somehow or goes to a meeting without saving his changes the data remains locked and no one else can make any changes until the first changes have been committed.

 

Pessimistic locking can also be done using soft locks instead of real database locks. This means that a field is updated to indicate that a user is busy with the data and has it “locked”. This prevents the scalability problem as the connection doesn’t need to be kept open. However this approach has several drawbacks because it isn’t enforced by the database and another piece of code can decide to ignore the lock. Additionally the lock must be manually removed and if this isn’t done the data remains locked for ever.

 

Even though pessimistic locking has it’s place, it has a number of drawbacks and doesn’t combine very well with the disconnected nature of data in a .NET application.

 

Optimistic concurrency control

When using optimistic concurrency control the user doesn’t lock the data to prevent concurrency issues but detects and resolves them when writing to the database. Typically several different approaches are used with optimistic concurrency control:

  • No checking and last one in wins.
    Not really a concurrency control as the problem is ignored and the last user to update the data overwrites the changes by made by the first user. The SQL update command has only the primary key as the filter in the where clause. This kind of concurrency control is only suitable for single user applications.
  • Comparing the fields the user changed.
    In this case the update is done with a check if any of the data the user has changed is still the same as it was when the user read the data. The SQL update command has the primary key and changed fields with their old values and the where clause.
  • Comparing all fields.
    In this case the update is done with a check on all fields instead of just the ones being changed. Even though this seems like excessive it is actually better that just checking the changed fields because of the nature of the standard ADO.NET classes. The problem with the ADO.NET data classes is that they will execute an update command with all fields, not just the ones that were changed. This means that even though no one else has updated the fields you are changing someone might have changed another column. In this case the SQL update command has a where clause containing all fields in the table.
  • Comparing the row version.
    In this case the data has an extra row version, also known as a timestamp field which is changed very time the data is updated. In this case filtering is much simpler and faster than when comparing all fields as we only need to filter on the primary key and the row version in the SQL update. However care should be taken because these two do not result in the same behavior. If the first user sends an update that updates all data with its old version the data’s row version will still be updated. This means that another user, who has made a change, will have a conflict when checking the row version while he or she will not have a conflict when comparing all fields.

 

Back to the real world

So all of the above has been described in much more detail in numerous articles but these articles have one common flaw. This flaw is that they view and discuss the whole problem from a database perspective instead of a user’s perspective. What do I mean with that you might be wondering? Well every solution always focuses on the concurrency conflict and detection of a single whole row in a table. Often an update will span multiple rows, tables and sometimes even databases are involved. However when it comes to the way users see the data it often doesn’t map to the rows being updated.

 

The following examples are taken from an accounting package but the same applies to most, if not all, larger applications out there.

 

How does a user see data:

  1. Simple reference data.
    This kind of data usually maps to a single row. An example would be a list of VAT codes with the appropriate data.
  2. Complex data.
    This typically consists of data spanning more than one row or table. Examples might be an invoice or a purchase order.
  3. Regular application data.
    This kind of data is often stored in a single row in the database but the user actually considers it as a set of related but semi distinct sets of data with different owners. An example of this would be article data. While all data about an article might be stored in a single row the user might divide the data based on who “owns” that part of the data.
    1. Basic article data might consist of an SKU and a description used by everyone.
    2. Sales information. The row might include a sales price, VAT information used when invoicing and the number of units sold but not shipped. All this data is owned and updated by the sales department and read-only for others.
    3. Purchasing information. The row might include a preferred supplier, the maximum purchase price and the number on order but not yet received. All this data is owned and maintained by the purchasing department.
    4. Inventory. The row might include the number of items on stock. This is owned by the warehousing department who will also update the number of items to be shipped of received when an order is shipped or received.

 

Simple reference data

The standard way of handling data concurrency maps very well to the case of the simple reference data because a set of data the user sees maps exactly to a row in the database. This means that this base is well covered and needs very little extra attention. Unfortunately this is also the kind of data that doesn’t change very often so the chance of a concurrency issue occurring is very low.

 

Complex data

The case of the complex data is not handled very well at all. Consider the case where two users both open the same order, each change a different order line and save the order. Technically speaking there is no conflict as each change a different row in the database. However a user thinks about a single order and not about different rows in a database, so as far as he or she is concerned this is a multi user conflict with two people changing the same piece of data. To some extend this problem can be solved procedurally in an organization and some people might justly argue that the chance of the above scenario happening is very small. While that may or may not be true sooner or later the order will be transferred from the sales department to another department, maybe for approval or shipping. At this point two very different parts of the organization are involved, one updating the detail lines and another taking ownership of the order by updating the header record. Again technically speaking no conflict as two different tables are updated by different users but in the real world this is a very real multi user conflict.

 

To solve this we need to consider the separate rows in the database as one whole instead of individual items. One possible way of doing this is changing the way we store an order in the database and doing so in a single row. It is possible to do this using an XML field in SQL Server 2005 and storing all details as XML elements. While possible, and solving the concurrency issue, this approach also has a number of drawbacks, most notably the lack of relational integrity checks between the order details in the XML and the rest of the database which would need to be done in code and cannot be done by the database itself.

 

Another approach for solving the complex data concurrency issues would be to check a row version of the order header row when updating the detail rows. This would basically work as follows:

  1. Start a database transaction.
  2. Check the row version of the order header row.
  3. Update the order header row. This must be done, even if there is no actual change to the order header, to update the row version.
  4. Update, delete or insert all order detail rows.
  5. Commit the transaction.

 

The problem here is that the database cannot really help us to enforce this is done and there isn’t another process that just updates an order detail row without updating the order header row version as well. To solve this problem the order header row version can be updated using a trigger defined on the order detail table. This way you can always be sure that the header row version is updated but it does mean that the in memory order header row must be refreshed if any of the detail rows have been updated. Again this is not an ideal situation but at the very least very consistent and testable. You can test this by changing the same order twice, the second time directly after the first change was saved. If the second change can be saved than the row version in memory is the same as the row version in the database.

 

Regular application data

This case isn’t handled very well either. The main problem is that both the database and the .NET class for working with data are based around a complete row. Both the tools inside Visual Studio and classes like the SqlCommandBuilder are based around a SQL select command and will generate an SQL update command for every field in the SQL select. This is fine for simple reference data as a user typically updates the whole row but in this case that doesn’t make much sense.

 

Let’s take a look at a simple example to see the shortcomings. We store debtor information in a database. This information contains the address, the current balance and the credit limit for a debtor. The balance is automatically updated every time a debtor pays a bill or places a new order and cannot be updated directly. One person in the customer relation department corrects the address of the client while another, in the finance department updates the debtor’s credit limit. Technically this is a concurrency conflict as both users are trying to update the same row at the same time. However the customer relation department is owner of the address data while the finance department is owner of the credit limit so procedurally there is no conflict. Trying to solve the issue by giving each user only a subset of the data isn’t going to work as the finance department needs to know where a debtor lives in order to make an assessment of their credit worthiness. They are not going to update the address so read only access is good enough. The same is true the other way round, the customer relation department needs to be aware of the debtor’s credit limit but is never going to update it.

 

Even if both changes where made by people that have permission to update all debtor information it can hardly be called a concurrency conflict. Clearly a SQL update statement that simply includes all fields from a SQL select isn’t going to cut it here. One way of solving the problem is by updating each field as a separate entity. While this would avoid and solve the problem described above it is an oversimplification of the problem as certain fields are logically grouped together like the complete address. What we need instead is the concept of read only fields and to group the remaining fields into separate SQL updates. To do this we need a CommandBuilder like class that takes a data definition as input and produces a collection of SQL update command objects to update the data. In the update command collection there would be one command for the address and a second for the credit info, presuming the current user has the rights to update both. The code then goes through the collection and populates each command with parameters containing old and new values for each field being updated. If all the old and new values for a specific command match there is clearly no update in that part of the data and the command doesn’t need to execute. Once all update commands have been loaded they can be executed. In this case we do not want to wrap these as part of a single transaction. If all update commands execute successfully there was no conflict and the user can go on with their job. If a conflict occurs it is limited to a specific subset of the data and the other parts of the data can be updated without any delay. Because of the group wise nature of the updates using a SQL row version is impossible. Instead we need to include the complete list of fields being updated together with the primary key in the where clause of each command.

 

An example

The following code is a simple example of how an update could work. In this example I use the column ExtendedProperties to store the field grouping information. Columns with the same group name are bundled together into a single update command.

 

''' <summary>

''' Test application to check database updates.

''' </summary>

''' <remarks></remarks>

Sub Main()

Dim dta As New pubsDataSetTableAdapters.titlesTableAdapter

Dim table As pubsDataSet.titlesDataTable

Dim cb As New CommandBuilder

' Load the data

table = dta.GetData()

 

' Configure the table

cb.ConfigureDataTable(table)

 

' Make a change to the data

table.Item(0).price *= 1.1

'table.Item(0).title = "New title"

 

' Update the database

cb.UpdateTable(table)

 

Console.WriteLine( _

"Press any key to terminate the application.")

Console.ReadKey()

End Sub

Listing 1: The Main() function.

 

The main program uses a typed table adapter to load the titles table from the Pubs database.

 

''' <summary>

''' Configure the columns into update groups.

''' </summary>

''' <param name="table">The table with columns.</param>

''' <remarks></remarks>

Public Sub ConfigureDataTable( _

ByVal table As pubsDataSet.titlesDataTable)

' Basic data about the book

table.title_idColumn.ExtendedProperties("UpdateGroup") = "Book"

table.titleColumn.ExtendedProperties("UpdateGroup") = "Book"

table.typeColumn.ExtendedProperties("UpdateGroup") = "Book"

table.notesColumn.ExtendedProperties("UpdateGroup") = "Book"

table.pubdateColumn.ExtendedProperties("UpdateGroup") = "Book"

 

' Finacial data about the book

table.pub_idColumn.ExtendedProperties("UpdateGroup") _

= "Financial"

table.priceColumn.ExtendedProperties("UpdateGroup") _

= "Financial"

table.advanceColumn.ExtendedProperties("UpdateGroup") _

= "Financial"

table.royaltyColumn.ExtendedProperties("UpdateGroup") _

= "Financial"

 

' Sales information about the book

table.ytd_salesColumn.ExtendedProperties("UpdateGroup") _

= "Sales"

End Sub

Listing 2: The ConfigureDataTable() function.

 

In the ConfigureDataTable() function each column receives an UpdateGroup property. This is used during the update to determine which fields are grouped together. This information isn’t determined at update time because the same table might be used for multiple purposes in different business object with different field groupings. Additionally some fields might be read-only, something not taken into account in this sample code.

 

''' <summary>

''' Sends all updates to the database

''' </summary>

''' <param name="table">The table with changes,</param>

''' <returns></returns>

''' <remarks>Just demo code.

''' Cannot execute as there is no connection and Insert/Delete

''' is not implemented.

''' </remarks>

Public Function UpdateTable(ByVal table As DataTable) As Boolean

Dim updateCommands As List(Of SqlCommand)

' Get a list of update commands to execute

updateCommands = GetUpdateCommands(table)

 

For Each row As DataRow In _

table.GetChanges(DataRowState.Modified).Rows

Select Case row.RowState

Case DataRowState.Added

' New row, do an database insert

Case DataRowState.Deleted

' Deleted row, do a database delete

Case DataRowState.Modified

' Changed row, do the required database updates

For Each cmd As SqlCommand In updateCommands

Dim hasChanges As Boolean = False

 

For Each param As SqlParameter In _

cmd.Parameters

' Populate all parameters

Dim fieldName As String

fieldName = _

param.ParameterName.Substring(3)

 

If param.ParameterName.StartsWith("old") _

Then

param.Value = row(fieldName, _

DataRowVersion.Original)

Else

param.Value = row(fieldName, _

DataRowVersion.Current)

End If

 

' Check if this field is changed

hasChanges = hasChanges _

OrElse Not row(fieldName,

DataRowVersion.Original).Equals( _

row(fieldName, DataRowVersion.Current))

Next

 

If hasChanges Then

Console.ForegroundColor = _

ConsoleColor.Yellow

Console.WriteLine("Executing command:")

'cmd.ExecuteScalar()

Else

Console.ForegroundColor = ConsoleColor.Red

Console.WriteLine("Skiping command:")

End If

 

Console.WriteLine(cmd.CommandText)

Console.WriteLine()

Console.ResetColor()

Next

End Select

Next

End Function

Listing 3: The UpdateTable() function.

 

The UpdateTable() function first retrieves a collection of update commands. Note that for this example I have skipped the insert and delete commands as these are handled in the same way as normal.

 

 

''' <summary>

''' Build a collection of update commands for the table.

''' </summary>

''' <param name="table">The table that needs to be updated.</param>

''' <returns>A collection of SQLCommands for the update.</returns>

''' <remarks></remarks>

Private Function GetUpdateCommands(ByVal table As DataTable) _

As List(Of SqlClient.SqlCommand)

Dim groups As IDictionary(Of String, List(Of DataColumn))

Dim cmds As List(Of SqlClient.SqlCommand)

cmds = New List(Of SqlClient.SqlCommand)

Console.WriteLine("Building update commands.")

Console.WriteLine()

 

' Split all columns into groups based upon the

' UpdateGroup extended property.

groups = SplitColumnIntoGroups(table)

 

For Each group As List(Of DataColumn) In groups.Values

Dim cmd As SqlCommand

cmd = CreateUpdateCommand(table, group)

cmds.Add(cmd)

 

Console.WriteLine("Update command {0}:", cmds.Count)

Console.WriteLine(cmd.CommandText)

Console.WriteLine()

Next

 

Return cmds

End Function

Listing 4: The GetUpdateCommands() function.

 

The GetUpdateCommands() function loops through all the field groups and created a different SQL update command for each one. All separate commands are bundled together into a collection and returned.

 

''' <summary>

''' Split all columns into groups based upon the UpdateGroup

''' extended property.

''' </summary>

''' <param name="table">The table with columns to split.</param>

''' <returns>A dictionary with the groups of columns.</returns>

''' <remarks></remarks>

Private Function SplitColumnIntoGroups(ByVal table As DataTable) _

As Dictionary(Of String, List(Of DataColumn))

Dim groups As New Dictionary(Of String, List(Of DataColumn))

 

For Each col As Data.DataColumn In table.Columns

Dim updateGroup As String

If col.ExtendedProperties.Contains("UpdateGroup") Then

updateGroup = _

col.ExtendedProperties("UpdateGroup").ToString()

Else

updateGroup = ""

End If

 

If Not groups.ContainsKey(updateGroup) Then

groups.Add(updateGroup, New List(Of DataColumn))

End If

groups(updateGroup).Add(col)

Next

 

Return groups

End Function

Listing 5: The SplitColumnIntoGroups() function.

 

The SplitColumnIntoGroups() function takes all the columns in a table and splits them into separate update groups. This would be a good point to exclude read-only columns and possibly the primary key columns as these are typically not updatable.

 

''' <summary>

''' Create a SqlCommand to update the field group.

''' </summary>

''' <param name="table">The table being updated.</param>

''' <param name="group">The field group.</param>

''' <returns>The SqlCommand to update the table.</returns>

''' <remarks></remarks>

Private Function CreateUpdateCommand(ByVal table As DataTable, _

ByVal group As IEnumerable(Of DataColumn)) As SqlCommand

' Build an update command for the group of columns

Dim cmd As New Data.SqlClient.SqlCommand

Dim sqlSet As New System.Text.StringBuilder()

Dim sqlWhere As New System.Text.StringBuilder()

 

For Each col As DataColumn In table.PrimaryKey

If sqlWhere.Length > 0 Then

sqlWhere.Append(" and ")

End If

sqlWhere.Append("([")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append("] = @org")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append(" or [")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append("] = @new")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append(")")

 

cmd.Parameters.AddWithValue("old" + _

col.ColumnName, col.DataType)

cmd.Parameters.AddWithValue("new" + _

col.ColumnName, col.DataType)

Next

 

For Each col As DataColumn In group

If sqlSet.Length > 0 Then

sqlSet.Append(", ")

End If

sqlSet.Append("[")

sqlSet.Append(col.ColumnName)

sqlSet.Append("] = @new")

sqlSet.Append(col.ColumnName)

 

If sqlWhere.Length > 0 Then

sqlWhere.Append(" and ")

End If

 

sqlWhere.Append("([")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append("] = @org")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append(" or [")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append("] = @new")

sqlWhere.Append(col.ColumnName)

sqlWhere.Append(")")

 

If Not cmd.Parameters.Contains("old" + col.ColumnName) Then

cmd.Parameters.AddWithValue("old" + col.ColumnName, _

col.DataType)

End If

 

If Not cmd.Parameters.Contains("new" + col.ColumnName) Then

cmd.Parameters.AddWithValue("new" + col.ColumnName, _

col.DataType)

End If

Dim commandText As String

commandText = "Update [{0}] Set {1} Where ({2})"

cmd.CommandText = String.Format(commandText, _

table.TableName, sqlSet.ToString(), sqlWhere.ToString())

Next

 

Return cmd

End Function

Listing 6: The CreateUpdateCommand() function.

 

The CreateUpdateCommand() function creates a single SQL update command for a group of fields. The SQL Where clause consists of the primary key of the table as well as the fields being updated. Each field is compared both against the old value as well as its new value. This is done in order to avoid seeing the same change by two users as a conflict where it really doesn’t matter that a field is overwritten with the same value.

 

Conclusion

While the technique described above isn’t a complete solution for all update concurrency issues I believe it is a step in the right direction. It is a work in progress and I feel that the best way to handle the separate cases is still to be discovered. In the mean time the solution above is a great help in creating applications that are friendlier and less technology oriented in their nature.

 

 

Maurice de Beijer is an independent software developer, beta tester and a recipient of the MVP award. He specializes in .NET, object orientation, Visual FoxPro and solving technically challenging problems. Maurice is The Problem Solver and can be reached at mauricedb@computer.org or at www.TheProblemSolver.nl.

 

 

 

Send mail to webmaster@TheProblemSolver.nl with questions or comments about this web site.
Copyright © 1995 - 2007 ABL - The Problem Solver
Last modified: 10/21/06
WF
RSS 2.0