Andre's Blog

Personal blog of Andre Perusse

Deleting Large Quantities of Data Using an Exponential Batch Reduction Algorithm

Recently, I was involved with a project that collected a vast quantity of data on a daily basis and stored it in a SQL Server database. That data was maintained in the database for an extended period of time after which, on a regular schedule, it was purged as being old data and no longer relevant.

The problem here is that SQL DELETE operations are rather expensive tasks and the amount of data being deleted was so large that the purging process took far too long to complete, resulting in database command timeouts and unacceptably long table locks. To fix the problem, I borrowed a technique described to me by a colleague whereby the maximum number of rows to delete in any single transaction could be specified, and if timeouts still occurred that number could be automatically reduced, or throttled, until the transaction succeeded.

The following code blocks demonstrate a C# method calling a stored procedure to perform this task. The stored procedure accepts a parameter which is the maximum number of rows to delete and returns a value indicating the number of rows that were actually deleted. This way, the C# function knows if more rows still need to be deleted and the proc is called iteratively (with a sleep interval to allow the database to perform other tasks) until no more rows need to be deleted.

Also note that the C# method checks for a SQL Timeout error on each command execution. If it detects a timeout, it automatically reduces the maximum number of rows to delete by half and tries again. It keeps doing this until it reaches a pre-defined lower threshold at which point it throws up its hands in failure.

Here is the SQL procedure that gets called:

CREATE PROCEDURE dbo.MyTable_Delete
    @BatchSize int,
    @RowsAffected int output
AS
DELETE TOP (@BatchSize)
FROM dbo.MyTable
SET @RowsAffected = @@ROWCOUNT

And here is the C# method that calls the proc. You have to be careful that each call to the proc executes in its own transaction. If all the iterative delete operations are wrapped in the same transaction you'll have gained nothing and perhaps even made the problem worse.

public void Delete_OldData(int batchSize, int minimumBatchSize, int transactionDelay)
{
    // If timeouts occur, we'll reduce the batch size.
    int workingBatchSize = batchSize;
    using (DbConnection dbConnection = CreateConnection())
    {
        dbConnection.Open();
        using (DbCommand dbCommand = dbConnection.CreateCommand())
        {
            try
            {
                int rowsAffected = 1;
                // We use a batch size to limit the number of deleted rows to prevent command
                // time-outs and avoid excessive table locking.
                while (rowsAffected > 0)
                {
                        dbCommand.CommandText = "[dbo].[MyTable_Delete]";
                        dbCommand.CommandType = CommandType.StoredProcedure;
                        dbCommand.Parameters.Clear();
                        dbCommand.Parameters.Add(new SqlParameter("@BatchSize", workingBatchSize));
                        SqlParameter rowsAffectedParam = new SqlParameter("@RowsAffected", SqlDbType.Int);
                        rowsAffectedParam.Direction = ParameterDirection.Output;
                        dbCommand.Parameters.Add(rowsAffectedParam);
                        try
                        {
                            dbCommand.ExecuteNonQuery();
                            rowsAffected = (int)rowsAffectedParam.Value;
                            // Restore batch size
                            workingBatchSize = batchSize;
                        }
                        catch (SqlException ex)
                        {
                            if (ex.Number == -2)    // -2 means Client Timeout
                            {
                                if (workingBatchSize <= minimumBatchSize)
                                {
                                        throw new Exception(string.Format("Proc MyTable_Delete experienced a timeout exception. Batch size is already at minimum {0}. Unable to throttle further.", workingBatchSize));
                                }
                                // Reduce batch size and try again.
                                workingBatchSize = Math.Max((int)(workingBatchSize / 2), minimumBatchSize);
                                rowsAffected = 1;
                            }
                        }
                        // Allow database to perform other work.
                        Thread.Sleep(transactionDelay);
                    }
                }
                catch (Exception ex)
                {
                        // Log error and throw.
                }
            }
        }
    }
}

Loading