Sharding Pattern

 

Comments

 

Divide a data store into a set of horizontal partitions or shards. This pattern can improve scalability when storing and accessing large volumes of data.

 

Code

 

The following example uses a set of SQL Server databases acting as shards. Each database holds a subset of the data used by an application. The application retrieves data that is distributed across the shards by using its own sharding logic (this is an example of a fan-out query).

 

The details of the data that is located in each shard is returned by a method called GetShards. This method returns an enumerable list of ShardInformation objects, where the ShardInformation type contains an identifier for each shard and the SQL Server connection string that an application should use to connect to the shard (the connection strings are not shown in the code example).

 

private IEnumerable GetShards()
{
  // This retrieves the connection information from a shard store 
  // (commonly a root database).
  return new[]
  {
    new ShardInformation
    {
      Id = 1,
      ConnectionString = ...
    },
    new ShardInformation
    {
      Id = 2,
      ConnectionString = ...
    }
  };
}

The code below shows how the application uses the list of ShardInformation objects to perform a query that fetches data from each shard in parallel.

 

The details of the query are not shown, but in this example the data that is retrieved comprises a string which could hold information such as the name of a customer if the shards contain the details of customers.

 

The results are aggregated into a ConcurrentBag collection for processing by the application.

 

// Retrieve the shards as a ShardInformation[] instance. 
var shards = GetShards();

var results = new ConcurrentBag();

// Execute the query against each shard in the shard list.
// This list would typically be retrieved from configuration 
// or from a root/master shard store.
Parallel.ForEach(shards, shard =>
{
  // NOTE: Transient fault handling is not included, 
  // but should be incorporated when used in a real world application.
  using (var con = new SqlConnection(shard.ConnectionString))
  {
    con.Open();
    var cmd = new SqlCommand("SELECT ... FROM ...", con);

    Trace.TraceInformation("Executing command against shard: {0}", shard.Id);
                    
    var reader = cmd.ExecuteReader();
    // Read the results in to a thread-safe data structure.
    while (reader.Read())
    {
      results.Add(reader.GetString(0));
    }
  }
});

Trace.TraceInformation("Fanout query complete - Record Count: {0}", 
                        results.Count);