MapReduce using Dapper


Posted by nick on February 04, 2020

Whilst working at Patient.Info, I was tasked with upgrading the forums to be more performant; The Forums hose thousands of page, hundreds of categories, and help people from all over the globe to overcome medical issues - and some just like to chat. It's a real community.

Anyways; from a performance perspective, the forums pages were loading on an average of 4-5seconds, SQL was sat at around 30% utilization, going to 60% with some load, and then occasionally it would fall over... not a good place to be.

To tackle this; I took a look at how Orchard Core was using YesSql. YesSql turns data in to "Json Documents" and stores everything in one column, it then turns the queryable data into a highly optimized index table. SQL is very fast when the data is indexed, and that index is not defragmented. Each table within YesSql is a clustered index.

This was great - let's do that!

But just using YesSQL was not possible due to time constraints; I needed a way to simulate this.. to do this, I created a map-reduce pattern in C# using Views.

To start with, you need a query to get you what you want... in this scenario, a discussion. I use a TVP to make the query slightly faster when I have multiple records;

var itemDataTable = new DataTable();
itemDataTable.Columns.Add("Id", typeof(long));

foreach (var queryId in queryIds) {

var results = _connection.Query<DiscussionSummary,

    FROM Discussions d
        INNER JOIN @discussionIdsTVP dtvp ON dtvp.Id = d.Id;",
    (d) => {
        d.SluggedTitle = d.Title.Slug();

        return d;
    param: new { discussionIdsTVP = itemDataTable.AsTableValuedParameter("[dbo].[IdTVP]") });

return results
    .ToDictionary(k => queryKeyValues.First(x => x.Value == k.Id).Key, v => v);

So... I can now get a discussion by an Id; the next step is knowing what I want!? This is where the filter comes in, the reduce.

I want to get all Discussions archived since a specific date;

public IEnumerable<long> GetAllArchivedSince(DateTime lastUpdated) {
    return _connection.Query<long>(@"
            FROM Discussions d
            WHERE d.Archived = 1
              AND d.LastUpdated >= @lastUpdated
        new { lastUpdated });

Here you would place an index on the Archived and LastUpdated records.

So... you have your MapReduce pattern using Dapper; you reduce your records and get your map. Next, add caching :)