Skip navigation

Reduce In Haskell

To implement ‘reduce’ (as discussed by Google) in a purely functional way really challenges my instincts. After the map step, we have a list of (Key, Val) 2-tuples that we would like to reduce to a list of (Key, [Val]) 2-tuples — we want all the values associated with a single key to be in one tuple. In Ruby, using for, it’s pretty easy to do this:

  map_o_answers = get_it_somehow
  hash ={|| }
  for key, val in map_o_answers
    # there are probably smarter ways to do this
    hash[key] = hash[key].push(val)

We rely on the fact that hashes are indexed by keys to ensure that all the values for each key get placed in the same bucket. However, in a purely functional language, trying to set a component of a data structure is a big no-no. We could leverage monads to get something like the above example — but that’s contrary to the spirit of the problem. If we sort on the tags, it’s not so difficult:

  viaSort :: (Ord k) => [(k, v)] -> [(k, [v])]
  viaSort stuff = soFar (sortBy cmp stuff) []
    soFar [] result = reverse result  --  reverse when it's shorter
    soFar ((key, val):rest) [] = soFar rest [(key, [val])]
    soFar ((key, val):rest) result@((otherKey, list):prior)
     |  key == otherKey   =  soFar rest ( (key, val : list) : prior )
     |  otherwise          =  soFar rest ( (key, [val]) : result )

  cmp :: (Ord k) => (k, v) -> (k, v) -> Ordering
  cmp (a, x) (b, y) = compare a b

In a nod to parallel computation, we can write a function to merge the results of two reduces — essentially we treat the results like arguments to viaSort, sorting them and then merging.

  --  the type signature makes all the difference
  mergeResults :: (Ord k) => [(k, [v])] -> [(k, [v])]
  mergeResults list = soFar (sortBy cmp list) []
    soFar [] result = reverse result --  reverse when it's shorter
    soFar ((key, list):rest) [] = soFar rest [(key, list)] --  kickstart
    soFar ((key, list):rest) result@((resKey, resList):resRest)
     |  key == resKey      =  soFar rest ( (key, list ++ resList) : resRest )
     |  otherwise          =  soFar rest ( (key, list) : result )

It follows that we can do the reduce entirely with our merge function, by repacking the input before we pass it through.

  --  repacks the tuple so it matches the signature of mergeResults
  repack list = [ (key, [val]) | (key, val)  [(k, v)] -> [(k, [v])]
  viaMerge stuff = mergeResults $ repack stuff

I’ve got this all in my public repo as literate Haskell. I would not be surprised to find out that there are more efficient ways to approach this problem.


Post a Comment

You must be logged in to post a comment.
%d bloggers like this: