To implement ‘reduce’ (as discussed by Google) in a purely functional way really challenges my instincts. After the map step, we have a list of `(Key, Val)`

2-tuples that we would like to reduce to a list of `(Key, [Val])`

2-tuples — we want all the values associated with a single key to be in one tuple. In Ruby, using `for`

, it’s pretty easy to do this:

map_o_answers = get_it_somehow
hash = Hash.new{|| Array.new }
for key, val in map_o_answers
# there are probably smarter ways to do this
hash[key] = hash[key].push(val)
end

We rely on the fact that hashes are indexed by keys to ensure that all the values for each key get placed in the same bucket. However, in a purely functional language, trying to set a component of a data structure is a big no-no. We could leverage monads to get something like the above example — but that’s contrary to the spirit of the problem. If we sort on the tags, it’s not so difficult:

viaSort :: (Ord k) => [(k, v)] -> [(k, [v])]
viaSort stuff = soFar (sortBy cmp stuff) []
where
soFar [] result = reverse result -- reverse when it's shorter
soFar ((key, val):rest) [] = soFar rest [(key, [val])]
soFar ((key, val):rest) result@((otherKey, list):prior)
| key == otherKey = soFar rest ( (key, val : list) : prior )
| otherwise = soFar rest ( (key, [val]) : result )
cmp :: (Ord k) => (k, v) -> (k, v) -> Ordering
cmp (a, x) (b, y) = compare a b

In a nod to parallel computation, we can write a function to merge the results of two reduces — essentially we treat the results like arguments to `viaSort`

, sorting them and then merging.

-- the type signature makes all the difference
mergeResults :: (Ord k) => [(k, [v])] -> [(k, [v])]
mergeResults list = soFar (sortBy cmp list) []
where
soFar [] result = reverse result -- reverse when it's shorter
soFar ((key, list):rest) [] = soFar rest [(key, list)] -- kickstart
soFar ((key, list):rest) result@((resKey, resList):resRest)
| key == resKey = soFar rest ( (key, list ++ resList) : resRest )
| otherwise = soFar rest ( (key, list) : result )

It follows that we can do the reduce entirely with our merge function, by repacking the input before we pass it through.

-- repacks the tuple so it matches the signature of mergeResults
repack list = [ (key, [val]) | (key, val) [(k, v)] -> [(k, [v])]
viaMerge stuff = mergeResults $ repack stuff

I’ve got this all in my public repo as literate Haskell. I would not be surprised to find out that there are more efficient ways to approach this problem.

### Like this:

Like Loading...

*Related*