How map/reduce works in CouchDB
I have huge trouble how CouchDBs system of views actually works.
By experimenting and reading the source I came up with thisdescription in pseudo Python:
def mapstep(alldata):
# the map is applied to every document
# and the result is collected in two lists of rows
k_rows = []
v_rows = []
for _id, doc in alldata:
k, v = mapfun(doc) # actually mapfunc uses emit() not return()
k_rows.append([k, _id])
v_rows.append(v)
return k_rows, v_rows
def reducestep(keys, values):
# now several reduce steps follow. For this example
# we randomly chose two
# all even elements
tmp1 = reducefun(k_rows[::2], v_rows u[::2], False)
# all uneven elements
tmp2 = reducefun(k_rows[1::2], v_rows u[1::2], False)
# finally several rereduce steps follow.
# For this example we use only one.
return reducefun(None, [tmp1, tmp2], True)
result = reducestep(mapstep(alldocs()))
If you call the view with group=true the map step stays the same, but the server applies grouping and calls the reduce step for each group. It looks like this:
def reduce_with_grouping(keys, values):
gdict = {}
# create dictionary mapping values to keys
for k, v in zip(keys, values):
gdict.setdefault(k, []).append(v)
ret = []
for k, values in gdict.items():
ret.append([k, reducestep(k*len(values), values])
return ret
result = reduce_with_grouping(mapstep(alldocs()))
If you experiment with views keep in mind that the the Futon Web-Client silently adds group=true to your views and that group=true is ignored if you don’t provide a reduce function.
Have a look at http://svn.apache.org/viewvc/couchdb/trunk/share/server/main.js?view=markup It implements the receiving end of the map, and reduce functions and might give you some more insight.