(Ab)using the MongoDB Aggregation Framework for Symbolic Computation

| No Comments | No TrackBacks

About once a quarter, MongoDB conducts a two-day Skunkworks session. Half-hackathon, half-incubator, the idea is to take a break from our day-to-day tasks and work feverishly for 48 hours on any project we want. Some projects are undertaken to scratch a particular itch and make our core products better. Others are for use inside the company to make our lives a little easier or more amusing. Still others are just for the hell of it. This quarter, my teammate Andrew Emil and I decided to work on something in the last category.

The aggregation framework is my favorite MongoDB feature, and a few months ago I had read with elation John Page's blog post about calculating square roots within the aggregation framework. I was surprised the learn that the MongoDB agg pipeline does not have a $sqrt operator, despite having a complete collection of basic arithmetic primitives. John's solution was to use the arithmetic primitives to calculate square roots by Newton's method. Not the most efficient way to get a square root (and no sane person would use this technique in a real application) but the solution made me grin.

I wondered how far MongoDB's arithmetic operators could take us. Could you do symbolic computation, like equation-solving or integration, using just the aggregation pipeline?

The first experiment I focused on was complex numbers. The arithmetic functions in MongoDB really only work on (real) floats, and MongoDB has no notion of the imaginary unit i. But you don't really need i just to do complex arithmetic; a complex number is just a pair of reals with extended algorithms for arithmetic. So I started to think about documents that look like this:

> db.complex.findOne()
{
	"_id" : ObjectId("53bf040a66f35625b75d3eb3"),
	"a" : {
		"Complex" : {
			"re" : 2,
			"im" : 3
		}
	},
	"b" : {
		"Complex" : {
			"re" : 4,
			"im" : 5
		}
	}
}

There's a simple formula for adding two complex numbers:

(a + bi) + (c + di) = (a + c) + (b + d)i

In other words, just add the real parts and add the imaginary parts. Given a collection full of documents with pairs of complex numbers, we can add all of them with the aggregation framework easily.

First, I set up a pair of helper functions for generating the pipeline code to extract the real and imaginary parts from the document representation of a complex number.

function Re( c ) {
    return "$" + c + ".Complex.re";
}

function Im( c ) {
    return "$" + c + ".Complex.im";
}

Then it's just a matter of doing the arithmetic on each part.

function Add() {
    return [
        { "$project": {
            _id: 0,
            a : 1,
            b : 1,
            sum: {
                Complex: {
                    re: {
                        "$add": [
                            Re( "a" ),
                            Re( "b" )
                        ]
                    },
                    im: {
                        "$add": [
                            Im( "a" ),
                            Im( "b" )
                        ]
                    }
                }
            }
        } }
    ];
}

> db.complex.aggregate( Add() )
{
	"result" : [
		{
			"a" : {
				"Complex" : {
					"re" : 2,
					"im" : 3
				}
			},
			"b" : {
				"Complex" : {
					"re" : 4,
					"im" : 5
				}
			},
			"sum" : {
				"Complex" : {
					"re" : 6,
					"im" : 8
				}
			}
		}
	],
	"ok" : 1

The Add() function just returns an array of pipeline operators. The $project stage passes the original numbers a and b through, and adds a new embedded document, called sum, containing the sum of the two parts. Complex subtraction is the same idea. Multiplication and division are a little more work, but follow from the same principals. Here's the Multiply function:

function Multiply() {
    return [
        { "$project": {
            _id: 0,
            a : 1,
            b : 1,
            product: {
                Complex: {
                    re: {
                        "$subtract": [
                            {
                                "$multiply": [
                                    Re( "a" ),
                                    Re( "b" )
                                ]
                            },
                            {
                                "$multiply": [
                                    Im( "a" ),
                                    Im( "b" ),
                                ]
                            }
                        ]
                    },
                    im: {
                        "$add": [
                            {
                                "$multiply": [
                                    Re( "a" ),
                                    Im( "b" )
                                ]
                            },
                            {
                                "$multiply": [
                                    Im( "a" ),
                                    Re( "b" )
                                ]
                            }
                        ]
                    }
                }
            }
        } }
    ];
}

The Multiply function again just returns an array of arithmetic pipeline operators. We're doing complex math purely in the aggregation framework!

Equations

Complex arithmetic is fun and all, but what about real symbolic math, like equation solving? Let's start with the simplest example, a linear equation of one variable. All such equations can be written in the standard form:

ax + b = 0

The variable itself doesn't really matter. All we care about are the coefficients a and b. The general solution for this equation is:

x = -b / a

(You could call this "the linear formula.") Implementing it with aggregation operators is actually pretty easy. Suppose we had a bunch of documents that look like this:

> db.linear.findOne()
{
	"_id" : ObjectId("53c011ca66f35625b75d3eb5"),
	"linear" : {
		"a" : 2,
		"b" : -6
	}
}

To find the solution to any linear equation in this form, we just apply the linear formula with our arithmetic operators.

function Linear() {
    return [
        { "$project": {
            _id: 0,
            a: "$linear.a",
            b: "$linear.b",
            min_b: {
                "$subtract": [ 0, "$linear.b" ]
            }
        } },
        { "$project": {
            x: {
                "$divide" : [ "$min_b", "$a" ]
            }
        } }
    ];
}

The first $project stage calculates -b by subtracting it from zero. The second stage divides it by the value of a, and saves the result in a field called x, and that's our answer. The result:

> db.linear.aggregate( Linear() )
{ "result" : [ { "x" : 3 } ], "ok" : 1 }

I've broken the steps up into two pipeline stages for clarity, but you could easily do all the arithmetic in a single stage.

Quadratic equations in the form

ax2 + bx + c = 0

work the same way. We only need to do arithmetic on the coefficients. We can represent quadratic equations as documents like this:

> db.quadratic.findOne()
{
	"_id" : ObjectId("53c0188a66f35625b75d3eb6"),
	"quadratic" : {
		"a" : 3,
		"b" : 2,
		"c" : -33
	}
}

To solve them, we need only apply the trusty quadratic formula. But that requires calculating square roots. I wrote a function that modifies John Page's Newton's Method implementation slightly so I could stick it in the middle of a quadratic equation aggregation pipeline.

function Newton() {
    var init = { $project : { n: 1, min_b: 1, two_a: 1, r : { $literal : 1}}};
    var refine = { $project : { n: 1, min_b: 1, two_a: 1, r : { $divide : [ {$add : [{$divide : [ "$n","$r"]} , "$r" ]},2]}}};
    return [
        init, refine, refine, refine, refine, refine, refine, refine, refine
    ]
}

The function returns an array of nine pipeline operators. The first is the init stage, which sets up an initial guess of one and propagates a few values from previous stages. Then the refine stage is simply repeated eight times. (That seems to be enough.)

The function to assemble the complete quadratic formula pipeline looks like this:

function Quadratic() {
    var pipe = [
        { $project : {
            _id: 0,
            a: "$quadratic.a",
            b: "$quadratic.b",
            c: "$quadratic.c",
            min_b: { "$subtract" : [ 0, "$quadratic.b" ] },
            two_a: { "$multiply" : [ 2, "$quadratic.a" ] },
            four_a_c: { "$multiply" : [ 4, "$quadratic.a", "$quadratic.c" ] },
            b_sq: { "$multiply" : [ "$quadratic.b", "$quadratic.b" ] }
        } },
        { $project: {
            min_b: 1,
            two_a: 1,
            n: { "$subtract": [ "$b_sq", "$four_a_c" ] }
        } },
    ];

    pipe = pipe.concat(Newton());

    rest = [
        { $project: {
            x1: { "$divide": [ { "$add": [ "$min_b", "$r" ] }, "$two_a" ] },
            x2: { "$divide": [ { "$subtract": [ "$min_b", "$r" ] }, "$two_a" ] }
        } }
    ];

    pipe = pipe.concat( rest );

    return pipe;
}

And here is the result for our sample document:

> db.quadratic.aggregate( Quadratic() )
{
	"result" : [
		{
			"x1" : 3.000000000049738,
			"x2" : -3.6666666667164045
		}
	],
	"ok" : 1
}

x1 and x2 are the answers, with a little floating-point fuzz thanks to Newton.

I had a lot of fun figuring out how to execute these algorithms in an environment that isn't really designed for it. Most of the work came down to discovering a convenient document-based representation for the mathematical object in question and then applying the right arithmetic operators in the right order. I'm happy to report that Andrew and I won the trophy for Best (Ab)use of MongoDB.

We also worked on some pipelines for doing differentiation and integration of simple polynomials, but I'll let Andrew write about that since he did most of that work.

Of course, the equation-solving algorithms above only work for real coefficients. Can they be combined with the complex arithmetic operators to solve the full universe of linear and quadratic equations? I'll leave that as an exercise for you.

No TrackBacks

TrackBack URL: http://friedo.com/cgi-bin/mt/mt-tb.cgi/26

Leave a comment

About this Entry

This page contains a single entry by Mike Friedman published on July 19, 2014 11:51 PM.

Automating Stratopan Releases with Perl's Dist::Zilla was the previous entry in this blog.

Exploring Perl 6: Up and Running is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

  • Exploring Perl 6

Pages

Powered by Movable Type 5.14-en