Structured Data and the Road to Obsolescence

| 1 Comment | No TrackBacks

Starting with MongoDB.pm 0.46.3, MongoDB is no longer supported on Perl 5.8 and below. (If you must use 5.8, you can still use old versions of the driver, but you won't get any cool new features.) The first release of Perl 5.8 turned ten years old in July, so the fact that more and more of CPAN lists Perl 5.10 (or even 5.12) as a dependency should come as no shock. The current release of Perl is 5.16.

One of my many rules of software engineering, born of more than a decade seeing things done the Wrong Way, is that serialization must occur only at the extreme edges of your program. At all other points you should, if possible, deal only with structured data. The lack of it in one crucial area of the Perl MongoDB driver is what made support for Perl 5.8 no longer possible.

Eight track
The term structured data is somewhat vague. I use it here to refer to a wide variety of potential structures: objects, hashes, arrays, lists, syntax trees, nested structures, etc. Essentially, anything other than a raw string of bytes. Even Unicode character strings have some structure to them (and some interesting introspective capabilities) but that's a topic for another post.

If you're familiar with the Model-View-Controller design pattern formalized for Smalltalk and now used in many web application frameworks, then you understand the vital importance of separation of concerns. The code that turns your structured data (from a database, perhaps) into something that the user can consume is separate from your application logic, so that additional or replacement frontends may be implemented without ripping the guts out of the whole application. Your Controller code works upon data with structure. Your View code spits out bytes.

MongoDB supports storing regular expressions as a native type within BSON documents. In Perl, these values get inflated into Perl's internal Regexp objects. What about the other direction?

In order to serialize Perl regexes for storage in MongoDB, we need two pieces of information: the regular expression pattern itself, and any modifier flags which may be present. The C code in the MongoDB driver originally did it like this:

static void serialize_regex_flags(buffer *buf, SV *sv) {
  char flags[] = {0,0,0,0,0,0};
  unsigned int i = 0, f = 0;
  STRLEN string_length;
  char *string = SvPV(sv, string_length);
  ...
}

The important bit is SvPV which constructs a new scalar value from the thing pointed to by its first argument. If that thing is not already a string, then it's stringified by applying the equivalent of quote-operator overloading. The stringified regex is then parsed byte-by-byte to separate the pattern from the list of flags. You can achieve the same kind of stringification by running this:

# perl -E '$re = qr/foo/i; say $re'

That outputs a stringified regex object that looks like (?^i:foo). The problem: Perl changed the manner in which regexes are stringified. Prior to Perl 5.14, the same pattern looks like (?i-xsm:foo). Even though it's the same pattern with the same options — it's structure is identical — its string representation is not. And as a result, MongoDB.pm's regex serialization logic has been subtly broken since Perl 5.14 came around.

Pay phones ID
To fix this (and other problems) I wanted to find a way to extract regex data in a more structured way. The solution was to use built-in re::regexp_pattern API, which is capable of returning the actual regular expression pattern and its list of flags separately. regexp_pattern will output the raw pattern itself, foo, regardless of how stringification semantics may differ between Perl versions.

Working with structured data greatly simplified the driver code for regex serialization, and significantly eased error detection. Sadly, the regexp_pattern API is not available in production Perl releases prior to 5.10.

I've already received a couple inquiries asking about why I dropped support for Perl 5.8, and I hope that the above lengthy missive makes the reasoning clear. It's not just about the availability of an API, but eliminating significant technical debt that would otherwise hinder the future development of the MongoDB Perl driver.

My thanks to ikegami for helping me figure out how to do this the right way.

No TrackBacks

TrackBack URL: http://friedo.com/cgi-bin/mt/mt-tb.cgi/18

1 Comment

i think more CPAN authors should be willing to dump old perls. alot of stuff has been delayed due to dependence on 5.8 and even 5.6!

Leave a comment

About this Entry

This page contains a single entry by Mike Friedman published on October 10, 2012 4:44 PM.

Toward a Unified libbson was the previous entry in this blog.

Building Your First App with MongoDB and Perl at MongoDB Boston is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Powered by Movable Type 5.14-en