A new release for “PyMongo”:http://api.mongodb.org/python has been long overdue – the last release (1.6) was made on May 11th, over two months ago! It’s been a busy couple of months with travel and “writing”:http://oreilly.com/catalog/9781449389536/ (and life), so a release hasn’t been on the top of the list. I finally made the effort to get a release out today, though, and the result is PyMongo 1.7. For a full list of changes, check the “changelog”:http://api.mongodb.org/python/1.7/changelog.html; in this post I’ll talk about a few of them in detail.

Using a dict to Specify Fields to Return

One of the smallest changes in 1.7 is that the fields argument to find/find\_one can now take a dict in addition to a list. So, the following two find\_ones are equivalent:

>>> db.test.save({"x": 1, "y": 2, "z": [1, 2, 3, 4]})
ObjectId('...')
>>> db.test.find_one(fields=["y"])
{u'y': 2, u'_id': ObjectId('...')}
>>> db.test.find_one(fields={"y": 1})
{u'y': 2, u'_id': ObjectId('...')}

The second call should look familiar to those who have used the MongoDB shell extensively, while the first is a little less verbose (and has always been the PyMongo way of doing things). Since PyMongo was first implemented, however, there have been some new features added to the server that are best exposed through the dict interface, like specifying keys that we don’t want returned:

>>> db.test.find_one(fields={"y": 0})
{u'x': 1, u'_id': ObjectId('...'), u'z': [1, 2, 3, 4]}
>>> db.test.find_one(fields={"z": 0})
{u'y': 2, u'x': 1, u'_id': ObjectId('...')}

Or using the new $slice operator to only return portions of an array:

>>> db.test.find_one(fields={"z": {"$slice": 2}})
{u'y': 2, u'x': 1, u'_id': ObjectId('...'), u'z': [1, 2]}
>>> db.test.find_one(fields={"z": {"$slice": -2}})
{u'y': 2, u'x': 1, u'_id': ObjectId('...'), u'z': [3, 4]}
>>> db.test.find_one(fields={"z": {"$slice": [1, 2]}})
{u'y': 2, u'x': 1, u'_id': ObjectId('...'), u'z': [2, 3]}

datetime Handling

PyMongo 1.7 also improves support for working with datetime instances. The first change is that timezone aware datetimes are now properly encoded, by converting them to UTC before saving them. Naive datetimes will still be assumed to be in UTC.

Since BSON is not currently capable of storing datetimes with timezone information, all datetimes will still decode as UTC. Currently the datetime instances returned will be naive, but that will probably change in the future as well.

A final change in 1.7 is that the PyMongo C extension now uses the y2038 project’s implementation of time.h. This allows the C extension to properly support dates beyond January 19, 2038. There might still be issues for users on 32-bit platforms using the pure-Python encoder, at least for older versions of Python.

max_scan

Version 1.7 adds support for the server’s new max\_scan functionality. max_scan allows a developer to set the maximum number of documents to scan for each individual query. This functionality can be useful when a developer wants guaranteed performance, even if it means a partial result set.

max\_scan requires a MongoDB server version = 1.5.1.

max_scan can be passed as an argument to find/find\_one, or can be added to a Cursor using chaining; the following two queries are equivalent:

db.test.find({"a": 1}, max_scan=50)

db.test.find({"a": 1}).max_scan(50)

Custom Classes for Returned Documents

The BSON decoder in PyMongo normally decodes all documents as dicts. Version 1.7 adds the ability to specify a custom class to decode to. All the class needs is to have a *setitem* method, which will be called for each key/value pair in the BSON being decoded.

This ability is exposed through the as_class argument to find/find\_one, and a new default class can be set using a Connection’s document_class attribute. The two queries below both decode to SON, so that the resultant documents maintain key order:

from pymongo.son import SON

db.test.find_one(as_class=SON)

connection.document_class=SON
db.test.find_one()

There are a couple of cool things that fall out of this functionality. First, third party libraries like MongoKit should have an easier time re-hydrating their models, and will probably be a little more efficient. Second, the class used doesn’t even need to represent a document: it can just treat *setitem* like a hook, allowing custom code to run on each key/value pair. Here, we create a custom class that just prints key/value pairs to stdout:

>>> class DocPrinter(object):
...     def __setitem__(self, key, value):
...         print "(%r, %r)" % (key, value)
... 
>>> db.test.save({"x": 1})
ObjectId('4c1a4ee9e6fb1b5adb000001')
>>> db.test.find_one(as_class=DocPrinter)
(u'_id', ObjectId('4c1a4ee9e6fb1b5adb000001'))
(u'x', 1)
<__main__.DocPrinter object at 0x1012d7bd0>

This example is a bit pointless, but I’m sure readers can think of much cooler things to do with it!

There’s a whole bunch more that’s new in PyMongo 1.7: check the changelog for the full list, and go upgrade!