[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Best way to (re)load test data in Mongo DB


I'm toying around with Pyramid and am using Mongo via MongoEngine for
storage. I'm new to both Pyramid and MongoEngine. For every test case in
the part of my suite that tests the data access layer I want to reload
the database from scratch, but it feels like there should be a better
and faster way than what I'm doing now.

I have two distinct issues:
1. What's the fastest way of resetting the database to a clean state?
2. How do I load data with Mongo's internal _id being kept persistent?

For issue #1:
First of all I'd very much prefer to avoid having to use external client
programs such as mongoimport to keep the number of dependencies minimal.
Thus if there's a good way to do it through MongoEngine or PyMongo,
that'd be preferable.

My first shot at populating the database was simply to load data from a
JSON file, use this to create my model objects (based on
MongoEngine.Document) and save them to the DB. With a single-digit
number of test cases and very limited data, this approach already takes
close to a second, so I'm thinking there should be a faster way. It's
Mongo, after all, not Oracle.

My second version uses the underlying PyMongo module's insert_many()
function to add all the documents for each collection in one go, but for
this small amount of data it doesn't seem any faster.

Which brings us to issue #2:
For both of these strategies I'm unable to insert the Mongo ObjectId
type _id. I haven't made _id properties part of my models, because they
seem a bit... alien. I'd rather not include them solely to be able to
load my test data properly. How can I populate _id as an ObjectId, not
just as a string? (I'm assuming there's a difference, but it's never
come up until now.)

Am I being too difficult? I haven't been able to find much written about
this topic: discussions about mocking drown out everything else the
moment you mention 'mongo' and 'test' in the same search.