I wrote a comment on a post over at 10gen’s mongo blog today. To their credit they are probably the most transparent corp. backed nosql product out there. So often I find them saying… well “we aren’t trying to solve that problem” or “our product is not good for this situation(paraphrased obviously). I praise them for such honesty and frankly I love it when engineers keep it real. A warning to anyone trying to solve the “webscale” problem. You can fool the business types but the engineers are going to get to the bottom of your BS! Anyway here is the contents of my post: (link)

“10gen has always done a great job of trumpeting the benefits of mongodb while being completely transparent about it’s drawbacks. I think that is very helpful. I wish I could say the same for all of the non-10gen staff that have drank the mongo kool-aid and trumpet it as THE solution for “webscale” (I hate that acronym).  I can say for sure that mongodb’s document storage and rich query language goes a long way for simple schemas. It is so important to break through to everyone that this is not a solution for complex schemas and by complex I mean anything that has a datamodel that cannot have every single field living inside the same document. It would almost be better to have blog posts about the sort of problems mongo is NOT intended to solve. Imagine trying to maintain a consistent view of a product library where each product has hundreds or thousands of individual fields. Then imagine each of the products in the library having a quantity field. Then imagine having a total quantity for all products within the library. Then imagine users checking in and out products and keeping the exact quantity of not only each product but the total of all products currently checked in consistent. $inc and similar atomic operators within mongo will work for the individual product totals but not the total over all products. Thus the need for acid in certain problem sets. As with all things you can get 2 but not 3. Fast/large(r than memory dataset)/consistent, the persistence daemon you choose will dictate which two you can have.”

 

Oh and quickly a little rant about “webscale” and why I hate that yet-another-unoriginal-sales-inspired-technical-buzzword. As with most of these “buzzwords” it means nothing and is not new at all. Since the dawn of computing there have been physical limitations. Limitations on physical memory being one of the more prevalent and permanent. “Webscale” I think intends to describe the ever growing dataset of the web and the difficulty of working with it. This “web” dataset has seen a huge spike recently and as such solutions to how to process the information have changed… SLIGHTLY. I say slightly because processing data has been going on since the dawn of man and the solutions to processing this data have been permuted on over and over again. Sure the software was different then but the ideas behind the software are age old. Yes there are breakthroughs in how to look at a problem from time to time (ahem… map reduce) but for the most part the solution has always been to keep the data as close to the processor (in memory dbs) as possible and keep the volume of changes to that data low or suffer consistency issues (nosql). Webscale has always been around only with a different number of INsignificant digits following the numbers. However the relationship of dataset size to physical memory (available to a single machine) is still the same old problem as it’s always been.


Published

11 July 2011