A TEXT POST

The Relational Model, NoSQL, and your problems!

Web Development seems to be experiencing a NoSQL hype. In the last couple of years there has been a lot of effort going into more open source projects that I’ve been able to keep up with. We have them in pretty much any language you might feel comfortable with: Cassandra for the Java fans, CouchDB which runs in an Erlang VM and MongoDB written in C++.

The Relational Model dates back to 1969, when E.F. Codd first proposed it. The application and amount of data in organizations has changed significantly even since. We have bigger problems now; like the amount of clients that need to access that data, while back in the 70’s an interactive system would only need to be accessible for a couple of thousand people; today, a simple mobile application could peak at half a billion users. Likewise, we have a couple of commodities we didn’t have back then: cheap hardware, cheap RAM and affordable environments for distributed computing. This makes an attractive ecosystem for close to infinite scalability and that’s just fascinating.

I have been dealing with quite a bit of data lately, most of it is disorganized data that used to have a lot of value until it became virtually impossible to get results in a reasonable amount of time. For the last 2 or 3 months I have been trying to organize it and make it as accessible as possible… not an easy task.

I either needed really expensive hardware to scale it up or I needed a change in our database paradigm… as a startup, I’m lucky I have hardware at all! So, I have been playing with NoSQL databases for a couple of months now and the most important lesson I have learned is what not to use them for. You are facing a major change in your data model, you’re now pretty much tied to a flat document with data, but if you need related data to make sense of this flat document, you’re out of luck; it will have to be implemented in the application layer, which might translate into increases in complexity and pricetag of your application. NoSQL is not for everyone and more importantly NoSQL databases are not a replacement for SQL databases.

Neither the Relational Model nor NoSQL will solve all your data problems -it would be too easy- all you’re doing is trading-off some problems for other issues that might affect you less. Essentially, you’re dropping all your accumulated knowledge about databases, such as ACID, SQL, Schemas, Normalization and Relationships but you’re gaining fault tolerance, geo availability and massive scalability (in the order of multiple TBs of data); which is hard and expensive to set up in a traditional RDBMS.

It comes down to the issues you’re facing. For example, your data might not need to be strong-consistent, and you could live with an eventual consistency of your data; but you do have users distributed across the US and Europe. Every time your application goes down you are loosing $5/minute on Ads revenue. In this case it will pay up to move to or complement a RDBMS with a NoSQL solution.

While experimenting I choose MongoDB and while it didn’t solved all my problems, it has certainly contributed to my final goal of making data accessible while maintaining reasonable response times. A particular example is a grouping query over several million rows which I was able to reduce from hours to just over a couple of minutes. This was not free though, there is some application logic that I had to implement to build the NoSQL data collection, but in this case it’s worth it.

You probably won’t use an airplane to commute to work, even though an airplane is much nicer and complex, using it for your 20 mile daily commute would not be worth the hassle, it just create more problems in your commute. Likewise, NoSQL technologies have a lot of hype around them, but they won’t necessarily make your application faster nor easier to manage.

-EOF-

  1. guntanis posted this