Well, where to start… This was a busy few days!
To start with, I should apologize for the previous blog post bashing DynamoDB before Amazon gets sad. It turns out my team mate and I were completely wrong about its abilities. We jumped to the conclusion that it could not support queries without specifying the primary key. It turns out it can. And it does this through the use of indexes that you manually specify (which you can do after the table has been created too). These indexes aren’t quite like indexes in relational databases though. They’re hashes. And given DynamoDB’s price competitiveness, we’re pretty happy looking at it as an option. Tl;dr we can use DynamoDB basically just as feasibly as any other NoSQL database. For the non-tl;dr, see my team mate’s blog post where he gets into the gritty details about DynamoDB and hash indexes.
Now on to the rest! We’re still not ready to start building because we’re still stuck at the stage where we decide on technology. Technologically, DynamoDB is good enough for us. However, I don’t work at the Seneca College Centre for Development of Proprietary Technology. We breath open source. If we can find an open source alternative to DynamoDB that Engineering.com is comfortable with, we can avoid coupling them too tightly to a proprietary technology. Freedom is nice.
So open source is good, NoSQL is necessary. All aboard the hype train, next stop MongoDB? I’ve known about MongoDB for years, and to my knowledge, it’s past its hyped up evangelized stage and if it’s still around, it must be good. I researched it. Turns out it’s massively scale-able, provided you have the metal for it, or if you want to go on the cloud, the database’s creator even offers MongoDB Atlas, a Database-as-a-Service (DBaas). Atlas sounds like what we need, but it better be scaleable. That’s the problem we keep running into with these cloud services. They’re convenient but limited. The main reason we need NoSQL at this point is because all of the relational database services on Amazon Web Services have a 6 TB limit, and we know we’ll need much more space than that if Engineering.com wants to run our creation for years to come. From what I gathered reading Atlas’s whitepaper and website pricing info, we should be able to get at least 10 TB from their cloud service. Better… But I sent them an email requesting more information and guidance before we take it too seriously.
Now here’s where the drama starts again. I researched how accurate MongoDB was. That is, is it just as strong as a traditional relational database? Is it ACID-compliant? Will it explode?
Our database for the project, maybe.
Yes. MongoDB’s criticism after its initial hype was warranted after all. It is absolutely garbage for relational data, *if* you need that data to be accurate. It isn’t ACID-complaint on the transactional level, so if you have a power outage, say goodbye to the usefulness of your data. Now for our project, this may be okay, because we don’t need the data to be super accurate. If things get disjointed after being denormalized (as anything must be to fit into a NoSQL database), it just slightly reduces the amount of useful analytics data we would have to work with. And having 99.99% of our analytics data that we mined available to us is still completely fine. However, I would definitely not use MongoDB in the real world for anything involving important information, especially e-commerce etc. If this paragraph went over your head or bored you, you’d probably enjoy reading the use case blog post I read to discover this. In it, American developer Sarah Mei describes how using MongoDB during the launch of Diaspora almost destroyed the project, and why they ultimately had to retreat back to relational databases.
So what does this mean? We’re closer, but we need to triple check the safety of using MongoDB for this project (and all NoSQL databases like DynamoDB for that matter!) before finally getting started building.