I have been tasked with picking and setting up a database with the following characteristics:
- Ultra-high availability - The real requirement is uptime - our whole platform becomes inaccessible without a “read” from the database. We need the read to authenticate users. Databases will never be spread across multiple networks.
- Reasonably quick access speeds
- Very low data storage - The data storage is very low - for 10 million users, we would have around 8GB of storage total.
Having done a bit of research on Cassandra, I think the optimal approach for my use-case would be to replicate the data on ALL nodes possible, but require reads to only have a consistency level of one. So, in the case that a node goes down, we can still read/write to other nodes. It is not very important that a read be unanimously agreed upon, as long as Cassandra is eventually consistent, within around 1s, then there shouldn’t be an issue.
When I go to set up the database though, I am required to set a replication factor to a number - 1,2,3,etc. So I can’t just say “ALL” and have it replicate to all nodes. Right now, I have a 2 node cluster with replication factor 3. Will this cause any issues, having a RF > #nodes? Or is there a way to just have it copy to all nodes? Is there any way that I can tune Cassandra to be more read-optimized?
Finally, I have some misgivings about how well Cassandra fits my use-case. Please, if anyone has a suggestion as to why or why not it is a good fit, I would really appreciate your input! If this could be done with a simple SQL database and this is overkill, please let me know.
Thanks for your input!