I will try to answer you on those questions, relying on people around to correct me if I am wrong.
It says to align partition keys to your TWCS windows. Is it generally the case that calendar/date based partitions would align nicely with TWCS windows such that we would end up with one SSTable per partition after the major compaction runs?
So this is probably meant to avoid having data spread on many buckets (possibly all of them) as it makes tombstone eviction harder, and depending on your queries the read might be way longer as it could hit many SSTables, for example if you use a LIMIT clause, without filtering on the clustering key (Alex who wrote the post you mentioned is currently writing about this kind of reads using limits). Well, keep in mind that in many case it is way better to have changing partitions over time.
About how nicely the partitions would align to TWCS buckets, I guess it is just about adding a time period as a part of the partition key:
Using an hour window? What about adding YYYYmmddHH as part of the partition key, the key would look like ((item1, 2017020814), date), '(item1, 2017020814)' being the partition key and 'date' a clustering key. This is just a stupid example to give you an idea on how to control partition size and time range they cover. On the flip side, to select a full day, you would then need to query 24 partitions (1 per hour).
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html says to aim for < 50 buckets per table based on TTL.
This, as many numbers shared around about Cassandra, is probably to give people a rough idea of what this number should be, the order of magnitude. You can probably be fine with 20 or 100 SSTables. Sharing experiences in Cassandra is not easy as how efficient each setting will be depends on the hardware, workload and many other things, and so be different from one cluster to the next one. Plus if I remember correctly Jeff said to use ~ 30 buckets.
Are there any recommendations on a range to stay within for number of buckets?
Well, this is actually quite straightforward. As a premise we have known and fixed TTL and we consider the final state of each bucket, meaning we consider 1 bucket = 1 sstable. Then to choose the appropriate window (bucket) size, it is just about dividing the TTL per the desired number of sstables (30). For a 90 days TTL, use 90 / 30 = 3 days.
What are some of the tradeoffs of smaller vs larger number of buckets? For example I know that a smaller number of buckets means more SSTable to compact during the major compaction that runs when we get past a given window.
Alex wrote about this, just before recommending the 50 SSTable max. But to answer again quickly, the bigger the buckets are, the heavier the compactions will be indeed. On the other side, more SSTables means that Cassandra will have to read many SSTables to have an information in some cases, even if relevant data is hold on one of those. Each SSTable read is a disk read, known to be way slower than other things in computer science as of now, even if this improves with SSDs, it is still a major thing to keep in mind.
Are tombstone compactions disabled by default?
No, they are not. Default options as far as I remember are to trigger compaction on SSTables having a droppable tombstone ratio over 0.20 (tombstone_threshold), if no tombstone compaction ran in the last 1 day (tombstone_compaction_interval) and if Cassandra assumes there is not too much overlapping with other SSTables. To make tombstone compaction more aggressive (removing the check that I just mentioned) set "unchecked_tombstone_compaction" to true.
Can you ever wind up in a situation where the major compaction that is supposed to run at the end of a window does not run? Not sure if this is realistic but consider this scenario
This is possible if compactions are not keeping up, because allocated resources are not big enough for example. Also, when adding an out of order SSTable, the old bucket is falling behind on compactions as a compaction is then needed.
Suppose compaction falls behind such that there are 5 windows for which the major compactions have not run. Will TWCS run the major compactions for those window serially oldest to newest?
I believe it is in reverse order, Newest first, then going to older buckets. That's what would make sense to me, I did not check the code honestly. Maybe Jeff, Alex or someone else will be able to confirm that.
If am I using a window size of one day, it it current 02:00 AM Tuesday, and I receive a write for 11:45 PM Monday, should I consider that out of order?
Yes, that's what being out of order means in our case I guess. The thing to keep in mind is that this data will be flushed with other data from 2 am, meaning the max timestamp of the sstables in the bucket will be around 2 am, and so the data from 23h45 will be put on the wrong bucket.
I must say that I haven't played that much with TWCS nor read the code. I am sharing my understanding, that can be imprecise or wrong with you here, again, hoping people will correct my wrong statements.
The Last Pickle - Apache Cassandra Consulting