[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Paying off tech debt and correctly naming things

Having been through a bunch of refactors in other projects, I would suggest doing small incremental patches isolated in related parts of the code. It would also be nice if you could give us a heads up on changes you're making.
Best of luck!

    On Tuesday, March 20, 2018, 3:04:43 PM PDT, Jon Haddad <jon@xxxxxxxxxxxxx> wrote:  
 Whenever I hop around in the codebase, one thing that always manages to slow me down is needing to understand the context of the variable names that I’m looking at.  We’ve now removed thrift the transport, but the variables, classes and comments still remain.  Personally, I’d like to go in and pay off as much technical debt as possible by refactoring the code to be as close to CQL as possible.  Rows should be rows, not partitions, I’d love to see the term column family removed forever in favor of always using tables.  That said, it’s a big task.  I did a quick refactor in a branch, simply changing the ColumnFamilyStore class to TableStore, and pushed it up to GitHub. [1]

Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K LOC changed across 275 files.  I’ll note that my branch doesn’t change any of the almost 1000 search results of “columnfamilystore” found in the codebase and hundreds of tests failed on my branch in CircleCI, so that 2K LOC change would probably be quite a bit bigger.  There is, of course, a lot more than just renaming this one class, there’s thousands of variable names using any manner of “cf”, “cfs”, “columnfamily”, names plus comments and who knows what else.  There’s lots of references in probably every file that would have to get updated.

What are people’s thoughts on this?  We should be honest with ourselves and know this isn’t going to get any easier over time.  It’s only going to get more confusing for new people to the project, and having to figure out “what kind of row am i even looking at” is a waste of time.  There’s obviously a much bigger impact than just renaming a bunch of files, there’s any number of patches and branches that would become outdated, plus anyone pulling in Cassandra as a dependency would be affected.  I don’t really have a solution for the disruption other than “leave it in place”, but in my mind that’s not a great (or even good) solution.

Anyways, enough out of me.  My concern for ergonomics and naming might be significantly higher than the rest of the folks working in the code, and I wanted to put a feeler out there before I decided to dig into this in a more serious manner. 


[1] https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1 <https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1>