Re: Paying off tech debt and correctly naming things
As requested, I'm alerting the ML that I've got the first patch of several
to come. This one only addresses the ColumnFamilyStore class - and only
changes the name. There's follow up tickets to change method and variable
names. As we agreed, I'm doing this incrementally, so please let's keep
that in mind, I have no interest in producing a 10K patch that changes all
Since the patch only touches lines explicitly naming ColumnFamilyStore this
patch should be reasonably non-intrusive.
On Thu, Mar 22, 2018 at 8:09 AM Jon Haddad <jon@xxxxxxxxxxxxx> wrote:
> Cool. I think there’s general agreement that doing this in as small bites
> as possible is going to be the best approach. I have no interest in mega
> > The combined approach takes a
> > change that's already non-trivially dealing with complex subsystem
> > changes and injects a bunch of trivial renaming noise across unrelated
> > subsystems into the signal of an actual logic refactor.
> I agree. This is why I like the idea of proactively working to improve
> the readability of the codebase as a specific goal, rather than being
> wrapped into some other unrelated patch. Keeping the scope in check is the
> challenge. Simple class and method renames, as several have pointed out,
> is easy enough with IDEA.
> I’ll start with class renames, as individual patches for each of them.
> I’ll be sure to call it out on the ML. First one will be ColumnFamilyStore
> -> TableStore.
> > On Mar 22, 2018, at 7:13 AM, Jason Brown <jasedbrown@xxxxxxxxx> wrote:
> > Jon,
> > Thanks for bringing up this topic. I'll admit that I've been around this
> > code base for long enough, and have enough accumulated history, that I
> > probably can't fully appreciate the impact for a newcomer wrt naming.
> > However, as Josh points out, this situation probably happens to "every
> > non-trivially aged code-base ever".
> > One thing I'd like to add is that with these types of large refactoring
> > changes, the review effort is non-trivial. This is because the review
> > has to ensure that correctness is preserved and it's easy to overlook a
> > seemingly innocuous change.
> > That being said, I am supportive of this effort. However, I believe it's
> > going to be best, for contributor and reviewer, to break it up into
> > smaller, more digestible pieces. I'd also like to request that we not go
> > whole hog and try to do everything in a compressed time frame; reviewer
> > availability is already stretched thin and I'm afraid of deepening the
> > review queue, especially mine :)
> > Thanks,
> > -Jason
> > On Thu, Mar 22, 2018 at 6:41 AM, Josh McKenzie <jmckenzie@xxxxxxxxxx>
> >>> Some of us have big patches in flight, things that actually
> >>> pay off some technical debt, and dealing with such renames is rebase
> >> hell :\
> >> For sure, but with a code-base this old / organically grown, I expect
> >> this will always be the case. If we're talking something as simple as
> >> an intellij rename refactor, while menial, couldn't someone with a
> >> giant patch just do the same thing on their side and spend half an
> >> hour of their life clicking next? ;)
> >>> That said, there is good time for such renames - it’s during
> >>> those major refactors and rewrites. When you are
> >>> changing a subsystem, might as well do the appropriate renames.
> >> Does that hold true for a code-base with as much static state and
> >> abstraction leaking / bad factoring as we have? (i.e. every
> >> non-trivially aged code-base ever) The combined approach takes a
> >> change that's already non-trivially dealing with complex subsystem
> >> changes and injects a bunch of trivial renaming noise across unrelated
> >> subsystems into the signal of an actual logic refactor.
> >> On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko <aleksey@xxxxxxxxx>
> >> wrote:
> >>> Poor and out-of-date naming of things is probably the least serious
> >> of our technical debt. Bad factoring, and straight-up
> >>> poorly written components is where it’s really at.
> >>> Doing a big rename for rename sake alone does more harm than it is
> >> sometimes. Some of us have big patches
> >>> in flight, things that actually pay off some technical debt, and
> >> with such renames is rebase hell :\
> >>> That said, there is good time for such renames - it’s during those
> >> refactors and rewrites. When you are
> >>> changing a subsystem, might as well do the appropriate renames.
> >>> —
> >>> AY
> >>> On 20 March 2018 at 22:04:48, Jon Haddad (jon@xxxxxxxxxxxxx) wrote:
> >>> Whenever I hop around in the codebase, one thing that always manages to
> >> slow me down is needing to understand the context of the variable names
> >> that I’m looking at. We’ve now removed thrift the transport, but the
> >> variables, classes and comments still remain. Personally, I’d like to
> go in
> >> and pay off as much technical debt as possible by refactoring the code
> >> be as close to CQL as possible. Rows should be rows, not partitions, I’d
> >> love to see the term column family removed forever in favor of always
> >> tables. That said, it’s a big task. I did a quick refactor in a branch,
> >> simply changing the ColumnFamilyStore class to TableStore, and pushed
> it up
> >> to GitHub. 
> >>> Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K
> >> LOC changed across 275 files. I’ll note that my branch doesn’t change
> >> of the almost 1000 search results of “columnfamilystore” found in the
> >> codebase and hundreds of tests failed on my branch in CircleCI, so that
> >> LOC change would probably be quite a bit bigger. There is, of course, a
> >> more than just renaming this one class, there’s thousands of variable
> >> using any manner of “cf”, “cfs”, “columnfamily”, names plus comments and
> >> who knows what else. There’s lots of references in probably every file
> >> would have to get updated.
> >>> What are people’s thoughts on this? We should be honest with ourselves
> >> and know this isn’t going to get any easier over time. It’s only going
> >> get more confusing for new people to the project, and having to figure
> >> “what kind of row am i even looking at” is a waste of time. There’s
> >> obviously a much bigger impact than just renaming a bunch of files,
> >> any number of patches and branches that would become outdated, plus
> >> pulling in Cassandra as a dependency would be affected. I don’t really
> >> a solution for the disruption other than “leave it in place”, but in my
> >> mind that’s not a great (or even good) solution.
> >>> Anyways, enough out of me. My concern for ergonomics and naming might
> >> significantly higher than the rest of the folks working in the code,
> and I
> >> wanted to put a feeler out there before I decided to dig into this in a
> >> more serious manner.
> >>> Jon
> >>>  https://github.com/apache/cassandra/compare/trunk...
> >> rustyrazorblade:refactor_column_family_store?expand=1 <
> >> https://github.com/apache/cassandra/compare/trunk...
> >> rustyrazorblade:refactor_column_family_store?expand=1>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx