On Sat August 21 2004 06:27 am, David Roundy wrote:
> It seems that there are roughly three reasons people would like to do this,
> and I think that these three reasons can be best addressed separately.
While this is a nice summary of the reasons that have been discussed so far,
it seems to me that a larger point, which I haven't expressed very well if at
all, is being missed. Let me take another shot at it.
In general, projects have multiple lines of development, and multiple
configurations within those lines. Much development is fairly linear in
nature, but some isn't: branches are sometimes made, and changes are
sometimes copied between branches.
The darcs model supports a single line of development per repo, with a tagging
feature available to record specific configurations along the line. &&&
Thus there are several kinds of information that are of interest:
() What configuration a given repo is in. It is suggested that the repo name
feature suffices to track this.
() What patches form the difference between two configurations. Darcs might
be able to compute this for two configurations in the same line, but can't do
it between branches, since it has no representation for multiple branches.
() What configurations contain a given patch. Again, darcs can't compute this
across branches.
() What public configurations (including branches) have been created (by
anyone) for a given project. The suggestion is that if all users keep their
public configurations in a single directory, an external tool could
synchronize the sets of configurations. As I think someone pointed out,
there are space and download time issues here. If one user creates a new
repo for a public branch on their own machine, they can get the benefit of
the hardlinks. But if one user has a repo A, which another user gets to make
A', and then the first user does a `darcs get' on A to create B, and the
second user then does a get on B to make B', there won't be any sharing
between A' and B'. This is one reason I think this functionality should be
integrated into darcs rather than outboard. Still, an external solution is
arguably workable as long as the numbers of public branches remain small.
We've mostly been discussing the third point, but I think the first two are at
least as important. It's common, for example, when a product is about to be
released, to make a branch for the release. Subsequent minor releases are
created by fixing bugs in the release branch; the development branch
typically undergoes more disruptive revisions in preparation for a future
release. The release branch needs to be isolated from those, but the bug
fixes need to be copied (usually) from the release branch to the development
branch. I gather it's easier in darcs to do this than in most SCM systems --
just push the patches from the release repo to the development repo. But it
still would be nice to be able to ask darcs whether this had been done for a
given patch.
> Reason 1: disk space
This is certainly one of the reasons. As you point out, bandwidth can be an
issue too.
> Reason 2: easily synchronizing or transfering multiple projects/branches
>
> As I said, I think this is best done with an external tool. Such a tool
> could be bundled with darcs, but I like the fact that darcs itself deals
> with just a single branch, which simplifies both its interface and its
> code.
I'm not sure it simplifies the interface. See below.
> Reason 3: keeping track of branches in a sane manner
>
> I'm not clear as to how things would be different in this respect if there
> was some link between various branches (a metarepository, or whatever). It
> seems like all that would change would be the namespace that is being used,
> and the only major effect would be that you'd lose flexibility. Whatever
> you gained in ability to deal with two branches on your computer (in a
> single metarepository) you wouldn't gain when dealing with one branch on
> your computer and one branch on someone else's computer--i.e. you'd be
> losing the distributed nature of darcs.
I don't agree. All you would have to do is to synchronize the two repos, and
then each user would be able to perform all operations that either could
perform.
In fact I think it enhances the distributed nature of darcs, because it lets
me know about all the public configurations you have created (including
experimental ones), and after I synchronize with your repo, I can then enter
any of those configurations efficiently without reconnecting.
Let me elaborate on this a bit. In the system I envision, the most common
operation between two repos would probably be complete synchronization, in
which all public configuration definitions and the patches they contain are
transferred between the two repos. ("Public" means "those whose developer
has decided they're ready to be published" -- the distinction simply provides
a way to keep work private while it's in progress.) This operation per se
doesn't alter the working copy of either repo. After it's done, the user of
each repo can browse the new configurations and decide which, if any, they
wish to move their working copy to -- or they can create some new
configuration, for instance, one which combines some of the patches they just
received with one of their private configurations.
By default, as a matter of convenience, one would probably want darcs to put
the local repo into the latest configuration in some line of development;
this reflects the way darcs works now and the most common usage scenario.
But this isn't necessary, and one can easily do something different.
To me this actually simplifies the darcs user interface, because no decisions
have to be made at repo synchronization time. I don't have to fiddle with
patches whose names match a particular regexp, or anything like that -- I
just get everything. I don't have to worry much about the bandwidth or disk
space involved, because no information is transmitted more than once or
copied more than once locally (I can choose to have multiple local copies for
my own reasons, of course, but darcs won't impose this on me). I don't even
have to decide in advance which configurations I might be interested in --
the configuration definitions and associated patch sets are likely to be
small enough (since programmers can create patches only at a certain rate)
that I don't have to worry about them. And once I have everything, I can
(but don't have to) look at all of it in as much detail as I wish before
deciding what to use.
Is this picture getting any clearer?
-- Scott
|