Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: RevML DTD: msg#00000

version-control.revml

Subject: Re: RevML DTD

[CCed to the revml list with permission]

On Thu, Jan 06, 2005 at 02:54:07PM +1100, Peter Miller wrote:
> Hi Guys,
>
> I'm thinking of adding revml support to Aegis, and I have a few
> questions.

Excellent, we'll help you as best we can.

> There is plenty of VCP documentation on the web site
> (http://public.perforce.com/public/revml/index.html), but no RevML
> documentation, and no RevML DTD,

See this link for the latest:

http://public.perforce.com/public/revml/revml.dtd

NOTE: CPAN should not have any code left on it aside from the backpan;
trying to distribute production code that way turns out to be
problematic due to the number of prerequisites and the difficulty many
people have in installing all of them. It also broadens and deepens the
number and complexity of configurations we need to support.

> Examples of valid RevML files would be nice, too.

I can send you some. VCP generates only valid XML (it checks elements
against the DTD).

> How come you introduced <char code="0xNN"> instead of using the existing
> &#xNN; mechanism? Maybe some words of explanation in the DTD would
> help.

It's an XML thing: no matter what XML method you use, you are not
allowed to encode any character point below a space (32) with the
exception of a few control characters like carriage return and line
feed. Even in XML1.1 you can't encode a NUL (0x00). So we need a
non-builtin way to carry the occasional illegal character through XML.

> The rest of my questions revolve around the addition of repository types
> that the DTD has never heard of. With a new VC/SCM project starting
> about every 3 months at the moment, it seems that an extensible DTD,
> with no assumptions about the set of all VC/SCM systems, would be a good
> idea.

Most of RevML has few assumptions other than a series of revisions
linked in some way.

> You have the ability to add <COMMENT>s to a change set, but what if the
> originating system has *numerous* attribute for each change set, and the
> description is only one of them? What if a system supports arbitrary
> user defined change set attributes?

Systems like Subversion and, I presume, Aegis, would need their own
element; the DTD above defines <cvs_info>, <p4_info>, etc. in each rev
currently as PCDATA blobs, but we can define structured information in
to them at some point as well.

> What is a system supports file attributes beyond the ones in the DTD?

We'd open the DTD up to allow them. Let's define them :).

> What if a system supports arbitrary user defined change set attributes?

We'd add a "named attribute" element like what you show. And I agree
with your choice of elements and not attributes.

> For example:
> <attribute><name>X-Aegis-cause</name
> <value>internal_bug</value> </attribute>
> <attribute><name>UUID</name
> <value>8112422d-bddf-496a-bf3c-23a4ac283fc8</value> </attribute>
> could be used to allow system specific attributes, for systems which
> have yet to be invented, or which the DTD authors will never use,
> without DTD changes.

I want to capture standard stuff in the DTD to prevent accidental or
overly creative misuse. By standardizing the commonly available pieces
in the DTD, including element ordering where convenient, we narrow the
range of variation and limit accidental dependance on unspecified
ordering, for instance.

> I don't understand the use intended for the BRANCH_MAP_ID and
> BRANCH_MAP_ID forms.

Those are not present any more; they were intended to declare a set of
<branche>s early in the RevML file and then be able to use <branch_map>
and related elements to specify how to do the mapping when copying. The
<rev>s can then refer to them throughout the file but it turns out to
add no value to the current implementation.

Today, we merely insist that the sender (VCP::Source::* in practice)
establish its own unique list of branch IDs and mention them in the
<rev> tags; there is no forward declaration of <branch>es today.

We can bring them back if ever we have a need, but for now, KISS
applies.

> Given the presence of the <REP_TYPE>, why is the rep_type redundantly
> present in the names of all the <*_BRANCH_ID> forms?

You see a false start :).

> Given the presence of the <REP_TYPE>, why is the rep_type redundantly
> present in the names of all the <*_INFO> forms?

It is not now, not sure why it every was.

> Would it not be
> possible to have a simple <INFO> form, with some kind of extensible
> content?

VCP support for PVCS never got off the ground; that was (and is)
provisional engineering. <pvcs_info> should be struck from the DTD
until such time as it is needed.

> What is a change set moves a file *and* changes it? (This is common for
> include files and their #ifdef insulation for indempotency.) Shouldn't
> the last line of <!element rev> say (delete | (move,delta?) |
> ((content|(base_name?,base_rev_id,delta)),digest)) instead?

That has not been considered. <delete>, <move> and friends have been
replaced with the more generic <action> element and all tools are under
scout's honor to only use certain strings ("edit", "add", "branch",
"delete", and a few others to deal with branching/merging semantics
to date). I've not implemented any backends that support a discreet
"move", but "move" would be my choice there. I should document these
strings in the current DTD. Any unrecognized string is treated as an
"edit" by VCP::Dest::*.

> What if a system supports file attributes beyond the ones in the DTD?
> (Only comment is provided - is that the change set comment

Yes, although two of the four systems (CVS, VSS) have no changeset
concept and so the DTD does not assume changesets. I'd like to see some
explicit support for declaring changeset-wide information and then
referring to it in individual revs, but that would mean a whole lot more
logic to handle indirection and save little or no disk space when RevML
is compressed.

> You have a comment that <REPOSITORY_*> specific tags as needed, but that
> is not future friendly. By using an <attribute> <name>blah</name
> <value>blah</value> </attribute> style, all you need is a convention for
> the names, which no change to the DTD.

Agreed, but I want to limit the ad-hoc use of a generic form to truely
generic attributes; common attributes should be embodied in the DTD to
encourage standardization and once a common attribute escapes in to the
wild encapsulated in a generic form, it can never be recaptured in a
standard form without having every tool support both forms (ugh).

> The <TYPE> form is too limited.

It is sufficient for the systems we've used RevML with; as other systems
are added we will look at other techniques. I like MIME types, but we'd
also have to maintain two mappings of MIME types, one "to" and one
"from", the simpler types used by more widespread systems like CVS,
Perforce, VSS, etc.

> Other well known attributes could include branch-id, Comment, Executable
> (true or false), label, lock, mod-time, rev-id, user-id, UUID, ...etc.

> This has the advantage that if a system chooses to ignore an attribute,
> they don't have to support the grammar for the ones they are ignoring.

Ignoring portions of an XML grammer is easy :). Coping with multiple
authors who do not happen to choose the same spelling for a <name> is
difficult, I think. But we're flexible; we want to encourage well-known
names by ensconcing them as element names, not forbid ad-hoc extensions.

> Plus, they can all have X-system-blah-blah extensions. The ones that
> support arbitrary user defined attributes could have User-blah-blah
> attributes, too.

Nice approach, actually. I like the idea of a <user_attribute> and
<site_attribute> if an SCM makes some true semantic difference between
them.

> When the <TYPE> also has the charset it given, it becomes possible to
> map the XML encoding into the correct file encoding.

Agreed. I presume the XML would be encoded without a character set
using UTF-*, code pages are so 1970s :). But we don't enforce that and
haven't had to deal with it yet.

> Note that some systems give each file a unique ID (at least two that I
> know of use the standard GUID/UUID format) which is immutable; they
> model filenames as an editable attribute of a file, thus a file rename
> is a simple change of the filename attribute.

The <rev id="..."> should contain the GUID/UUID while the <name> should
be it's current public identity.

We do look to extend the RevML DTD as new needs come along; let's
discuss how.

Thanks,

Barrie


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
hardware.arm.at...    cms.citadel.dev...    video.gstreamer...    java.facelets.u...    misc.basics.qna...    web.wiki.instik...    network.uip.use...    xdg.devel/2003-...    tex.bibtex.bibd...    finance.quotesp...    ietf.zeroconf/2...    redhat.blinux.g...    suse.db2/2003-0...    php.phpesp/2004...    uml.devel/2003-...    gnome.labyrinth...    qnx.openqnx.dev...    boot-loaders.gr...    db.dataperfect....    audio.audacity....    linux.uclinux.m...    editors.j.devel...    os.openbsd.tech...    kde.users.multi...   
Home | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation