|
osdir.com mailing list archive F.A.Q. -since 2001! |
|
|
|
Subject: Re: Re: monotone & CVS import - msg#00045List: version-control.monotone.devel
by Date: Prev Next Date Index by Thread: Prev Next Thread Index
On Wed, Nov 12, 2003 at 10:28:40AM -0500, graydon hoare wrote:
> space character is an unfortunate special case becuse some data in > monotone is whitespace delimited (manifest entries and historical > rename certs). so .. hmm. I think manifests would survive introducing > space as an allowed character since they have exactly 40 characters of > hex as their first component, and then 2 spaces, then "rest of line" as > the path name. can't add '\n', but I don't think many files have that. sha1sum appears to have interestingly undocumented behaviour here. $ touch "`echo -e 'file_with_newline\nin'`" $ touch 'file_with_backlash_n\nin' $ touch 'normal_file' $ sha1sum * \da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_backlash_n\\nin \da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_newline\nin da39a3ee5e6b4b0d3255bfef95601890afd80709 normal_file If I'm reading this right, it means that a checksum whose first letter is "\" enables backslash processing of the filename. This appears only to be for a few $ touch "`echo -e 'file_with_non_printing_char\02in'`" $ touch "`echo -e 'file_with_tab\tin'`" $ sha1sum * da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_wierd_charin da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_tab in Investigation shows that the \02 and tab are embedded literally. Spaces are handled the same way. So this backslash processing appears to be quoting of backslashes and newlines only? I'm too lazy to check the source right now. (Are there any weird systems that allow nuls in filenames? I wouldn't put it past some version of Windows. That would be fun...) > historical rename certs will break if we add ' ', but I don't know if > anyone's using them aside from me so far, and I can repair my own just > by re-issuing the cert. is anyone else using them yet? would '\n' be a > good separator for structured data inside such certs? or '\0', just to > be safe? how about a netstring, len:<len bytes...> ? it'd be nice to > keep it possible to print the cert value to stdout. How about <len1>:<len bytes>\n<len2>:<len bytes>? That should be easy to parse for both computers and humans, while getting that nice sexpy goodness going... -- Nathaniel -- "If you can explain how you do something, then you're very very bad at it." -- John Hopfield
Thread at a glance:
Previous Message by Date:Re: Re: monotone & CVS importOn Wed, 2003-11-12 at 07:28, graydon hoare wrote: > historical rename certs will break if we add ' ', but I don't know if > anyone's using them aside from me so far, and I can repair my own just > by re-issuing the cert. is anyone else using them yet? would '\n' be a > good separator for structured data inside such certs? or '\0', just to > be safe? how about a netstring, len:<len bytes...> ? it'd be nice to > keep it possible to print the cert value to stdout. It would be good to support all possible names. Many projects have one or two files which have strange names, possibly because of the requirements of some other tool. If the source control system can't handle some files, it is effectively useless. What about using escapes (either URL-style %XX or C-style \000) in your string encoding to cope with tricky characters? Of course you then need to be careful to define what the hash is of: the quoted version or the unquoted version? If it's of the quoted version, you need to be sure that you have a canonical quoted form which is always the result of quoting. If it's of the unquoted version, then it would be possible to cause collisions - for example, if you named a file: foo.c\n ab87345ba98234b12692ab87345ba98234b12692 interloper.c you could cause the manifest hash to look as if interloper.c actually existed. If you make the record delimiter in the manifest \0, then this isn't a problem. On an unrelated subject, have you looked at storing file permissions/type as well? It would be useful if scripts checked into monotone came out with the x bits set. How about another field in the manifest? J Next Message by Date:Re: Building monotone with Debian (Sarge)On Wed, Nov 12, 2003 at 07:49:19AM -0800, Kevin Smith wrote: > I think what messed me up was having build-related instructions spread > across three different files: README, INSTALL, and the "building" link > on the monotone website. Perhaps there should be a BUILDING file, which > is mentioned in the README and INSTALL. The web page would then have a > copy of this file, rather than a separate document. I'm all for reducing duplication generally. What is the difference between INSTALL and BUILDING, though? I think traditionally what you call BUILDING just goes in INSTALL. However, I don't know if anyone actually _reads_ INSTALL these days, since most programs now seem to come with the useless generic version dropped in by autotools. Or maybe it's only the people who don't need to read INSTALL that find that generic version useless... -- Nathaniel -- "Lull'd in the countless chambers of the brain, Our thoughts are link'd by many a hidden chain: Awake but one, and lo! what myriads rise! Each stamps its image as the other flies" -- Ann Ward Radcliffe, The Mysteries of Udolpho Previous Message by Thread:Re: Re: monotone & CVS importOn Wed, 2003-11-12 at 19:40, graydon hoare wrote: > that was more my point; as it turns out, I think linux (~2.4) is one > of the better behaved unices and can handle a large swath of UTF-8 > characters in pathnames. other systems -- or older linuxes -- I don't > know. I'll have to test. at some point we'll make a portability > vs. feature tradeoff (VMS? Windows95?) but not before collecting some > data about which systems can do what. Pretty much all Unicies back to day one allow filenames to contain any character except '/' and '\0'. They're just treated as binary, so it doesn't matter to the kernel what the encoding is. Other operating systems have much more constrained filename rules though. And VMS's file naming rules are very complex and irregular. J Next Message by Thread:Re: Re: monotone & CVS importNathaniel Smith <njs@xxxxxxxxx> writes: > On Wed, Nov 12, 2003 at 10:28:40AM -0500, graydon hoare wrote: > > space character is an unfortunate special case becuse some data in > > monotone is whitespace delimited (manifest entries and historical > > rename certs). so .. hmm. I think manifests would survive introducing > > space as an allowed character since they have exactly 40 characters of > > hex as their first component, and then 2 spaces, then "rest of line" as > > the path name. can't add '\n', but I don't think many files have that. > > sha1sum appears to have interestingly undocumented behaviour here. wow, that's .. awful. spaces I will fiddle around and probably support. newlines, tabs, non-printing characters, probably not. not unless there's a persuative i18n reason and a strong, code-by-code analysis of the security affects on ascii-based filesystems. > How about <len1>:<len bytes>\n<len2>:<len bytes>? That should be easy > to parse for both computers and humans, while getting that nice sexpy > goodness going... yeah. something simple like that, at least for the rename certs. the xdelta format I made up works pretty much like that. it helps to keep it readable, for debugging. -graydon
blog comments powered by Disqus
|
|