logo       

Re: [Boston.pm] Max hash key length: msg#00002

Subject: Re: [Boston.pm] Max hash key length
   From: Aaron Sherman <ajs-Xy/8OPi/Zy8@xxxxxxxxxxxxxxxx>
   Date: Mon, 03 Jan 2005 10:21:32 -0500

   . . .

   And yes, while I usually just trust to the law of probability (which is
   a very strange feature of our universe, if you stop to think about it),

You really think so?  It seems to me that "well-behaved" randomness is
built into the very fabric of our universe, so to speak, where
"well-behaved" means "can be characterized by mathematical analysis."
Can you imagine quantum mechanics without it?  Or evolution?

   when I'm doing something that requires certainty I do not rely on
   rsync's block-for-block checksum strategy. Instead, I do one of two
   things:

        1. I force a full copy of files based on timestamp (costly, of
           course)

This will work (of course).

        2. I add one byte to the end of all files, rsync normally, remove
           the byte and rsync again. This results in odds of correct
           identification of changes that is an order of magnitude better,
           but still not perfect. Again, this is a matter risk assessment.

Are you sure this does what you think?  It will certainly force the ends
of the files to be retransmitted in detail, but I can't see that it does
anything to avoid hash collisions.  In particular, it doesn't affect the
hash values that will be used for blocks that do not contain the last
byte.  And since you must be doing adding bytes on the transmitting
side, it doesn't at all affect the hashes computed by the receiver and
sent back to the transmitter (until after the first rsync, of course).

   Adding or changing one byte per file per might possibly reduce the
odds of a hash collision, but only if rsync were to compare checksums
for the entire file before "committing" changes, but I see no mention of
that in the report (http://samba.anu.edu.au/rsync/tech_report/, for
reference).  And if it did that, it seems to me, adding or changing one
byte per file wouldn't improve the matter.

   Or am I misunderstanding something?

                                        -- Bob Rogers
                                           http://rgrjr.dyndns.org/


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
science.linguis...    culture.sf.lite...    video.mplayer.c...    yellowdog.gener...    ietf.rfc822/199...    emacs.help/2002...    redhat.release....    kernel.speakup/...    java.openejb.de...    debian.devel.gt...    xfree86.newbie/...    bug-tracking.ma...    pam/2003-05/msg...    games.devel.ope...    user-groups.lin...    music.pancham/2...    network.mq.deve...    web.html.genera...    arklinux.bugs/2...    linux.ecasound/...    qnx.openqnx.dev...    org.user-groups...    file-systems.sf...    trustix.contrib...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe