[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

checksum problem

On Wed, Jan 31, 2018 at 6:21 AM, Peter Pearson
<pkpearson at nowhere.invalid> wrote:
> On Tue, 30 Jan 2018 11:24:07 +0100, jak <please at nospam.tnx> wrote:
>>      with open(fname, "rb") as fh:
>>          for data in fh.read(m.block_size * blocks):
>>              m.update(data)
>>      return m.hexdigest()
> I believe your "for data in fh.read" loop just reads the first block of
> the file and loops over the bytes in that block (calling m.update once
> for each byte, probably the least efficient approach imaginable),
> omitting the remainder of the file.  That's why you start getting the
> right answer when the first block is big enough to encompass the whole
> file.

Correct analysis.

Generally, if you want to read a file in chunks, the easiest way is this:

while "moar data":
    data = fh.read(block_size)
    if not data: break

That should get you the correct result regardless of your block size,
and then you can tweak the block size to toy with performance.