[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Better Regex and exception handling for this small code

On 11Jul2017 22:01, Ganesh Pal <ganesh1pal at gmail.com> wrote:
>I am trying to open a file and check if there is a pattern  has  changed
>after the task got completed?
>file data:
>#tail -f /file.txt
>Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = b6b20a65,
>journal_crc = d2097b00
>Note: Task completed successfully.
>Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = d976d35e,
>journal_crc = a176af10
> I  have the below piece of code  but would like to make this better more
>pythonic , I found regex pattern and exception handling poor here , any
>quick suggestion in your spare time is welcome.
>#open the existing file if the flag is set and check if there is a match

Use "True" instead of "1". A flag is a Boolean thing, and should use a Boolean 
value. This lets you literally speak "true" and 'false" rather than imoplicitly 
saying that "0 means false and nonzero means true".

>data = None

There is no need to initialise data here because you immediately overwrite it 

>with open(log_file, 'r') as f:
>     data = f.readlines()
>if flag_is_on:

Oh yes. Just name this variable "flag". "_is_on" is kind of implicit.

>    logdata = '\n'.join(data)

Do other parts of your programme deal with the file data as lines? If not, 
there is little point to reading the file and breaking it up into lines above, 
then joining them together against here. Just go:

  with open(log_file) as f:
      log_data = f.read()

>    reg = "initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc"

Normally we write regular expressions as "raw" python strings, thus:

    reg = r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc'

because backslashes etc are punctuation inside normal strings. Within a "raw" 
string started with r' nothing is special until the closing ' character. This 
makes writing regular expressions more reliable.

Also, why the character range "[\s\S]"? That says whitespace or nonwhitespace 
i.e. any character. If you want any character, just say ".".

>    crc = re.findall(re.compile(reg), logdata)

It is better to compile a regexp just the once, getting a Regexp object, and 
then you just use the compiled object.

>    if not crc:
>        raise Exception("Pattern not found in  logfile")

ValueError would be a more appropriate exception here; plain old "Exception" is 
pretty vague.

>    checksumbefore = crc[0].strip()
>    checksumafter = crc[1].strip()

Your regexp cannot start or end with whitespace. Those .strip calls are not 
doing anything for you.

This reads like you expect there to be exactly 2 matches in the file. What if 
there are more or fewer?

>    logging.info("checksumbefore :%s and  checksumafter:%s"
>                      % (checksumbefore, checksumafter))
>    if checksumbefore == checksumafter:
>       raise Exception("checksum not macthing")

Don't you mean != here?

I wouldn't be raising exceptions in this code. Personally I would make this a 
function that returns True or False. Exceptions are a poor way of returning 
"status" or other values. They're really for "things that should not have 
happened", hence their name.

It looks like you're scanning a log file for multiple lines and wanting to know 
if successive ones change. Why not write a function like this (untested):

  RE_CRC_LINE = re.compile(r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc')

  def check_for_crc_changes(logfile):
      old_crc_text = ''
      with open(logfile) as f:
          for line in f:
              m = RE_CRC_LINE.match(line)
              if not m:
                  # uninteresting line
              crc_text = m.group(0)
              if crc_text != old_crc_text:
                  # found a change
                  return True
      if old_crc_text == '':
          # if this is really an error, you might raise this exception
          # but maybe no such lines is just normal but boring
          raise ValueError("no CRC lines seen in logfile %r" % (logfile,))
      # found no changes
      return False

See that there is very little sanity checking. In an exception supporting 
language like Python you can often write code as if it will always succeed by 
using things which will raise exceptions if things go wrong. Then _outside_ the 
function you can catch any exceptions that occur (such as being unable to open 
the log file).

Cameron Simpson <cs at zip.com.au>