Better Regex and exception handling for this small code
On 11Jul2017 22:01, Ganesh Pal <ganesh1pal at gmail.com> wrote:
>I am trying to open a file and check if there is a pattern has changed
>after the task got completed?
>#tail -f /file.txt
>Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = b6b20a65,
>journal_crc = d2097b00
>Note: Task completed successfully.
>Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = d976d35e,
>journal_crc = a176af10
> I have the below piece of code but would like to make this better more
>pythonic , I found regex pattern and exception handling poor here , any
>quick suggestion in your spare time is welcome.
>#open the existing file if the flag is set and check if there is a match
Use "True" instead of "1". A flag is a Boolean thing, and should use a Boolean
value. This lets you literally speak "true" and 'false" rather than imoplicitly
saying that "0 means false and nonzero means true".
>data = None
There is no need to initialise data here because you immediately overwrite it
>with open(log_file, 'r') as f:
> data = f.readlines()
Oh yes. Just name this variable "flag". "_is_on" is kind of implicit.
> logdata = '\n'.join(data)
Do other parts of your programme deal with the file data as lines? If not,
there is little point to reading the file and breaking it up into lines above,
then joining them together against here. Just go:
with open(log_file) as f:
log_data = f.read()
> reg = "initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc"
Normally we write regular expressions as "raw" python strings, thus:
reg = r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc'
because backslashes etc are punctuation inside normal strings. Within a "raw"
string started with r' nothing is special until the closing ' character. This
makes writing regular expressions more reliable.
Also, why the character range "[\s\S]"? That says whitespace or nonwhitespace
i.e. any character. If you want any character, just say ".".
> crc = re.findall(re.compile(reg), logdata)
It is better to compile a regexp just the once, getting a Regexp object, and
then you just use the compiled object.
> if not crc:
> raise Exception("Pattern not found in logfile")
ValueError would be a more appropriate exception here; plain old "Exception" is
> checksumbefore = crc.strip()
> checksumafter = crc.strip()
Your regexp cannot start or end with whitespace. Those .strip calls are not
doing anything for you.
This reads like you expect there to be exactly 2 matches in the file. What if
there are more or fewer?
> logging.info("checksumbefore :%s and checksumafter:%s"
> % (checksumbefore, checksumafter))
> if checksumbefore == checksumafter:
> raise Exception("checksum not macthing")
Don't you mean != here?
I wouldn't be raising exceptions in this code. Personally I would make this a
function that returns True or False. Exceptions are a poor way of returning
"status" or other values. They're really for "things that should not have
happened", hence their name.
It looks like you're scanning a log file for multiple lines and wanting to know
if successive ones change. Why not write a function like this (untested):
RE_CRC_LINE = re.compile(r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc')
old_crc_text = ''
with open(logfile) as f:
for line in f:
m = RE_CRC_LINE.match(line)
if not m:
# uninteresting line
crc_text = m.group(0)
if crc_text != old_crc_text:
# found a change
if old_crc_text == '':
# if this is really an error, you might raise this exception
# but maybe no such lines is just normal but boring
raise ValueError("no CRC lines seen in logfile %r" % (logfile,))
# found no changes
See that there is very little sanity checking. In an exception supporting
language like Python you can often write code as if it will always succeed by
using things which will raise exceptions if things go wrong. Then _outside_ the
function you can catch any exceptions that occur (such as being unable to open
the log file).
Cameron Simpson <cs at zip.com.au>