osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extract all words between two keywords in .txt file (Python)


On Thursday, 12 December 2019 02:28:09 UTC+8, Ben Bacarisse  wrote:
> A S <aishan0403 at gmail.com> writes:
> 
> > I would like to extract all words within specific keywords in a .txt
> > file. For the keywords, there is a starting keyword of "PROC SQL;" (I
> > need this to be case insensitive) and the ending keyword could be
> > either "RUN;", "quit;" or "QUIT;". This is my sample .txt file.
> >
> > Thus far, this is my code:
> >
> > with open('lan sample text file1.txt') as file:
> >     text = file.read()
> >     regex = re.compile(r'(PROC SQL;|proc sql;(.*?)RUN;|quit;|QUIT;)')
> >     k = regex.findall(text)
> >     print(k)
> 
> Try
> 
>   re.compile(r'(?si)(PROC SQL;.*(?:QUIT|RUN);)')
> 
> Read up one what (?si) means and what (?:...) means..  You can do the
> same by passing flags to the compile method.
> 
> > Output:
> >
> > [('quit;', ''), ('quit;', ''), ('PROC SQL;', '')]
> 
> Your main issue is that | binds weakly.  Your whole pattern tries to
> match any one of just four short sub-patterns:
> 
> PROC SQL;
> proc sql;(.*?)RUN;
> quit;
> QUIT;
> 
> -- 
> Ben.

Hey Ben, this works for my sample .txt file! Thanks:) but it wont work, if I have other multiple text files to parse through that, are similar but have some variations, strangely enough.