Parsing Nested List
On Sun, 04 Feb 2018 14:26:10 -0800, Stanley Denman wrote:
> I am trying to parse a Python nested list that is the result of the
> getOutlines() function of module PyPFD2 using pyparsing module.
pyparsing parses strings, not lists.
I fear that you have completely misunderstood what pyparsing does: it
isn't a general-purpose parser of arbitrary Python objects like lists.
Like most parsers (actually, all parsers that I know of...) it takes text
as input and produces some sort of machine representation:
So your code is not working because you are calling parseString() with a
The name of the function, parseString(), should have been a hint that it
requires a *string* as argument.
You have generated an outline:
List = pdfReader.getOutlines()
but do you know what the format of that list is? I'm going to assume that
it looks something like this:
['ABCD 01 of 99', 'EFGH 02 of 99', 'IJKL 03 of 99', ...]
since that matches the template you gave to pyparsing. Notice that:
- words are separated by spaces;
- the first word is any arbitrary word, made up of just letters;
- followed by EXACTLY two digits;
- followed by the word "of";
- followed by EXACTLY two digits.
Furthermore, I'm assuming it is a simple, non-nested list. If that is not
the case, you will need to explain precisely what the format of the
outline actually is.
To parse this list is simple and pyparsing is not required:
for item in List:
words = item.split()
if len(words) != 4:
raise ValueError('bad input data: %r' % item)
first, number, x, total = words
number = int(number)
assert x == 'of'
total = int(total)
print(first, number, total)
Hope this helps.
(Please keep any replies on the list.)