osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

"Data blocks" syntax specification draft


> -----Original Message-----
> 
> I think it would be appropriate to propose an alternative to TQS for this
> specific purposes. Namely for making it easier to implement parsers and
> embedded syntaxes.
> 
> So what do I have now with triple quoted strings - a simple example:
> 
> if 1:
>     s = """\
>     print ("\n") \\
>         foo = 5
>     """
> 
> So there is a _possibility_ in the sense it is possible to do, so let's say I have a
> lib with a parser, etc. Though now a developer and a user will face quite real
> issues:
> 
> - TQS itself has its specific purpose already in many contents,
>   which may mean for example hard-coded syntax highlighting
> - there a lot of things happening here: e.g. in the above example
>   I use "\n" which I assume a part of string, or \\ - but it is interpreted.
>   Maybe some other things regarding escaping. This particular
>   issue maybe a blocker for making use of TQS in some data cases,
>   Say if the target source text need these very characters.
> 

Yup, I can see this, I do use """ in a number of ways, often to comment out large chunks of code. (OK, I probably should not, but I do).

> - indentation is the part of TQS. That is of couse by design
>   so and it's quite logical, though it is hard-coded behaviour and thus
>   does not make the presentation a natural part of blocks containing
>   this string.
> - appearance: imagine you have some small chunks of embedded
>   code parts and you will still have the closing """ everywhere -
>   that would be really hairy.
> 
> 

And yup, that does cause some challenges sometimes.

> 
> Explanation:
> [here i'll use same symbol /// for the data entry point, but of course it can be
> changed if a better idea comes later. Also for now, just for simplicity - the rule
> is that the contents of a block starts always on the new line.
> 
> So, e.g. this:
> 
> data = /// s4
>     first line
>     last line
> the rest python code
> 
> - will parse the block and knock out leading 4 spaces.
> i.e. if the first line has 5 leading spaces then 1 space will be left in the string.
> Block parsing terminates when the next line does not satisfy the indent
> sequence (4 spaces in this case).
> Another obvious type: tabs:

OK, I CAN see this as a potentially useful suggestion.  There are a number of times where I would like to define a large chunk of text, but using tqs and having it suddenly move to the left is painful visually.  Right now, I tend to either a) do it anyway, b) do it in a separate module and import the variables, or c) do it and parse the string to remove the extra spaces.

Personally though, I would not hard code it to knock out 4 leading spaces.   I would have it handle spaces the same was that the existing parser does, if there are 4 spaces indending the next line, then it removes 4 spaces, if there are 6 spaces, it removes 6 spaces, etc... ignoring additional spaces within the data-string object.  Once it hits a line that has the same number if indenting spaces as the initial token, the data-string object is finished.

> 
> data = /// t1
>     first line
>     last line
> the rest python code
> 
> Will do the same but with one tabstop character.
> 

Tabs / spaces should be handled as normal (up to the data-string object starts, after which, it pulls off the first x tabs or spaces, and leaves anything else) 

> Actually that's it!
> Some further ideas:
> 
> data = /// ts
> - "any whitespace" (mimic current Python behaviour)
> 
> data = /// s        # or
> data = /// t
> - simply count amount of spaces (tabs) from first
>   line and proceed, otherwise terminate.
> 
> data = /// "???"
> ??? abc foo bar
> ???
> 
> - defines indent character by string: crazy idea but why not.
> 

Nope, don't like this one... It's far enough from Python normal that it seems unlikely to not get through, and (personally at least), I struggle to see the benefit.

> Language  parameter, e.g.:
> data = /// t1."yaml"
> 
> -this can be reserved for future usage by code analysis tools or dynamic
> syntax highlighting.
> 

I can see where this might be interesting, but again, I just don't see the need, if the spec returns a string, you can use that string in any parser you want. If you want to customize how it's handled, then you can always create a custom object for it.

> That's just a rough specification.
> 
> What should it give as result:
> 

To me, this seems like a simply additional specification for a TQS, with the only enhancement being that it's an indented TQS basically, so the return is a string.

> 1. No clash with current TQS rules - less worries
>   about reserved characters.
> 
> 2. Built-in indentation parsing parameter makes it more or
>   less natural continuation of Python blocks and is char-precise,
>   which is very important here.
> 
> 3. Independent of the indent of containing block!
> 
> 4. Parameter descriptor can be developed in such manner
>    that it allows more customisation and additions in the future.
> 

I would not argue about this being in the spec, but it seems like a un-needed complexity.

> 
> Does seem to be more generalized problem-solving here.
>