[RFC] Loosening heredoc/nowdoc scanner

10 years ago by Stas Malyshev — view source

unread

Hi!

I would like to propose a few changes to our heredoc / nowdoc scanner to make it less awkward to use inside other constructs.

https://wiki.php.net/rfc/heredoc-scanner-loosening

Let me know your thoughts :)

With this proposal, you will not be able to use the delimiter inside the
text at the beginning of the line, which is a BC break and may be a
problem for some code. I'm not sure saving one variable assignment, at
the expense of making scripts less readable and breaking BC, is really
worth it. How often you need an array of heredocs or concatenate two
heredocs? How often it is a good idea, readability-wise?

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Tjerk Meesters — view source

unread

Hi Stas,

Hi!

I would like to propose a few changes to our heredoc / nowdoc scanner to make it less awkward to use inside other constructs.

https://wiki.php.net/rfc/heredoc-scanner-loosening

Let me know your thoughts :)

With this proposal, you will not be able to use the delimiter inside the
text at the beginning of the line, which is a BC break and may be a
problem for some code.

Yes, that’s also mentioned in the RFC; although it would happen more easily when restrictions are completely removed, there’s a slim chance it would happen with the limited set of valid terminators. That said, I would like to quote from the RFC itself:

[…] it should be noted that the developer is in complete control of choosing the name for their enclosures; it's important to choose an enclosure that doesn't occur naturally inside the quotation.

I'm not sure saving one variable assignment, at
the expense of making scripts less readable and breaking BC, is really
worth it. How often you need an array of heredocs or concatenate two
heredocs?

There’s no real objective measure with which I can answer such questions :)

The closest I could come to a rebuttal is if there’s no real need to make the syntax so restrictive, why not make it less restrictive?

How often it is a good idea, readability-wise?

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by johannes@schlueters.de — view source

unread

The closest I could come to a rebuttal is if there’s no real need to
make the syntax so restrictive, why not make it less restrictive?

This is a good criterion when adding a new syntax. A change to existing
syntax features needs further arguments. Mind that such changes not only
impact BC for existing scripts which in this case can be detected
easily, but also has to be reflected in tools like IDEs which have to
highlight it correctly based on version and developers have to learn
this (need to read long heredocs with more attention as the terminator
might hide more and developers have to be aware that when using the new
form their application won't work anymore on older platforms, this is
more clear with distinct new features)

Overall I'm +0.25 on this change - the current limitation annoyed me
often enough, but the cost of such a change (see above) is too high
compared to the tiny win.

johannes

10 years ago by Stas Malyshev — view source

unread

Hi!

There’s no real objective measure with which I can answer such
questions :)

The closest I could come to a rebuttal is if there’s no real need to
make the syntax so restrictive, why not make it less restrictive?

"Why not" is usually not a very good reason for a change in the language
syntax. There is, however, a reason why it is restrictive - so that
there would be less chance for the end tag to collide with the actual
text being heredoc'ed, and so that the end of the text would be clearly
demarcated (since the text itself, being taken verbatim, can not be
properly indented/delimited within the text).

My belief is that the change have positive value of "changing something
for the better minus changing something for the worse" and so far I'm
not really convinced as of now that this change has it, especially given
the BC break potential.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Tjerk Meesters — view source

unread

Hi!

There’s no real objective measure with which I can answer such
questions :)

The closest I could come to a rebuttal is if there’s no real need to
make the syntax so restrictive, why not make it less restrictive?

"Why not" is usually not a very good reason for a change in the language
syntax. There is, however, a reason why it is restrictive - so that
there would be less chance for the end tag to collide with the actual
text being heredoc'ed, and so that the end of the text would be clearly
demarcated (since the text itself, being taken verbatim, can not be
properly indented/delimited within the text).

I agree, but I can see how this argument is going in circles; my point (and that of Nikita) is that if the enclosure naturally occurs within the quotation, it’s a bad enclosure and a better one should be picked. The rule of requiring a newline directly following the closing delimiter is more of a hindrance than it is helpful imo.

My belief is that the change have positive value of "changing something
for the better minus changing something for the worse" and so far I'm
not really convinced as of now that this change has it, especially given
the BC break potential.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Nikita Popov — view source

unread

On Sat, Aug 30, 2014 at 2:33 AM, Tjerk Meesters tjerk.meesters@gmail.com
wrote:

Hi internals,

I would like to propose a few changes to our heredoc / nowdoc scanner to
make it less awkward to use inside other constructs.

https://wiki.php.net/rfc/heredoc-scanner-loosening

Let me know your thoughts :)

+1 on removing heredoc/nowdoc restrictions. Having to add a newline after
the terminator is very clumsy. We acknowledge this fact by special casing
the semicolon, but don't follow through by just removing the requirement.

If we allow people to choose their own terminator, they can damn well
choose one that causes no conflicts. I mean, what else is the point of this
feature?

On a technical note, if you go for not removing the restriction entirely,
then all characters that are not valid heredoc labels should be allowed
afterwards. E.g. your current list misses ) for use in a function call and
likely any other number of characters.

Nikita

10 years ago by Crypto Compress — view source

unread

Hi,

Manual: "Also, the identifier must follow the same naming rules as any
other label in PHP: it must contain only alphanumeric characters and
underscores, and must start with a non-digit character or underscore."

RFC: "Ends a quotation when the closing identifier is followed by a
newline or any of these characters: space, tab, period (concat), comma,
semicolon, closing parentheses, closing square bracket (arrays) and null
byte (end of file)."

Is "Loosened restrictions" listing equal !(alphanumeric characters and
underscores) as described in manual for "label"?

cryptocompress

10 years ago by Robert Williams — view source

unread

If the syntax of heredocs/nowdocs is to be loosened, the biggest aspect I’d like to see addressed is indentation. I can certainly see how it would be nice to loosen the restrictions around the post-closing-token newline to allow easier use in places like array definitions, but, I’ve never run into that problem myself. I have, however, been annoyed by the indentation limitations, so much so that I could probably count with my fingers the number of times that I’ve used the construct in the couple million lines of PHP I’ve written — even though I want to use it about once a week. I just find the side-effect on code formatting, when used in a container structure (class, function, method, loop, whatever), to be more than I can handle. Look at this simple example from the docs to see what I mean:

http://php.net/manual/en/language.types.string.php#example-89

One of the key benefits of consistent indentation is that it allows very rapid visual navigation of code by nearly eliminating the need for reading until you’ve zoned in on the right section of code. To illustrate this very well, just try to visually identify the code structure in this old-school BASIC code:

http://www.atariarchives.org/basicgames/showpage.php?page=3

That this restriction warrants italicized text inside a pink warning box in the docs suggests both that lots of people bump into it and that it runs counter to the expectations of the language.

Now, perhaps I’ve not felt constrained by the newline restriction simply because I so rarely get to use the construct at all because of the indentation restriction. That’s actually probably the case. So that idea is important to address, but I think it’s pointless to address without also addressing indentation.

What if we could do something like this:

function foo() {
$string = <<<
THEEND
This is the document text. Any
whitespace appearing in a column
that’s before the starting token
is automatically ignored for all
lines.
THEEND;

A few particulars:

If the starting token appears immediately adjacent to the <<< sequence, then parsing is done according to existing rules (perhaps with the changes in the RFC). This both maintains BC and avoids issues where more or less whitespace is ignored when, for example, the variable is renamed.
Extending the original proposal, the closing token can be indented without concern. Align it with the starting token, with the assignment line, whatever.
The closing token could not appear within the text.

I’ve not given this solution deep thought, so I’m sure there are problems I’m missing. But if there’s a good solution to the indentation restrictions, I think it would be a huge win. And with the AST-based parser, there may be solutions that are possible now that were previously impossible, which makes this a good time to reconsider the problem.

--
Bob Williams
SVP, Software Development
Newtek Business Services, Inc.
“The Small Business Authority”
http://www.thesba.com/

Notice: This communication, including attachments, may contain information that is confidential. It constitutes non-public information intended to be conveyed only to the designated recipient(s). If the reader or recipient of this communication is not the intended recipient, an employee or agent of the intended recipient who is responsible for delivering it to the intended recipient, or if you believe that you have received this communication in error, please notify the sender immediately by return e-mail and promptly delete this e-mail, including attachments without reading or saving them in any manner. The unauthorized use, dissemination, distribution, or reproduction of this e-mail, including attachments, is prohibited and may be unlawful. If you have received this email in error, please notify us immediately by e-mail or telephone and delete the e-mail and the attachments (if any).

[RFC] Loosening heredoc/nowdoc scanner

My belief is that the change have positive value of "changing something for the better minus changing something for the worse" and so far I'm not really convinced as of now that this change has it, especially given the BC break potential.

My belief is that the change have positive value of "changing something for the better minus changing something for the worse" and so far I'm not really convinced as of now that this change has it, especially given the BC break potential.

My belief is that the change have positive value of "changing something
for the better minus changing something for the worse" and so far I'm
not really convinced as of now that this change has it, especially given
the BC break potential.

My belief is that the change have positive value of "changing something
for the better minus changing something for the worse" and so far I'm
not really convinced as of now that this change has it, especially given
the BC break potential.