Sat, 20 Jul 2024 17:43:04 +0200 Subject: Re: [PHP-DEV] Request for opinions: bug vs feature - change intokenization of yield from To: References: <> <> <> Message-ID: <> Date: Sat, 20 Jul 2024 17:42:56 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 Precedence: bulk list-help: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 In-Reply-To: <> Content-Type: multipart/alternative; boundary="------------050008020602030606080801" X-AuthUser: From: (Juliette Reinders Folmer) This is a multi-part message in MIME format. --------------050008020602030606080801 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 20-7-2024 16:51, Tim Düsterhus wrote: > Hi > > On 7/19/24 07:22, Juliette Reinders Folmer wrote: >> More than anything, I find it concerning that this change sets a >> precedent for tokens to include comments. >> >> Just as an example: what does this mean for the PHP 8.0 nullsafe object >> operator ? Should we now suddenly allow that to be written as `? >> /*comment*/ ->` ? >> Or what about a cast token ? Should that be allowed to be `(string /*for >> reasons*/)` ? > > The difference between `yield from` and `?->` is that the former looks > and feels like it would be two separate keywords, because of the > *required* whitespace between the `yield` and the `from`. The fact > that a `yield` keyword actually exists also contributes to that. `?->` > on the other hand looks and feels like a single operator, just like > `++`, `!==`, `<=>` and others. > > Except for `yield from` the rule where comments may be placed as far > as I can tell is "comments may appear where whitespace may appear", > which is easy enough to explain and understand. So it makes sense to > allow for comments between `yield` and `from`, but I agree that > ideally those would be emitted as separate tokens. Tim, "comments may appear where whitespace may appear" ? You'd think so, except it isn't true. I already mentioned cast tokens before. Whitespace is perfectly acceptable within the parentheses of these. Comments are not: and Now you may argue that cast tokens "feel like" a single operator, but that's subjective and there's even a sniff to enforce no spacing within cast parentheses as apparently people do pad them with spaces - and doing so is allowed in PHP. * _* Qualifier: spaces and tabs are allowed inside cast parentheses, but new lines are not..._ Along the same lines and I'm beginning to repeat myself, the PHP 8.0 RFC which changed the tokenization of namespaced names explicitly disallowed whitespace and comments _inside_ namespaced names tokenized as a single token, while in the previous, multi-token situation, whitespace and comments were allowed in namespaced names. So to get back to my original point, as of PHP 8.3 is the **only** token which allows for a comment to be tokenized as part of the token. There is no other token which allows that. Whitespace is one thing, comments is a different matter. And even the whitespace is an interesting one as I've seen bug reports in PHPCS about a sniff breaking on `yield from` with a new line and indentation between the keywords. PHP allows this, the sniff in question does not handle it correctly. So, what "feels" natural (whitespace-wise) to one person may not be the same for the next, but comments _within_ tokens is different thing and should in my opinion, not be allowed. Smile, Juliette --------------050008020602030606080801 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
On 20-7-2024 16:51, Tim Düsterhus wrote:

On 7/19/24 07:22, Juliette Reinders Folmer wrote:
More than anything, I find it concerning that this change sets a
precedent for tokens to include comments.

Just as an example: what does this mean for the PHP 8.0 nullsafe object
operator ? Should we now suddenly allow that to be written as `?
/*comment*/ ->` ?
Or what about a cast token ? Should that be allowed to be `(string /*for
reasons*/)` ?

The difference between `yield from` and `?->` is that the former looks and feels like it would be two separate keywords, because of the *required* whitespace between the `yield` and the `from`. The fact that a `yield` keyword actually exists also contributes to that. `?->` on the other hand looks and feels like a single operator, just like `++`, `!==`, `<=>` and others.

Except for `yield from` the rule where comments may be placed as far as I can tell is "comments may appear where whitespace may appear", which is easy enough to explain and understand. So it makes sense to allow for comments between `yield` and `from`, but I agree that ideally those would be emitted as separate tokens.


"comments may appear where whitespace may appear" ?

You'd think so, except it isn't true.

I already mentioned cast tokens before. Whitespace is perfectly acceptable within the parentheses of these. Comments are not: and

Now you may argue that cast tokens "feel like" a single operator, but that's subjective and there's even a sniff to enforce no spacing within cast parentheses as apparently people do pad them with spaces - and doing so is allowed in PHP. *
_* Qualifier: spaces and tabs are allowed inside cast parentheses, but new lines are not..._

Along the same lines and I'm beginning to repeat myself, the PHP 8.0 RFC which changed the tokenization of namespaced names explicitly disallowed whitespace and comments _inside_ namespaced names tokenized as a single token, while in the previous, multi-token situation, whitespace and comments were allowed in namespaced names.

So to get back to my original point, as of PHP 8.3 is the **only** token which allows for a comment to be tokenized as part of the token. There is no other token which allows that.

Whitespace is one thing, comments is a different matter. And even the whitespace is an interesting one as I've seen bug reports in PHPCS about a sniff breaking on `yield from` with a new line and indentation between the keywords. PHP allows this, the sniff in question does not handle it correctly.

So, what "feels" natural (whitespace-wise) to one person may not be the same for the next, but comments _within_ tokens is different thing and should in my opinion, not be allowed.

