Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:124505
X-Original-To: internals@lists.php.net
Delivered-To: internals@lists.php.net
Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5])
	by qa.php.net (Postfix) with ESMTPS id A88D21A00B7
	for <internals@lists.php.net>; Fri, 19 Jul 2024 05:22:51 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail;
	t=1721366662; bh=PMFwzwU64fvicVx6N6IbhpeKNKnUmrjsk8Xh/eKjpY4=;
	h=Subject:To:References:From:Date:In-Reply-To:From;
	b=ZI3GlhlKR9opI0TBvn9o7beI2W5ibRgMJSrGtZisfQwi5+A9RjalpXbdSrQYTzDJR
	 XjxB5t+CAV3fix02dt3y/Ux9MSsh9XyCz53iKgGRiSyB69DHxHOBZD/OGhiOq+KQJF
	 Hcp2GcvA4UbP7l2h4SmiFF/X/AWCxVIDjQciHprWq1VhMLzx59GRaVBDzePIFZsUE+
	 FgCnz1QRVM7ad3AsTUR711hjYHZJdOl24HHRbPZ+0XTxWD8n7rXmchIaFIfZ765GAP
	 6vQMMO/IabzmtrxJiGCYU9ltziOkHptdYQZb5VXG9VHxkBcIsAbcnWugdGOZBdfWX5
	 6RP9SZLZDB/uw==
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id 22ACB180039
	for <internals@lists.php.net>; Fri, 19 Jul 2024 05:24:21 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: **
X-Spam-Status: No, score=2.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,
	HTML_MESSAGE,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,
	SPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no
	version=4.0.0
X-Spam-Virus: No
X-Envelope-From: <php-internals_nospam@adviesenzo.nl>
Received: from aye.elm.relay.mailchannels.net (aye.elm.relay.mailchannels.net [23.83.212.6])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Fri, 19 Jul 2024 05:24:20 +0000 (UTC)
X-Sender-Id: a2hosting|x-authuser|juliette@adviesenzo.nl
Received: from relay.mailchannels.net (localhost [127.0.0.1])
	by relay.mailchannels.net (Postfix) with ESMTP id 48FB490593D
	for <internals@lists.php.net>; Fri, 19 Jul 2024 05:22:48 +0000 (UTC)
Received: from nl1-ss105.a2hosting.com (unknown [127.0.0.6])
	(Authenticated sender: a2hosting)
	by relay.mailchannels.net (Postfix) with ESMTPA id 5EC10901FBB
	for <internals@lists.php.net>; Fri, 19 Jul 2024 05:22:47 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1721366567; a=rsa-sha256;
	cv=none;
	b=jAas0SVmo1pi7id3EzIhZW/r+YUTqTwmFTIx2fR2JlhXamPIADzbsOki7MnqMf/nI+dfL7
	71VQG1hpdiFqa0P9zN2eXon9U1138iL5yGnzfl9cAVmT35Etl7ntYm+GklAb02HnSpd4Yp
	kGQYP/e15vnmKucPRBCvIVdfwObaQus316g65uO7vaXwUhhkCxe47p3vrawRgt1vPyE8B5
	lFDHNM1DNjFUvYj0kk+d1foVgXoaoYEGvIpLSl0esDfixSpdhSkdls5ejESd/7U7X2akRn
	/iIwepZ5LxHe2zokxP3mmNoW9KkP1ZvhY63koqgTEjiKx4uytoZyCUi7pdZVNw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net;
	s=arc-2022; t=1721366567;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=UC5+1RZ0kD+8iq/HmTXd5H6yeeChBJMNfCXEB26aou0=;
	b=kcHAL8UBT0rgABNLobo7O9dLSSAKPTOnRjbSZTYT3qiW3/U0ZA/+Rim68Y3jbuZECb7ayC
	6s3dlgMTnuWEq81X56ZsQQOCI/iAbcxQQfJjD+zbF+vxgG0aJ1UchJEBhe+P+3X+/QnT8O
	lrbp5qexPkks+QPJIc0DDgHO5vGa6oK964iySOQj0/t+2W8SWA4fVj+JjtoC82oJT0Wuyc
	wkrmln49qmlBjnowSyRA5WKh5qQJiXLigu+oKa/25QtNjt31vXmQW/tko9wM1ExhlDggmm
	KaydsT/bxTjkO8Vb0WWY7SMiEURden62VMs1DSWNesVEPrz16Y122fAr9rKzsQ==
ARC-Authentication-Results: i=1;
	rspamd-5d9c874f6d-pnjhc;
	auth=pass smtp.auth=a2hosting
 smtp.mailfrom=php-internals_nospam@adviesenzo.nl
X-Sender-Id: a2hosting|x-authuser|juliette@adviesenzo.nl
X-MC-Relay: Neutral
X-MailChannels-SenderId: a2hosting|x-authuser|juliette@adviesenzo.nl
X-MailChannels-Auth-Id: a2hosting
X-Squirrel-Eyes: 470e881d7a96c532_1721366567922_2626616023
X-MC-Loop-Signature: 1721366567921:2044219366
X-MC-Ingress-Time: 1721366567921
Received: from nl1-ss105.a2hosting.com (nl1-ss105.a2hosting.com
 [85.187.142.69])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384)
	by 100.123.133.76 (trex/7.0.2);
	Fri, 19 Jul 2024 05:22:47 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=adviesenzo.nl; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date:
	Message-ID:From:References:To:Subject:Sender:Reply-To:Cc:
	Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:
	List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=UC5+1RZ0kD+8iq/HmTXd5H6yeeChBJMNfCXEB26aou0=; b=SAQBotLoClHoqBACSBdUv5Wg2w
	rxKPo9BJWXOgjfDOUynXjUe5uJy1YoKaSHTFdtYwTHuDW+f9z1mtkKx1ZuBu7Hzlmflkd9yQlNsTF
	X8XQXDMfIwkYRXOREbTe44sGH+iv7zGeIMnCkr7YDxjNWBeKjGaRrVnEAklMD79bSZ5M=;
Received: from mailnull by nl1-ss105.a2hosting.com with spam-scanner (Exim 4.97.1)
	(envelope-from <php-internals_nospam@adviesenzo.nl>)
	id 1sUg4r-00000005OHT-2dUY
	for internals@lists.php.net;
	Fri, 19 Jul 2024 07:22:45 +0200
X-ImunifyEmail-Filter-Info: UkNWRF9WSUFfU01UUF9BVVRIIFJDVkRfVExTX0FMTCBWRVJJ
	TE9DS19
		DQiBSQ1ZEX0NPVU5UX09ORSBCQVlFU19IQU0gTUlNRV9VTktOT1dOIE
		FSQ19OQSBNSURfUkhTX01BVENIX0ZST00gSUVfVkxfUEJMX0FDQ09VT
		lRfMDUgTUlNRV9UUkFDRSBGUk9NX0VRX0VOVkZST00gRlJPTV9IQVNf
		RE4gVE9fRE5fTk9ORSBSQ1BUX0NPVU5UX09ORSBJRV9WTF9QQkxfQUN
		DT1VOVF8wMSBUT19NQVRDSF9FTlZSQ1BUX0FMTCBfRFJVR1NfTU1fRE
		lTQ09VTlQgQVNO
X-ImunifyEmail-Filter-Action: no action
X-ImunifyEmail-Filter-Score: 0.87
X-ImunifyEmail-Filter-Version: 3.5.16/202407190044
Received: from [31.201.40.213] (port=65210 helo=[192.168.1.16])
	by nl1-ss105.a2hosting.com with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	(Exim 4.97.1)
	(envelope-from <php-internals_nospam@adviesenzo.nl>)
	id 1sUg4u-00000005OGy-3rnR
	for internals@lists.php.net;
	Fri, 19 Jul 2024 07:22:45 +0200
Subject: Re: [PHP-DEV] Request for opinions: bug vs feature - change
 intokenization of yield from
To: internals@lists.php.net
References: <66984FD0.5090805@adviesenzo.nl>
 <AM8P250MB0170FFCA0014FB9EC272DB4FE2AC2@AM8P250MB0170.EURP250.PROD.OUTLOOK.COM>
 <c9736385-94b7-4064-911f-e5fc1df7e2bd@gmx.de>
 <AM8P250MB0170B4C59B4313A565CD029AE2AC2@AM8P250MB0170.EURP250.PROD.OUTLOOK.COM>
Message-ID: <6699F817.8070806@adviesenzo.nl>
Date: Fri, 19 Jul 2024 07:22:31 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.7.0
Precedence: bulk
list-help: <mailto:internals+help@lists.php.net
list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net>
list-post: <mailto:internals@lists.php.net>
List-Id: internals.lists.php.net
x-ms-reactions: disallow
MIME-Version: 1.0
In-Reply-To: <AM8P250MB0170B4C59B4313A565CD029AE2AC2@AM8P250MB0170.EURP250.PROD.OUTLOOK.COM>
Content-Type: multipart/alternative;
 boundary="------------060003050400090207020206"
X-AuthUser: juliette@adviesenzo.nl
From: php-internals_nospam@adviesenzo.nl (Juliette Reinders Folmer)

This is a multi-part message in MIME format.
--------------060003050400090207020206
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

On 19-7-2024 1:09, Bob Weinand wrote:
> Hey Christoph,
>
>> Am 19.07.2024 um 00:51 schrieb Christoph M. Becker <cmbecker69@gmx.de>:
>>
>> Hi Bob!
>>
>> On 18.07.2024 at 15:41, Bob Weinand wrote:
>>
>>> Moreover, it can - at least - be worked around in tooling by special 
>>> casing the T_YIELD_FROM token and extracting the comment from the 
>>> raw parsed string:
>>>
>>> var_dump(token_get_all('<?php yield /* comment */ from $foo;'));
>>>
>>> will contain:
>>>
>>> [1]=> array(3) { [0]=> int(270) [1]=> string(24) "yield /* comment 
>>> */ from" [2]=> int(1) }
>>>
>>> It's not optimal, but probably the least bad solution to leave it 
>>> unchanged in PHP 8.3, have tooling special case it and properly fix 
>>> it in PHP 8.4.
>>
>> And what about "code" like <https://3v4l.org/4CLhM>?  Is Codesniffer
>> supposed to scan the result of <https://3v4l.org/dKDcs> for possible CS
>> violations?
>>
>> Cheers,
>> Christoph
>
> I suppose you mean https://3v4l.org/IMi8Y, (you missed the <?php tag).
> If you want to scan that, it's quite easy to strip the leading yield 
> and trailing from, and tokenize that again to extract all comments.
>
> Sure, it's a hack, but it'll work: https://3v4l.org/8eAiV.
>
> Bob
>

Hi Bob,

Of course, everything can be hacked around, but that still leaves the 
question what should be the "proper tokenization". Having this change in 
PHP 8.3 and then - as you suggest - yet another in PHP 8.4, makes it 
mighty hard to have a consistent token stream in tooling, especially as 
it is unclear what the "proper tokenization" should/would be.

More than anything, I find it concerning that this change sets a 
precedent for tokens to include comments.

Just as an example: what does this mean for the PHP 8.0 nullsafe object 
operator ? Should we now suddenly allow that to be written as `? 
/*comment*/ ->` ?
Or what about a cast token ? Should that be allowed to be `(string /*for 
reasons*/)` ?

Allowing this change to stay in, without having the discussion about 
what the "proper tokenization" should be, feels off and random to me and 
opens the door for more random changes.

As for the impact on tooling: a change in the tokenization of any token 
has an impact not only on tooling like PHPCS itself, but also on every 
single external standard build on top of it and is a breaking change.
To give you some perspective - for PHPCS we even went as far as to 
"undo" the PHP 8.0 tokenization of namespaced names for the time being 
(in the PHPCS 3.x releases) and we'll only change the PHPCS tokenizer to 
use the PHP 8.0 tokenization in the PHPCS 4.0 release as it would 
otherwise break too many existing sniffs. [1]

Smile,
Juliette

1: https://github.com/squizlabs/PHP_CodeSniffer/issues/3041

--------------060003050400090207020206
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 19-7-2024 1:09, Bob Weinand wrote:<br>
    </div>
    <blockquote
cite="mid:AM8P250MB0170B4C59B4313A565CD029AE2AC2@AM8P250MB0170.EURP250.PROD.OUTLOOK.COM"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">Hey Christoph,</div>
      <div dir="ltr"><br>
        <blockquote type="cite">Am 19.07.2024 um 00:51 schrieb Christoph
          M. Becker <a class="moz-txt-link-rfc2396E" href="mailto:cmbecker69@gmx.de">&lt;cmbecker69@gmx.de&gt;</a>:<br>
          <br>
        </blockquote>
      </div>
      <blockquote type="cite">
        <div dir="ltr"><span>Hi Bob!</span><br>
          <span></span><br>
          <span>On 18.07.2024 at 15:41, Bob Weinand wrote:</span><br>
          <span></span><br>
          <blockquote type="cite"><span>Moreover, it can - at least - be
              worked around in tooling by special casing the
              T_YIELD_FROM token and extracting the comment from the raw
              parsed string:</span><br>
          </blockquote>
          <blockquote type="cite"><span></span><br>
          </blockquote>
          <blockquote type="cite"><span>var_dump(token_get_all('&lt;?php
              yield /* comment */ from $foo;'));</span><br>
          </blockquote>
          <blockquote type="cite"><span></span><br>
          </blockquote>
          <blockquote type="cite"><span>will contain:</span><br>
          </blockquote>
          <blockquote type="cite"><span></span><br>
          </blockquote>
          <blockquote type="cite"><span>[1]=&gt; array(3) { [0]=&gt;
              int(270) [1]=&gt; string(24) "yield /* comment */ from"
              [2]=&gt; int(1) }</span><br>
          </blockquote>
          <blockquote type="cite"><span></span><br>
          </blockquote>
          <blockquote type="cite"><span>It's not optimal, but probably
              the least bad solution to leave it unchanged in PHP 8.3,
              have tooling special case it and properly fix it in PHP
              8.4.</span><br>
          </blockquote>
          <span></span><br>
          <span>And what about "code" like
            <a class="moz-txt-link-rfc2396E" href="https://3v4l.org/4CLhM">&lt;https://3v4l.org/4CLhM&gt;</a>?  Is Codesniffer</span><br>
          <span>supposed to scan the result of
            <a class="moz-txt-link-rfc2396E" href="https://3v4l.org/dKDcs">&lt;https://3v4l.org/dKDcs&gt;</a> for possible CS</span><br>
          <span>violations?</span><br>
          <span></span><br>
          <span>Cheers,</span><br>
          <span>Christoph</span><br>
        </div>
      </blockquote>
      <br>
      <div>I suppose you mean <a moz-do-not-send="true"
          href="https://3v4l.org/IMi8Y">https://3v4l.org/IMi8Y</a>, (you
        missed the &lt;?php tag).</div>
      <div>If you want to scan that, it's quite easy to strip the
        leading yield and trailing from, and tokenize that again to
        extract all comments.</div>
      <div><br>
      </div>
      <div>Sure, it's a hack, but it'll work: <a moz-do-not-send="true"
          href="https://3v4l.org/8eAiV">https://3v4l.org/8eAiV</a>.</div>
      <div><br>
      </div>
      <div>Bob</div>
      <div><br>
      </div>
    </blockquote>
    <br>
    Hi Bob,<br>
    <br>
    Of course, everything can be hacked around, but that still leaves
    the question what should be the "proper tokenization". Having this
    change in PHP 8.3 and then - as you suggest - yet another in PHP
    8.4, makes it mighty hard to have a consistent token stream in
    tooling, especially as it is unclear what the "proper tokenization"
    should/would be.<br>
    <br>
    More than anything, I find it concerning that this change sets a
    precedent for tokens to include comments.<br>
    <br>
    Just as an example: what does this mean for the PHP 8.0 nullsafe
    object operator ? Should we now suddenly allow that to be written as
    `? /*comment*/ -&gt;` ?<br>
    Or what about a cast token ? Should that be allowed to be `(string
    /*for reasons*/)` ?<br>
    <br>
    Allowing this change to stay in, without having the discussion about
    what the "proper tokenization" should be, feels off and random to me
    and opens the door for more random changes.<br>
    <br>
    As for the impact on tooling: a change in the tokenization of any
    token has an impact not only on tooling like PHPCS itself, but also
    on every single external standard build on top of it and is a
    breaking change.<br>
    To give you some perspective - for PHPCS we even went as far as to
    "undo" the PHP 8.0 tokenization of namespaced names for the time
    being (in the PHPCS 3.x releases) and we'll only change the PHPCS
    tokenizer to use the PHP 8.0 tokenization in the PHPCS 4.0 release
    as it would otherwise break too many existing sniffs. [1]<br>
    <br>
    Smile,<br>
    Juliette<br>
    <br>
    1: <a class="moz-txt-link-freetext" href="https://github.com/squizlabs/PHP_CodeSniffer/issues/3041">https://github.com/squizlabs/PHP_CodeSniffer/issues/3041</a><br>
  </body>
</html>

--------------060003050400090207020206--