Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124303 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id CC7961A009C for ; Tue, 9 Jul 2024 03:50:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720497131; bh=8Xu6cmLnRwo+Kq1Z04cB6tz71vkgtwKPduhGl1mzuUU=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=KmvVWVLbHi3Z9tC1u9xmYVthJSsDDkcpbtrmvUEhEMx/je02lmK+BATGfjCnZh74w hk4zAYJfXtTpa2gQais0xojAW3s+LDhAK11EiqXGl9G5eWULTcZSsLqYyliFdx630H D1Zk8jtP+hnMjKOKQfKAhBXFfoQOpO2LwS5Dpn/baFDDPqZ4nKCuNMhrYPiiChvxjM t5VtxbQquNznvMAOmPojFtS0W0rJHBenf9F20gfYmSGQChipioAn0duVycpTm806P0 Kq5qiKgW2MW2ECcdQwHtCt/NMCep0yfc3EYIxLi0luQGTX6HSwQH49Xl8QTCVRnjX9 c4A9pPBIh7MEw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id DE1E1180084 for ; Tue, 9 Jul 2024 03:52:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DMARC_MISSING,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 9 Jul 2024 03:52:10 +0000 (UTC) Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-65651296d91so19058237b3.3 for ; Mon, 08 Jul 2024 20:50:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20230601.gappssmtp.com; s=20230601; t=1720497045; x=1721101845; darn=lists.php.net; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=IqHkIbp/6+PRwwy/E7wINu680nVLDbah68vf5j4upro=; b=LAOEhXkQba6NbqO4Rn+Iij6DRABj8N0laCA/J86ylrMHPYGqjtwZABlnMQRMtHXUfm Eq1y2ifCQOoQ6qD+ioA1D1ShdP4RDitf6JxDBORM2FsgjIUft7GaLU+NYnUzNS7DuLpA DVIKSqNsDH7MCWXBFham0nN5j6eDFqjge5KCEPX0h1OT/R8FfHS25tEv84J2YlPgEiY5 kyFG/mSIiobX6aPcVarqomPcjqbtvLDM1hwqwjDjid3YVz0F6zGX83JvGG3b2DocNnB3 d0h0trJS2wLsQZPiFCgl1pvSjy2+1Eja50YANsv+TQrEhQN8acQ/PrZkCLvGWIL5s2bW xyFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720497045; x=1721101845; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IqHkIbp/6+PRwwy/E7wINu680nVLDbah68vf5j4upro=; b=kUKzelLA7HqzhCo/j+b/zUm5gXmcPZeFf95mB5b3ePp7HWXA7ndSDIpkdwyUosy+WQ U+S6mC6vNNeXXSprhNGexebUYTgxiGCp1gPLcRoNLE9S6HvDuybuhhhi+4d26X3r7jiJ txCOAKwD1NtWEX3jipsH7maj92i624azaaPRRUOz0aXeZoqpIsixx6wjNO/75pjZnt0j ibOovsOtvtcA1yvGbPHZOzqJNfbI40kc6YRVVPiVMA3gcECKPLHhm/B1R5wtUoQd6ypN ZM26om1JHGsUw5rhzmMdeGc0gvRkrcJWinMXtgHyW9ypsnhxiUzxRn0w5grhkcu3n4kI WkfA== X-Forwarded-Encrypted: i=1; AJvYcCVQiNQPTR1iUckBLJZ6/31USM2zE2tPwk/rgAFIj7KAALoUWve+K7YHmstoihwPEK64ljPtzOnXYrVDF3MtDb8epDNigTVdKw== X-Gm-Message-State: AOJu0YyHebtLCRpi88rjdJV4F47MtZ/OEPk4ZrxWudM2nseSImF7dQNZ DFqWmdyiY2ha4mhqDRK+S93eo3r8aa2ujT46QDruqZ/VqBU0eZmC/Z2Psb+po0ZLVLv1N7L5Aff IrqQ= X-Google-Smtp-Source: AGHT+IEpYerRu5vIz2jaGsUg1o39/t8o19KvH7Xk/AzGpgf7O+uAnMpN4biFiz/Y0kADePMKz6HyEQ== X-Received: by 2002:a81:8486:0:b0:649:8f00:5254 with SMTP id 00721157ae682-658ee791033mr17049707b3.1.1720497044568; Mon, 08 Jul 2024 20:50:44 -0700 (PDT) Received: from smtpclient.apple (c-98-252-216-111.hsd1.ga.comcast.net. [98.252.216.111]) by smtp.gmail.com with ESMTPSA id 00721157ae682-658e4d2a627sm2241827b3.38.2024.07.08.20.50.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Jul 2024 20:50:43 -0700 (PDT) Message-ID: Content-Type: multipart/alternative; boundary="Apple-Mail=_F2FE2A11-1188-4DD3-B796-D767E35AEDC1" Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.8\)) Subject: Re: [PHP-DEV] [RFC] Deprecations for PHP 8.4 Date: Mon, 8 Jul 2024 23:50:41 -0400 In-Reply-To: Cc: "Gina P. Banyard" , PHP internals To: =?utf-8?Q?Alexandru_P=C4=83tr=C4=83nescu?= , Claude Pache References: <76838718-E108-4A08-A88F-4965FB81E52C@gmail.com> X-Mailer: Apple Mail (2.3696.120.41.1.8) From: mike@newclarity.net (Mike Schinkel) --Apple-Mail=_F2FE2A11-1188-4DD3-B796-D767E35AEDC1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jul 8, 2024, at 12:03 PM, Alexandru P=C4=83tr=C4=83nescu = wrote: > Managed to simplify it like this and I find it reasonable enough: > function strtok2(string $string, ?string $token =3D null): = string|false { > static $tokenGenerator =3D null; > if ($token) { > $tokenGenerator =3D (function(string $characters) use = ($string): \Generator { > $pos =3D 0; > while (true) { > $pos +=3D \strspn($string, $characters, $pos); > $len =3D \strcspn($string, $characters, $pos); > if ($len =3D=3D=3D 0) > return; > $token =3D \substr($string, $pos, $len); > $characters =3D yield $token; > $pos +=3D $len; > } > })($token); > return $tokenGenerator->current() ?? false; > } > return $tokenGenerator?->send($string) ?? false; > } Hi Alexandru, Great attempt.=20 Unfortunately, however, it seems around 4.5 slower than strtok(): https://3v4l.org/7lXlM#v8.3.9 > On Jul 8, 2024, at 2:23 PM, Claude Pache = wrote: >> Le 6 juil. 2024 =C3=A0 03:22, Mike Schinkel a = =C3=A9crit : >>> On Jul 5, 2024, at 1:11 PM, Claude Pache > wrote: >>> * About strtok(): An exact replacement of `strtok()` that is = reasonably performant may be constructed with a sequence of strspn(...) = and strcspn(...) calls; here is an implementation using a generator in = order to keep the state: https://3v4l.org/926tC >> Well your modern_strtok() function is not an _exact_ replacement as = it requires using a generator and thus forces the restructure of the = code that calls strtok().=20 >=20 > Yes, of course, I meant: it has the exact same semantics. You cannot = have the same API without keeping global state somewhere. If you use = strtok() for what it was meant for, you must restructure your code if = you want to eliminate hidden global state. Hi Claude, Agreed that semantics would have to change. Somewhat.=20 The reason I made the comment was when I saw you stated it was an "exact = replacement" I was concern people not paying close attention to the = thread may see it and and think: "Oh, okay, there is an exact, drop-in = replacement so I will vote to deprecate" when that same person might not = vote to deprecate if they did not think there was an exact drop-in = replacement. But I did my best to try to soften my words so it did not = come off as accusatory and instead just matter-of-fact. If I failed at = that, I apologize. Anyway, your comments about needing to change the semantics got me = thinking that addressing the concern when remediating code with strtok() = could be much closer to a drop in replacement than a generator, assuming = there is a will to actually tackle this. And this it small enough scope = that I might even be able to learn enough C-for-PHP to create a pull = request, if the idea were blessed. Consider this simple code for using `strtok()`: $token =3D strtok($content, ','); while ($token !=3D=3D false) { $token =3D strtok (','); } Now compare to this potential enhancement: $handle=3Dstrtok($content, ',', STRTOK_INIT); do { $token =3D strtok($handle); } while ($token !=3D=3D false); strtok($handle, STRTOK_RELEASE) This would be much closer to a drop-in replacement, would allow PHP to = keep the fast strtok() function, AND would address the reason for = deprecation. See any reason this approach would not be viable? -Mike --Apple-Mail=_F2FE2A11-1188-4DD3-B796-D767E35AEDC1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On = Jul 8, 2024, at 12:03 PM, Alexandru P=C4=83tr=C4=83nescu <drealecs@gmail.com> = wrote:
Managed to simplify it like this = and I find it reasonable enough:
function strtok2(string $string, ?string $token =3D null): string|false {
= static $tokenGenerator =3D null;
= if ($token) {
= $tokenGenerator = =3D (function(string $characters) use ($string): \Generator {
$pos = =3D 0;
while (true) {
$pos +=3D \strspn($string, $characters, $pos);
= $len =3D = \strcspn($string, $characters, $pos);
= if = ($len =3D=3D=3D= 0)
= return;
$token =3D \substr($string, $pos, $len);
= $characters = =3D yield = $token;
$pos +=3D $len;
}
= })($token);
return = $tokenGenerator->current() ?? false;
}
return $tokenGenerator?->send($string) ?? false;
}
Hi = Alexandru,

Great = attempt. 

Unfortunately, = however, it seems around 4.5 slower than strtok():



On Jul 8, 2024, at 2:23 PM, Claude Pache = <claude.pache@gmail.com> wrote:
Le 6 juil. = 2024 =C3=A0 03:22, Mike Schinkel <mike@newclarity.net>= a =C3=A9crit :
On Jul 5, 2024, at 1:11 PM, = Claude Pache <claude.pache@gmail.com> wrote:
* About strtok(): = An exact replacement of `strtok()` that is reasonably performant may be = constructed with a sequence of strspn(...) and strcspn(...) calls; here = is an implementation using a generator in order to keep the = state: https://3v4l.org/926tC

Well = your modern_strtok() function is not an _exact_ replacement as it = requires using a generator and thus forces the restructure of the code = that calls strtok(). 

Yes, of course, I meant: it has the exact same semantics. You = cannot have the same API without keeping global state somewhere. If you = use strtok() for what it was meant for, you must restructure your code = if you want to eliminate hidden global = state.

Hi = Claude,

Agreed that = semantics would have to change. Somewhat. 

The reason I made the comment was when = I saw you stated it was an "exact replacement" I was concern people not = paying close attention to the thread may see it and and think: "Oh, = okay, there is an exact, drop-in replacement so I will vote to = deprecate" when that same person might not vote to deprecate if they did = not think there was an exact drop-in replacement. But I did my best to = try to soften my words so it did not come off as accusatory and instead = just matter-of-fact. If I failed at that, I apologize.

Anyway, your comments = about needing to change the semantics got me thinking that addressing = the concern when remediating code with strtok() could be much closer to = a drop in replacement than a generator, assuming there is a will to = actually tackle this. And this it small enough scope that I might even = be able to learn enough C-for-PHP to create a pull request, if the idea = were blessed.

Consider this simple code for using `strtok()`:

$token =3D= strtok($content, ',');
while ($token !=3D=3D = false) {
$token =3D strtok = (',');
}

Now compare to this potential = enhancement:

$handle=3Dstrtok($content, ',', STRTOK_INIT);
do  {
$token =3D = strtok($handle);
} while ($token !=3D=3D = false);
strtok($handle, = STRTOK_RELEASE)

This would be much closer to a drop-in replacement, would = allow PHP to keep the fast strtok() function, AND would address the = reason for deprecation.

See any reason this approach would not be viable?

-Mike

= --Apple-Mail=_F2FE2A11-1188-4DD3-B796-D767E35AEDC1--