Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:123932
X-Original-To: internals@lists.php.net
Delivered-To: internals@lists.php.net
Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5])
	by qa.php.net (Postfix) with ESMTPS id A29671A009C
	for <internals@lists.php.net>; Thu, 27 Jun 2024 08:22:46 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail;
	t=1719476644; bh=GU4ws3hiCd/FyFW4UTiJVk1C7USjpkuaxc6Sw5rsC+s=;
	h=From:Subject:Date:In-Reply-To:Cc:To:References:From;
	b=fF6kClzGgC5ThtkafxZuwPdDO6cqzRwTB9HIiEuEAq3wycAygKaQqTxRBVN2WjH2k
	 8qEAknhNODjMMmq6253xYAl9vhaaHlwaz5UHAwnAbRcrBENG2T5eazMV9ly6AP9g27
	 Eq4Ie4y+qyZkZqF5cz5xRY+FKlr4UbM/kKIMhWKu81mMKvw8B6JWEEzDsN4TwGZlbd
	 WkWWudvEUd8DonoOwo0PUHawfxtd+BBoZ29v53xa0n/KLJimsG8wU3M7QvP2VdErpV
	 QqbzTskER+/vSW00eLz4PjWKp5YgvRyVkWNJ/Gs740CJLhoze1BnlrTiXh0kgNPZme
	 w3lFhWXsFmkgw==
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id 9F72D180339
	for <internals@lists.php.net>; Thu, 27 Jun 2024 08:24:03 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DMARC_MISSING,
	HTML_MESSAGE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no
	autolearn_force=no version=4.0.0
X-Spam-Virus: Error (Cannot connect to unix socket
	'/var/run/clamav/clamd.ctl': connect: Connection refused)
X-Envelope-From: <php-lists@koalephant.com>
Received: from mail1.25mail.st (mail1.25mail.st [206.123.115.54])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Thu, 27 Jun 2024 08:24:03 +0000 (UTC)
Received: from smtpclient.apple (unknown [49.48.245.197])
	by mail1.25mail.st (Postfix) with ESMTPSA id 6FB9760401;
	Thu, 27 Jun 2024 08:22:37 +0000 (UTC)
Message-ID: <2A60B0A8-3105-4DE5-8C8A-8BDAB330C4BC@koalephant.com>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B"
Precedence: bulk
list-help: <mailto:internals+help@lists.php.net
list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net>
list-post: <mailto:internals@lists.php.net>
List-Id: internals.lists.php.net
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\))
Subject: Re: [PHP-DEV] [RFC] Deprecations for PHP 8.4
Date: Thu, 27 Jun 2024 15:22:24 +0700
In-Reply-To: <0BBF41AF-2516-44D4-A102-73580C5ED373@newclarity.net>
Cc: "Gina P. Banyard" <internals@gpb.moe>,
 PHP internals <internals@lists.php.net>
To: Mike Schinkel <mike@newclarity.net>
References: <bw20I5b7ly3lSbI-2Bv3kfrfTVJbDo5RhwBiQa1PEwuLjprDJWptPajLiaialj1RLVKu7z1j0MofJUhhRVtzT_5i2E11oKeQx_VMUxnKhUE=@gpb.moe>
 <E146A171-CFA6-4E3F-91AA-2ACE7710A6D9@newclarity.net>
 <dbGe34EpQtjyP7ja7aUHnZYwmtupxeLd7EoOv3JjQMSh_UqoMrbqo5PkxrlIiaXJePC1-TfLyyblz5QDM13OkitgBqKPuSvh28WiJFh7qJI=@gpb.moe>
 <B958318C-B61D-4618-BA7D-3BF204C5B3CD@newclarity.net>
 <uE8fty0oxYjGeo11h28dcXlCSaLwUpPfXT2msoy5eDAQRqPVFSWpG6pn1rwPHaRh9l2WWAOSJClb60LKjlur8qnsogEP1IH3za6wsEBgZFc=@gpb.moe>
 <0BBF41AF-2516-44D4-A102-73580C5ED373@newclarity.net>
X-Mailer: Apple Mail (2.3774.600.62)
From: php-lists@koalephant.com (Stephen Reay)


--Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8



> On 27 Jun 2024, at 12:31, Mike Schinkel <mike@newclarity.net> wrote:
>=20
>> On Jun 26, 2024, at 8:14 AM, Gina P. Banyard <internals@gpb.moe =
<mailto:internals@gpb.moe>> wrote:
>>=20
>>=20
>> On Wednesday, 26 June 2024 at 06:18, Mike Schinkel =
<mike@newclarity.net <mailto:mike@newclarity.net>> wrote:
>>> https://3v4l.org/RDYFs#v8.3.8
>>>=20
>>> Note those seven use-cases are found in around the first 25 results =
when searching GitHub for "strtok(".  I could probably find more if I =
kept looking:
>>>=20
>>> https://github.com/search?q=3Dstrtok%28+language%3APHP+&type=3Dcode
>>>=20
>>> Regarding explode($delimiter, $str)[0] =E2=80=94 unless it is to be =
special-cased during compilation =E2=80=94it is a really inefficient way =
to find the substring up to the first character, especially for large =
strings and/or when in a tight loop where the explode is contained in a =
called function
>>=20
>> Then use a regex: https://3v4l.org/SGWL5
>=20
> Using `preg_match()` instead of `strtok()` to process the ~4k file of =
commas is, on average, same as using explode()[0], or 10x as long as =
using `strtok()` (at times it got as low as 4.4x, but that was rare):
>=20
> https://onlinephp.io/c/e1fad
>=20
> Size of file:          3972
> Number of commas:      359
> Time taken for strtok: 0.003 seconds
> Time taken for regex:  0.0307 seconds
> Times strtok() faster: 10.25
>=20
>> Or a combination of strpos and substr.
>=20
>=20
> Using `strpos()`+ `substr()` instead of `strtok()` to process the ~4k =
file of commas is, took on average ~3x as long as using `strtok()`. I =
implemented a class for this and tried to optimize it by using only =
string positions and not copying the string repeatedly. It also took =
about 1/2 hour to get the code working vs. about 15 seconds to get the =
code working with strtok(); which will most programmers prefer?
>=20
> https://onlinephp.io/c/2a09f
>=20
> Size of file:           3972
> Number of commas:       359
> Time for strtok:        0.0027 seconds
> Time for strpos/substr: 0.0089 seconds
> Times strtok() faster:  3.31
>=20
>=20
>> There are *plenty* of solutions to the specific problem you pose =
here, and thus many different solutions more or less appropriate.
>=20
> Yes, and in all cases the existing solutions are significantly slower, =
except one.
>=20
> And that one solution that is not significantly slower is to not =
deprecate `strtok()`.  Not to mention not deprecating would keep from =
causing lots of BC breakage.
>=20
> -Mike

Hi All,

I do appreciate that strtok has a kind of bizarre signature/use pattern =
and potential for confusion due to how subsequent calls work, but to me =
that sounds like a better result for uses that need the repeated call =
functionality, would be to introduce a builtin `StringTokenizer` class =
that wraps the underlying strtok_r C call and uses internal state to =
keep track of the string being tokenized.=20


As a "works the same" solution for grabbing the first segment of a =
string up to any of the delimiter chars, could the  `strpbrk` function =
be expanded with a `$before_needle` arg like `strstr` has? (strstr =
matches on an exact substring, not on any pf a list of characters)




Cheers

Stephen=20=

--Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"><br =
id=3D"lineBreakAtBeginningOfMessage"><div><br><blockquote =
type=3D"cite"><div>On 27 Jun 2024, at 12:31, Mike Schinkel =
&lt;mike@newclarity.net&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div><meta charset=3D"UTF-8"><div =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: 400; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;"><blockquote =
type=3D"cite"><div>On Jun 26, 2024, at 8:14 AM, Gina P. Banyard &lt;<a =
href=3D"mailto:internals@gpb.moe">internals@gpb.moe</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div><div =
class=3D"protonmail_signature_block" style=3D"font-family: Arial, =
sans-serif; font-size: 14px;"><div =
class=3D"protonmail_signature_block-user"><div style=3D"font-family: =
Arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, =
255);"><br></div></div><div class=3D"protonmail_signature_block-proton =
protonmail_signature_block-empty"></div>On Wednesday, 26 June 2024 at =
06:18, Mike Schinkel &lt;<a =
href=3D"mailto:mike@newclarity.net">mike@newclarity.net</a>&gt; =
wrote:</div><div class=3D"protonmail_quote"><blockquote =
class=3D"protonmail_quote" type=3D"cite"><blockquote style=3D"margin: =
0px 0px 0px 40px; border: medium; padding: 0px;"><div><div><a =
href=3D"https://3v4l.org/RDYFs#v8.3.8" rel=3D"noreferrer nofollow =
noopener" =
target=3D"_blank">https://3v4l.org/RDYFs#v8.3.8</a></div><div><br></div></=
div></blockquote>Note those seven use-cases are found in around the =
first 25 results when searching GitHub for "strtok(". &nbsp;I could =
probably find more if I kept =
looking:<br><div><div><br></div></div><blockquote style=3D"margin: 0px =
0px 0px 40px; border: medium; padding: 0px;"><div><div><a =
href=3D"https://github.com/search?q=3Dstrtok%28+language%3APHP+&amp;type=3D=
code" rel=3D"noreferrer nofollow noopener" =
target=3D"_blank">https://github.com/search?q=3Dstrtok%28+language%3APHP+&=
amp;type=3Dcode</a></div></div></blockquote><blockquote style=3D"margin: =
0px 0px 0px 40px; border: medium; padding: =
0px;"><div><div><br></div></div></blockquote><div>Regarding =
explode($delimiter, $str)[0] =E2=80=94 unless it is to be special-cased =
during compilation =E2=80=94it is a really inefficient way to find the =
substring up to the first character, especially for large strings and/or =
when in a tight loop where the explode is contained in a called =
function</div></blockquote><div style=3D"font-family: Arial, sans-serif; =
font-size: 14px; background-color: rgb(255, 255, 255);"><br></div><div =
style=3D"font-family: Arial, sans-serif; font-size: 14px; =
background-color: rgb(255, 255, 255);">Then use a regex:<span =
class=3D"Apple-converted-space">&nbsp;</span><span><span><a =
target=3D"_blank" rel=3D"noreferrer nofollow noopener" =
href=3D"https://3v4l.org/SGWL5">https://3v4l.org/SGWL5</a><br></span></spa=
n></div></div></div></blockquote><div><br></div></div><blockquote =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: 400; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none; margin: 0px 0px =
0px 40px; border: medium; padding: 0px;"><div><div>Using `preg_match()` =
instead of `strtok()` to process the ~4k file of commas is, on average, =
same as using explode()[0], or<span =
class=3D"Apple-converted-space">&nbsp;</span><b>10x as long as using =
`strtok()`</b><span class=3D"Apple-converted-space">&nbsp;</span><i>(at =
times it got as low as 4.4x, but that was =
rare):</i></div></div><div><div><br></div></div><div><div><a =
href=3D"https://onlinephp.io/c/e1fad">https://onlinephp.io/c/e1fad</a></di=
v></div><div><div><br></div></div><div><div><div><div>Size of file: =
&nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;3972</div></div></div></div><div><div><div><div>Number of commas: =
&nbsp; &nbsp; &nbsp;359</div></div></div></div><div><div><div><div>Time =
taken for strtok: 0.003 =
seconds</div></div></div></div><div><div><div><div>Time taken for regex: =
&nbsp;0.0307 seconds</div></div></div></div><div><div><div><div><b>Times =
strtok() faster: 10.25</b></div></div></div></div></blockquote><div =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: 400; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: =
none;"><div><div><br></div></div><blockquote type=3D"cite"><div><div =
class=3D"protonmail_quote"><div style=3D"font-family: Arial, sans-serif; =
font-size: 14px; background-color: rgb(255, 255, 255);"><span><span>Or a =
combination of strpos and<span =
class=3D"Apple-converted-space">&nbsp;</span><span>substr.</span></span></=
span></div></div></div></blockquote></div><blockquote =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: 400; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none; margin: 0px 0px =
0px 40px; border: medium; padding: =
0px;"><div><div><br></div></div><div><div><div>Using `strpos()`+ =
`substr()` instead of `strtok()` to process the ~4k file of commas is, =
took on average ~3x as long as using `strtok()`. I implemented a class =
for this and tried to optimize it by using only string positions and not =
copying the string repeatedly. It also took about 1/2 hour to get the =
code working vs. about 15 seconds to get the code working with strtok(); =
which will most programmers =
prefer?</div></div></div><div><div><div><i><br></i></div></div></div><div>=
<div><a =
href=3D"https://onlinephp.io/c/2a09f">https://onlinephp.io/c/2a09f</a></di=
v></div><div><div><br></div></div><div><div><div>Size of file: &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; 3972</div></div></div><div><div><div>Number =
of commas: &nbsp; &nbsp; &nbsp; 359</div></div></div><div><div><div>Time =
for strtok: &nbsp; &nbsp; &nbsp; &nbsp;0.0027 =
seconds</div></div></div><div><div><div>Time for strpos/substr: 0.0089 =
seconds</div></div></div><div><div><div><b>Times strtok() faster: =
&nbsp;3.31</b></div></div></div></blockquote><div style=3D"caret-color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant-caps: normal; font-weight: 400; letter-spacing: =
normal; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><div><br></div><div><br></div><blockquote =
type=3D"cite"><div><div class=3D"protonmail_quote"><div =
style=3D"font-family: Arial, sans-serif; font-size: 14px; =
background-color: rgb(255, 255, 255);"><span><span>There are *plenty* of =
solutions to the specific problem you pose here, and thus many different =
solutions more or less =
appropriate.<br></span></span></div></div></div></blockquote><div><br></di=
v></div><blockquote style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 12px; font-style: normal; font-variant-caps: =
normal; font-weight: 400; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none; margin: 0px 0px 0px 40px; border: medium; padding: =
0px;"><div><div>Yes, and in all cases the existing solutions are =
significantly slower, except =
one.</div></div><div><div><br></div></div><div><div>And that one =
solution that is not significantly slower is to<span =
class=3D"Apple-converted-space">&nbsp;</span><b><i>not<span =
class=3D"Apple-converted-space">&nbsp;</span></i></b>deprecate =
`strtok()`. &nbsp;Not to mention not deprecating would keep from causing =
lots of BC breakage.</div></div></blockquote><br style=3D"caret-color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant-caps: normal; font-weight: 400; letter-spacing: =
normal; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><div style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;">-Mike</div></div></blockquote></div><br><div>Hi =
All,</div><div><br></div><div>I do appreciate that strtok has a kind of =
bizarre signature/use pattern and potential for confusion due to how =
subsequent calls work, but to me that sounds like a better result for =
uses that need the repeated call functionality, would be to introduce a =
builtin `StringTokenizer` class that wraps the underlying strtok_r C =
call and uses internal state to keep track of the string being =
tokenized.&nbsp;</div><div><br></div><div><br></div><div>As a "works the =
same" solution for grabbing the first segment of a string up to any of =
the delimiter chars, could the &nbsp;`strpbrk` function be expanded with =
a `$before_needle` arg like `strstr` has? (strstr matches on an exact =
substring, not on any pf a list of =
characters)</div><div><br></div><div><br></div><div><br></div><div><br></d=
iv><div>Cheers</div><div><br></div><div>Stephen&nbsp;</div></body></html>=

--Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B--