Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123932 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id A29671A009C for <internals@lists.php.net>; Thu, 27 Jun 2024 08:22:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719476644; bh=GU4ws3hiCd/FyFW4UTiJVk1C7USjpkuaxc6Sw5rsC+s=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=fF6kClzGgC5ThtkafxZuwPdDO6cqzRwTB9HIiEuEAq3wycAygKaQqTxRBVN2WjH2k 8qEAknhNODjMMmq6253xYAl9vhaaHlwaz5UHAwnAbRcrBENG2T5eazMV9ly6AP9g27 Eq4Ie4y+qyZkZqF5cz5xRY+FKlr4UbM/kKIMhWKu81mMKvw8B6JWEEzDsN4TwGZlbd WkWWudvEUd8DonoOwo0PUHawfxtd+BBoZ29v53xa0n/KLJimsG8wU3M7QvP2VdErpV QqbzTskER+/vSW00eLz4PjWKp5YgvRyVkWNJ/Gs740CJLhoze1BnlrTiXh0kgNPZme w3lFhWXsFmkgw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9F72D180339 for <internals@lists.php.net>; Thu, 27 Jun 2024 08:24:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DMARC_MISSING, HTML_MESSAGE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: <php-lists@koalephant.com> Received: from mail1.25mail.st (mail1.25mail.st [206.123.115.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for <internals@lists.php.net>; Thu, 27 Jun 2024 08:24:03 +0000 (UTC) Received: from smtpclient.apple (unknown [49.48.245.197]) by mail1.25mail.st (Postfix) with ESMTPSA id 6FB9760401; Thu, 27 Jun 2024 08:22:37 +0000 (UTC) Message-ID: <2A60B0A8-3105-4DE5-8C8A-8BDAB330C4BC@koalephant.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B" Precedence: bulk list-help: <mailto:internals+help@lists.php.net list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net> list-post: <mailto:internals@lists.php.net> List-Id: internals.lists.php.net Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: [PHP-DEV] [RFC] Deprecations for PHP 8.4 Date: Thu, 27 Jun 2024 15:22:24 +0700 In-Reply-To: <0BBF41AF-2516-44D4-A102-73580C5ED373@newclarity.net> Cc: "Gina P. Banyard" <internals@gpb.moe>, PHP internals <internals@lists.php.net> To: Mike Schinkel <mike@newclarity.net> References: <bw20I5b7ly3lSbI-2Bv3kfrfTVJbDo5RhwBiQa1PEwuLjprDJWptPajLiaialj1RLVKu7z1j0MofJUhhRVtzT_5i2E11oKeQx_VMUxnKhUE=@gpb.moe> <E146A171-CFA6-4E3F-91AA-2ACE7710A6D9@newclarity.net> <dbGe34EpQtjyP7ja7aUHnZYwmtupxeLd7EoOv3JjQMSh_UqoMrbqo5PkxrlIiaXJePC1-TfLyyblz5QDM13OkitgBqKPuSvh28WiJFh7qJI=@gpb.moe> <B958318C-B61D-4618-BA7D-3BF204C5B3CD@newclarity.net> <uE8fty0oxYjGeo11h28dcXlCSaLwUpPfXT2msoy5eDAQRqPVFSWpG6pn1rwPHaRh9l2WWAOSJClb60LKjlur8qnsogEP1IH3za6wsEBgZFc=@gpb.moe> <0BBF41AF-2516-44D4-A102-73580C5ED373@newclarity.net> X-Mailer: Apple Mail (2.3774.600.62) From: php-lists@koalephant.com (Stephen Reay) --Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 27 Jun 2024, at 12:31, Mike Schinkel <mike@newclarity.net> wrote: >=20 >> On Jun 26, 2024, at 8:14 AM, Gina P. Banyard <internals@gpb.moe = <mailto:internals@gpb.moe>> wrote: >>=20 >>=20 >> On Wednesday, 26 June 2024 at 06:18, Mike Schinkel = <mike@newclarity.net <mailto:mike@newclarity.net>> wrote: >>> https://3v4l.org/RDYFs#v8.3.8 >>>=20 >>> Note those seven use-cases are found in around the first 25 results = when searching GitHub for "strtok(". I could probably find more if I = kept looking: >>>=20 >>> https://github.com/search?q=3Dstrtok%28+language%3APHP+&type=3Dcode >>>=20 >>> Regarding explode($delimiter, $str)[0] =E2=80=94 unless it is to be = special-cased during compilation =E2=80=94it is a really inefficient way = to find the substring up to the first character, especially for large = strings and/or when in a tight loop where the explode is contained in a = called function >>=20 >> Then use a regex: https://3v4l.org/SGWL5 >=20 > Using `preg_match()` instead of `strtok()` to process the ~4k file of = commas is, on average, same as using explode()[0], or 10x as long as = using `strtok()` (at times it got as low as 4.4x, but that was rare): >=20 > https://onlinephp.io/c/e1fad >=20 > Size of file: 3972 > Number of commas: 359 > Time taken for strtok: 0.003 seconds > Time taken for regex: 0.0307 seconds > Times strtok() faster: 10.25 >=20 >> Or a combination of strpos and substr. >=20 >=20 > Using `strpos()`+ `substr()` instead of `strtok()` to process the ~4k = file of commas is, took on average ~3x as long as using `strtok()`. I = implemented a class for this and tried to optimize it by using only = string positions and not copying the string repeatedly. It also took = about 1/2 hour to get the code working vs. about 15 seconds to get the = code working with strtok(); which will most programmers prefer? >=20 > https://onlinephp.io/c/2a09f >=20 > Size of file: 3972 > Number of commas: 359 > Time for strtok: 0.0027 seconds > Time for strpos/substr: 0.0089 seconds > Times strtok() faster: 3.31 >=20 >=20 >> There are *plenty* of solutions to the specific problem you pose = here, and thus many different solutions more or less appropriate. >=20 > Yes, and in all cases the existing solutions are significantly slower, = except one. >=20 > And that one solution that is not significantly slower is to not = deprecate `strtok()`. Not to mention not deprecating would keep from = causing lots of BC breakage. >=20 > -Mike Hi All, I do appreciate that strtok has a kind of bizarre signature/use pattern = and potential for confusion due to how subsequent calls work, but to me = that sounds like a better result for uses that need the repeated call = functionality, would be to introduce a builtin `StringTokenizer` class = that wraps the underlying strtok_r C call and uses internal state to = keep track of the string being tokenized.=20 As a "works the same" solution for grabbing the first segment of a = string up to any of the delimiter chars, could the `strpbrk` function = be expanded with a `$before_needle` arg like `strstr` has? (strstr = matches on an exact substring, not on any pf a list of characters) Cheers Stephen=20= --Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"content-type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;"><br = id=3D"lineBreakAtBeginningOfMessage"><div><br><blockquote = type=3D"cite"><div>On 27 Jun 2024, at 12:31, Mike Schinkel = <mike@newclarity.net> wrote:</div><br = class=3D"Apple-interchange-newline"><div><meta charset=3D"UTF-8"><div = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 12px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none;"><blockquote = type=3D"cite"><div>On Jun 26, 2024, at 8:14 AM, Gina P. Banyard <<a = href=3D"mailto:internals@gpb.moe">internals@gpb.moe</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div><div = class=3D"protonmail_signature_block" style=3D"font-family: Arial, = sans-serif; font-size: 14px;"><div = class=3D"protonmail_signature_block-user"><div style=3D"font-family: = Arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, = 255);"><br></div></div><div class=3D"protonmail_signature_block-proton = protonmail_signature_block-empty"></div>On Wednesday, 26 June 2024 at = 06:18, Mike Schinkel <<a = href=3D"mailto:mike@newclarity.net">mike@newclarity.net</a>> = wrote:</div><div class=3D"protonmail_quote"><blockquote = class=3D"protonmail_quote" type=3D"cite"><blockquote style=3D"margin: = 0px 0px 0px 40px; border: medium; padding: 0px;"><div><div><a = href=3D"https://3v4l.org/RDYFs#v8.3.8" rel=3D"noreferrer nofollow = noopener" = target=3D"_blank">https://3v4l.org/RDYFs#v8.3.8</a></div><div><br></div></= div></blockquote>Note those seven use-cases are found in around the = first 25 results when searching GitHub for "strtok(". I could = probably find more if I kept = looking:<br><div><div><br></div></div><blockquote style=3D"margin: 0px = 0px 0px 40px; border: medium; padding: 0px;"><div><div><a = href=3D"https://github.com/search?q=3Dstrtok%28+language%3APHP+&type=3D= code" rel=3D"noreferrer nofollow noopener" = target=3D"_blank">https://github.com/search?q=3Dstrtok%28+language%3APHP+&= amp;type=3Dcode</a></div></div></blockquote><blockquote style=3D"margin: = 0px 0px 0px 40px; border: medium; padding: = 0px;"><div><div><br></div></div></blockquote><div>Regarding = explode($delimiter, $str)[0] =E2=80=94 unless it is to be special-cased = during compilation =E2=80=94it is a really inefficient way to find the = substring up to the first character, especially for large strings and/or = when in a tight loop where the explode is contained in a called = function</div></blockquote><div style=3D"font-family: Arial, sans-serif; = font-size: 14px; background-color: rgb(255, 255, 255);"><br></div><div = style=3D"font-family: Arial, sans-serif; font-size: 14px; = background-color: rgb(255, 255, 255);">Then use a regex:<span = class=3D"Apple-converted-space"> </span><span><span><a = target=3D"_blank" rel=3D"noreferrer nofollow noopener" = href=3D"https://3v4l.org/SGWL5">https://3v4l.org/SGWL5</a><br></span></spa= n></div></div></div></blockquote><div><br></div></div><blockquote = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 12px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none; margin: 0px 0px = 0px 40px; border: medium; padding: 0px;"><div><div>Using `preg_match()` = instead of `strtok()` to process the ~4k file of commas is, on average, = same as using explode()[0], or<span = class=3D"Apple-converted-space"> </span><b>10x as long as using = `strtok()`</b><span class=3D"Apple-converted-space"> </span><i>(at = times it got as low as 4.4x, but that was = rare):</i></div></div><div><div><br></div></div><div><div><a = href=3D"https://onlinephp.io/c/e1fad">https://onlinephp.io/c/e1fad</a></di= v></div><div><div><br></div></div><div><div><div><div>Size of file: = = 3972</div></div></div></div><div><div><div><div>Number of commas: = 359</div></div></div></div><div><div><div><div>Time = taken for strtok: 0.003 = seconds</div></div></div></div><div><div><div><div>Time taken for regex: = 0.0307 seconds</div></div></div></div><div><div><div><div><b>Times = strtok() faster: 10.25</b></div></div></div></div></blockquote><div = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 12px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: = none;"><div><div><br></div></div><blockquote type=3D"cite"><div><div = class=3D"protonmail_quote"><div style=3D"font-family: Arial, sans-serif; = font-size: 14px; background-color: rgb(255, 255, 255);"><span><span>Or a = combination of strpos and<span = class=3D"Apple-converted-space"> </span><span>substr.</span></span></= span></div></div></div></blockquote></div><blockquote = style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 12px; font-style: normal; font-variant-caps: normal; font-weight: 400; = letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; text-decoration: none; margin: 0px 0px = 0px 40px; border: medium; padding: = 0px;"><div><div><br></div></div><div><div><div>Using `strpos()`+ = `substr()` instead of `strtok()` to process the ~4k file of commas is, = took on average ~3x as long as using `strtok()`. I implemented a class = for this and tried to optimize it by using only string positions and not = copying the string repeatedly. It also took about 1/2 hour to get the = code working vs. about 15 seconds to get the code working with strtok(); = which will most programmers = prefer?</div></div></div><div><div><div><i><br></i></div></div></div><div>= <div><a = href=3D"https://onlinephp.io/c/2a09f">https://onlinephp.io/c/2a09f</a></di= v></div><div><div><br></div></div><div><div><div>Size of file: = 3972</div></div></div><div><div><div>Number = of commas: 359</div></div></div><div><div><div>Time = for strtok: 0.0027 = seconds</div></div></div><div><div><div>Time for strpos/substr: 0.0089 = seconds</div></div></div><div><div><div><b>Times strtok() faster: = 3.31</b></div></div></div></blockquote><div style=3D"caret-color: = rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: = normal; font-variant-caps: normal; font-weight: 400; letter-spacing: = normal; text-align: start; text-indent: 0px; text-transform: none; = white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div><br></div><div><br></div><blockquote = type=3D"cite"><div><div class=3D"protonmail_quote"><div = style=3D"font-family: Arial, sans-serif; font-size: 14px; = background-color: rgb(255, 255, 255);"><span><span>There are *plenty* of = solutions to the specific problem you pose here, and thus many different = solutions more or less = appropriate.<br></span></span></div></div></div></blockquote><div><br></di= v></div><blockquote style=3D"caret-color: rgb(0, 0, 0); font-family: = Helvetica; font-size: 12px; font-style: normal; font-variant-caps: = normal; font-weight: 400; letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: = none; margin: 0px 0px 0px 40px; border: medium; padding: = 0px;"><div><div>Yes, and in all cases the existing solutions are = significantly slower, except = one.</div></div><div><div><br></div></div><div><div>And that one = solution that is not significantly slower is to<span = class=3D"Apple-converted-space"> </span><b><i>not<span = class=3D"Apple-converted-space"> </span></i></b>deprecate = `strtok()`. Not to mention not deprecating would keep from causing = lots of BC breakage.</div></div></blockquote><br style=3D"caret-color: = rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: = normal; font-variant-caps: normal; font-weight: 400; letter-spacing: = normal; text-align: start; text-indent: 0px; text-transform: none; = white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;"><div style=3D"caret-color: rgb(0, 0, 0); = font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: 400; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; = text-decoration: none;">-Mike</div></div></blockquote></div><br><div>Hi = All,</div><div><br></div><div>I do appreciate that strtok has a kind of = bizarre signature/use pattern and potential for confusion due to how = subsequent calls work, but to me that sounds like a better result for = uses that need the repeated call functionality, would be to introduce a = builtin `StringTokenizer` class that wraps the underlying strtok_r C = call and uses internal state to keep track of the string being = tokenized. </div><div><br></div><div><br></div><div>As a "works the = same" solution for grabbing the first segment of a string up to any of = the delimiter chars, could the `strpbrk` function be expanded with = a `$before_needle` arg like `strstr` has? (strstr matches on an exact = substring, not on any pf a list of = characters)</div><div><br></div><div><br></div><div><br></div><div><br></d= iv><div>Cheers</div><div><br></div><div>Stephen </div></body></html>= --Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B--