Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123932 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id A29671A009C for ; Thu, 27 Jun 2024 08:22:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719476644; bh=GU4ws3hiCd/FyFW4UTiJVk1C7USjpkuaxc6Sw5rsC+s=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=fF6kClzGgC5ThtkafxZuwPdDO6cqzRwTB9HIiEuEAq3wycAygKaQqTxRBVN2WjH2k 8qEAknhNODjMMmq6253xYAl9vhaaHlwaz5UHAwnAbRcrBENG2T5eazMV9ly6AP9g27 Eq4Ie4y+qyZkZqF5cz5xRY+FKlr4UbM/kKIMhWKu81mMKvw8B6JWEEzDsN4TwGZlbd WkWWudvEUd8DonoOwo0PUHawfxtd+BBoZ29v53xa0n/KLJimsG8wU3M7QvP2VdErpV QqbzTskER+/vSW00eLz4PjWKp5YgvRyVkWNJ/Gs740CJLhoze1BnlrTiXh0kgNPZme w3lFhWXsFmkgw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9F72D180339 for ; Thu, 27 Jun 2024 08:24:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DMARC_MISSING, HTML_MESSAGE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail1.25mail.st (mail1.25mail.st [206.123.115.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 27 Jun 2024 08:24:03 +0000 (UTC) Received: from smtpclient.apple (unknown [49.48.245.197]) by mail1.25mail.st (Postfix) with ESMTPSA id 6FB9760401; Thu, 27 Jun 2024 08:22:37 +0000 (UTC) Message-ID: <2A60B0A8-3105-4DE5-8C8A-8BDAB330C4BC@koalephant.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B" Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: [PHP-DEV] [RFC] Deprecations for PHP 8.4 Date: Thu, 27 Jun 2024 15:22:24 +0700 In-Reply-To: <0BBF41AF-2516-44D4-A102-73580C5ED373@newclarity.net> Cc: "Gina P. Banyard" , PHP internals To: Mike Schinkel References: <0BBF41AF-2516-44D4-A102-73580C5ED373@newclarity.net> X-Mailer: Apple Mail (2.3774.600.62) From: php-lists@koalephant.com (Stephen Reay) --Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 27 Jun 2024, at 12:31, Mike Schinkel wrote: >=20 >> On Jun 26, 2024, at 8:14 AM, Gina P. Banyard > wrote: >>=20 >>=20 >> On Wednesday, 26 June 2024 at 06:18, Mike Schinkel = > wrote: >>> https://3v4l.org/RDYFs#v8.3.8 >>>=20 >>> Note those seven use-cases are found in around the first 25 results = when searching GitHub for "strtok(". I could probably find more if I = kept looking: >>>=20 >>> https://github.com/search?q=3Dstrtok%28+language%3APHP+&type=3Dcode >>>=20 >>> Regarding explode($delimiter, $str)[0] =E2=80=94 unless it is to be = special-cased during compilation =E2=80=94it is a really inefficient way = to find the substring up to the first character, especially for large = strings and/or when in a tight loop where the explode is contained in a = called function >>=20 >> Then use a regex: https://3v4l.org/SGWL5 >=20 > Using `preg_match()` instead of `strtok()` to process the ~4k file of = commas is, on average, same as using explode()[0], or 10x as long as = using `strtok()` (at times it got as low as 4.4x, but that was rare): >=20 > https://onlinephp.io/c/e1fad >=20 > Size of file: 3972 > Number of commas: 359 > Time taken for strtok: 0.003 seconds > Time taken for regex: 0.0307 seconds > Times strtok() faster: 10.25 >=20 >> Or a combination of strpos and substr. >=20 >=20 > Using `strpos()`+ `substr()` instead of `strtok()` to process the ~4k = file of commas is, took on average ~3x as long as using `strtok()`. I = implemented a class for this and tried to optimize it by using only = string positions and not copying the string repeatedly. It also took = about 1/2 hour to get the code working vs. about 15 seconds to get the = code working with strtok(); which will most programmers prefer? >=20 > https://onlinephp.io/c/2a09f >=20 > Size of file: 3972 > Number of commas: 359 > Time for strtok: 0.0027 seconds > Time for strpos/substr: 0.0089 seconds > Times strtok() faster: 3.31 >=20 >=20 >> There are *plenty* of solutions to the specific problem you pose = here, and thus many different solutions more or less appropriate. >=20 > Yes, and in all cases the existing solutions are significantly slower, = except one. >=20 > And that one solution that is not significantly slower is to not = deprecate `strtok()`. Not to mention not deprecating would keep from = causing lots of BC breakage. >=20 > -Mike Hi All, I do appreciate that strtok has a kind of bizarre signature/use pattern = and potential for confusion due to how subsequent calls work, but to me = that sounds like a better result for uses that need the repeated call = functionality, would be to introduce a builtin `StringTokenizer` class = that wraps the underlying strtok_r C call and uses internal state to = keep track of the string being tokenized.=20 As a "works the same" solution for grabbing the first segment of a = string up to any of the delimiter chars, could the `strpbrk` function = be expanded with a `$before_needle` arg like `strstr` has? (strstr = matches on an exact substring, not on any pf a list of characters) Cheers Stephen=20= --Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 27 Jun 2024, at 12:31, Mike Schinkel = <mike@newclarity.net> wrote:

On Jun 26, 2024, at 8:14 AM, Gina P. Banyard <internals@gpb.moe> = wrote:


On Wednesday, 26 June 2024 at = 06:18, Mike Schinkel <mike@newclarity.net> = wrote:
Note those seven use-cases are found in around the = first 25 results when searching GitHub for "strtok(".  I could = probably find more if I kept = looking:


Regarding = explode($delimiter, $str)[0] =E2=80=94 unless it is to be special-cased = during compilation =E2=80=94it is a really inefficient way to find the = substring up to the first character, especially for large strings and/or = when in a tight loop where the explode is contained in a called = function

Then use a regex: https://3v4l.org/SGWL5

Using `preg_match()` = instead of `strtok()` to process the ~4k file of commas is, on average, = same as using explode()[0], or 10x as long as using = `strtok()` (at = times it got as low as 4.4x, but that was = rare):


Size of file: =         =  3972
Number of commas: =      359
Time = taken for strtok: 0.003 = seconds
Time taken for regex: =  0.0307 seconds
Times = strtok() faster: 10.25

Or a = combination of strpos and substr.

Using `strpos()`+ = `substr()` instead of `strtok()` to process the ~4k file of commas is, = took on average ~3x as long as using `strtok()`. I implemented a class = for this and tried to optimize it by using only string positions and not = copying the string repeatedly. It also took about 1/2 hour to get the = code working vs. about 15 seconds to get the code working with strtok(); = which will most programmers = prefer?

=

Size of file:   =         3972
Number = of commas:       359
Time = for strtok:        0.0027 = seconds
Time for strpos/substr: 0.0089 = seconds
Times strtok() faster: =  3.31


There are *plenty* of = solutions to the specific problem you pose here, and thus many different = solutions more or less = appropriate.

Yes, and in all cases the existing solutions are = significantly slower, except = one.

And that one = solution that is not significantly slower is to not deprecate = `strtok()`.  Not to mention not deprecating would keep from causing = lots of BC breakage.

-Mike

Hi = All,

I do appreciate that strtok has a kind of = bizarre signature/use pattern and potential for confusion due to how = subsequent calls work, but to me that sounds like a better result for = uses that need the repeated call functionality, would be to introduce a = builtin `StringTokenizer` class that wraps the underlying strtok_r C = call and uses internal state to keep track of the string being = tokenized. 


As a "works the = same" solution for grabbing the first segment of a string up to any of = the delimiter chars, could the  `strpbrk` function be expanded with = a `$before_needle` arg like `strstr` has? (strstr matches on an exact = substring, not on any pf a list of = characters)




Cheers

Stephen 
= --Apple-Mail=_B9D390A6-CC5A-45CB-8CD8-B61B24F6B01B--