Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123854 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 0133A1A009C for ; Wed, 26 Jun 2024 05:18:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719379180; bh=LUMU59WrXCoOBcscw4k8Ar+4cKMhLK4vQaReAk5iLpg=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=CemZgimTD/tBmV66xa4soTCbBpsc2Xap0iKr+nESlnQFS6KsKVbZo2pLQ8XEMhwzb RgmNkYvouVk36W+PlLA4IYE32JRap3/nM/AfnS9AKLTFlycFr1r5wETOT2A2dFcbuo ivgOg9c0JBw1XSsnVqT74spxoh4WH9gvpMLp1TUDaFklbLLmpoQZAKlY2e08hy09Qh 5Tp87PPxauPyE9sKfv4EB4l0X4Yb8637etmruuGZhyRWVm+9fkHPaCw20XK+po7lot j1dcd0qpF8WAgBSzeFlwQrnO9R+zu7EoXnZDpC/us/S9FjcLUeifUTgLa2IkoCmD+2 iDnZT62Cy77mA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 615CD180038 for ; Wed, 26 Jun 2024 05:19:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: ** X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_50,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DMARC_MISSING,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 26 Jun 2024 05:19:36 +0000 (UTC) Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-e02b571b0f6so6144723276.0 for ; Tue, 25 Jun 2024 22:18:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20230601.gappssmtp.com; s=20230601; t=1719379099; x=1719983899; darn=lists.php.net; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=sxdsNbZ33goS8jlBVEekV+2r1dTN8VfQPwQPrA/Sav4=; b=tzn7NlD9ZTx3pXRS7kynY0zK+6X0wKyR0LHD/H35S5dlIeNOiSGcTtaUoBJtxGFrgI ECiggrTktf8eE4Ums9MG57ceZVUYvC6WbHS6snBZ0lbiBH/9dGIFfAke+2ovOa2Z5vzA zPOx0U/MZklD7pJrCMQMKfAiYyxlqEhYg8QJfSIR8Am3/gSxjotPhtRTN2P7dfyQWA6N xSneESvcDYZwp6WwsoQTU0wIZd8OOB0GXUolKF14ejklMBataw7nmxOX1L0pvjljf3cU DVbrR4xhgv2ycey9a6WRAMT0bhY4g05vOqI+s+7qzlWWgq8Jdm4eXKw36Lg0LwoQo6Nh MqEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719379099; x=1719983899; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sxdsNbZ33goS8jlBVEekV+2r1dTN8VfQPwQPrA/Sav4=; b=akM2nyZCAfxchDXkKKVsv+1RPj6hsEjGoGSsqhP2OTB1VqKEhOOMHDJvBQCdXkcE5K deYwmh8iZNE2blzrU9O9vmraIhlaermottfLXh94EsmwDsjlprWgfhrN4YUEfXFFChE7 hTwTO0lioYcry7cUUgbIfLnUVFGChDIM2aPAArvuONTwhWZ/TFdbWo9aspx3vgjuuoLc 5RKH93m97eeulbeT4CpmnBLcN5u89OFqSafU9qkpsK4LZBd9z/aJ/2qpDB6FQeZMMg7e E3UOtfNFpZBQsXktVeqMOheioNljafywz96tzemf5Z8xdEJclDs9XVufOy52YqxitU2B WIfg== X-Gm-Message-State: AOJu0Yx0far1JH9bEybs8AW7dIhsKWmFv5ajTthPkSlx7jWg9kaQj3v5 ZvUFoTjnzINdpP3zEdCOqapI1mUU3JfekHx0JtXiAU3CkLWfVQ5AQqFtX2vWQSoKIY5zqAxqKQJ 59yo= X-Google-Smtp-Source: AGHT+IHp4dd1+z6qgxScUR/oHmlg0WNoQ3IccMXPZbYiW/hwyvPNoafpl9TEuBv1dUSwc4et/Qgsiw== X-Received: by 2002:a25:d68f:0:b0:dff:1020:6f31 with SMTP id 3f1490d57ef6-e0303ff8138mr9228172276.45.1719379098456; Tue, 25 Jun 2024 22:18:18 -0700 (PDT) Received: from smtpclient.apple (c-98-252-216-111.hsd1.ga.comcast.net. [98.252.216.111]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e02e6116e96sm4186020276.1.2024.06.25.22.18.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2024 22:18:17 -0700 (PDT) Message-ID: Content-Type: multipart/alternative; boundary="Apple-Mail=_2229B685-7B27-4A16-8299-FDA92B42E316" Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.8\)) Subject: Re: [PHP-DEV] [RFC] Deprecations for PHP 8.4 Date: Wed, 26 Jun 2024 01:18:16 -0400 In-Reply-To: Cc: PHP internals To: "Gina P. Banyard" References: X-Mailer: Apple Mail (2.3696.120.41.1.8) From: mike@newclarity.net (Mike Schinkel) --Apple-Mail=_2229B685-7B27-4A16-8299-FDA92B42E316 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 25, 2024, at 4:51 PM, Gina P. Banyard = wrote: >=20 >=20 > On Tuesday, 25 June 2024 at 19:06, Mike Schinkel = wrote: >>=20 >> strtok() >> =3D=3D=3D=3D=3D >> strtok() is found 35k times in GitHub: >>=20 >> https://github.com/search?q=3Dstrtok%28+language%3APHP+&type=3Dcode = >>=20 >> It is a commonly used as a "left part of string up to a character" in = addition to its intended use for tokenizing. =20 >>=20 >> I would prefer not deprecated because of BC breakage, but IF it is = deprecated I would suggest adding a one-for-one replacement function for = the "left part of string up to a character" use-case; maybe = `str_left("abc.txt",".")` returning `"abc"`. >=20 >=20 > For this exact case of extracting a file name without an extension, = you should really just use: > pathinfo($filepath, PATHINFO_FILENAME); > But for something more generic, you can just do: > explode($delimiter, $str)[0]; >=20 > So I really don't see why we would need an "str_left()" function. Ah, the dangers of providing a specific example of a broader use-case is = that someone will invariably discredit the specific example instead of = focusing on the applicability for the broader use-case. =F0=9F=A4=A6=E2=80= =8D=E2=99=82=EF=B8=8F To wit, here are seven (7) use-cases for which `pathinfo()` is not a = viable alternative: https://3v4l.org/RDYFs#v8.3.8 Note those seven use-cases are found in around the first 25 results when = searching GitHub for "strtok(". I could probably find more if I kept = looking: https://github.com/search?q=3Dstrtok%28+language%3APHP+&type=3Dcode = Regarding explode($delimiter, $str)[0] =E2=80=94 unless it is to be = special-cased during compilation =E2=80=94it is a really inefficient way = to find the substring up to the first character, especially for large = strings and/or when in a tight loop where the explode is contained in a = called function. Here is a benchmark (https://onlinephp.io/c/87341) showing that =E2=80=94 = on average of the runs I performed =E2=80=94 for using `strtok()` to = fully process through a 3972 byte file with 359 commas it took right at = 90 times longer using explode($delimiter, $str)[0] vs. = strtok($str,$delimiter). Imagine is the file were 39,720 bytes, or = larger, instead.=20 Size of file: 3972 Number of commas: 359 Time taken for strtok: 0.0034 seconds Time taken for explode: 0.3036 seconds Times strtok() faster: 89.1 Yes the above processes the entire file using explode()[0] each time = rather than first using explode(",") once =E2=80=94 because of the = equivalent of the N+1 problem[1] where the explode() is buried in a = function. This illustrates why strtok() is so good for its primary = use-case of parsing text files. strtok() is fast and does not use heaps = of memory on every token.=20 This leads me to think `strtok()` should not be deprecated given how = inefficient string handling in PHP can otherwise be, at least not = without a much more efficient object for string parsing. -Mike [1] https://www.baeldung.com/cs/orm-n-plus-one-select-problem = --Apple-Mail=_2229B685-7B27-4A16-8299-FDA92B42E316 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On = Jun 25, 2024, at 4:51 PM, Gina P. Banyard <internals@gpb.moe> = wrote:


On Tuesday, 25 June 2024 at 19:06, Mike Schinkel <mike@newclarity.net>= wrote:

strtok()
=3D=3D=3D=3D=3D
strtok() is found 35k = times in GitHub:

https://github.com/search?q=3Dstrtok%28+language%3APHP+&= amp;type=3Dcode

It is a commonly used as a "left = part of string up to a character" in addition to its intended use for = tokenizing.  

I would prefer not deprecated because of BC = breakage, but IF it is deprecated I would suggest adding a one-for-one = replacement function for the  "left part of string up to a = character" use-case; maybe `str_left("abc.txt",".")` returning = `"abc"`.

For this exact case of extracting a file name = without an extension, you should really just use:
pathinfo($filepath, =
PATHINFO_FILENAME);
But for something more generic, = you can just do:
explode($delimiter, $str)[0];

So I really don't see why we = would need an "str_left()" function.

Ah, the dangers of providing a = specific example of a broader use-case is that someone will = invariably discredit the specific example instead of focusing on the = applicability for the broader use-case. =F0=9F=A4=A6=E2=80=8D=E2=99=82=EF=B8= =8F

To wit, here are seven (7) = use-cases for which `pathinfo()` is not a viable = alternative:

Note those seven use-cases are found = in around the first 25 results when searching GitHub for "strtok(". =  I could probably find more if I kept looking:


Regarding explode($delimiter, = $str)[0] =E2=80=94 unless it is to be special-cased during compilation = =E2=80=94it is a really inefficient way to find the substring up to the = first character, especially for large strings and/or when in a tight = loop where the explode is contained in a called function.

Here is a benchmark (https://onlinephp.io/c/87341) showing that =E2=80=94 on = average of the runs I performed =E2=80=94 for using `strtok()` to fully = process through a 3972 byte file with 359 commas it took right at 90 times longer using = explode($delimiter, $str)[0] vs. strtok($str,$delimiter). Imagine is the = file were 39,720 bytes, or larger, instead. 

Size of file:   =              3972
Number of = commas:            359
Time = taken for strtok:       0.0034 seconds
Time = taken for explode:      0.3036 seconds
Times strtok() faster:     = 89.1

Yes the above processes the entire file using = explode()[0] each time rather than first using explode(",") once =E2=80=94= because of the equivalent of the N+1 problem[1] where the explode() is = buried in a function. This illustrates why strtok() is so good for its = primary use-case of parsing text files. strtok() is fast and does not = use heaps of memory on every token. 

This leads me to think `strtok()` should not be deprecated given how inefficient string = handling in PHP can otherwise be, at least not without a much more = efficient object for string parsing.

-Mike
[1] https://www.baeldung.com/cs/orm-n-plus-one-select-problem


= --Apple-Mail=_2229B685-7B27-4A16-8299-FDA92B42E316--