Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124290 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 422121A009C for ; Mon, 8 Jul 2024 15:44:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720453543; bh=M8evGyz4MiqauAEbIYzQfVyNg4fZMxDMcp3gWIH2CyA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=RYm6SoRzFHJbxHr08Dil4jiSYAAnl0nTMdf2gKGa1ECsoVCvvjyLZRLOS4g9yMv0H PGUhzS7IuIi4BljhjgqJCfNOx2o9KWsagLeBzE981hlpPt3GKqRqYHmusj6rRgUPLQ KEFSNsbqPA0xm0NxMo2m/hlePXu092uxRW0D2Y3uh7JgnSbIU/4/eIoMvFuLbyh5eb q1tFY8ql5Pyb76LS/Q8oU5kjF0rL2pzxkDVmc0XnMxrE/e+PSqHlYRvxN14YuznFV8 iBOUDY0nDk3TBZUfCIjjUu3FZmPfD0S0VeRyCzILXD3kWSXKZTKm54/IzqKrXfHSAz niI/+4srIBkfQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E40D61801DF for ; Mon, 8 Jul 2024 15:45:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 8 Jul 2024 15:45:42 +0000 (UTC) Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-e039be89de9so3892688276.2 for ; Mon, 08 Jul 2024 08:44:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720453457; x=1721058257; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=I9ZagsInps6Oj0mTC0cH4BLL7BdF3RpVkC7Lfsbtgqc=; b=fzXeJked+Hqr7ZttvXv0ybKrdVMe+b++83J+rJXjZv6g56mnYuC6xlN3lcjxdRzxtB YRFn7om12l8og2dGh0JyLt4H8XBhgYwitPGpUJf86XL+UEyPOFBuQnS/RoVFKnIPcPRM yfUm8x1N1N5SP3Nz3aeFa77g5hd1Qq3uRHPJlPjTHjIBLx2/3M9Eeib+1ZLTckRe22lp +mczrymy+Mxf49mAtaNmeIXsq2v6HLOjzjlm291RRDlqsRLFwAOSJc0LIjYRUIXzcwfF 9lTXIdM6kAYpIcC6pYX/v5DUxxYhcqTWnfB0UoR2fUBOsD3hr28xGUV/a49u809MfSWl AL3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720453457; x=1721058257; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=I9ZagsInps6Oj0mTC0cH4BLL7BdF3RpVkC7Lfsbtgqc=; b=fz2xcRFSXQxH4NOjCX+t4mR6y5Ztj2ElSlOKNdQKE0Cocq3Dh5BWJOsUs59daamkJ/ 813RPxiMO5nVl0s7hIl7vC2oOzrhhYWGr6C9GJ/B2PFHPnUHFYL922+DKrwHpQ8XOWg5 IPHY18kAWUTSCSMaJnLwmIMOE1Nkr6ms62R0ZPoEDdLVMfpkcMximsylcvdCUNI3LviY jYuhWbtLtJq6qLPMxotW47ugdWNY/HnebY0OeWED3Dg/T1XS+FwtqXazXFLuAEagsk0j j/mEhIpd/BBtH5vAHSlSOkPSFO2jNZsBZn3ZPTToXLaROP2H9voan3vbGP1fKWuicR0P brUw== X-Forwarded-Encrypted: i=1; AJvYcCVufmbA4F8vxpPGWSz+ATxz7ORFHhN+yn0myphAz3WKzbKbcYslin5xMlfnvhDWppRSNSzweJJzOAt6J4jXfxUUJkRfckpvlw== X-Gm-Message-State: AOJu0YwbhBHYzNzH/SvvH6hywKf30Mh22XS/4U3lgolgIMuyk+FBDoH9 Mjp4VmPJHdI6FO32iSpBGe8nbMzCMArC7obNdBD8vh+KojEsZwFWEcb4X3hr7cZ1c7/u/zId1uF tPnYs6l446p1JQ9D4+Oejz/jIOHX80W6c+w0= X-Google-Smtp-Source: AGHT+IF3i28twstMVKPEY1Usbbb3Xa99L4bOXn6Q7rCE5mMyIPDVq9AlfK5oD7J6m7D5AXD5ZtkzXD2rBm8MTkikip4= X-Received: by 2002:a5b:184:0:b0:e03:ab93:10eb with SMTP id 3f1490d57ef6-e041b0774a6mr119700276.36.1720453456652; Mon, 08 Jul 2024 08:44:16 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <76838718-E108-4A08-A88F-4965FB81E52C@gmail.com> In-Reply-To: Date: Mon, 8 Jul 2024 18:43:56 +0300 Message-ID: Subject: Re: [PHP-DEV] [RFC] Deprecations for PHP 8.4 To: Mike Schinkel Cc: Claude Pache , "Gina P. Banyard" , PHP internals Content-Type: multipart/alternative; boundary="00000000000022f1a2061cbe4910" From: drealecs@gmail.com (=?UTF-8?Q?Alexandru_P=C4=83tr=C4=83nescu?=) --00000000000022f1a2061cbe4910 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Jul 6, 2024 at 4:25=E2=80=AFAM Mike Schinkel = wrote: > > On Jul 5, 2024, at 1:11 PM, Claude Pache wrote: > > Le 25 juin 2024 =C3=A0 16:36, Gina P. Banyard a =C3= =A9crit : > https://wiki.php.net/rfc/deprecations_php_8_4 > > > * About strtok(): An exact replacement of `strtok()` that is reasonably > performant may be constructed with a sequence of strspn(...) and > strcspn(...) calls; here is an implementation using a generator in order = to > keep the state: https://3v4l.org/926tC > > > Well your modern_strtok() function is not an _exact_ replacement as it > requires using a generator and thus forces the restructure of the code th= at > calls strtok(). > > So not a drop-in =E2=80=94 search-and-replace =E2=80=94 replacement for s= trtok(). But it > is a reasonable replacement for those who are motivated to do the > restructure. > > > I looked a bit into this and, taking the idea further, let's also consider defining a StringTokenizer class: class StringTokenizer { private \Generator $tokenGenerator; public function __construct(public readonly string $string) { } public function nextToken(string $characters): string|null { if (!isset($this->tokenGenerator)) { $this->tokenGenerator =3D $this->generator($characters); return $this->tokenGenerator->current(); } return $this->tokenGenerator->send($characters); } private function generator(string $characters): \Generator { $pos =3D 0; while (true) { $pos +=3D \strspn($this->string, $characters, $pos); $len =3D \strcspn($this->string, $characters, $pos); if (!$len) return; $token =3D \substr($this->string, $pos, $len); $characters =3D yield $token; $pos +=3D $len; } } } and if we define a wrapper function: function strtok2(string $string, ?string $token =3D null): string|false { static $tokenizer =3D null; if ($token) { $tokenizer =3D new StringTokenizer($string); return $tokenizer->nextToken($token) ?? false; } if (!isset($tokenizer)) { return false; } return $tokenizer->nextToken($string) ?? false; } I think that this might be a perfect replacement. If we want, we could implement the StringTokenizer in the core, so that it would be a nice replacement. If we don't want to do this at this stage, we can completely avoid the class for now, using an anonymous class: function strtok2(string $string, ?string $token =3D null): string|false { static $tokenizer =3D null; if ($token) { $tokenizer =3D new class($string) { private \Generator $tokenGenerator; public function __construct(public readonly string $string) { } public function nextToken(string $characters): string|null { if (!isset($this->tokenGenerator)) { $this->tokenGenerator =3D $this->generator($characters)= ; return $this->tokenGenerator->current(); } return $this->tokenGenerator->send($characters); } private function generator(string $characters): \Generator { $pos =3D 0; while (true) { $pos +=3D \strspn($this->string, $characters, $pos); $len =3D \strcspn($this->string, $characters, $pos); if (!$len) return; $token =3D \substr($this->string, $pos, $len); $characters =3D yield $token; $pos +=3D $len; } } }; return $tokenizer->nextToken($token) ?? false; } if (!isset($tokenizer)) { return false; } return $tokenizer->nextToken($string) ?? false; } What do you think? Mike, would you mind benchmarking this as well to make sure it's similarly fast with the initial suggestion from Claude? I'm hoping this can be simplified further, but to get to the point, I also think we should have a userland replacement suggestion in the RFC. And, ideally, we should have a class that can replace it in PHP 9.0, similar to the above StringTokenizer. Regards, Alex --00000000000022f1a2061cbe4910 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Sat, Jul 6, 2024 at 4:25=E2=80=AFAM Mi= ke Schinkel <mike@newclarity.net<= /a>> wrote:
<= div style=3D"overflow-wrap: break-word;">
On Jul 5, 2024, at 1:11 PM, Claude Pache <claude.pache@gmail.com> wrote:
Le 25 juin 2024 =C3=A0 16= :36, Gina P. Banyard <internals@gpb.moe> a =C3=A9crit :

* About strtok(): An exact replacement of `strtok()` that i= s reasonably performant may be constructed with a sequence of strspn(...) a= nd strcspn(...) calls; here is an implementation using a generator in order= to keep the state:=C2=A0https://3v4l.org/926tC

=
Well your=C2=A0modern_strtok() function is not an _exact_ replacement = as it requires using a generator and thus forces the restructure of the cod= e that calls strtok().=C2=A0

So not a drop-in =E2= =80=94 search-and-replace =E2=80=94 replacement for strtok(). But it is a r= easonable replacement for those who are motivated to do the restructure.



I looked a bit into this and, taking the idea further, let's also co= nsider defining a=C2=A0StringTokenizer=C2=A0c= lass:

class StringTokenizer {
priva= te \Generator $toke= nGenerator;
public functi= on __construct(public readonly string $string) {
}

public function n= extToken(string $characters): string|null {
= if (!isset($this->tokenGenerator)) {
= $this->tokenGenerator =3D $this->generator= ($characters);
= return $this->tokenG= enerator->current(); }
return $this
->tokenGenerator->= send($characters);
= }

private function generator(string $characters): \Generator {
$pos =3D 0;
= while (true) {
$pos +=3D \strspn($this->string, $characters, $pos);
= $len =3D \strcspn(= $this->string, $characters, $pos);
= if (!$len)
= return;
$token =3D \substr($this->string, $pos, $len);
$characters =3D yield $token;
= $pos +=3D $len;
}
}
}
=

and if we define a wrapper function:
function st=
rtok2(string $string, ?string $token =3D null): string|false {
= static $tokenizer =3D null<= /span>;
if ($token) {
$tokenizer =3D new <= /span>StringTokenizer($string);
return $tokenizer->nextToken($token) ?? false;
}
if (!isset($tokenizer)) {
return false;
}
return $tokenizer-&= gt;nextToken($string) ?? fals= e;
}

I think tha= t this might be a perfect replacement.

If we want,= we could implement the=C2=A0StringTokenizer= =C2=A0in the core, so that it would be a nice replacement.

If we don't want to do this at this stage, we can completely a= void the class for now, using an anonymous class:
function <=
span style=3D"color:rgb(0,98,122)">strtok2(string $string,=
 ?string $token =3D null): string|false {
s= tatic $tokenizer =3D null
;
if ($token) {<= br> $tokenizer =3D new class($string) {
private \Generator $tokenGenerator;
public function __construct(public readonly string $string) {
}
= public function nextToken(string $characters): <= span style=3D"color:rgb(0,51,179)">string|null {
if (!isset($this->tokenGenerator)) {
$this->tokenG= enerator =3D $this->generator($characters);
return $this->tokenGenerator->current();
}
= return $this->tokenGenerator->send($characters);
}=
private function generator(string $characters= ): \Generator {
$pos =3D 0;=
while (true
) {
$pos +=3D \strspn($this->string, $characters, $pos);
= $len =3D \strcspn(<= span style=3D"color:rgb(102,0,0)">$this
->string, $characters= , $pos);
= if (!$len)
return;
$token =3D \
substr($this->string<= /span>, $pos, $len);
$characters =3D yie= ld $token;
= $pos +=3D $len;
}
}
= };
return $tokenizer
->nextToken($token) ?? false;
}
= if (!isset($tokenizer)= ) {
return false;
= }
return $tokenizer->nextToken($string) = ?? false;
}
=
What do you think?
Mike, would you mind benchmarking thi= s as well to make sure it's similarly fast with the initial suggestion = from Claude?

I'm hoping this=C2=A0can be s= implified further, but to get to the point, I also think we should have a u= serland replacement suggestion in the RFC.
And, ideally, we shoul= d have a class that can replace it in PHP 9.0, similar to=C2=A0the above=C2= =A0StringTokenizer.

= Regards,
Alex

--00000000000022f1a2061cbe4910--