Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123582 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 6ECB11A009C for ; Tue, 11 Jun 2024 14:38:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1718116771; bh=V1wGrXOY8w7NAB+Z6sDnZpOhUp/bAOL/IeOEdsobG9U=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ntxFWrxm3mYxoqK9eI2SKe/bs1eXMjgDZ0vK+NkwFrjcKfQtozbGCwNO9ZxJ0gTcv qC+/bEllu7rN27TVbXELURGBArrEIWQr1HO5ANigSZJgltIbpHXUQihk82nwCYyNKD I5L2orJy8fHP7qAEltcfIjYokrI3AV0prxRyxfseCI9MB8MpsXsReZOddmmUv5+tMf 58SmnvUTU1+xKVRgkEm+pGEA1PKml2h7Y9Ee5NOgX3VltdZvry8ZWr1caUFZmiMqVe e7WpqUZZRsdx0mp1d3aHzPraNaHb+mcjP1kKlD5YImv/v8tjMlNzphot7DSFQ51nbn IWCeMtZKYHueQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4516D180994 for ; Tue, 11 Jun 2024 14:39:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 11 Jun 2024 14:39:30 +0000 (UTC) Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-421cd1e5f93so8585105e9.0 for ; Tue, 11 Jun 2024 07:38:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718116701; x=1718721501; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=V1wGrXOY8w7NAB+Z6sDnZpOhUp/bAOL/IeOEdsobG9U=; b=QHDdRdYq1TSRrPiVlcb/OnrSp5wu2mLH4yZoHPYNf3bKLj3jRp9KPirvdTJlZ+i3e2 hhf2bHcY3jTt347tq4j6kVfACYqEIANNJPlmE9/GHWqHZqc5SIKZYtueqUKb6xsxqd89 wrhXJOwKibXdOqTj34mw8eh2wt89Cpm0tkDX06EDBa2TSY41nlry3GDuA5acTj4s+3kL JuYFwkbUFIZdlOPS+T7Tji/GL8SxaIk7opiL40klSvOk8+waMsji0HAmJW6Db5lIRJsq fEsc1BjYt6gKZdSqnu1VFWbiVpkd+qiYwNqBjqZLAdlXdiyytxezohXKbzjhm4t3u/lX 9szg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718116701; x=1718721501; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=V1wGrXOY8w7NAB+Z6sDnZpOhUp/bAOL/IeOEdsobG9U=; b=NQVwbmykl1OSGTfl7ghj+62fndWdl8oTrN/iiippxfd5BKTUiG2Tmcudbin4XnM3OK Xhb+NeZkPhD3QT/BFNHeDiD/QbYd95dnYXM6miYIZ6ej2zWV5IB+AigeDZwe9qMKa4SL RDBfa2GvzkQKrlT/CD0MYnGGaX+4iDNxMd95tw292/QZiXobJcga6sGKdi6mrx25j3Y+ CATWYLTbZQl9tB08ZGaz6BqSe68YhAwTR6iT6rnnag1eSQLAAqvJwBbAUuAYzQ22TkVX V8OrYFH9+gjlphjlkiffbxa6I17sHuSDr7GnLDwk36KyLrTvSaIrMkS0WoDJIFr6sMQs 1fyg== X-Forwarded-Encrypted: i=1; AJvYcCUKZr2kJGNLWJiYZr0+ppH7hCIDIDjpLDUy7hEbHrugLfKhKcE7coqJ/C+kx75qp+MoXo+QG7nZ7b6My0WcVY7Rey70mB2Agw== X-Gm-Message-State: AOJu0Yxiwzu4YePySjIcESCSPciDPa+uhH3PMqEtds5Ijd7pud7k4wd2 43o8F9uAEymZ1X+r9upZGgtkDwzG44v/n8NSZQVurAecP3ORi1mT1lPehz/dO0X0Jp4Q7ifk+sZ YtHa2EjFaivAjNjLPfRMxyA/IHtdpR3qtZg== X-Google-Smtp-Source: AGHT+IFhbI66Xg565SdNvehuFtZotbDicpWGjPtmHz3WHeIFhAIN3AhAcIiASij7264Pu0UKslAOYJ/6qsUALciReew= X-Received: by 2002:a05:600c:1e0e:b0:421:bf9f:a163 with SMTP id 5b1f17b1804b1-421bf9fa342mr63441435e9.4.1718116700418; Tue, 11 Jun 2024 07:38:20 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: In-Reply-To: Date: Tue, 11 Jun 2024 17:38:08 +0300 Message-ID: Subject: Re: [PHP-DEV] Revisiting case-sensitivity in PHP To: Levi Morrison Cc: Ben Ramsey , php internals Content-Type: multipart/alternative; boundary="0000000000009c7f5a061a9e375c" From: udaltsov.valentin@gmail.com (Valentin Udaltsov) --0000000000009c7f5a061a9e375c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 11 June 2024=E2=80=AFat 17:13, Levi Morrison wrote: > On Mon, Jun 10, 2024 at 9:40=E2=80=AFPM Ben Ramsey wrote= : > > > > > On Jun 10, 2024, at 20:35, Valentin Udaltsov < > udaltsov.valentin@gmail.com> wrote: > > > > > > Hi, internals! > > > > > > 9 years have passed since the last discussions of case sensitive PHP: > https://externals.io/message/79824 and https://externals.io/message/83640= . > > > Here I would like to revisit this topic. > > > > > > What is case-sensitive in PHP 8.3: > > > - variables > > > - constants (all since > https://wiki.php.net/rfc/case_insensitive_constant_deprecation) > > > - class constants > > > - properties > > > > > > What is case-insensitive in PHP 8.3: > > > - namespaces > > > - functions > > > - classes (including self, parent and static relative class types) > > > - methods (including the magic ones) > > > > > > Pros: > > > 1. no need to convert strings to lowercase inside the engine for name > lookups (a small performance and memory gain) > > > 2. better fit for case sensitive platforms that PHP code is mostly ru= n > on (Linux) > > > 3. uniform handling of ASCII and non-ASCII symbols (currently > non-ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG) > > > 4. PSR-4 compatibility ( > https://www.php-fig.org/psr/psr-4/#:~:text=3DAll%20class%20names%20MUST%2= 0be%20referenced%20in%20a%20case%2Dsensitive%20fashion > ) > > > > > > Cons: > > > 1. pain for users, obviously > > > 2. a backward compatibility layer might be difficult to implement > and/or have a performance penalty > > > > > > On con 1. I think today PHP users are much more prepared for the > change: > > > - more and more projects adopted namespaces and PSR-4 autoloading via > Composer that never supported case-insensitivity ( > https://github.com/composer/composer/issues/1803, > https://github.com/composer/composer/issues/8906) which forced to mind > casing > > > - static analyzers became more popular and they do complain about the > wrong casing (see https://psalm.dev/r/fbdeee2f38 and > https://phpstan.org/r/1789a32d-d928-4311-b02e-155dd98afbd4) > > > - Rector appeared (it can be used to automatically prepare the > codebase for the next PHP version) > > > > > > On con 2. While considering different transition options proposed in > prior discussions (compilation flag, ini option, deprecation notice) I > stumbled upon Nikita's comment (https://externals.io/message/79824#79939)= : > > > May I recommend to only target class and class-like names for an > initial RFC? Those have the strongest argument in favor of case-sensitivi= ty > given > > > how current autoloader implementations work - essentially the > case-insensitivity doesn't properly work anyway in modern code....I'd als= o > appreciate having a voting option for removing case-insensitivity right > away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change > this, I personally would rather drop it right away than start throwing > E_STRICT warnings that would make the case-insensitive usage impossible > anyway. > > > It makes a lot of sense to me: a fairly simple change in the core and > no performance penalty. At the same time, a gradual approach will reduce > the stress. > > > > > > So the plan for 8.4 might be to just drop case insensitivity for clas= s > names and that's it... Let's discuss that! > > > > > > I=E2=80=99m not saying I agree with or support this, but I think your p= roposal > has a better chance of being accepted if you target PHP 9.0 instead of 8.= 4. > > > > Cheers, > > Ben > > > > In fact, it's definitely a BC break I would not personally vote for in > 8.4. This isn't some minor thing squirreled away in a library--this is > the core language, with wide impact. For this reason, I believe it > should target 9.0. > > I will happily vote for this feature, as long as the patch is reasonable. > > The most obvious implementation is not very good, though. The engine > uses lowercase names for case insensitivity. Namespaces are embedded > into the type names. To lowercase the namespace but not the type name, > one could do a reverse scan for a namespace separator on the type > name, and then lowercase from the start to the index of the namespace > separator. For example, " Psr\Log\LoggerInterface" needs to become > "psr\log\LoggerInterface". The problem with this is that it's not > really going to save CPU nor memory because it still has to lowercase > the namespace. > > We could refactor the engine to store the namespace separately from > the type name. This is a lot more work and will increase the size of > some types, which might be difficult at a technical level. > > I can't think of other implementations right now. If nobody can come > up with a better implementation, I think we should consider going with > split-sensitivity on namespaces where it matches the sensitivity of > the thing it is attached to. A namespaced class would have a case > sensitive namespace but a namesped function would still have a case > insensitive one. > Hi, Ben and Levi! Thank you for your interest! Could you, please, elaborate on why you propose to target 9.0? That would make perfect sense if PHP strictly followed semver, but we always have some BC breaks in minor releases ( https://www.php.net/manual/en/migration82.incompatible.php, https://www.php.net/manual/en/migration83.incompatible.php). So, is there a real difference between 8.4 and 9.0 for this case? Or do you mean that this BC break is way too big for 8.4? Levi, if we bundle namespaces, classes and functions in a single change, will that be easier to implement? Basically to remove lowercasing and put the original type names in the lookup tables? -- Best regards, Valentin Udaltsov --0000000000009c7f5a061a9e375c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, 11 June 2024=E2=80=AFat 17:13, Le= vi Morrison <levi.morrison@datadoghq.com> wrote:
On Mon, Jun 10,= 2024 at 9:40=E2=80=AFPM Ben Ramsey <ramsey@php.net> wrote:
>
> > On Jun 10, 2024, at 20:35, Valentin Udaltsov <udaltsov.valentin@gmail.co= m> wrote:
> >
> > Hi, internals!
> >
> > 9 years have passed since the last discussions of case sensitive = PHP: https://externals.io/message/79824 and https://e= xternals.io/message/83640.
> > Here I would like to revisit this topic.
> >
> > What is case-sensitive in PHP 8.3:
> > - variables
> > - constants (all since https= ://wiki.php.net/rfc/case_insensitive_constant_deprecation)
> > - class constants
> > - properties
> >
> > What is case-insensitive in PHP 8.3:
> > - namespaces
> > - functions
> > - classes (including self, parent and static relative class types= )
> > - methods (including the magic ones)
> >
> > Pros:
> > 1. no need to convert strings to lowercase inside the engine for = name lookups (a small performance and memory gain)
> > 2. better fit for case sensitive platforms that PHP code is mostl= y run on (Linux)
> > 3. uniform handling of ASCII and non-ASCII symbols (currently non= -ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG)
> > 4. PSR-4 compatibility (https://www.ph= p-fig.org/psr/psr-4/#:~:text=3DAll%20class%20names%20MUST%20be%20referenced= %20in%20a%20case%2Dsensitive%20fashion)
> >
> > Cons:
> > 1. pain for users, obviously
> > 2. a backward compatibility layer might be difficult to implement= and/or have a performance penalty
> >
> > On con 1. I think today PHP users are much more prepared for the = change:
> > - more and more projects adopted namespaces and PSR-4 autoloading= via Composer that never supported case-insensitivity (https://github.com/composer/composer/issues/1803, https://github.com/composer/composer/issues/8906) which forced to = mind casing
> > - static analyzers became more popular and they do complain about= the wrong casing (see https://psalm.dev/r/fbdeee2f38 and https://phpstan.org/r/1789a32d-d928-4311-b02e-155= dd98afbd4)
> > - Rector appeared (it can be used to automatically prepare the co= debase for the next PHP version)
> >
> > On con 2. While considering different transition options proposed= in prior discussions (compilation flag, ini option, deprecation notice) I = stumbled upon Nikita's comment (https://externals.io/mes= sage/79824#79939):
> > May I recommend to only target class and class-like names for an = initial RFC? Those have the strongest argument in favor of case-sensitivity= given
> > how current autoloader implementations work - essentially the cas= e-insensitivity doesn't properly work anyway in modern code....I'd = also appreciate having a voting option for removing case-insensitivity righ= t away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change = this, I personally would rather drop it right away than start throwing E_ST= RICT warnings that would make the case-insensitive usage impossible anyway.=
> > It makes a lot of sense to me: a fairly simple change in the core= and no performance penalty. At the same time, a gradual approach will redu= ce the stress.
> >
> > So the plan for 8.4 might be to just drop case insensitivity for = class names and that's it... Let's discuss that!
>
>
> I=E2=80=99m not saying I agree with or support this, but I think your = proposal has a better chance of being accepted if you target PHP 9.0 instea= d of 8.4.
>
> Cheers,
> Ben
>

In fact, it's definitely a BC break I would not personally vote for in<= br> 8.4. This isn't some minor thing squirreled away in a library--this is<= br> the core language, with wide impact. For this reason, I believe it
should target 9.0.

I will happily vote for this feature, as long as the patch is reasonable.
The most obvious implementation is not very good, though. The engine
uses lowercase names for case insensitivity. Namespaces are embedded
into the type names. To lowercase the namespace but not the type name,
one could do a reverse scan for a namespace separator on the type
name, and then lowercase from the start to the index of the namespace
separator. For example, " Psr\Log\LoggerInterface" needs to becom= e
"psr\log\LoggerInterface". The problem with this is that it's= not
really going to save CPU nor memory because it still has to lowercase
the namespace.

We could refactor the engine to store the namespace separately from
the type name. This is a lot more work and will increase the size of
some types, which might be difficult at a technical level.

I can't think of other implementations right now. If nobody can come up with a better implementation, I think we should consider going with
split-sensitivity on namespaces where it matches the sensitivity of
the thing it is attached to. A namespaced class would have a case
sensitive namespace but a namesped function would still have a case
insensitive one.

Hi, Ben and Levi! Thank you for your interest!<= /div>

Could you, please, elaborate = on why you propose to target 9.0? That would make perfect sense if PHP stri= ctly followed semver, but we always have=C2=A0some BC breaks in minor relea= ses (https://www.php.net/manual/en/migration82.incompatible.php, https://www.ph= p.net/manual/en/migration83.incompatible.php). So, is there a real diff= erence between 8.4 and 9.0 for this case? Or do you mean that this BC break= is way too big for 8.4?


Levi, if we bundle namespaces, classes and = functions in a single change, will that be easier to implement? Basically t= o remove=C2=A0lowercasing and put the original type names in the lookup tab= les?

--
Best regards,
Valentin Udaltso= v
--0000000000009c7f5a061a9e375c--