Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125193 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 8E30A1AD8D4 for ; Sat, 24 Aug 2024 20:37:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1724531943; bh=pXkFwQ3TmxIc4zSNnvJEzTMrDMsqRlbLLfDFaS4MuUE=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From; b=N+JhND5GuBJjNQJJm8Xe7kWa/8msh+gWQ7oJWhfqPVZmGchm8chMBSebRGpMZ9POY LtJ2bBnVNO9Qz5/n+cL3u25xs2Jams1ut6Jed6C9yOFdZ40esBvKII+qGQDpA4+1YY lDEGB/fHTzgbkUB9R+wa3JYiJjOkXuVMnl1Ljy/f1CR+6LdCSkFcP3Lrg18m/tBDaG gj26JPUGxMQ2qTMZE+zHTaZALyMlXQDSCiSPkX3ynDLjcqd+XsKw5UVk7vqGsXO9Ug eR0DgC4CifGBR8wn+htE+ycYtCrUUOsDNfb6uXCx5s9Zln5mxJEC0am05g2jzSoS7A tS3ORmmn+iRaw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 1CA471801E8 for ; Sat, 24 Aug 2024 20:38:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fhigh8-smtp.messagingengine.com (fhigh8-smtp.messagingengine.com [103.168.172.159]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 24 Aug 2024 20:38:56 +0000 (UTC) Received: from phl-compute-03.internal (phl-compute-03.nyi.internal [10.202.2.43]) by mailfhigh.nyi.internal (Postfix) with ESMTP id C79021152D0F; Sat, 24 Aug 2024 16:37:03 -0400 (EDT) Received: from phl-imap-09 ([10.202.2.99]) by phl-compute-03.internal (MEProxy); Sat, 24 Aug 2024 16:37:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:cc:content-type:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1724531823; x= 1724618223; bh=2NBcJqcDm4bvOrJa8yUY9V0v9MatQykVzIohi6QTIbA=; b=k lpQ5d6VhU9MDzIrKHqM2aceYl0WR4EJyWpC+ybJik4R0kldw2vWZIEi9TiNco042 BfTDe63Yz1txDX7rl9/LrCuDUv5RfBGpk74IQNEXmFvhhfTcxMkB19Czbg9EoJRh q3AygJAjPZ08PUevhwAe2Zm3fN5aztOkW7DSIRJ+mlyMxEaH3soQ/3hssbv74Efg oq3cbZqqusqj0lQqqE9cz8qnlbxcqXnyw+SYkr3Z65q33g7YZftyLCCuENKQlHZh XrN08CmJ+Ljdn43JEBcbH4irCC/H6twcHTzJaGjBEqqk8BgmFyilNENDGpQsAmWT AWqi3bYA1BAcLVc1ZcftA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1724531823; x=1724618223; bh=2NBcJqcDm4bvOrJa8yUY9V0v9Mat QykVzIohi6QTIbA=; b=ABFdLR/abH121TNmLPNgXG8tl13Wz/yLAQDMBdDuyXVp o3NEZU4wzg4PKz9WAZgdWourbWne/4Ed7/cN5sqzcsq/z8UQJTP5MwBxuqMFJiGc P3Vava3NDhvxTw3nK0VvbUyKoYxdPqGnjXQGbLst56bbZ3vAGOFZblDDZm4jZKFx TDVbZ0lh+To0ubS2q1QVO0z/AOl5S2Q3z+GblMXrB75RXO4CsRIxgMxRPsmlxZxi uwXxSVvAeTm0XapSD679GQuFjYFq8ppyb26cDNIUZGqtZzM7HceqzCO9JOcH9WQj tI/Wv1tC8kA9VYXph+Zf2jJA1Hvz/reDdEe5Dv+NVw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddvgedgudehvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpefoggffhffvvefkjghfufgtsegrtderreertdej necuhfhrohhmpedftfhosgcunfgrnhguvghrshdfuceorhhosgessghothhtlhgvugdrtg houggvsheqnecuggftrfgrthhtvghrnhepieeuteehvddvfeejhffgieehleehhedthfef keejffelgfevvdekudetjeejtddtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomheprhhosgessghothhtlhgvugdrtghouggvshdpnhgspghrtghp thhtohepfedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepthhovhhilhhordhilh hijhgrsehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhhphdqlhhishhtsheskhhorghl vghphhgrnhhtrdgtohhmpdhrtghpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrph hhphdrnhgvth X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 9AC68780065; Sat, 24 Aug 2024 16:37:03 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Sat, 24 Aug 2024 22:36:43 +0200 To: "Stephen Reay" , "Ilija Tovilo" Cc: "PHP internals" Message-ID: In-Reply-To: <642C34C4-9E6E-4190-8EF9-9765F4A0F175@koalephant.com> References: <181C78B1-65AD-44B4-AD95-A59791B7FF86@koalephant.com> <642C34C4-9E6E-4190-8EF9-9765F4A0F175@koalephant.com> Subject: Re: [PHP-DEV] [Concept] Flip relative function lookup order (global, then local) Content-Type: multipart/alternative; boundary=a3193c66a5cc42d28048a65096d0879a From: rob@bottled.codes ("Rob Landers") --a3193c66a5cc42d28048a65096d0879a Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sat, Aug 24, 2024, at 20:16, Stephen Reay wrote: >=20 >=20 > > On 25 Aug 2024, at 00:01, Ilija Tovilo wrot= e: > >=20 > > Hi Stephen > >=20 > > On Sat, Aug 24, 2024 at 1:54=E2=80=AFPM Stephen Reay wrote: > >>=20 > >> Thanks for clarifying. Out of curiosity, how much optimisation do y= ou imagine would be possible if the lookups were done the same was as cl= asses (ie no fallback, names must be local, qualified or imported with `= use`)? > >=20 > > I haven't measured this case specifically, but if unqualified calls = to > > local functions are indeed rare (which the last analysis seems to > > indicate), then it should make barely any difference. Of course, if > > your code makes lots of use of them, then the story might be > > different. That said, the penalty of an ambiguous internal call is > > much higher than that of a user, local call, given that internal cal= ls > > sometimes have special optimizations or can even be entirely executed > > at compile time. For local calls, it will simply lead to a double > > lookup on first execution. > >=20 > >> I am aware this is a BC break. But if it's kosher to discuss introd= ucing a never ending BC break I don't see why this isn't a valid discuss= ion either. It would give *everyone* that elusive 2-4% performance boost= , would resolve any ambiguity about which function a person intended to = call (the claimed security issue) and would bring consistency with the w= ay classes/etc are referenced. > >=20 > >> From my analysis, there were 2 967 unqualified calls to local > > functions in the top 1 000 repositories. (Disclaimer: There might be= a > > "use function" at the top for some of these, the analysis isn't that > > sophisticated.) > >=20 > > I also ran the script to check for unqualified calls to global > > functions (or at least functions that weren't statically visible in > > that scope in any of the repositories files), and there were ~139 000 > > of them. It seems like this is quite a different beast. To summarize: > >=20 > > 1. Flipping lookup order: ~a few dozens of changes > > 2. Global only: ~3 000 changes > > 3. Local only: ~139 000 changes > >=20 > > While much of this can be automated, huge diffs still require > > reviewing time, and can lead to many merge conflicts which also take > > time to resolve. I would definitely prefer to go with 1. or > > potentially 2. > >=20 > > Ilija > >=20 >=20 >=20 > Hi Ilija, >=20 > I understand that a change like (3) is a huge BC break, and as I said = earlier, I wasn't actually suggesting that is the action to take, becaus= e I don't think there is sufficient reason to take *any* action. But giv= en that some people in this thread seem convinced that *a* change to fun= ctionality is apparently required, I do think every potential change, an= d it's pros and cons, should be discussed. >=20 >=20 > As I've said numerous times, and been either outright dismissed or ign= ored: there has been a consistent push from a non-trivial number of inte= rnals members that userland developers should make better use of regular= functions, rather than using classes as fancy namespaces. There was a r= ecent RFC vote that implicitly endorsed this opinion. >=20 > Right now, the lookup rules make namespaced regular functions a consis= tent experience for developers, but the lack of autoload makes it unpopu= lar, and the lack of visibility for such symbols can be problematic.=20 >=20 > With the change you're proposing, there will be *another* hurdle that = makes the use of regular namespaced functions harder/less intuitive, or = potentially (with option 1) unpredictable over PHP versions, due to the = constant threat of BC breaks due to new builtin functions - right when w= e have not one but two RFCs for function autoloading (arguably the bigge= st barrier to their increased usage in userland). >=20 >=20 >=20 > So the whole reason I asked about (3) is because it would > - (a) bring consistency with class/interface/trait symbols; > - (b) inherently bring the much desired 2% performance boost for funct= ion calls, because people would be forced to qualify the names; > - (c) have zero risk of of future WTF BC break when a new global funct= ion interrupting local function lookups; > - (d) have no need for a new "simpler" qualifying syntax (you can't ge= t shorter than 1 character); > - (e) presumably simplify function autoloading, because there's no lon= ger any "fallback" step to worry about before triggering an autoloader; > - (e) even solve the "security" concerns John raised, because the deve= loper would be forced to qualify their usage if they wanted to use the b= uiltin function - their intent is always explicit, never guessed. >=20 >=20 >=20 > Yes, it is a huge BC break in terms of the amount of code that's affec= ted. But it's almost certainly one of the simplest BC break to "fix" in = the history of PHP BC breaks. >=20 >=20 > How much code was affected when register globals was removed? Or when = magic quotes was removed? Or when short codes were removed?=20 >=20 > Surely any change being proposed here would mean a deprecation notice = in the next release after 8.4, and then whatever actual change is propos= ed, in the next major version after that. So possibly 8.5 and then 9.0, = but potentially 9.0 and then 10.0. >=20 >=20 > If either of (1) or (2) is chosen, and the "acceptability" of such a c= hoice depends on something less verbose than "namespace\" to qualify a l= ocal function, projects literally can't future proof (or no-deprecation-= notice-proof, if you prefer) their code against the eventual flip until = that change is implemented - in a scenario where a deprecation (and new = local qualifier) goes out in 2025 as part of 8.5, and a flip happens in = 2026 as part of 9.0, that would cuts the time projects have to effective= ly adapt, in half, and it means any code that's updated for it, can't ma= ke use of the new "less verbose" local qualifier if they also need to su= pport versions prior to it being available. >=20 > If it happened to be 9.0 and 10.0 being the deprecation and "change" v= ersions, obviously people have longer to make the required change - but = that argument cuts both ways. If you have 5 years to change every `strle= n` to `\strlen` it's hardly going to cause a huge and sudden swath of no= ise in revision history. I would imagine most projects would just adopt = a new code style, and prefixing with `\` would occur automatically whene= ver a file is otherwise modified by a developer. >=20 >=20 > There's also an impact on internals development/RFC with either (1) or= (2): *any* proposed new global function in the standard library now has= a BC barrier to pass if it *might* conflict with one defined by anyone = in userland, in any namespace. JS is a living embodiment of of this prob= lem: see String#includes, Array#includes, and Array#flat - and that's wi= th people doing the "wrong thing" (extending builtin JS prototypes is ar= guably the same as using the `\php` namespace) >=20 >=20 >=20 > Multiple people have lamented the way function fallbacks were original= ly implemented. If you're going to insist on making a change, let's at l= east aim for a change that brings *MORE* consistency, and fixes a previo= us mistake, rather than adding a brand new inconsistency and who knows h= ow many years of unexpected BC breaks for unsuspecting userland develope= rs - who apparently *already* stuggle to understand the way symbol looku= p happens - into the future, and adding yet *another* reason for people= to not use namespaced functions. >=20 >=20 >=20 > Cheers >=20 >=20 > Stephen=20 It may be worth waiting for function autoloading, to be honest. One of t= he nice things about it is that you get called when using non-qualified = globals. This makes it very easy for an autoloader to start forcing qual= ified globals and emitting warnings/exceptions. I have a feeling that, e= ventually, if function autoloading gets more use and accepted into php, = we will see people using more and more qualified globals. Ergo, I suspect option (3) will become the default, eventually. Unless (= 2) is chosen, of course.=20 =E2=80=94 Rob --a3193c66a5cc42d28048a65096d0879a Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

=
On Sat, Aug 24, 2024, at 20:16, Stephen Reay wrote:

<= br>
> On 25 Aug 2024, at 00:01, Ilija Tovilo <tovilo.ilija@gmail.com> wrote:

> Hi Stephen
>&= nbsp;
> On Sat, Aug 24, 2024 at 1:54=E2=80=AFPM Stephen= Reay <php-lists@koalepha= nt.com> wrote:
>> 
>>= ; Thanks for clarifying. Out of curiosity, how much optimisation do you = imagine would be possible if the lookups were done the same was as class= es (ie no fallback, names must be local, qualified or imported with `use= `)?

> I haven't measured this = case specifically, but if unqualified calls to
> local = functions are indeed rare (which the last analysis seems to
> indicate), then it should make barely any difference. Of course, = if
> your code makes lots of use of them, then the stor= y might be
> different. That said, the penalty of an am= biguous internal call is
> much higher than that of a u= ser, local call, given that internal calls
> sometimes = have special optimizations or can even be entirely executed
> at compile time. For local calls, it will simply lead to a double=
> lookup on first execution.
> <= br>
>> I am aware this is a BC break. But if it's kosher= to discuss introducing a never ending BC break I don't see why this isn= 't a valid discussion either. It would give *everyone* that elusive 2-4%= performance boost, would resolve any ambiguity about which function a p= erson intended to call (the claimed security issue) and would bring cons= istency with the way classes/etc are referenced.
> = ;
>> From my analysis, there were 2 967 unqualified = calls to local
> functions in the top 1 000 repositorie= s. (Disclaimer: There might be a
> "use function" at th= e top for some of these, the analysis isn't that
> soph= isticated.)

> I also ran the s= cript to check for unqualified calls to global
> functi= ons (or at least functions that weren't statically visible in
<= div>> that scope in any of the repositories files), and there were ~1= 39 000
> of them. It seems like this is quite a differe= nt beast. To summarize:

> 1. F= lipping lookup order: ~a few dozens of changes
> 2. Glo= bal only: ~3 000 changes
> 3. Local only: ~139 000 chan= ges

> While much of this can b= e automated, huge diffs still require
> reviewing time,= and can lead to many merge conflicts which also take
>= time to resolve. I would definitely prefer to go with 1. or
> potentially 2.

> Ilija=



Hi= Ilija,

I understand that a change like (3)= is a huge BC break, and as I said earlier, I wasn't actually suggesting= that is the action to take, because I don't think there is sufficient r= eason to take *any* action. But given that some people in this thread se= em convinced that *a* change to functionality is apparently required, I = do think every potential change, and it's pros and cons, should be discu= ssed.


As I've said numerous = times, and been either outright dismissed or ignored: there has been a c= onsistent push from a non-trivial number of internals members that userl= and developers should make better use of regular functions, rather than = using classes as fancy namespaces. There was a recent RFC vote that impl= icitly endorsed this opinion.

Right now, th= e lookup rules make namespaced regular functions a consistent experience= for developers, but the lack of autoload makes it unpopular, and the la= ck of visibility for such symbols can be problematic. 

With the change you're proposing, there will be *anothe= r* hurdle that makes the use of regular namespaced functions harder/less= intuitive, or potentially (with option 1) unpredictable over PHP versio= ns, due to the constant threat of BC breaks due to new builtin functions= - right when we have not one but two RFCs for function autoloading (arg= uably the biggest barrier to their increased usage in userland).



So the whole reason = I asked about (3) is because it would
- (a) bring consiste= ncy with class/interface/trait symbols;
- (b) inherently b= ring the much desired 2% performance boost for function calls, because p= eople would be forced to qualify the names;
- (c) have zer= o risk of of future WTF BC break when a new global function interrupting= local function lookups;
- (d) have no need for a new "sim= pler" qualifying syntax (you can't get shorter than 1 character);
- (e) presumably simplify function autoloading, because there's = no longer any "fallback" step to worry about before triggering an autolo= ader;
- (e) even solve the "security" concerns John raised= , because the developer would be forced to qualify their usage if they w= anted to use the builtin function - their intent is always explicit, nev= er guessed.



Y= es, it is a huge BC break in terms of the amount of code that's affected= . But it's almost certainly one of the simplest BC break to "fix" in the= history of PHP BC breaks.


H= ow much code was affected when register globals was removed? Or when mag= ic quotes was removed? Or when short codes were removed? 
=

Surely any change being proposed here would mean a d= eprecation notice in the next release after 8.4, and then whatever actua= l change is proposed, in the next major version after that. So possibly = 8.5 and then 9.0, but potentially 9.0 and then 10.0.

<= /div>

If either of (1) or (2) is chosen, and the "acc= eptability" of such a choice depends on something less verbose than "nam= espace\" to qualify a local function, projects literally can't future pr= oof (or no-deprecation-notice-proof, if you prefer) their code against t= he eventual flip until that change is implemented - in a scenario where = a deprecation (and new local qualifier) goes out in 2025 as part of 8.5,= and a flip happens in 2026 as part of 9.0, that would cuts the time pro= jects have to effectively adapt, in half, and it means any code that's u= pdated for it, can't make use of the new "less verbose" local qualifier = if they also need to support versions prior to it being available.

If it happened to be 9.0 and 10.0 being the depr= ecation and "change" versions, obviously people have longer to make the = required change - but that argument cuts both ways. If you have 5 years = to change every `strlen` to `\strlen` it's hardly going to cause a huge = and sudden swath of noise in revision history. I would imagine most proj= ects would just adopt a new code style, and prefixing with `\` would occ= ur automatically whenever a file is otherwise modified by a developer.


There's also an impact on int= ernals development/RFC with either (1) or (2): *any* proposed new global= function in the standard library now has a BC barrier to pass if it *mi= ght* conflict with one defined by anyone in userland, in any namespace. = JS is a living embodiment of of this problem: see String#includes, Array= #includes, and Array#flat - and that's with people doing the "wrong thin= g" (extending builtin JS prototypes is arguably the same as using the `\= php` namespace)



Multiple people have lamented the way function fallbacks were origina= lly implemented. If you're going to insist on making a change, let's at = least aim for a change that brings *MORE* consistency, and fixes a previ= ous mistake, rather than adding a brand new inconsistency and who knows = how many years of unexpected BC breaks for unsuspecting userland develop= ers - who apparently *already* stuggle to understand the way symbol look= up happens -  into the future, and adding yet *another* reason for = people to not use namespaced functions.


Cheers


Stephen 

It may be = worth waiting for function autoloading, to be honest. One of the nice th= ings about it is that you get called when using non-qualified globals. T= his makes it very easy for an autoloader to start forcing qualified glob= als and emitting warnings/exceptions. I have a feeling that, eventually,= if function autoloading gets more use and accepted into php, we will se= e people using more and more qualified globals.

=
Ergo, I suspect option (3) will become the default, eventually. Unl= ess (2) is chosen, of course. 

=E2=80=94 Rob
--a3193c66a5cc42d28048a65096d0879a--