Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:106365
MIME-Version: 1.0
References: <CAF+90c_Ctjja=8fR-uBWL1wDxZge771yMujMhaR_R2seKXciYw@mail.gmail.com>
 <CAF+90c8JhV6-aQrsvW8u_xGB-wHmyObEU1Chd64J+x3tkKogQQ@mail.gmail.com>
 <9ADC8994-9D3C-4810-A2DB-6FB81D513098@gmail.com> <CABdc3WqxvucnCk7rd-Y2YuNn_dCvh5atNagwCqo8gp9NjMt_aQ@mail.gmail.com>
 <CANS-=pcp=-t4-cEHK8vbuNAaMBLSjmoL94h-TXKtHm0v8Ptkqg@mail.gmail.com>
 <CALKiJKorD5xayU1QYtd_XMEz7Zkx+etWs3k3DGZS3O9zPnk7mQ@mail.gmail.com>
 <CABdc3WrgSwFtoxzOpUa=_09N6vPt-CU_qFhCmZM9GVJMMQ7Gow@mail.gmail.com> <CAF+90c82zYeUkn3QRiPDFzf4B3cdF188TFGOMMXjwMAJ1UW6DQ@mail.gmail.com>
In-Reply-To: <CAF+90c82zYeUkn3QRiPDFzf4B3cdF188TFGOMMXjwMAJ1UW6DQ@mail.gmail.com>
Date: Wed, 31 Jul 2019 11:17:59 +0200
Message-ID: <CAF+90c-aTEqeT6dwfTqHseduqy-N43fxB1u67WurO6BAEgn==A@mail.gmail.com>
To: =?UTF-8?Q?Micha=C5=82_Brzuchalski?= <michal@brzuchalski.com>
Cc: Rowan Collins <rowan.collins@gmail.com>, PHP internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary="000000000000e74cf6058ef69a14"
Subject: Re: [PHP-DEV] Re: [RFC] Namespace-scoped declares, again
From: nikita.ppv@gmail.com (Nikita Popov)

--000000000000e74cf6058ef69a14
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 30, 2019 at 4:08 PM Nikita Popov <nikita.ppv@gmail.com> wrote:

> Thanks everyone for your responses!
>
> I think the discussion resolves around two primary concerns, so let me
> address them in turn.
>
> The first is the general approach of using declares as a
> language-evolution mechanism. The concern here is that each additional
> declare fragments the language and increases the number of combinations o=
f
> different options there are.
>
> What I ultimately want to achieve here is a way to evolve the language an=
d
> fix long-standing issues without breaking backwards compatibility or
> causing ecosystem fragmentation. The only way we currently have to addres=
s
> (nowadays) undesirable behavior is through deprecation and subsequent
> removal. As people like to regularly remind me, this has a high cost on t=
he
> ecosystem, because millions of codebases that were running without a glit=
ch
> need to be updated, which not only takes a lot of effort, but also delays
> adoption of new PHP versions for everyone. As such, the "deprecation and
> removal" approach has to work over long time-frames and is only really
> applicable to rather "minor" issues in the first place.
>
> If we want to evolve the language without breaking backwards
> compatibility, we need to provide a way for gradual migration of the
> ecosystem: A library should be able to opt-in to breaking changes, while
> remaining usable by downstream consumers. Conversely, an application shou=
ld
> be able to opt-in to breaking changes, while still being able to use an
> older library.
>
> To achieve this, I believe it is unavoidable to have *some* kind of
> mechanism to affect language-behavior on a per-library/project level. Of
> course, the devil tends to be in the details...
>
> What this RFC originally proposed is a fine-grained approach, where
> individual changes are controlled by separate declare directives. However=
,
> this is not the only possibility: As has been recently popularized with
> Rust editions (https://doc.rust-lang.org/edition-guide/editions/index.htm=
l),
> a coarse-grained approach where multiple changes have to be enabled
> together as part of an "edition" is also possible.
>
> The advantage of the coarse-grained "edition" approach is that it avoids =
a
> combinatorial explosion of options: It's all or nothing, and it is easier
> to keep in mind that a project uses "PHP 2020" rather than some specific
> combination of declares.
>
> The advantage of a fine-grained approach is that is also allows a
> fine-grained migration. As a statically-typed language, Rust can provide
> fairly reliable tooling to perform edition migrations. While such tooling
> also exists for PHP (e.g. Rector), it does not have the same level of
> reliability, especially for codebases that do not make pervasive use of
> type annotations. Fine-grained declares allow a code-base to be updated o=
ne
> step at a time.
>
> It is possible to combine both approaches by providing both fine-grained
> control and an overall "edition" that enables a larger set of language
> declares. The end goal should be to move to the next edition, but
> individual declares may be used during the migration, or to opt-out a
> section of code. This is probably my preferred approach.
>
> I should probably also highlight that this is somewhat different from the
> existing strict_types directive: strict_types was only in part a mechanis=
m
> to control BC breakage (with regard to internal functions), but to a larg=
e
> part exists because we couldn't agree on which semantics are preferable.
>
> This is not what I'm going for here. I don't want declares to becomes a
> way to resolve disagreements by just providing both options. Instead a
> declare represents a change that we *want* to make and that codebases
> *should* make eventually, but that is opt-in to maintain backwards
> compatibility and library interoperability.
>
> ---
>
> The second concern is around the technical details of opting-in to
> BC-breaking language changes on the library level. Here is an overview of
> some proposals that have been made:
>
> 1. Keep declares per-file. This is clearly incompatible with any
> fine-grained (or optionally fine-grained) approach, because declares have
> to be replicated across hundreds of files. I think this is a bad choice
> also for a coarse-grained approach (or even for the existing strict_types
> directive), because in all cases I've seen people want to enable the opti=
on
> for the whole library, not individual files.
>
> Replicating declares per-file is error prone (I regularly forget to add
> strict_types declarations to newly created files) and complicates the
> mental model of the programmer. While ostensibly per-file declares make
> things explicit, I think the reality is that nobody actually double-check=
s
> declares in each file they open and will instead assume that the project
> default holds.
>
> 2. Support per-namespace declares. This is what I originally proposed.
> This is based on the premise that a library will usually correspond to a
> namespace. This approach has been extensively discussed in this thread --=
 I
> think the main issue is that the premise just doesn't reliably hold up in
> practice, e.g. because multiple packages publish under the same namespace=
.
>
> 3. Support per-directory declares, which is the direction I was planning
> to explore next. This is based on the premise that all library files are
> part of some top-level directory, which I think is a fairly safe premise
> (note that the "directory" could also be a phar file).
>
> The actual intended use (similar to the namespace-based variant) is that
> people will specify their declares in the composer.json file, and compose=
r
> then includes a call to declare_directory() or similar as part of the
> autoloader. Projects not using composer have the choice of issuing an
> explicit call.
>

After looking into the implementation side a bit, I think I remember why I
didn't go down this route originally: Path canonicalization is tricky.

Namespaces have the big advantage that they are fully controlled by PHP,
with well-defined semantics. Plain filesystem paths have a well-defined
realpath canonicalization, but things are less simple for general streams.

The engine already needs to solve this problem for the purpose of
include_once and require_once and provides the zend_resolve_path hook for
this. However, it doesn't really support stream wrappers (phar has some
support here, but the last time I looked into it I didn't get the
impression that it's reliable). There's the additional question of caching:
In an opcache'd scenario we wouldn't want to have any stat traffic. Opcache
does perform caching of resolved paths, but I think this currently only
extends to actually cached files, while we'd also need directories in this
case.

These problems can probably be overcome, possibly by making path resolution
a first-class stream wrapper function, but it does make things a good bit
more complicated relative to the namespace-based approach, and possibly
also more fragile.

I'm somewhat stumped at this point. The most reliable approach (and in a
certain way also technically simplest) would be to have a package
declaration in every file, as suggested by Michal ... but of course this
does require adding the package declaration everywhere and complicates
migration.

Nikita

4. Specify declares in a special file, similar to how INI directives are
> declared. The suggestion here has been that PHP could scan the path of an
> included file upwards to find a declares.json (or similar).
>
> The main advantage I see here (over a declare_directory() function) is
> that there are no loading order issues. declare_directory() needs to be
> called before any files from that directory have been included (which is
> part of why an integration into the composer autoloader is useful), while
> for a separate and implicitly processed file this falls out naturally.
>
> Apart from that, I'm not a big fan of this proposal, mostly because of th=
e
> implicit loading it entails. I also don't think that having one more
> configuration file for this buys us something over declaring things in
> composer.json.
>
> 5. Introduce a first-class module/package concept and support per-package
> declares. This is arguably the closest fit for what is needed, but also t=
he
> most complex solution. This is a fairly big problem space and I personall=
y
> do not want to pursue this outside a certain narrow scope.
>
> In particular I have serious doubts about retrofitting (at this point in
> time) an invasive module system that involves explicit export and import
> management, along the lines of what Michal is describing. (Though I will =
be
> happily surprised if someone comes forward with a proposal to do this in =
a
> non-invasive way.)
>
> What I think might be worth pursuing though, is a much weaker package
> notion that essentially grants some language-integration to the existing
> notion of composer packages. Instead of having a declare_directory() we'd
> have declare_package(), which is bound to a certain path and can be used =
to
> specify declares, but also used for other purposes, such as package-priva=
te
> visibility.
>
> If I may make another Rust analogy, this would be more like Rust crates
> than Rust modules. The analogy being that this is a more coarse grained
> level, and is fairly tightly integrated with the package manager (but of
> course still usable without it).
>
> Regards,
> Nikita
>
>
> On Tue, Jul 30, 2019 at 12:14 PM Micha=C5=82 Brzuchalski <
> michal@brzuchalski.com> wrote:
>
>> Hi Rowan,
>>
>> wt., 30 lip 2019 o 10:48 Rowan Collins <rowan.collins@gmail.com>
>> napisa=C5=82(a):
>>
>> > I think there's some confusion here, because establishing the concept
>> of a
>> > package as separate from a namespace is exactly what I proposed.
>> >
>> > Here's a previous message (technically in the same thread, but from 18
>> > months ago) where I also mentioned class visibility:
>> > https://externals.io/message/101323#101390
>> >
>>
>> Was thinking about similar, a package with own identity and a way to
>> declare autoload and other stuff.
>> Was even thinking it could use a double colon which I've proposed way ba=
ck
>> in the same thread and
>> with a delimiter in name all related symbols could be stored in package
>> individual symbol tables,
>> it won't collide with namespaced and global ones and would be easier to
>> detect if tried to use an internal symbol
>> in another context like other package or in global code.
>> It could introduce a few more keywords like:
>> * "package" - for declaring package name and declares,
>> * "export" - for explicit declare of publicly available symbols which th=
en
>> could be detectable etc. for visibility features,
>> * "expect" - for explicit declare of required dependencies
>>
>> Last two are for future features and the first one could be enough for
>> shaping how it could look like.
>> For eg. some of my thoughts
>> https://gist.github.com/brzuchal/c45010f0dd20642b470eeee8b9c56c5f
>>
>> I know it's out of the main topic but IMO we should start another one an=
d
>> I'm pretty sure I've mentioned that earlier.
>> If we wanna shape package for PHP then the separate discussion could be =
a
>> good idea.
>>
>> --
>> regards / pozdrawiam,
>> --
>> Micha=C5=82 Brzuchalski
>> about.me/brzuchal
>> brzuchalski.com
>>
>

--000000000000e74cf6058ef69a14--