Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125083 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 6091D1A00BD for ; Wed, 21 Aug 2024 05:10:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1724217122; bh=RDwJ4uSyMQmAYiA40sMf09h9XsxQ3BXmZFmKTJiSvn4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=BtdPCIfpUQU1ptDRP71lpdsdofwVcOSI3SEfXYB0Pr7+1SRU2id6IC1A+mujfnKz5 PHRkcDaAQgiOeibVEWnSoxI529bdisds7OdCDwjxOgJwnMn/+/VxCQyECGGt6yf1T6 ENXY4j2Ok0JrypCCAiwwj+8Y9abtVydEcjZ6S4CD2LcP20rYHJItJ9sgEe6mi5k0HS G21TJnF3JTGRp+X33m1hj9GENtNW2FrQTeuEgQUdJMmyC8ePO3lRc2BGP/pyffeVD5 4+Gd98jfPl7oE0SPPHVsoZcrQNMOGdKIwAGPFTj+WAZ5P0OwVf+ezYEl7g0YpB4AwW PiWNZRlAhHZGA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 48EAF18006F for ; Wed, 21 Aug 2024 05:12:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 21 Aug 2024 05:11:58 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-2d3e46ba5bcso3525738a91.0 for ; Tue, 20 Aug 2024 22:10:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724217008; x=1724821808; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=UXmSTzLPGLdpL+ylG/sFXDEkZeADlrgurKHqkRPcWR8=; b=F8TAhi1vZGPsxaPV+zJywVoRgQ0pmPlhYEwPu9loih73zKyWl0UHVc3eSqMh1D8nAv FOOGtpOcLXeue3i1KKVvnewPYeVY1p7sBBhm5P7Qc+o3+Dx0zcrTnmaPPKdQnptzklj3 Ds5fRPtyjByrC1DHHdGAuOkGAVmSyBWDSt+uBSJ88cfvmG8S0dVaLEGiyxoDfnCtQoGH MawTsLFJpPFL24P68E7rsNWE8yTBk9uNT6y7ykEbRzeVp8FBoAaOExh60Z5NmxTEvdzL BIn9rWgtgb0jU2GLM48GNNeGqObiNF6JsEknXJyb9cMz4P5Uq2H6G9EAL5tmV+gG54Yv twYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724217008; x=1724821808; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UXmSTzLPGLdpL+ylG/sFXDEkZeADlrgurKHqkRPcWR8=; b=IqSlDGHQdZua4TL+e79SfgSTtPBSge3Q00K6tly2Ah9VBfv5vPRebWlRqio+dV9G+V bhD8MRoqB4g67zcFeu0Alvi5IMhp34Tqc9KgwevuzYGbj6+tByk2iLIUNhax3ravzCX9 PPwfmmMPLZhl6ikrAGaVudvWmY+2c1AuuT2vg6Frl6w86iQ5u4CSObLhQwUc4HhYAqKl SYP/IpoupRkWp68kkRIIRfbr9Iazyu3rqdhem7O2WFKwC/4Yex5gU2vWf5l6PcfYaBkl PJzS1wt9y3LrDIdYI7beL6+n035+zPpPvZCTiuLzFZtW73F4Y6qWsSIhAO45pt1Dw1AD lX1w== X-Forwarded-Encrypted: i=1; AJvYcCVe+4qqPtTB/pReSgxMj+O629G4hXeaq0l1mMLhospTUOphWcsxGsZPFm37i66j/3tM3BTmeIduE0Q=@lists.php.net X-Gm-Message-State: AOJu0Yw0eN9WQspDsbOLjwgMVOYxlh6aDCnjwOiM7arIPUrQJAWgYhGx XUngLgx+8k3S61h+B4Jlbf1pwR0O+1CrfAWL7m4H9NErvvYBCczQKfTIN+VCf2PFO+PZ7qPqj85 2AcA7sLgXS26ZYLu0mSxaPxu6ctU= X-Google-Smtp-Source: AGHT+IELiFOd9Uf/BiuOLjHTRcOIWVHjvFVyTgVFYH+xWQ9K6LZ2W5B/Npqlj4Vvkq8MDd0C+J8LJ13Gfg2XffnSpe4= X-Received: by 2002:a17:90a:e7cc:b0:2d3:b438:725f with SMTP id 98e67ed59e1d1-2d5e9a22234mr1379019a91.24.1724217008017; Tue, 20 Aug 2024 22:10:08 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1b59392a-68cb-36eb-0fef-977ac7113520@php.net> In-Reply-To: Date: Tue, 20 Aug 2024 22:09:55 -0700 Message-ID: Subject: Re: [PHP-DEV] State of Generics and Collections To: Arnaud Le Blanc Cc: Bob Weinand , Derick Rethans , PHP Developers Mailing List Content-Type: multipart/alternative; boundary="000000000000475b1206202a8ec2" From: jordan.ledoux@gmail.com (Jordan LeDoux) --000000000000475b1206202a8ec2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Aug 20, 2024 at 6:02=E2=80=AFAM Arnaud Le Blanc wrote: > Hi Bob, > > On Tue, Aug 20, 2024 at 12:18=E2=80=AFAM Bob Weinand wrote: > > The fluid Arrays section says "A PoC has been implemented, but the > performance impact is still uncertain". Where may I find that PoC for my > curiosity? I'm imagining the implementation of the array types as a count= ed > collection of types of the entries. But without the PoC I may only guess. > > I may publish the PoC at some point, but in the meantime here is a > short description of how it's implemented: > > - The zend_array has a zend_type member representing the type of its > elements > - Everytime we add or update a member, we union its type with the > array type. For simple types it's just a |=3D operation. For arrays with > a single class it's also simple. For complex types it's more expensive > currently, but it may be possible to cache transitions to make this > cheaper. > - Updating the array type on deletes requires to either maintain a > counter of every type, or to re-compute the type entirely everytime. > Both are probably too expensive. Instead, we don't update the type on > deletes, but we re-compute the type entirely when a type check fails. > This is based on two hypotheses: 1. A delete rarely changes an array's > type in practice, and 2. Type checks rarely fail > - References are treated as mixed, so adding a reference to an array > or taking a reference to an element changes its type to mixed. Passing > an array to a more specific array will cause a > re-compute, which also de-refs every reference. > - Updating a nested element requires updating the type of every parent > > > It also says "Another issue is that [...] typed properties may not be > possible.". Why would that be the case? Essentially a typed property woul= d > just be a static array, which you describe in the section right below. > > It becomes complicated when arrays contain references or nested > arrays. Type constraints must be propagated to nested arrays, but also > removed when an array is not reachable via a typed property anymore. > > E.g. > > class C { > public array> $prop; > } > > $a =3D &$c->prop[0]; > $a[] =3D 'string'; // must be an error > unset($c->prop[0]); > $a[] =3D 'string'; // must be accepted > > $b =3D &$c->prop[1]; > $b[] =3D 'string'; // must be an error > $c->prop =3D []; > $a[] =3D 'string'; // must be accepted > > I don't remember all the possible cases, but I didn't find a way to > support this that didn't involve recursively scanning an array at some > point. IIRC, without references it's less of an issue, so a possible > way forward would be to forbid references to members of typed > properties. Unfortunately this breaks pass-by-reference, e.g. > `sort($c->prop)`. out/inout parameters may be part of a solution, but > with more array separations than pass-by-ref. > > Best Regards, > Arnaud > Another one that I don't see mentioned that naturally follows from a conversation I had with you a few weeks ago is operators on arrays. Namely, the behavior of the `+` operator when used with arrays. How this would interact with generics, and with different approaches to generics and arrays, is probably something that will require attention. Operators in general present some challenges (though not unsolvable ones, just complicated ones) to languages that try to use both generics and loose types, because operators generally don't have a way for the programmer to help the engine with typing during the evaluation. Jordan --000000000000475b1206202a8ec2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Aug 20, 2024 at 6:02=E2=80=AF= AM Arnaud Le Blanc <arnaud.lb@gma= il.com> wrote:
Hi Bob,

On Tue, Aug 20, 2024 at 12:18=E2=80=AFAM Bob Weinand <bobwei9@hotmail.com> wrote: > The fluid Arrays section says "A PoC has been implemented, but th= e performance impact is still uncertain". Where may I find that PoC fo= r my curiosity? I'm imagining the implementation of the array types as = a counted collection of types of the entries. But without the PoC I may onl= y guess.

I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:

- The zend_array has a zend_type member representing the type of its elemen= ts
- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |=3D operation. For arrays wit= h
a single class it's also simple. For complex types it's more expens= ive
currently, but it may be possible to cache transitions to make this
cheaper.
- Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's<= br> type in practice, and 2. Type checks rarely fail
- References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause = a
re-compute, which also de-refs every reference.
- Updating a nested element requires updating the type of every parent

> It also says "Another issue is that [...] typed properties may no= t be possible.". Why would that be the case? Essentially a typed prope= rty would just be a static array, which you describe in the section right b= elow.

It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.

E.g.

class C {
=C2=A0 =C2=A0 public array<array<int>> $prop;
}

$a =3D &$c->prop[0];
$a[] =3D 'string'; // must be an error
unset($c->prop[0]);
$a[] =3D 'string'; // must be accepted

$b =3D &$c->prop[1];
$b[] =3D 'string'; // must be an error
$c->prop =3D [];
$a[] =3D 'string'; // must be accepted

I don't remember all the possible cases, but I didn't find a way to=
support this that didn't involve recursively scanning an array at some<= br> point. IIRC, without references it's less of an issue, so a possible way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
`sort($c->prop)`. out/inout parameters may be part of a solution, but with more array separations than pass-by-ref.

Best Regards,
Arnaud

Another one that I don't see= mentioned that naturally follows from a conversation I had with you a few = weeks ago is operators on arrays. Namely, the behavior of the `+` operator = when used with arrays. How this would interact with generics, and with diff= erent approaches to generics and arrays, is probably something that will re= quire attention. Operators in general present some challenges (though not u= nsolvable ones, just complicated ones) to languages that try to use both ge= nerics and loose types, because operators generally don't have a way fo= r the programmer to help the engine with typing during the evaluation.

Jordan
--000000000000475b1206202a8ec2--