Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123755 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 2FF9C1A009C for ; Sat, 22 Jun 2024 18:59:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719082820; bh=NNjKIN+ZYtodVuOqpaJsHzDe/Zeff2B5+xemTPdZF28=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=QO6iofOwBkSA8OXcDn97Jslocac/V5TZtEykAKkLmuq3JPvET/IEX6ORgNmtYJwWe JGgwr2tgKFdIV3g6yR9CRDBMz7xSaQHusxsvyQbe0VTjLD1kmNgnITHL+0MWFLmkUB rufQQSBLTs+1Ak/EsBysGWV5JTWE5soZO0Hw0rX2LA1w6WfC+6+TEunDKiY5XGYZ+B nlI7098q1AjExuUmJC3/OtjnCpcH5SW3Vdp8WKTK+sRgWaFf8acn4GE4E7UaHRIOOR xFSPSAXLC0f11AGFUpmMGx7VCAKEHwgjc8yrp10j37QxUfFTpRlz1wdl7TwZqDjQXu l7md0tjznab3A== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A2DF718060E for ; Sat, 22 Jun 2024 19:00:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 22 Jun 2024 19:00:16 +0000 (UTC) Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a6f09eaf420so352055166b.3 for ; Sat, 22 Jun 2024 11:59:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719082740; x=1719687540; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=NNjKIN+ZYtodVuOqpaJsHzDe/Zeff2B5+xemTPdZF28=; b=ZMyrMpa0xa/a2zt432Lf5B9wVvRTGCp06A5o9lljm9oUSWZbphUQv8q3sfojWQqPWw k5BoieyJXU4hr/c/IqWL05Xx84jSyrGJUN5DnXxd8kK/eaYzexOsjhxv+scoU3cr4OwI IIsABiZPc9vDoQHMmBXaKqVBoz4QA+uLph8u1BmenfScGVoBXQ/mPLx+DTxl/+fDDXqQ mipbNg8iVX4mmUr2sRPU1qU2RVHhaHYbzQRcuSIhE3AJ0VaHsqxp0q9O2ueH7jwxs13N v8Qssk8x0O6x5bJXW+dhPw9muIU3Gab3zvBwF88fkq2gBooRQHC3jgxJS9iad6n20fer uQhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719082740; x=1719687540; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NNjKIN+ZYtodVuOqpaJsHzDe/Zeff2B5+xemTPdZF28=; b=lasztNR5+E8FBJvtdqxLOVwZPGf38QCjU/DU1LLoAfD7psVF//nykhzML8GNR8VOlc 6fEJKC2ydmdDFfCrz8AQLcUizhVgzQg7nvp6MmXAM/Z9hPT96jXeLz+E4LsCmtrwl1T+ vSIw+LLaa4TDHXolKXXPDaqzR5Nr+KVBpAK7XyHs74rya8vGGtqCKZu1OwN4jiyf4Jbn Hm9+hA9sthAGQ4pJpIzWhDF1AJr26TTbGYYVZxpviHvR+NDnOUGr5ReYk6hcjRws9zhv lGl+k4Ne19ed98gO97AHNXL5ufsiTxeO7gQXLYsc4Lbg7aQGrjvrtk/d9670IA+jzh+t zjgw== X-Forwarded-Encrypted: i=1; AJvYcCWVWiPZEYaPS68fFfQbodqCuI3RU60A/kw5MfA+JWru8e76gADBTTWfOQOxUn2RZZuedYpqLTFXsmzdIl2ONgdFBGR+EpcVqQ== X-Gm-Message-State: AOJu0YzsIeqG+QBMY2HC6UZBp3F5JxkRJBWeQKITDxR26R1ifgwjN/28 Loe/QG0Tr7N+tQRmzehg9bZIJsrmGIScHlzL64sKZWpmxTrRU0F6QjQ3x7edOqpg3RTz72WIhHB pJhpfI6umW08sUyVSj5v3xvA8Oj3reQ== X-Google-Smtp-Source: AGHT+IFVo0C77vx/6K5vaCVvNGaHe5BAWmCp/k/jEWvJauo4+sVUBVe5VaKT6iogRmqAa83KrMrAAOGEqq28oT++WKE= X-Received: by 2002:a17:907:8e93:b0:a6c:6f0a:e147 with SMTP id a640c23a62f3a-a714d72c2admr107335666b.12.1719082739568; Sat, 22 Jun 2024 11:58:59 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <2a6b92eb-d5e9-4a1a-9548-a068ac42ebd2@app.fastmail.com> <02ee8831-43a0-4857-886e-7f54fb42a99d@varteg.nz> <7d825b1d-e584-4916-9435-3561b9c54c26@gmail.com> In-Reply-To: Date: Sat, 22 Jun 2024 20:58:48 +0200 Message-ID: Subject: Re: [PHP-DEV] [Early Feedback] Pattern matching To: Robert Landers Cc: Niels Dossche , internals@lists.php.net Content-Type: multipart/alternative; boundary="0000000000000818ff061b7f247e" From: arnaud.lb@gmail.com (Arnaud Le Blanc) --0000000000000818ff061b7f247e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jun 21, 2024 at 7:20=E2=80=AFPM Robert Landers wrote: > > > I'm always surprised why arrays can't keep track of their internal > > > types. Every time an item is added to the map, just chuck in the type > > > and a count, then if it is removed, decrement the counter, and if > > > zero, remove the type. Thus checking if an array is `array` > > > should be a near O(1) operation. Memory usage might be an issue (a > > > couple bytes per type in the array), but not terrible.... but then > > > again, I've been digging into the type system quite a bit over the > > > last few months. > > > > And every time a modification happens, directly or indirectly, you'll > > have to modify the counts too. Given how much arrays / hash tables are > > used within the PHP codebase, this will eventually add up to a lot of > > overhead. A lot of internal functions that work with arrays will need > > to be audited and updated too. Lots of potential for introducing bugs. > > It's (unfortunately) not a matter of "just" adding some counts. > > Well, of course, nothing in software is "just" anything. Counters are not cheap as we need one slot for each type in the array so we need a dynamic buffer and an indirection, plus absolutely every mutation needs to update a counter, including writes to references. It is possible to remove the counters and to maintain an optimistic upper bound of the type (computing the type more precisely when type checking fails), but I feel this would not work well with pattern matching. Also, a few things complicate this: - Nested writes like $a[0][0][0]=3D1 need to backtrack to update the type o= f all parent arrays after the element is added/updated - Supporting references to properties whose type is a typed array, or dimensions of these properties, is very hard Fixed-type arrays may be easier to support but there are important drawbacks in usability IMHO. This does not play well with CoW semantics. Best Regards, Arnaud --0000000000000818ff061b7f247e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, Jun 21, 2024 at 7:20=E2=80=AFPM Robert Landers <landers.robert@gmail.com= > wrote:
> > > I'm always surprised why arrays can't keep track of = their internal
> > > types. Every time an item is added to the map, just chuck in= the type
> > > and a count, then if it is removed, decrement the counter, a= nd if
> > > zero, remove the type. Thus checking if an array is `array&l= t;int>`
> > > should be a near O(1) operation. Memory usage might be an is= sue (a
> > > couple bytes per type in the array), but not terrible.... bu= t then
> > > again, I've been digging into the type system quite a bi= t over the
> > > last few months.
> >
> > And every time a modification happens, directly or indirectly, yo= u'll
> > have to modify the counts too. Given how much arrays / hash table= s are
> > used within the PHP codebase, this will eventually add up to a lo= t of
> > overhead. A lot of internal functions that work with arrays will = need
> > to be audited and updated too. Lots of potential for introducing = bugs.
> > It's (unfortunately) not a matter of "just" adding = some counts.
>
> Well, of course, nothing in software is "just" anything.

Counters are not cheap as w= e need one slot for each type in the array so we need a dynamic buffer and = an indirection, plus absolutely every mutation needs to update a counter, i= ncluding writes to references. It is possible to remove the counters and to= maintain an optimistic upper bound of the type (computing the type more pr= ecisely when type checking fails), but I feel this would not work well with= pattern matching.

Also, a few things complicate this:
- Nested writes like $a[0][0][0]=3D1 need to backtrack to update the type o= f all parent arrays after the element is added/updated
- Supporting references to properties whose type is a typed array, or dimen= sions of these properties, is very hard

Fixed-type arrays may be easier to support but there are important drawback= s in usability IMHO. This does not play well with CoW semantics.

Best Regards,
Arnaud
--0000000000000818ff061b7f247e--