Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119789 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 57099 invoked from network); 30 Mar 2023 13:49:13 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 30 Mar 2023 13:49:13 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2A1031804B3 for ; Thu, 30 Mar 2023 06:49:12 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 30 Mar 2023 06:49:11 -0700 (PDT) Received: by mail-pj1-f49.google.com with SMTP id l7so17289026pjg.5 for ; Thu, 30 Mar 2023 06:49:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680184150; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7gVx4TU3psZLaCWvYIa+OZKfHoOY+xZY/ufHtPf9q5k=; b=fG7z4mowyQHvRLXDzdNetPN7xua3aJEMDHFNEN8jvXv12pVWhHooaA6xDi1iL3oAHy HyK1AvdFMqd5vBYMAe6PWx5A5DnY7PxKAE14yCsel3zvHJqqG32ZlkMQnFBRvFcaq+Bs CaPIpoK8YwwbjV2kv3FWFF/ofRjQPQUU7v1zQ9MY4cO2iBvxfoKlUq67gRavh1bDQewY Z1A3YnVYDVvsaLi9u8wdisUz8FJvXjUUIjdwD8NHOEsthDzMUCHjdZVodNa9iRIZKsOk Ao+VCcBw1yKNibCctTr88h7Awms4x+0ZaitRdFV9/Cz5yBfEBtaFsxF1+QORQPRC1pLZ frQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680184150; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7gVx4TU3psZLaCWvYIa+OZKfHoOY+xZY/ufHtPf9q5k=; b=SZY0qtqqSsOivx6gEBh8KnzD06AnbsrGdX3wIsBw8bODv1wPm37CKY3mGnHCCrzDJJ emEep2Iax7yc7mm1fwbArkngHvCnjY3pYLqlYkZ6rFjBPrsm878Wih8VQbw/fjCcaWv9 Yo1fr4lLhh2McJc4JsYufokk7dO1ZDv/LD8AACTBuVqJVbx13RF0SVcV10BuONoTSxhZ GU62VG1YT+SWQEgBCaOarADHOYuaMixCdbza6NZw5mz3qZ4kWTotIPYAbzHk8fxTCx5Z Lh//wTCpXdJOw7uDYrWGBn2T7FlDLIB5NI9S3Cr91a1oQ8s8ApzTxD/KZyCaRyeqwKWm rlDw== X-Gm-Message-State: AAQBX9fmQwBPEUiq/2k6c2jSLTO1D6qP4cBIhCY47oVSoDp71/7bh34F CxvSc5vn6gdcY9VCLCX/0FZQz7zOc3B0XJs+hxwY0Edh9f0= X-Google-Smtp-Source: AKy350aii40+w0+JBkkb0fnk653A0wORxn4sxbPmOlU+X/ZrPKRfx+0H7/Btc3UIiUhn+wm5aLdPT+7PU7hhqQLkd4g= X-Received: by 2002:a17:902:ef94:b0:1a1:c109:3700 with SMTP id iz20-20020a170902ef9400b001a1c1093700mr8288533plb.7.1680184149961; Thu, 30 Mar 2023 06:49:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 30 Mar 2023 14:48:58 +0100 Message-ID: To: Mark Baker Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="0000000000006a6ec705f81e5bc0" Subject: Re: [PHP-DEV] [RFC] Define proper semantics for range() function From: george.banyard@gmail.com ("G. P. B.") --0000000000006a6ec705f81e5bc0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 30 Mar 2023 at 03:50, Mark Baker wrote: > On 28/03/2023 00:36, G. P. B. wrote: > > Hello internals, > > > > While working on analysing the impact of the changes proposed by amendi= ng > > the behaviour of the increment and decrement operators ( > > https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that the > > range() function has some rather lax behaviour that is very unintuitive= . > > > > I therefore propose the "Define proper semantics for range() function" > RFC > > to address the unintuitive behaviour that sees no usage and/or hide bug= s: > > https://wiki.php.net/rfc/proper-range-semantics > > > > The change propose to throw TypeErrors and ValueErrors for case where I > > couldn't find occurrences in the wild and hide bugs, and emit some > > E_WARNINGs for cases that are hard to detect via static analysis. > > Unlike your changes to the increment operator, I'd love to see this > rationalisation put in place, though like many here I don't see problems > with using a negative step with decreasing ranges, but would consider it > strange for increasing ranges. I still find it somewhat odd, but this is not a hill I'm going to die on. I've changed the behaviour to throw a ValueError if a negative step is provided with increasing range and accept negative steps for decreasing ranges. Furthermore, I've also made passing an empty string an E_WARNING with a cast to 0, same as the current behaviour. See new version: https://wiki.php.net/rfc/proper-range-semantics > And I do want to see some > case-consistency when working with string ranges. > > > I'd love to see it taken a stage (or two) further; returning an iterable > rather than an array (although that would be a bc break); and working > with strings (ASCII only) in the same way that the increment operator > does, so that range('A', 'IV') would be valid, and return `Z` then `AA`, > `AZ` then `BA`, etc. > Frankly I was also surprised that the behaviour with strings was to do an ASCII code point increment. As I would agree that range("Y", "AC") returning ["Y", "Z", "AA", "AB", "AC"] would have been more intuitive than the silently discarding everything past the 1st byte. However, I don't think there is much point in breaking BC to return a possible generator or fix the unfortunate string behaviour. I would rather that PHP creates dedicated syntax to creates ranges (e.g. $s..$e seems to be what most other programming languages settles on, although it might be slightly confused as concatenation) =C3=A0 la Ruby whi= ch allows objects that implement certain methods to also be used to generate ranges. This is IMHO way more powerful as it would allow the creation of Date ranges or other custom ranges. And part of this proposal could be to support the aforementioned alphabetical string ranges natively without needing to break BC on range() and let this function just fade away into obscurity. There is also this C++ talk from over a decade ago that argues that Ranges are better than iterator, so this might be an additional motivation as to why we would want this: https://accu.org/conf-docs/PDFs_2009/AndreiAlexandrescu_iterators-must-go.p= df > I am slightly surprised that you make no mention of the odd behaviour of > mixed alphameric strings, e.g. var_dump(range('A1', 'C5')) which returns > a purely alpha array 'A' to 'C'; or var_dump(range('3c', '5e')) which > returns numeric (3, 4, 5); or var_dump(range('1', '1e2')) which treates > `1e2` as scientific and returns 1..100. > Because I didn't think of this and was just well usual numeric string behaviour or non-numeric string behaviour that truncates the string. But that range('3c', '5e') is the only way to get an array of digits as strings, and it makes me want to shout into the abyss. I'm not sure it super worth to mention those cases, but I can add examples of this to the RFC after crying about the even more insane behaviour range() currently has. Best regards, George P. Banyard --0000000000006a6ec705f81e5bc0--