Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:105268 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 86820 invoked from network); 13 Apr 2019 20:36:44 -0000 Received: from unknown (HELO mail-ed1-f67.google.com) (209.85.208.67) by pb1.pair.com with SMTP; 13 Apr 2019 20:36:44 -0000 Received: by mail-ed1-f67.google.com with SMTP id d13so11066800edr.5 for ; Sat, 13 Apr 2019 10:34:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=iTIJm7BST95pmB6HR/BSxU3jqoGQ6mJsuJMi3KBY+ak=; b=kwVYeQMFjCWTQidRs8iNF0DZJNDLwx9JS/AxUAPDP/THu+ZiubM/S7bBGh13rzEbQ9 mVhTSvIIyZamZ/rfbXY9BqfjeDRJWU0KNUn4WE6aYIvlW5fYUmCabd466RzQO3/ePh6C Qn0uqzbIJNQw0AZ2jzPEOH//mbYJIaOumgdvQBYcje62jVTufPjLDUPS11iFk2PPRYn6 phXa0sR9qvR5I+4eLgZjaXdUAdZPHq6LIY5nvpv5Aj5kWZ8KFs/8uthHjgrQebAIZvTH zbRjILE71uSbzyxFLq/L8mO+DWPORmBe6hW0CT9IkwtQUwmPLpy1wiEs+u7Ons+S4H4v I6bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=iTIJm7BST95pmB6HR/BSxU3jqoGQ6mJsuJMi3KBY+ak=; b=s7dai1XBqeCO7a41QqqUBRIoJyWU5tJKk7fW3gmM6XeZ28kKWGLgOqLtmFOW1l5Yaj lmaI89EvjNowEMYxna6z2ObdV4EYnJbhGnl5nh31Y0x20ZWG3GcdMUk3kMD9mXISBlqp Hr+x8O6LokcVoKA8bF/VWL/HbxSSZ50WJ7Qoc8Mv/5rnczsGekODlKDxSpRTU0qe5DqS 3eA6MIpF32lRdVlgxwS4X1YKhjKG9DvqYdwObqKuMfArCMXLJwK80Nnt8MuAKGdQ7QpK gzyNYB6JP2mZuRqa5oqjlARdeBgUXItmHrcltfQ2lNQz2lk2HZ90PR2gaE0Z3CgiGec2 Argw== X-Gm-Message-State: APjAAAUI/XpNJEl9PoOCUN/8+AMc5UBeg5XGXnhBt0ZqE9u/XWJGckfH fxpmwhtCVwB/7Hh7Jmm0wCUUYvlN1qU= X-Google-Smtp-Source: APXvYqwHiTRpbQY1huewNvP97tvH5Oad9jeFW5ZFpQnkuXNyJcTLZECRkskoubvv6GBAltdWoXvGrg== X-Received: by 2002:a17:906:f0cd:: with SMTP id dk13mr33865431ejb.87.1555176880354; Sat, 13 Apr 2019 10:34:40 -0700 (PDT) Received: from [192.168.0.63] (84-75-30-51.dclient.hispeed.ch. [84.75.30.51]) by smtp.gmail.com with ESMTPSA id y12sm1938617edh.29.2019.04.13.10.34.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 13 Apr 2019 10:34:39 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) In-Reply-To: Date: Sat, 13 Apr 2019 19:34:35 +0200 Cc: Andrea Faulds , PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: <8081ADE5-DB6E-42B5-9BE4-CB8ED98A25FF@gmail.com> References: <40683e93-f8e9-5a8c-9646-31c73c99396f@fischer.name> <5ca53eb4.1c69fb81.e223b.922eSMTPIN_ADDED_MISSING@mx.google.com> To: Nikita Popov X-Mailer: Apple Mail (2.3445.104.8) Subject: Re: [PHP-DEV] [RFC] Permit trailing whitespace in numeric strings From: claude.pache@gmail.com (Claude Pache) > Le 9 avr. 2019 =C3=A0 12:47, Nikita Popov a = =C3=A9crit : >=20 > On Thu, Apr 4, 2019 at 1:16 AM Andrea Faulds wrote: >=20 >> Nikita Popov wrote: >>> I'm always a fan of making things stricter, but think that in this >>> particular case there are some additional considerations we should = keep >> in >>> mind. >>>=20 >>> 1. What is more important to me here than strictness is consistency. >> Either >>> both " 123" and "123 " are numeric, or neither are. Making "123 = " >>> numeric is a change we can easily do, because it makes the numeric = string >>> definition more permissive and is thus mostly backwards compatible. = Doing >>> the reverse change is certainly not compatible and will be a much = harder >>> sell. >>>=20 >>> 2. I believe that a large part of the motivation here is that by = making >> the >>> numeric string definition slightly more lax (in a consistent = manner), we >>> can make *other* things more strict, because this essentially = eliminates >>> the only "somewhat reasonable" case of trailing characters. The RFC >> already >>> mentions two of them: >>>=20 >>> a) We can hard reject "123foo" inputs to "int" arguments (and some = other >>> places). Currently this is allowed with a notice. I think if we = resolve >> the >>> trailing whitespace question, then there cannot be any reasonable >>> opposition to this change. >>> b) My own RFC on number to string comparisons would benefit from = this. >> From >>> initial testing it has surprisingly little impact, but one of the = few >> cases >>> that turned up was this comparison with a string that had trailing >>> whitespace. >>>=20 >>> Personally I think both of those changes are a lot more valuable = than a >>> stricter numeric string definition without leading/trailing = whitespace. >>=20 >> I'm kinda unsure how to go forward because of these points. I would = like >> to see improved comparisons, and I would like to see the end of the >> =E2=80=9Cnon-well-formed=E2=80=9D numeric string, and I think this = whitespace RFC could >> be helpful to both. But I can't see the future, I don't know whether >> people will vote for removing leading or permitting traiing = whitespace >> and whether or not they will be influenced by or this will influence >> opinion on the further improvements. =C2=AF\_(=E3=83=84)_/=C2=AF >>=20 >> I'm torn between: >>=20 >> * Vote on allowing trailing whitespace >> * Vote on disallowing leading whitespace >> * Vote on which of those two approaches to go for >> * Trying to bundle everything together and voting on it as a package. >>=20 >> I'm probably thinking too strategically. >>=20 >=20 > Given the response on the mailing list (and also other places like = Reddit), > it seems like people feel pretty strongly that it's better to drop = support > for leading whitespace than add support for trailing whitespace. If we = do > this, I think we should couple this change with the removal of "non > well-formed numeric strings", because they are so closely related (one > change would forbid leading whitespace and the other trailing = characters). >=20 > One possible course of action would be: >=20 > a) In PHP 7.4 throw a deprecation warning in is_numeric_string if = there is > leading whitespace (always). > b) In PHP 7.4 throw a deprecation warning in is_numeric_string if = there are > trailing characters in mode 1 (mode -1 already throws a notice and 0 > already treats as non-numeric). > b) In PHP 8.0 treat leading whitespace as non-numeric (always). > c) In PHP 8.0 treat trailing characters as non-numeric (always), and > remove the non well-formed distinction (mode -1). >=20 > Notably this also affects (int) behavior in that (int) " 42" will be = 0 > and (int) "42xyz" will be 0. >=20 > A less aggressive alternative would be: >=20 > a) In PHP 7.4 throw a deprecation warning in is_numeric_string if = there is > leading whitespace (unless mode is 1). > b) In PHP 8.0 treat leading whitespace as non-numeric (unless mode is = 1). > c) In PHP 8.0 treat leading characters as non-numeric (unless mode is = 1). > Remove non well-formed distinction (mode -1). >=20 > This would keep the behavior of (int) as-is and only affect implement > numeric string checks. >=20 > This discussion how mostly been around the implicit cases, what do = people > think about the desired behavior of (int)? >=20 > Regards, > Nikita I think that, whatever change we make around =E2=80=9Dwell-formed = numeric string=E2=80=9D (those treated as numeric by the comparison = operator, etc.), the casting operation itself (implicit as in `$a + $b` = or explicit as in `(float) $a`) must keep the same semantics, except = maybe with additional notices and/or warnings for the implicit case. = People depend on that semantics, and we must not break their code, = except maybe we could force them to be explicit (using (float)) if we = have sufficiently good reason for that. I=E2=80=99ll give use cases = further below. To keep the things simple, I suggest to classify the strings in three = categories:=20 1. Well-formed numeric strings; those, and those only, are considered as = valid numeric values for the sake of comparison, of typed parameters, of = is_numeric(), etc. Let=E2=80=99s be strict here, and let=E2=80=99s not = permit whitespaces. 2. Loose numeric strings. Explicit casting must continue to work as = today, since people expect that semantics (see first use case below); = ditto for implicit casting but with an additional notice or warning. = Those strings are of the form:=20 * optional whitespaces, followed by * well-formed numeric string, followed by * optional garbage. 3. Non-numeric strings. Explicit casting must continue to work as today, = as people may depend on that (see second use case below). A notice may = be acceptable if we don=E2=80=99t bother people using the evil @ = operator in more places than today. Ditto for implicit casting, but with = an additional warning, unless we decide to harshly throw a Throwable if = we have a good reason for that. ------ Here is a real-world use case, where I want the casting from =E2=80=9Cloos= e numeric string=E2=80=9D to float to continue to work as today: I have a CSS stylesheet. Because I find it more practical, I prefer to = encode the colours in the hwb (hue-whiteness-blackness) format:=20 ```css :root { --red-hue: 350; } .alert { color: hwb(var(--red-hue), 6% , 12%); } ``` However browsers do not (yet) understand hwb; so I must serve them as = hsl (hue-saturation-lightness): ```css .alert { color: hsl(var(--red-hue), 87.2%, 47%); } ``` The transformation between hwb() and hsl() is quite simple, and I use = PHP as a preprocessor (on my dev machine) for that purpose. The = algorithm does not need to include a true CSS parser, as I know in = advance that the `hwb()` expressions I=E2=80=99ve written myself contain = only either literal numbers or literal percentages for the = =E2=80=9Cwhiteness=E2=80=9D and =E2=80=9Cblackness=E2=80=9D components, = and no comma in the =E2=80=9Chue=E2=80=9D component: * find an occurence of "hwb("; * read characters up to the next ",": this is the =E2=80=9Chue=E2=80=9D = component; * read characters up to the next ",": this is the =E2=80=9Cwhiteness=E2=80= =9D component; * read next characters up to the next ")": this is the =E2=80=9Cblackness=E2= =80=9D component; * if the =E2=80=9Cwhiteness=E2=80=9D (respectively the =E2=80=9Cdarkness=E2= =80=9D) component contains the character "%", it is a percentage: divide = it by 100; * calculate =E2=80=9Csaturation=E2=80=9D and =E2=80=9Clightness=E2=80=9D = from =E2=80=9Cwhiteness=E2=80=9D and =E2=80=9Cblackness=E2=80=9D; * format and assemble the =E2=80=9Chue=E2=80=9D, =E2=80=9Csaturation=E2=80= =9D and =E2=80=9Clightness=E2=80=9D components into a "hsl(...)" = expression. If you have followed the algorithm, you should have noticed that at one = point I have the values: $whiteness =3D=3D=3D " 6% "; $darkness =3D=3D=3D " 12%"; which are exactly of the format =E2=80=9Cloose numeric string=E2=80=9D = (whitespaces, followed by well-formed numeric string, followed by = garbage) described above; and that I straightforwardly treat as numeric = (knowing that PHP permits that), with or without explicitly casting = depending on how much I am bothered by the =E2=80=9Cnot well-formed = numeric value=E2=80=9D notice. ---- Here is a use case where I want a string-to-float casting that never = fail (and to not change semantics either): I have config parameters stored in a database, with values in the form = of string. Some of those parameters are supposed to be numeric; for = example, two of them may represent the offset (in millimetres) of the = company logo, on printed material, relatively to a predefined position. = A missing value (either the empty string or null) is treated as zero. An = invalid value (such as "foo") is also happily treated as zero, as I = don=E2=80=99t bother to report the error at this point: it is = straightforwardly seen whether the logo is correctly placed or not. For = that purpose, I=E2=80=99m relying on the fact that the explicit = string-to-numeric casting operation is infallible. =E2=80=94Claude