Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:112576 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 5279 invoked from network); 21 Dec 2020 19:04:38 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Dec 2020 19:04:38 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id EED8D180503 for ; Mon, 21 Dec 2020 10:37:08 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_40,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 21 Dec 2020 10:37:08 -0800 (PST) Received: by mail-ed1-f54.google.com with SMTP id h16so10515907edt.7 for ; Mon, 21 Dec 2020 10:37:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=JtEZLDY2S9xndZI7rnYDuITHINtIYBnx0QYfOoLhxzY=; b=CIIVj9cICVe2Ldp0yuc3XCHpf7z2bbTtwcb6XUB/cj4UNWHbQkFXh+w31dfVmpG3ev Z51o2df7BAHOT8zuV+oszGXhB4CGzX4b9JkocYcII3sTjgrs0QjPFgbyr5lTmrzMl81n +hGoMs3ED8hF+O9QbU+F7RqgJco9FD+GPNwsfKjrONNVGHgWi3v/D2ESzPJ2kMF/JgtM tlMpdB0Ag9GQSkXSI1g7wmNcUCziyCDAC8OpV5HZ5zIeBFoixRb/rxeIjxEkxjKIcdD/ 3zh0MsaUt+8dLdSIXXq9P/AfrWt05JT9ZA/I3mQ+sgXHoCZRfd+pIf0uw+Mfz9XcoVE2 QhHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=JtEZLDY2S9xndZI7rnYDuITHINtIYBnx0QYfOoLhxzY=; b=GczTi/bv8mi7YrwNGlxvJiCns34nfo92NhB2Ne35vxwbgiZgBJ4TGW2uowXXRaioZp 4cR38f21HGfeigYQL4JAWqUo1nH2UtGAhzwFyGzZNwrzwph8+tgUE0utNvB6D+TYUXU5 jfKx2ea4gtCtACb95ehXMy+U+7WNXFbRp1/rA7cxxKED428bfx8FXsVF4h5ntIUTaYhz zjDxmoIRyC5jlROpB07HS6W9GQxxXHwn/7Uu0aBG7EnvfhZAqFatiHLZL9uHj/VpH00V AVCE5rMcK8x6jC6alLrz94Fen34fCkYJhmUCNqXabvaxfbD3ztEU25x0rUrzhCXnUZuW 5zOQ== X-Gm-Message-State: AOAM530rq+JCn1pbujkezR5Q328cRbESZ/hoa3TcztFtP1kmSbFsAxZl XNJC7bpb5KhWrTtkUr8rtJICoFZmlF5EvowqJBfigO14+237Jw== X-Google-Smtp-Source: ABdhPJwzaH8ajjP6F28M2crD0Tya45TF7RGbNgCFVkeuDkWOOwk/BUKirdX7qLrj0adNkhmEOvBh3Q04rjlYshGTKRA= X-Received: by 2002:a05:6402:7d7:: with SMTP id u23mr16799176edy.325.1608575826203; Mon, 21 Dec 2020 10:37:06 -0800 (PST) MIME-Version: 1.0 Date: Mon, 21 Dec 2020 19:36:54 +0100 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary="000000000000b7188f05b6fdbe08" Subject: [PHP-DEV] Follow up on Octal literal RFC From: george.banyard@gmail.com ("G. P. B.") --000000000000b7188f05b6fdbe08 Content-Type: text/plain; charset="UTF-8" Hello internals, The implementation for the RFC still hasn't landed due to some helpful remarks made by Tyson Andre. The issues lie not with the core functionality itself but how to amend extensions which have a notion of numeric string literals. I have added test cases for the extension but some of the behaviour is rather surprising. I'll try to detail them below, but they are all hopefully covered with a test case in my PR. [1] GMP: According to the GMP extension the following strings are valid numbers: var_dump(gmp_init('0x')); var_dump(gmp_init('0X')); var_dump(gmp_init('0b')); var_dump(gmp_init('0B')); all evaluate to 0, but var_dump(gmp_init('')) Is not and will throw a TypeError. Filter: According to the filter extension the following strings are valid numbers var_dump(filter_var('0x', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_HEX)); var_dump(filter_var('0X', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_HEX)); and evaluate to 0, but, the following octals var_dump(filter_var('O', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL)); var_dump(filter_var('', FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL)); Are invalid and will evaluate to false, the case '0' is debatable if it should be considered a valid integer, but the following case is also invalid according to the filter extension: var_dump(filter_var("010", FILTER_VALIDATE_INT)); As it is interpreted as an octal number and not decimal. Base conversion functions in standard math lib: We'll be looking at base_convert() as it exhibits the same behaviour than bindec(), octdec(), and hexdec() (except for one case which will be covered later) // Binary to decimal: var_dump(base_convert('0b', 2, 10)); var_dump(base_convert('0B', 2, 10)); var_dump(base_convert('', 2, 10)); // Octal to decimal: var_dump(base_convert('0o', 8, 10)); var_dump(base_convert('0O', 8, 10)); var_dump(base_convert('', 8, 10)); // Hexadecimal to decimal var_dump(base_convert('0x', 16, 10)); var_dump(base_convert('0X', 16, 10)); var_dump(base_convert('', 16, 10)); These all evaluate to 0 (for base_convert it will be a string and thus "0", for the explicit functions it will return an integer). Now, onto the weird special case, which looks like a bug in the implementation of base_convert() as var_dump(base_convert('O', 8, 10)); Will emit the following deprecation warning (but only if the starting base is 8): Deprecated: Invalid characters passed for attempted conversion, these have been ignored in %s on line %d string(1) "0" But when using octdec() it doesn't. As you can see, the behaviour is rather suboptimal and inconsistent. I can see a couple of ways on how to handle this: 1. Make the octal prefix behave according to the respective extension, thus no BC, but more surprising results. 2. Make the octal prefix behaviour of GMP and filter extension sane (as base_convert and co, already supports it) and leave the current behaviour for Hex (and Binary for GMP) 3. Same as 2, but warn for these edge cases such that we can error out in PHP 9 4. Make BC break in PHP 8.1, and remove all the edge cases for GMP and the Filter extension, what to do with base_convert() and co would still be up to debate. 5. Something else? On top of this, I wonder if for the filter extension if we should be adding the following flags: FILTER_FLAG_ALLOW_EXPLICIT_OCTAL and FILTER_FLAG_ALLOW_IMPLICIT_OCTAL to remove ambiguity and deprecate later the usage of FILTER_FLAG_ALLOW_OCTAL without at least one of the above two flags. (one could also add a flag to allow leading 0 for decimal numbers.) Hope to hear your thoughts about this. Best regards, George P. Banyard [1] https://github.com/php/php-src/pull/6360 --000000000000b7188f05b6fdbe08--