Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:103665 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 67534 invoked from network); 2 Jan 2019 17:18:48 -0000 Received: from unknown (HELO localhost.localdomain) (76.75.200.58) by pb1.pair.com with SMTP; 2 Jan 2019 17:18:48 -0000 To: internals@lists.php.net Date: Wed, 02 Jan 2019 14:50:48 +0100 In-Reply-To: References: User-Agent: tt v. 1.0.5; www.icosaedro.it/tt MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Posted-By: 87.3.51.130 Subject: Re: Inconsistent float to string vs. string to float casting From: salsi@icosaedro.it (Umberto Salsi) Message-ID: cmbecker69@gmx.de ("Christoph M. Becker") wrote: > [...] I tend to prefer the non-locale aware behavior, i.e. float to > string conversion should always produce a decimal *point*. Users still > can explicitly use number_format() or NumberFormatter if they wish. We all agree that the basic features of the language should NOT be locale-aware to make easier error reporting and logging, data file writing and parsing, session management, and libraries portability. But I would to restate this goal more clearly: FLOAT TO STRING CAST CONVERSION REPLACEMENT Given a floating-point value, retrieve its canonical PHP source-code string representation. By "canonical" I mean something that can be parsed by the PHP interpreter like a floating-point number, not like an int or anything else. Then, for example, 123.0 must be rendered as "123.0" not as "123" because it looks like an int; non-finite values NAN and INF must also be rendered as "NAN" and "INF". The "(string)" cast and the embedded variable in string "$f" are locale-aware, and so are all the printf() &Co. functions, including var_dump() (this latter a big surprise; anyone willing to send a data structure dump to end user?). The simplest way I found to get such canonical representation is =09$s =3D var_export($f, TRUE); which returns exactly what I expect, does not depend on the current locale, does not depend on exotic libraries, and it is very short and simple. It depends only on the current serialize_precision php.ini parameter, which should already be set right (or you are going to have problems elsewhere). STRING TO FLOAT CAST CONVERSION REPLACEMENT Given a string carrying the canonical representation of a floating-point number, retrieve the floating-point number. Syntax errors must be detectable. The result must be "float", not int or anything else. Unsure about how much strict the parser should be in these edge cases: "+1.2" (redundant plus sign) "123" (looks like int, not a float) "0123" (looks like int octal base) Getting all this is bit more tricky. The "(float)" cast does not work because it does not support non-finite values NAN,INF and does not allow to detect errors. The simplest way I found is by using the serialize() function: /** * Parses the PHP canonical representation of a floating point number. This * function parses any valid PHP source code representation of a "float", * including NAN, INF, -INF and -0 (IEEE 754 zero negative). Not locale awar= e. * @param string $s String to parse. No spaces allowed, apply trim() if need= ed. * @return float Parsed floating-point number. * @throws InvalidArgumentException Invalid syntax. */ function parseFloat($s) { =09// Security: untrusted strings must be checked against a basic syntax bef= ore =09// being blindly submitted to unserialize(): =09if( preg_match("/^[-+]?(NAN|INF|[-+.0-9eE]++)\$/sD", $s) !=3D=3D 1 ) =09=09throw new InvalidArgumentException("cannot parse as a floating point n= umber: '$s'"); =09// unserialize() raises an E_NOTICE on parse error and then returns FALSE= . =09$m =3D @unserialize("d:$s;"); =09if( is_int($m) ) =09=09return (float) $m; // always return what we promised =09if( is_float($m) ) =09=09return $m; =09throw new InvalidArgumentException("cannot parse as a floating point numb= er: '$s'"); } Here again, only core libraries involved, no dependencies from the locale, not so short but the best I found up now. Things like NumberFormatter require the 'intl' extension be enabled, and often it isn't. By using these functions all the possible "float" values pass the round-trip back and forth, including NAN, INF, -INF, -0 (zero negative, for what it worth) at the highest accuracy possible of the IEEE 754 representation. Regards, ___ /_|_\ Umberto Salsi \/_\/ www.icosaedro.it