Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:103665
To: internals@lists.php.net
Date: Wed, 02 Jan 2019 14:50:48 +0100
In-Reply-To: <a7761c2f-5716-279b-d701-5f588b856277@gmx.de>
References: <a7761c2f-5716-279b-d701-5f588b856277@gmx.de>
User-Agent: tt v. 1.0.5; www.icosaedro.it/tt
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: Inconsistent float to string vs. string to float casting
From: salsi@icosaedro.it (Umberto Salsi)
Message-ID: <php.internals-103665@news.php.net>

cmbecker69@gmx.de ("Christoph M. Becker") wrote:

> [...] I tend to prefer the non-locale aware behavior, i.e. float to
> string conversion should always produce a decimal *point*.  Users still
> can explicitly use number_format() or NumberFormatter if they wish.

We all agree that the basic features of the language should NOT be
locale-aware to make easier error reporting and logging, data file writing
and parsing, session management, and libraries portability. But I would
to restate this goal more clearly:

FLOAT TO STRING CAST CONVERSION REPLACEMENT

Given a floating-point value, retrieve its canonical PHP source-code
string representation. By "canonical" I mean something that can be
parsed by the PHP interpreter like a floating-point number, not like
an int or anything else. Then, for example, 123.0 must be rendered as
"123.0" not as "123" because it looks like an int; non-finite values
NAN and INF must also be rendered as "NAN" and "INF". The "(string)"
cast and the embedded variable in string "$f" are locale-aware, and so
are all the printf() &Co. functions, including var_dump() (this latter a
big surprise; anyone willing to send a data structure dump to end user?).

The simplest way I found to get such canonical representation is

=09$s =3D var_export($f, TRUE);

which returns exactly what I expect, does not depend on the current
locale, does not depend on exotic libraries, and it is very short and
simple.  It depends only on the current serialize_precision php.ini
parameter, which should already be set right (or you are going to have
problems elsewhere).

STRING TO FLOAT CAST CONVERSION REPLACEMENT

Given a string carrying the canonical representation of a floating-point
number, retrieve the floating-point number. Syntax errors must be
detectable. The result must be "float", not int or anything else.
Unsure about how much strict the parser should be in these edge cases:

"+1.2" (redundant plus sign)
"123" (looks like int, not a float)
"0123" (looks like int octal base)

Getting all this is bit more tricky.  The "(float)" cast does not work
because it does not support non-finite values NAN,INF and does not allow
to detect errors.  The simplest way I found is by using the serialize()
function:

/**
 * Parses the PHP canonical representation of a floating point number. This
 * function parses any valid PHP source code representation of a "float",
 * including NAN, INF, -INF and -0 (IEEE 754 zero negative). Not locale awar=
e.
 * @param string $s String to parse. No spaces allowed, apply trim() if need=
ed.
 * @return float Parsed floating-point number.
 * @throws InvalidArgumentException Invalid syntax.
 */
function parseFloat($s)
{
=09// Security: untrusted strings must be checked against a basic syntax bef=
ore
=09// being blindly submitted to unserialize():
=09if( preg_match("/^[-+]?(NAN|INF|[-+.0-9eE]++)\$/sD", $s) !=3D=3D 1 )
=09=09throw new InvalidArgumentException("cannot parse as a floating point n=
umber: '$s'");
=09// unserialize() raises an E_NOTICE on parse error and then returns FALSE=
.
=09$m =3D @unserialize("d:$s;");
=09if( is_int($m) )
=09=09return (float) $m; // always return what we promised
=09if( is_float($m) )
=09=09return $m;
=09throw new InvalidArgumentException("cannot parse as a floating point numb=
er: '$s'");
}

Here again, only core libraries involved, no dependencies from the locale,
not so short but the best I found up now. Things like NumberFormatter
require the 'intl' extension be enabled, and often it isn't.

By using these functions all the possible "float" values pass the
round-trip back and forth, including NAN, INF, -INF, -0 (zero negative,
for what it worth) at the highest accuracy possible of the IEEE 754
representation.


Regards,
 ___ 
/_|_\  Umberto Salsi
\/_\/  www.icosaedro.it