Hi Internals,
There’s an awkward hitch with removing integer support. Fair warning, we’re
about to get into some under the hood stuff:
During the compilation process (before any user data is involved), PHP
makes tiny optimisations to code e.g. ”a” . 1
becomes ”a1”
. In the
parts where this happens, the developer-defined values (strings, integers,
floats and booleans) are - technically accurately, but erroneously since we
aren’t doing full ’developer-defined’ support - treated as literal values,
and when concatenated the string gets (correctly, but erroneously) flagged
as a literal.
So, for example, a developer may wonder why $b is seen as a non-literal,
whereas $c is flagged as a literal:
$a = 1;
$b = "Hello " . $a; // Non-Literal
$c = "Hello " . 1; // Flagged Literal
This is because $b involves concatenation at run-time, and because $a is an
integer, it’s seen as a non-literal. Whereas $c has its value optimised by
the compiler into a single literal string, so it’s marked as literal.
Or for a second example, where the compiler can "coerce" an integer into a
string:
$a = "Hello ";
$b = $a . 2;
The compiler cannot do an optimisation based on the contents of $a, but it
can see that $a will be concatenated with the integer 2. To optimise this,
the compiler will coerce that integer into the string “2”, to make the
concatenation faster at runtime, so $b will be seen as a literal, which the
developer may find odd, due to the presence of the developer-defined
integer.
Now these aren’t security issues, and it doesn’t work the other way round:
is_literal()
doesn't incorrectly report any user (non-literal) data as a
literal. But nonetheless it is still technically inconsistent - as it looks
like it accepts ‘literal integers’ at some points and not others.
(And hence why when we were including integer support this was fine -
because it simply accepted integers too and so it was all consistent. As it
is, the list feedback was to not include them, as it’s not possible to
include a flag on integers to say if they are developer defined or not in
the same way we can with strings).
OPcache adds its own similar twist if it’s enabled, but with the added fun
that unlike PHP’s own optimisation processes, OPcache is by its nature
inconsistent when it runs, changing what it optimises and when based on a
number of factors (e.g. available memory) and so isn’t guaranteed to
optimise the code every time.
As to the specific issue, if you have:
$a = implode("-", [1, 2, 3]);
The variable $a will be set to "1-2-3", as a non-literal (because of the
use of integers).
However, if the OPcache is enabled, it can make its own optimisation,
performing the implode()
call early, and storing the literal string "1-2-3"
in the OPcache, so the implode()
function does not get used at runtime. In
effect the compiled script becomes:
$a = "1-2-3"; // Flagged as literal
(Which as before, may be ’literally’ correct but since it’s not supposed to
support integers…)
Or the OPcache while enabled doesn’t optimise it this time (because it’s
not guaranteed to), and leaves it as the previous example and ding
non-literal sighted.
Now while I imagine most people checking is_literal()
would, on seeing an
‘error’ like that, simply go to their code, realise there is an integer in
it, and then change it without any further interest, but it might still
cause them confusion if they wondered why they hadn’t seen the error every
time before.
Any thoughts?
Craig
There’s an awkward hitch with removing integer support.
Correct me if I'm wrong, but all those inconsistencies would happen
even if all integers were considered literal, e.g.
https://3v4l.org/C9YpE/vld#output clearly performed compile-time
concatenation with a float.
Now these aren’t security issues, and it doesn’t work the other way round:
is_literal()
doesn't incorrectly report any user (non-literal) data as a
literal.
I'd say it's fine that way.
OPcache adds its own similar twist if it’s enabled, but with the added fun
that unlike PHP’s own optimisation processes, OPcache is by its nature
inconsistent when it runs, changing what it optimises and when based on a
number of factors (e.g. available memory) and so isn’t guaranteed to
optimise the code every time.
Now that's a problem. If the same code produces different results for
expression literalness depending on external factors like available
memory it may pass in the test environment, but fail in production.
--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com