Hi all,
Per Seifeddine's suggestion to keep this out of the karma-request
thread, I'm opening a pre-RFC discussion for scalar object methods --
calling a small, curated set of methods directly on scalar values, e.g.
$str->trim(), (3)->pow(2). There's a complete, tested implementation and
a full write-up (links below); I'd like to surface the strongest
objections before I write the formal RFC.
Disclosure: I built this with an AI assistant (Claude) as a tool. The
design and the decisions are mine, and I've independently verified the
engine behaviour, performance, JIT correctness, leak-freedom and the BC
scan. Flagging it up front for transparency.
I know "methods on primitives" was proposed and declined before
(Nikita's 2014 "Methods on primitive types in PHP"). The reason it
stalled was loose typing: $x->trim() would need a runtime type check and
would behave differently depending on what $x held. This proposal
sidesteps that entirely, by generalizing the resolution Nikita himself
suggested in that thread -- requiring an explicit cast where the type
isn't already clear.
The idea: dispatch only on receivers the compiler already knows are
scalar. The method call is rewritten at compile time to an ordinary call
into an internal backing class -- no runtime type dispatch, no new
opcode, the object method-call path is untouched. A receiver qualifies
only if its type is guaranteed syntactically: a literal, a
(string)/(int) cast, a concatenation/interpolation, a non-nullable
scalar-typed property, or a call with a declared non-nullable scalar
return type. An untyped $x->trim() is left exactly as today (Error).
Crucially, dispatch never depends on optimizer-inferred types, so
behaviour is identical with and without opcache.
echo " Hello World "->trim()->upper(); // "HELLO WORLD"
echo (3)->pow(2); // 9
echo "hello"->length()->pow(2); // 25 -- length():int
chains into the int methods
So the cast Nikita proposed, ((string) $num)->chunk(), is only needed
where the type isn't already guaranteed; everywhere else the dispatch is
sound by construction, with no runtime check.
It's intended as one proposal with two independent votes:
-
Scalar methods on guaranteed free receivers (the above). A pure
capability -- it adds a way to call scalar operations and changes
nothing about untyped code. Proposed initial sets: a small curated Str
(trim/upper/lower/length + contains/startsWith/endsWith), Int
(abs/pow/clamp), and Float (round/ceil/floor/abs); bool deliberately
gets none (its operations are operators, not methods). The sets are
governed by explicit criteria and are the easiest thing to tune in
discussion. -
Scalar-typed local variables (int $x = ...;, scalar types only),
which additionally make a typed local a guaranteed receiver (string $s =
...; $s->trim()). This is the more contested half -- it also carries the
"local type discipline" argument -- so it's a separate vote: a "no" here
ships the capability without typed locals.
What I'm deliberately NOT doing, up front so it's not a surprise:
- No method-call-result receivers ($this->getName()->trim()) -- that
would rest on return-type covariance under inheritance; not worth the
surface. - Int::abs/pow return int|float (they can overflow, as the global
functions do), so they're honest terminals -- they don't chain.
(Int::clamp is the one initial int method provably :int for all inputs,
so it does chain.) - No int|false typed locals -- that's a sentinel state, not a committed
type; ?T is supported, sentinel-unions are not. - The backing classes are internal-only (NUL-prefixed name, like
anonymous classes): class_exists('Str') is false, no Reflection,
userland "class Str {}" can't collide.
Implementation status -- this is built and tested, not a sketch:
- Scalar methods add zero new opcodes -- the desugar emits an ordinary
static call, and the object method-call path is byte-for-byte unchanged.
(Typed locals add dedicated *_TYPED assignment opcodes, but the untyped
hot path stays byte-identical.) - Performance (deterministic callgrind, release build): the untyped hot
path is byte-identical; the standard bench.php suite is +0.145%
instructions, entirely from predicted-not-taken branches in reference
opcodes only, with zero added cache misses or branch mispredictions. A
typed-local write benchmarks at ~0.79x the cost of a typed-property
write -- a check the language already runs on every typed-property write
since 7.4. - References (the objection that sank prior typed-locals attempts) are
enforced through every path -- =&, by-ref params,
array/object/static-prop refs, yield, closure capture, $$name, extract,
$GLOBALS, global -- via the existing typed-property reference machinery.
Leak-checked under stress. - Correct under JIT in all three modes (interpreter, function, tracing
-- differential byte-identical output). opcache SHM + file_cache
round-trip verified. - BC impact, measured: an AST scan of the 1,000 most-downloaded
Packagist packages (173k+ files) found zero method-call sites with a
guaranteed-scalar receiver -- i.e. zero call sites that change behaviour
(every such site is a fatal error today). Userland Str classes (incl.
Laravel's Illuminate\Support\Str) coexist with the backing class, verified.
Full write-up (RFC draft, plus the method-set, performance, and
BC-impact analyses):
https://github.com/kralmichal/php-src/tree/rfc/docs/scalar-object-methods-rfc
Implementation branches (PHP 8.6-dev base):
- Primary (scalar methods):
https://github.com/kralmichal/php-src/tree/rfc/scalar-methods - Secondary (typed locals, stacked):
https://github.com/kralmichal/php-src/tree/rfc/typed-locals
What I'd value discussing before I write the formal RFC:
- Does the "compile-time-guaranteed receivers only" framing actually
resolve the loose-typing objection, or is there a hole I'm not seeing? - The method-set and naming is the most open part -- is a small
curated, clean-slate set (distinct from the procedural names) the right
direction, or a non-starter? How should it relate to the existing
userland efforts in this space (e.g. Psl)? - Anything that would sink this before I invest in the full RFC.
Thanks,
Michal Kral
Hi all,
Per Seifeddine's suggestion to keep this out of the karma-request
thread, I'm opening a pre-RFC discussion for scalar object methods --
calling a small, curated set of methods directly on scalar values, e.g.
$str->trim(), (3)->pow(2). There's a complete, tested implementation and
a full write-up (links below); I'd like to surface the strongest
objections before I write the formal RFC.
Hi Michal,
Thanks for the detailed write-up. I'll be upfront: I'm against this
feature. I have two concrete objections to the approach and one
broader objection to the idea itself.
The idea: dispatch only on receivers the compiler already knows are
scalar. The method call is rewritten at compile time to an ordinary call
into an internal backing class -- no runtime type dispatch, no new
opcode, the object method-call path is untouched. A receiver qualifies
only if its type is guaranteed syntactically: a literal, a
(string)/(int) cast, a concatenation/interpolation, a non-nullable
scalar-typed property, or a call with a declared non-nullable scalar
return type.
This is my main objection, and I think it's a fatal one.
Restricting dispatch to receivers "the compiler already knows are
scalar" sounds safe, but in practice it covers almost no real code.
PHP is compiled one file at a time, with no view into other files. The
compiler does not perform whole-program analysis; when compiling a
given file, it generally has no knowledge of declarations in other
files and haven't been autoloaded yet.
So take your own qualifying rule, "a call with a declared non-nullable
scalar return type":
class Example {
public static function getStr(): string { return "x"; }
}
$x = Example::getStr();
$x->length();
A human reads this and knows $x is a string. But the PHP compiler,
when compiling the file that contains $x = Example::getStr(), only
knows getStr() returns string if Example happens to be declared
in the same file. Move Example into its own autoloaded file (which
is how essentially all real code is organised) and the compiler has no
idea what getStr() returns at compile time. So $x->length() would
not dispatch, even though the type is completely determined.
The result is that the same expression works or fails depending on
whether a class is in the same file or autoloaded from another one.
That's not a predictable rule a developer can hold in their head.
Real-world values emerge from call chains, conditionals, and
cross-file boundaries, The cases the compiler can't prove
syntactically. The feature ends up usable only on literals and casts
and almost nothing else.
This also forces special handling in static-analysis tools because the
compiler and the analyser will disagree about the same line. Given
$x->length(), PHPStan/Psalm/Mago will happily infer $x is a string
and accept it, while the compiler rejects it.
- The backing classes are internal-only (NUL-prefixed name, like
anonymous classes): class_exists('Str') is false, no Reflection,
userland "class Str {}" can't collide.
My second objection. If $s->length() is sugar for Str::length($s),
then Str (or whatever backs it) must be visible to userland in
some form. Static analysers (PHPStan, Psalm, Mago, Phan, PhpStorm,
...) need a definition describing which methods exist, to type-check
calls, support "go to definition," report wrong arity, and the
PHP-based ones need it to be reflectable.
The collision concern you're solving with NUL-prefixing isn't worth
that cost. Userland already has thousands of Str classes; the clean
fix is to namespace the backing classes (or simply pick non-colliding
names), not to hide them from the entire ecosystem. Solve the naming
problem with naming, not by blinding the tooling.
(Related, and unaddressed in the write-up: what happens with methods
that take arguments, e.g. $s->indexOf($y)? Does an arity mismatch
fail at compile time, or at runtime with ArgumentCountError like a
normal method call?)
Finally, the broader point. Even if both of the above were fully
resolved, I'd still be against this. PHP has an established way of
doing these operations and scalar methods don't remove that.
trim($s), mb_trim($s), and $s->trim() would all coexist, with
the method form available only sometimes. To me that's too disruptive
to the norm for what it buys: it doesn't replace anything, it doesn't
compose cleanly, and it introduces a method-call syntax on values that
carry no object identity. I don't think the language is better for it.
Cheers,
Seifeddine.