Am 28.05.2026 um 17:19 schrieb Michael Morris tendoaki@gmail.com:
Hi internals,
I intend to submit an RFC introducing a new file extension for pure-code PHP source files (no leading <?php required) and would like to gather feedback before drafting.
Proposal in brief:
Files ending in .phpc would be parsed starting in ST_IN_SCRIPTING state. No <?php or ?> tags permitted inside such files. Existing .php files and their semantics are completely unchanged. This is purely additive and BC-clean.
Motivation:
PHP's mixed-mode default reflects its 1995 templating origins. Since PHP 7+, the language has evolved into a credible general-purpose tool: strict types, enums, readonly classes, property hooks, JIT compilation. I personally maintain PHPolygon, a CPU-bound 3D engine written in PHP – a use case where the templating heritage is pure ceremony. Other modern uses (CLI tooling, queue workers, code generators) share this pattern. A dedicated pure-code file format would be a small but meaningful acknowledgment that PHP-as-language is now a first-class use case alongside PHP-as-template.
Prior art and what's different:
I have read both rfc/source_files_without_opening_tag (Boutell, 2012, abandoned by author) and rfc/nophptags (Ohgaki, 2014, inactive). My proposal deliberately avoids what I believe were the two design choices that killed them:
- No new include syntax (Boutell's AS keyword). Extension-based detection only.
- No php.ini-based mode switch (Ohgaki's template_mode). No global config side effects.
- No security framing. The mode-switch overhead is parse-time only and OPcache/JIT eliminate it in practice; this proposal is about conceptual clarity and tooling, not performance or LFI mitigation.
Implementation:
I will write and maintain the implementation patch. Initial scope: extension registration in zend_compile_file, lexer state initialization, OPcache awareness, CLI support, and rejection of <?php/?> tokens inside .phpc files. I will also coordinate with Composer maintainers ahead of RFC submission to confirm autoload support.
Open questions for the list:
- Is the .phpc extension acceptable as the disambiguator, or is there appetite for something else (e.g. shebang line, declare directive – both of which I think are worse, but I'd hear the case)?
- Should #! shebang lines and UTF-8 BOM be permitted before the implicit scripting state begins? My intent is yes for both.
- Should __halt_compiler() retain its current behavior in .phpc files? My intent is yes.
I welcome substantive critique. If the concept itself is unwanted, I would rather know now than discover it during a vote.
Thanks.
Hendrik Mennen
Maintainer, PHPolygonI've proposed things very similar to this before, so these are my thoughts and points raised earlier.
Extension based parsing isn't a popular idea. The first few responses to this have echoed this, but there's a certain power in allowing the parser to not care about the extension.
IDE's will have to be updated for these files as they parse PHP based on the presence of the <?php tag being detected.
My suggestions have been to use a special require statement to pull these files in - require_module, require_library. This gets around the problem of the extension mattering. The rule could be set that these files are always include only once. However, this introduces a new problem - composer will need to somehow know which require technique to use if there's more than one to choose from.
PHP has several braceless syntax control structure commands - Would these stay around or should they too go away since they aren't really needed in a code first file.
The ability to avoid an accidental echo and therefore mess up headers is pretty valuable, but what else is going to be picked up? Can the token parser iterate over the file faster if it doesn't have to look for <?php ?> tags?
I would caution against any code only include being implemented without taking the opportunity to fix certain bugs that have gotten to stick around because of the enormous BC breaks such fixes introduce. There are no existing files in this format, therefore no existing code to have a BC break. This is a rare opportunity to clean some things that should not be passed on. Please, Please look into this as part of this.
Hi Michael,
Thanks, this is the densest critique so far and worth working through point by point. I am also aware that this topic overlaps significantly with the Modules direction you have been iterating on over the past couple of years, so some of these answers will inevitably touch on that prior work.
- Extension based parsing isn't a popular idea. The first few
responses to this have echoed this, but there's a certain power in
allowing the parser to not care about the extension.
Fair, and I have heard this from both Ben and Alex earlier in the thread. My pushback there has been that the engine already has the filename available at zend_compile_file and at include/require resolution, so teaching it to check the extension is additive rather than architecturally violating. But I take the point that this is a cultural preference on the list, not just a technical one. If there is a workable alternative that does not have the caller-vs-author inversion problem I raised in my reply to Alex, I am open to it.
- IDE's will have to be updated for these files as they parse PHP
based on the presence of the <?php tag being detected.
True, and the same is true for static analyzers (PHPStan, Psalm), the nikic/php-parser library that most of those depend on, and Composer's autoloader. I have on my list to talk to the Composer maintainers ahead of the formal RFC, and I would extend that to PhpStorm and the analyzer maintainers. The change for an IDE is small once the underlying parser library handles the new mode (treat the file as starting in scripting state). PhpStorm in particular already handles .phps and other variants, so the lift should be modest.
- My suggestions have been to use a special require statement to pull
these files in - require_module, require_library. [...] However,
this introduces a new problem - composer will need to somehow know
which require technique to use if there's more than one to choose
from.
This lines up with your Modules iterations, and is in the same family as Alex's include_pure and Tom Boutell's 2012 require AS INCLUDE_PURE_CODE. I have two reservations about it as the primary mechanism, and you have actually surfaced the second yourself:
- The parsing context lives with the caller, not the author. A library author cannot guarantee their files will be loaded the right way, because consumers might use the wrong require.
- The Composer autoload problem you describe. If there are multiple require variants, the autoloader has to pick one. The only way around that is per-file metadata that tells the autoloader which to use, which is exactly the kind of out-of-band signaling that extension dispatch avoids.
That said, as a complementary mechanism for explicit override or for stdin/eval cases where no filename exists, a require_pure or require_module-style call is genuinely useful. I floated this in my reply to Alex as part of a synthesis: extension as primary signal, CLI -p flag for stdin (Ben's suggestion), require_pure-family for explicit override (your and Alex's suggestion). The autoloader question gets simpler in that synthesis because the autoloader can dispatch on extension by default.
- PHP has several braceless syntax control structure commands - Would
these stay around or should they too go away since they aren't
really needed in a code first file.
Good question, and I think this deserves explicit discussion in the RFC. The alternative syntax (if: endif;, foreach: endforeach;, etc.) exists primarily because it reads better in templating contexts where you have <?php if ($x): ?>...HTML...<?php endif; ?>. In pure code, you would always use braces.
Two reasonable positions:
(a) Allow it for consistency with .php files. Linters and code style tools can flag it as inappropriate in pure-code files without the engine treating it as an error.
(b) Disallow it. The new file format has the freedom to be stricter, and the alternative syntax serves no purpose without templating.
My current preference is (a) for minimum surprise, but I would not fight hard against (b) if there is appetite on the list. This is exactly the kind of question I want to surface in the RFC text rather than decide unilaterally.
- The ability to avoid an accidental echo and therefore mess up
headers is pretty valuable, but what else is going to be picked up?
Can the token parser iterate over the file faster if it doesn't
have to look for <?php ?> tags?
On parse speed: marginally yes, but in practice the difference is invisible. OPcache caches the opcodes after the first parse, so for any non-cold-start request the lexer mode-switch is not in the hot path. JIT does not change this picture either, since the inline-HTML T_ECHO opcodes that mixed-mode files produce are not what JIT optimizes anyway.
The accidental-echo / headers-already-sent issue is the real practical win. That bug class disappears entirely in pure-code files. I would list this as the second motivation in the RFC after the conceptual one.
"What else is going to be picked up" - this connects directly to your point 6, which I think deserves its own treatment.
- I would caution against any code only include being implemented
without taking the opportunity to fix certain bugs that have gotten
to stick around because of the enormous BC breaks such fixes
introduce. There are no existing files in this format, therefore no
existing code to have a BC break. This is a rare opportunity to
clean some things that should not be passed on. Please, Please look
into this as part of this.
This is the most important point you raised and I want to be careful with it. You are right that introducing a new file format is a once-in-a-decade opportunity to ship semantics that would otherwise be blocked by BC. Your Modules proposals have been built around exactly this argument, Hack took the same route, and the temptation to bundle improvements is real and well-founded.
My reservation is purely tactical: every additional semantic change is an additional reason a voter might decline. An RFC scoped to "new file format that starts in scripting state" is already non-trivial to defend. An RFC scoped to "new file format with stricter type juggling, no eval, no variable variables, automatic strict_types, no @ suppression" is a substantially harder sell, and the list has a history of preferring focused RFCs over Christmas trees. I want to avoid the derailment pattern.
I think the right approach is to surface this in the RFC text as an explicit Open Question: "Should pure-code files differ from .php files in semantics beyond the entry parsing state, and if so, in which specific ways?" Rather than committing to a list now, I would let the list propose specific cleanups and only include those that have clear consensus.
Given your years of iteration on this concept, you almost certainly have a more grounded list of candidate cleanups than I do. Would you be willing to share which specific bugs or semantic issues you had in mind? Some that come to my own mind: automatic declare(strict_types=1), banning variable variables, banning @ error suppression, banning the global keyword. But I would value your concrete examples more than my abstract candidates.
Thanks again for the substantive engagement.
Hendrik Mennen
Maintainer, PHPolygon