[RFC] Namespace and Parse tags on Include and Require

13 years ago by Michael Morris — view source

unread

I have made a wiki account with user name MichaelMorris - I don't
think I have permissions to submit an RFC as of yet. I'll post this
here for now. I've brought this up before, but can now simplify the
original proposal since the decision to always have <?= available
addresses much of the original problem.

These changes apply to include, require, include_once and
require_once. Two new arguments for these functions are proposed for
introduction.

The first is boolean - whether to look for php tags of any sort in the
file. The default needs to be to expect them, this is the backwards
compatible behavior, and I'm thinking this should be 0. 1 means "no
tags in file". This means the file to be included can be written
without any php tags at all. This might allow the parser to speed up,
but that would be a side benefice to two goals.

One: Framework designers can more strongly enforce code separation.
For example, if a coder attempts to jump to html output in what the
framework expects to be a database interface class - which shouldn't
need to output html - there would be a syntax crash caused by this.
Naturally the obstinate coder could work around this with echo and
print.

Two: About once a week someone on sitepoint writes a panic post about
header() not working - because of whitespace before or after the tags

sometimes accidentally inserted. By eliminating tags from class
definition files this problem can be be mitigated - the odds of
accidental output to the browser is lessened.

Problems
The largest problem is with IDE's. There is no current convention to
warn them that the file is pure PHP. However, I think this can be
mitigated by adopting an extension for php include files -- *.pif,
*.iphp are two possibilities. The language itself doesn't need to
give a care about this directly.

The second parameter is a string -- which namespace the code exists
in. Currently a file is always imported to the root namespace. If a
namespace is specified then the file is imported to that namespace.
If the file has a namespace definition, then that namespace becomes a
sub-namespace when imported. Two reasons for this change

It feels weird to declare a namespace at the start of every template
file if you use php directly as a template engine instead of smarty.

Dynamic namespace resolution becomes possible. This is a powerful but
potentially huge can of worms for the PHP programs since it allows
autoloaders can decide for itself which namespace to class requests
into. Consider the following:

$db = new DB();

The autoloader for the framework would look in the extensions and if
there isn't a DB class, it would load the core DB into the root
namespace. If it sees an extension it would load that instead to the
root namespace. That file's class declaration however could read..

class DB extends Core\DB

So the autoloader would then load the Core DB class into the Core
namespace since it could do so with

require ('path/to/core/DB.php', 'Core');

This provides a powerful layer of flexibility. It also allows a tyro
to whack their foot off. Thoughts?

Again, as soon as I have the ability to submit an RFC I'll get this
all up on the wiki.

13 years ago by Hannes Magnusson — view source

unread

I have made a wiki account with user name MichaelMorris - I don't
think I have permissions to submit an RFC as of yet. I'll post this

You do now.
Things much smother though when you actually read the registration page.
We don't everyone write karma just by registering.

-Hannes

13 years ago by Michael Morris — view source

unread

Ok, with Hannes help I have the RFC up now.

https://wiki.php.net/rfc/changes_to_include_and_require

13 years ago by John Crenshaw — view source

unread

Ok, with Hannes help I have the RFC up now.

https://wiki.php.net/rfc/changes_to_include_and_require

WRT tagless files, in addition to the issues already raised by others:

What about script execution? You'll still need a shebang (linux) or to register the extension (Windows). Having a special shebang but making the <?php still optional seems silly. Conversely not being able to start in script mode from the place most likely to only have script also makes no sense.
Apache will also not like these files very much. Possible to work around if you use a different extension, but will present a huge problem for legacy server configurations (which will still be ubiquitous for years after any such change).
Autoloaders. Even if an extension convention is adopted this will be horribly painful. Consider the work and additional (twice as many!!!) disk accesses required to make a simple autoloader for PEAR convention libraries work with both formats.
General code interoperability is a serious problem here too. If such a change were adopted developers would have to open and examine any file prior to inclusion to ensure that they know which format the file is in. Using the wrong type of include would be a huge problem.
Closing the tag that wasn't opened in the middle of such a file would be incredibly strange
I expect that this would create problems with the variety of opcode caches out there (since caches would now need to become aware of which mode of file they have cached vs. which mode the file is getting included in).

To me the autoloader and interoperability issues are the most critical because they directly impact usability and performance basically everywhere.

John Crenshaw
Priacta, Inc.

13 years ago by Ferenc Kovacs — view source

unread

The first is boolean - whether to look for php tags of any sort in the
file. The default needs to be to expect them, this is the backwards
compatible behavior, and I'm thinking this should be 0. 1 means "no
tags in file". This means the file to be included can be written
without any php tags at all. This might allow the parser to speed up,
but that would be a side benefice to two goals.

I don't like this, I mean one can end up printing out his sourcecode if

those files are publically available through the document root
or
if he somewhere used a wrong argument, and includes a tagless file with
the (default) with-tags option.
on the other hand, a html file page with php code examples (but without php
tags) could be turned into executing those examples if the tagless option
is used.

I would support the namespace option though.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

13 years ago by Michael Morris — view source

unread

I don't like this, I mean one can end up printing out his sourcecode if

those files are publically available through the document root

This can also occur if the server is mis-configured. That said, one
way to deal with this. One would be to allow the server itself to
start PHP with the file being loaded as tagless. In either event
though we're dealing with a config change, and users don't always
remember to do those when they switch versions.

Would current software break though? I don't think so. Anyone doing
this would (should?) be aware of the ramifications. I'm personally
not a fan of putting all the PHP files in the web document root, but I
can understand why its done and I know its the current most common
practice.

or

if he somewhere used a wrong argument, and includes a tagless file with
the (default) with-tags option.
on the other hand, a html file page with php code examples (but without php
tags) could be turned into executing those examples if the tagless option is
used.

That's actually up in the air as to how this is implemented. If
implemented such that the parser just appends "<?php" to the start of
the file inclusion (the laziest way to implement this feature) then
any code block in there could possibly execute.

To your first point this would mean the first block of code would be
ignored and echoed out if it was mis-imported up until the parser
finds a closing tag, and then finds a new opening tag. Then code
would begin executing, and an attacker might pull off something with
this. More likely though the code is going to parse error having come
into the execution at an unexpected moment.

If implemented such that 'tagless' truly means tagless and using php
tags would result in a parse error in that mode the attack becomes
much harder to pull off.

So with your first point the if they import a tagless file with the
wrong file everything gets echoed out as text because there would be
no tags to pick up on unless the attacker is very sneaky and puts it
in the quotes of an echo statement.

As to the html, trying to import a html file with php code examples as
a tagless file is going to fail, and fail hard, with the very first
html tag or the doctype declaration of <!DOCTYPE html>

I feel neither example is really a new vulnerability though. If
anything, it makes code injection a bit trickier. But the basic mode
of exploit isn't going to change.

And the programmer working with this should be able to correct his
mistake and move on.

I would support the namespace option though.

Of the two this one is actually the one that scares the heck out me in
terms of what can go wrong if it is misused.

13 years ago by keisial@gmail.com — view source

unread

I have made a wiki account with user name MichaelMorris - I don't
think I have permissions to submit an RFC as of yet. I'll post this
here for now. I've brought this up before, but can now simplify the
original proposal since the decision to always have <?= available
addresses much of the original problem.

These changes apply to include, require, include_once and
require_once. Two new arguments for these functions are proposed for
introduction.

(...)

Problems
The largest problem is with IDE's. There is no current convention to
warn them that the file is pure PHP. However, I think this can be
mitigated by adopting an extension for php include files -- *.pif,
*.iphp are two possibilities. The language itself doesn't need to
give a care about this directly.
Tagless files interpreted as php is the wrong way to go.
I think you should instead propose it as:

A file included in that mode MUST begin with <?php.
?> is forbidden in such mode unless followed by EOF.

13 years ago by Michael Morris — view source

unread

2012/3/6 Ángel González keisial@gmail.com:

Tagless files interpreted as php is the wrong way to go.
I think you should instead propose it as:

A file included in that mode MUST begin with <?php.

?> is forbidden in such mode unless followed by EOF.

Ever work with older versions of subversion or vi? (Many editors will
put a carriage return after the last character of the file). So no,
you're wrong. Requiring ?> at EOF would just be stupid.