Hi all,
If this is going to be implemented as a class, what is the advantage
of instantiation for this? Unless I'm missing it, I would propose that
the functions are made static.
In other words, I would prefer this:
echo Escaper::escapeHtml('<b>test</b>');
over this:
$e = new Escaper;
echo $e->escapeHtml('<b>test</b>');
Regards,
Tomas
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).https://wiki.php.net/rfc/escaper
Best regards,
Paddy
2012/9/19 Tomas Creemers tomas.creemers@gmail.com
Hi all,
If this is going to be implemented as a class, what is the advantage
of instantiation for this? Unless I'm missing it, I would propose that
the functions are made static.In other words, I would prefer this:
echo Escaper::escapeHtml('<b>test</b>');
over this:
$e = new Escaper;
echo $e->escapeHtml('<b>test</b>');Regards,
Tomas
Hi,
I guess the reason is the same like the one, why you just should avoid
static methods at all. But only one example: Try to extend the class and
then always use the extended one ;)
Regards,
Sebastian
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).https://wiki.php.net/rfc/escaper
Best regards,
Paddy--
2012/9/19 Tomas Creemers tomas.creemers@gmail.com
Hi all,
If this is going to be implemented as a class, what is the advantage
of instantiation for this? Unless I'm missing it, I would propose that
the functions are made static.
[snip]
Regards,
Tomas
Hi,
I guess the reason is the same like the one, why you just should avoid
static methods at all. But only one example: Try to extend the class and
then always use the extended one ;)Regards,
Sebastian
Isn't that what late static binding is for? It enables the use of the
extending class (if any) from the base class.
I really don't see what class instantiation would add to this design
(if it's going to be a class at all). It doesn't have
instance-specific state.
Regards,
Tomas
You did notice the character encoding parameter to the constructor? The point of the class is to share that little piece of state and omit it as a required method parameter thus removing one OOP layer for those practicing OOP like all the major frameworks.
The RFC notes already that character encoding parameters are NOT optional. They MUST be set on each call outside of the class to enforce explicitness and prevent the currently popular option of imposing a non-configurable default in libs and frameworks. Character encoding is important in escaping and assuming that they are interchangeable doesn't always fit the reality of browser behaviour and bugs.
This would apply to static calls as much as plain functions.
Paddy
2012/9/19 Tomas Creemers tomas.creemers@gmail.com
Hi all,
If this is going to be implemented as a class, what is the advantage
of instantiation for this? Unless I'm missing it, I would propose that
the functions are made static.[snip]
Regards,
Tomas
Hi,
I guess the reason is the same like the one, why you just should avoid
static methods at all. But only one example: Try to extend the class and
then always use the extended one ;)Regards,
SebastianIsn't that what late static binding is for? It enables the use of the
extending class (if any) from the base class.I really don't see what class instantiation would add to this design
(if it's going to be a class at all). It doesn't have
instance-specific state.Regards,
Tomas
You did notice the character encoding parameter to the constructor? The point of the class is to share that little piece of state and omit it as a required method parameter thus removing one OOP layer for those practicing OOP like all the major frameworks.
The RFC notes already that character encoding parameters are NOT optional. They MUST be set on each call outside of the class to enforce explicitness and prevent the currently popular option of imposing a non-configurable default in libs and frameworks. Character encoding is important in escaping and assuming that they are interchangeable doesn't always fit the reality of browser behaviour and bugs.
This would apply to static calls as much as plain functions.
Paddy
I missed the encoding parameter. While it's still possible to add that
to a static-only class, that would be more cumbersome and less correct
than instantiation (since the encoding is state, technically). My
apologies. Carry on ;-)
Tomas
[snip]
I really don't see what class instantiation would add to this design
(if it's going to be a class at all). It doesn't have
instance-specific state.Regards,
Tomas
No need to apologise ;). Just wanted to clarify that the character
encoding drives the choice of class since it can be easy to miss its
importance - amended the RFC a little to highlight it.
Paddy
On Wed, Sep 19, 2012 at 12:55 PM, Tomas Creemers
tomas.creemers@gmail.com wrote:
You did notice the character encoding parameter to the constructor? The point of the class is to share that little piece of state and omit it as a required method parameter thus removing one OOP layer for those practicing OOP like all the major frameworks.
The RFC notes already that character encoding parameters are NOT optional. They MUST be set on each call outside of the class to enforce explicitness and prevent the currently popular option of imposing a non-configurable default in libs and frameworks. Character encoding is important in escaping and assuming that they are interchangeable doesn't always fit the reality of browser behaviour and bugs.
This would apply to static calls as much as plain functions.
Paddy
I missed the encoding parameter. While it's still possible to add that
to a static-only class, that would be more cumbersome and less correct
than instantiation (since the encoding is state, technically). My
apologies. Carry on ;-)Tomas
[snip]
I really don't see what class instantiation would add to this design
(if it's going to be a class at all). It doesn't have
instance-specific state.Regards,
Tomas--
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
I missed the encoding parameter. While it's still possible to add that
to a static-only class, that would be more cumbersome and less correct
than instantiation (since the encoding is state, technically). My
apologies. Carry on ;-)
It's probably already been covered, but I don't like the fact that
it's a class at all.
There's nothing wrong with an ini value to start with (defaulting to X
if it is unrecofnised), then ini_set()
to change the value at runtime
if required, and finally implementing everything as normal functions
that accept an override encoding as an optional parameter for those
one-off cases.
It feels like this is just using classes for the sake of using
classes, adding an unnecessary layer of complexity (and discussion)
for no real reason except that is the RFC authors preference.
Hi Leigh,
I missed the encoding parameter. While it's still possible to add that
to a static-only class, that would be more cumbersome and less correct
than instantiation (since the encoding is state, technically). My
apologies. Carry on ;-)It's probably already been covered, but I don't like the fact that
it's a class at all.There's nothing wrong with an ini value to start with (defaulting to X
if it is unrecofnised), thenini_set()
to change the value at runtime
if required, and finally implementing everything as normal functions
that accept an override encoding as an optional parameter for those
one-off cases.
First off, bear in mind that the class is a preferred implementation
that does not preclude the implementation of similar functions. As a
class it just eases usage for those who are OOP obsessed ;). I work on
Zend Framework and we already pass around a similar object.
Secondly, the problem with a php.ini option or a default function
parameter value is that it ignores the prevailing best practice of
always specifying a character encoding explicitly. htmlspecialchars()
is a perfect example here. Many uses will not specify an explicit
encoding so we find applications with UTF-8 or other encodings for
their output performing escaping to ISO-8859 in PHP 5.3. Please,
search Github for htmlspecialchars - I'm not exaggerating. Now search
for the same thing in any large framework and you'll never see this
because those either followed best practice or already attracted a
security report about it.
Explicitly requiring the parameter just follows recommended practice.
Now, let's consider adding it to php.ini. This follows the exact same
reasoning as programmers will develop to their platform in ignorance
of any others. It's just another assumed default that needs to be
tweaked anyway - and probably will be only rarely.
The point here is to eliminate opportunities to do this wrong and
simplify doing it correctly. Flexibility for flexibility's sake
doesn't actually solve an existing poor practice - it simply gives it
a new life extension. Programmers need to know that character encoding
selection is intrinsic to escaping. How many programmers will be
stranded by PHP's switch to UTF-8 because they actually did use
ISO-8859 output encoding? How many are stranded with UTF-8 escaping
from PHP 5.4 while not using that encoding? Being explicit may be
perceived as a PITA per function call but its necessary to eliminate
unpredictability in the backend.
It feels like this is just using classes for the sake of using
classes, adding an unnecessary layer of complexity (and discussion)
for no real reason except that is the RFC authors preference.
Anything any class can do, could be done procedurally. That doesn't
mean it is. Many programmers use objects, dependency injection and all
that other stuff. Adding the class simplifies usage in an OOP setting
and ideally helps remove the barrier of having to rewrap functions
into a class for those who do practice OOP regularly. So, yes,
obviously it's a preference but not an unnecessary layer of complexity
since it actually simplifies overall usage in the OO setting.
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
It feels like this is just using classes for the sake of using
classes, adding an unnecessary layer of complexity (and discussion)
for no real reason except that is the RFC authors preference.
+1
First the discussion was filtering vs escaping. Now it is about how to implement it as a class etc.
I don't understand why people have any issue with it being a core procedural function. You can call that from your OO code just fine. Just like any other core procedural function.
There is no reason it has to be OO. OO has to call procedural functions already (string functions, array functions, curl, json, etc etc); but us procedural purists don't have to call OO methods and classes if we don't need to (except now some of the DateTime IIRC which bugs me :p)
Anyway how hard is it to use something akin to the filter functions? That's what constants/flags are for.
Call it str_escape(string, flags optional, encoding optional) and be done with it. Since it won't be useful to have escape_var or escape_input type of differentiation and this seems like it could just fit under the string family of functions (I am a fan of namespaced functions by prefix)
After that it seems like the discussion would be:
- do we even need encoding or is UTF8 just fine
- what are the flags to be defined for different escaping methods
$.02
It feels like this is just using classes for the sake of using
classes, adding an unnecessary layer of complexity (and discussion)
for no real reason except that is the RFC authors preference.
+1First the discussion was filtering vs escaping. Now it is about how to implement it as a class etc.
I don't understand why people have any issue with it being a core procedural function. You can call that from your OO code just fine. Just like any other core procedural function.
Yes, but typing the encoding every time is cumbersome. Or, if you don't
want to set it every time, you'd have to set it globally. Then, you
forgot to change it back somewhere when you're dealing with multiple
encodings, and it all goes wrong.
There is no reason it has to be OO. OO has to call procedural functions already (string functions, array functions, curl, json, etc etc); but us procedural purists don't have to call OO methods and classes if we don't need to (except now some of the DateTime IIRC which bugs me :p)Anyway how hard is it to use something akin to the filter functions? That's what constants/flags are for.
Call it str_escape(string, flags optional, encoding optional) and be done with it. Since it won't be useful to have escape_var or escape_input type of differentiation and this seems like it could just fit under the string family of functions (I am a fan of namespaced functions by prefix)
After that it seems like the discussion would be:
- do we even need encoding or is UTF8 just fine
UTF8-only is certainly not just fine.- what are the flags to be defined for different escaping methods
$.02
--
Andrew Faulds
http://ajf.me/
Yes, but typing the encoding every time is cumbersome. Or, if you don't want
to set it every time, you'd have to set it globally. Then, you forgot to
change it back somewhere when you're dealing with multiple encodings, and it
all goes wrong.
then write a reusable function. same amount of code as an OO method.
since everyone likes to make OO classes for every little thing.
After that it seems like the discussion would be:
- do we even need encoding or is UTF8 just fine
UTF8-only is certainly not just fine.
That's fine :) Just a suggestion for discussion. Keep encoding in
then! (I was thinking about htmlspecialchars and such)
Personally, I would like to see it operate similar to MySQLi, where you
have the convenience of OOP, but can still call a function directly in a
procedural manner.
And I definitely feel like we need encoding. We can default it to UTF-8 or
to the zend.script_encoding if set, but I think it needs the flexibility to
handle different encoding types as well.
It feels like this is just using classes for the sake of using
classes, adding an unnecessary layer of complexity (and discussion)
for no real reason except that is the RFC authors preference.+1
First the discussion was filtering vs escaping. Now it is about how to
implement it as a class etc.I don't understand why people have any issue with it being a core
procedural function. You can call that from your OO code just fine. Just
like any other core procedural function.There is no reason it has to be OO. OO has to call procedural functions
already (string functions, array functions, curl, json, etc etc); but us
procedural purists don't have to call OO methods and classes if we don't
need to (except now some of the DateTime IIRC which bugs me :p)Anyway how hard is it to use something akin to the filter functions?
That's what constants/flags are for.Call it str_escape(string, flags optional, encoding optional) and be done
with it. Since it won't be useful to have escape_var or escape_input type
of differentiation and this seems like it could just fit under the string
family of functions (I am a fan of namespaced functions by prefix)After that it seems like the discussion would be:
- do we even need encoding or is UTF8 just fine
- what are the flags to be defined for different escaping methods
$.02
--
--
"My command is this: Love each other as I
have loved you." John 15:12
Hi,
Am 19.09.2012 um 18:11 schrieb Michael Stowe me@mikestowe.com:
Personally, I would like to see it operate similar to MySQLi, where you
have the convenience of OOP, but can still call a function directly in a
procedural manner.
There seems to be a need for a procedural API. As their is one, let’s do it similar to how MySQLi etc. does it and use a context resource:
$ctx = escape_context_create('UTF-8');
$str = escape_html_attr($ctx, $str);
And so on.
cu,
Lars
There seems to be a need for a procedural API. As their is one, let’s do it similar to how MySQLi etc. does it and use a context resource:
$ctx = escape_context_create('UTF-8');
$str = escape_html_attr($ctx, $str);
why bother with that? it's called function parameters. (and even
better, named parameters if PHP ever implemented those... :))
Call it str_escape(string, flags optional, encoding optional) and be done with it.
Keeping it simple definitely preferred
- do we even need encoding or is UTF8 just fine
Definitely need encoding.
mbstring supports quite a lot
http://php.net/manual/en/mbstring.supported-encodings.php
I think you'd need to at least approximately match those encodings,
perhaps there is code already there that can be depended upon?
Hi Michael,
It feels like this is just using classes for the sake of using
classes, adding an unnecessary layer of complexity (and discussion)
for no real reason except that is the RFC authors preference.+1
First the discussion was filtering vs escaping. Now it is about how to implement it as a class etc.
I don't understand why people have any issue with it being a core procedural function. You can call that from your OO code just fine. Just like any other core procedural function.
I have never once expressed a problem with this being a set of
procedural function. Not once. The RFC offers some suggested function
signatures. So nobody has expressed any issues and nobody has insisted
that you be required to use a class or object.
There is no reason it has to be OO. OO has to call procedural functions already (string functions, array functions, curl, json, etc etc); but us procedural purists don't have to call OO methods and classes if we don't need to (except now some of the DateTime IIRC which bugs me :p)
Then please - call the functions as defined in the RFC.
Anyway how hard is it to use something akin to the filter functions? That's what constants/flags are for.
The RFC addresses escaping, not input filtering. Yes, there is a fine
line between them but the filter method requires constants, options,
and we would then need to later in character encoding. The resulting
mutation would be a step backwards in my opinion in guiding users
towards the secure use of escaping in applications.
Call it str_escape(string, flags optional, encoding optional) and be done with it. Since it won't be useful to have escape_var or escape_input type of differentiation and this seems like it could just fit under the string family of functions (I am a fan of namespaced functions by prefix)
After that it seems like the discussion would be:
- do we even need encoding or is UTF8 just fine
- what are the flags to be defined for different escaping methods
Correct encoding is essential. The entire planet does not use UTF-8,
and UTF-8 is not the same as other encodings once you get over the
theoretical perfection that should exist and meet the rebels:
browsers. Please bear in mind that using the correct encoding has been
preached for many many years as a minimum requirement in secure
escaping for PHP.
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
I have never once expressed a problem with this being a set of
procedural function. Not once. The RFC offers some suggested function
signatures. So nobody has expressed any issues and nobody has insisted
that you be required to use a class or object.
Sorry; it just felt like the conversation was getting so heavy on syntax that people would be viewing the only way to implement the functionality would be requiring an OO interface to it. Look at all of the examples people post so far :)
The RFC addresses escaping, not input filtering. Yes, there is a fine
line between them but the filter method requires constants, options,
and we would then need to later in character encoding. The resulting
mutation would be a step backwards in my opinion in guiding users
towards the secure use of escaping in applications.
I brought up the filter functions as an example of semantics/syntax. Not functionality. I understand and agree with them being separate.
Correct encoding is essential. The entire planet does not use UTF-8,
and UTF-8 is not the same as other encodings once you get over the
theoretical perfection that should exist and meet the rebels:
browsers. Please bear in mind that using the correct encoding has been
preached for many many years as a minimum requirement in secure
escaping for PHP.
As mentioned before I removed my discussion point about encoding. It was just an idea. I think it should be UTF8 by default but accept encoding as a parameter.
I think we are probably on the same page here. I have not looked at the RFC but was just seeing floods of example code coming through the mailing list and started being vocal based on my concerns from that. An RFC doesn't mean it will happen exactly as the RFC defines it :)
2012/9/19 Tomas Creemers tomas.creemers@gmail.com
On Wed, Sep 19, 2012 at 8:34 AM, Sebastian Krebs krebs.seb@gmail.com
wrote:2012/9/19 Tomas Creemers tomas.creemers@gmail.com
Hi all,
If this is going to be implemented as a class, what is the advantage
of instantiation for this? Unless I'm missing it, I would propose that
the functions are made static.[snip]
Regards,
Tomas
Hi,
I guess the reason is the same like the one, why you just should avoid
static methods at all. But only one example: Try to extend the class and
then always use the extended one ;)Regards,
SebastianIsn't that what late static binding is for? It enables the use of the
extending class (if any) from the base class.
late static binding is for runtime-resolvement of the class within the
class. If you spread the (external) call "FooEscaper::escapeJs()" all over
you code, you'll have much fun changing every occurence of "FooEscaper"
once you extend it.
Regards,
Sebastian
I really don't see what class instantiation would add to this design
(if it's going to be a class at all). It doesn't have
instance-specific state.
Regards,
Tomas
2012/9/19 Tomas Creemers tomas.creemers@gmail.com
On Wed, Sep 19, 2012 at 8:34 AM, Sebastian Krebs krebs.seb@gmail.com
wrote:2012/9/19 Tomas Creemers tomas.creemers@gmail.com
Hi all,
If this is going to be implemented as a class, what is the advantage
of instantiation for this? Unless I'm missing it, I would propose that
the functions are made static.[snip]
Regards,
Tomas
Hi,
I guess the reason is the same like the one, why you just should avoid
static methods at all. But only one example: Try to extend the class and
then always use the extended one ;)Regards,
SebastianIsn't that what late static binding is for? It enables the use of the
extending class (if any) from the base class.I really don't see what class instantiation would add to this design
(if it's going to be a class at all). It doesn't have
instance-specific state.Regards,
Tomas
Oh and just to throw that in: If the additional variable (or the extra
line) is a "problem"
echo (new Escaper)->escapeHtml('<b>test</b>');
// vs.
echo Escaper::escapeHtml('<b>test</b>');