[PHP.next] Error-handling using "Error Events"

11 years ago by Rowan Collins — view source — reply

unread

Hi All,

One of the things that I'd love to see on the roadmap, or at least the
brainstorm, for PHP.next is some kind of review of error handling. I've
been thinking about this for a while, but hestitated to post until I had
a positive suggestion, not just a list of whinges.

By "error handling", I guess I actually mean "message handling", or
something - everything from E_ERROR down to E_STRICT is currently just a
string of text, with a few ini settings and the ability to register a
single global handler. A lot of the time, this is fine, because these
messages should simply be displayed on the developer's screen, or
appended to the production server's log; but sometimes, it's useful to
know that certain non-fatal things happened - as evidenced by the
existence of ad hoc facilities like libxml_get_errors().

I think the core of PHP could do more to help with this, standardising
things and making them more flexible for the user.

The basic gist I'm suggesting below is that we review and classify all
the existing messages, and promote them from fragile strings into more
useful "message events", with a filtered listener system providing
everything we currently have and more.

Sorry it's got a bit long; consider it a draft RFC ;)

Exceptions for fatals?

I know there was a big discussion about this a while back, and I didn't
read all of it, so I'm not going to go into it here. However, if we were
to review the classification of errors, a proper hierarchy of Exception
classes might be somewhere to put what are currently fatal errors. I
mention this first just to point out that most of what I'm about to
discuss doesn't apply so well to fatal errors, since they can't be
handled in the same ways.

Review of severity

There are currently a lot of errors, warnings, notices, etc in PHP, but
which ones have which severity sometimes feels a bit arbitrary and
inconsistent. I think it would be good to have clear guidelines of what
those severities should mean, and which messages should therefore fall
under which. Severities can also change over time as circumstances
alter. For instance, it bugs me that referencing an undefined class
constant is a fatal error, but referencing an undefined global (or
namespace) constant is only a Notice; code relying on unquoted string
literals has been considered badly written for longer than I've known
PHP, so perhaps it's time to either remove the fallback completely, or
at least raise the message to Warning level.

Classification of messages

The assumption which underpins a lot of what follows is that errors can
and should be classified by type as well as severity. At the moment, the
messages have no identity, they are just strings; this makes handling
them convoluted and fragile - unless you are just logging or displaying
everything that happened, you have to perform a string match, often
masking out variable parts of the message with a regex or prefix-only
match. Ideally, it should be possible to improve the wording for human
consumption without breaking machine handling of that type of event.

My suggestion is that each existing message could be assigned a
"namespace" (the extension name, or section of core), a "type" within
that namespace (analagous to an Exception sub-class), and an ID (like
the numeric code of an exception). The human readable message could then
be tweaked, translated, etc, without appearing to be a completely new
message to any code trying to handle it. Note that applying this to
existing code is mostly trivial as far as assigning a namespace and
message ID to each string; the only hard decision would be assigning
"types" to group similar but non-identical messages in larger extensions.

Treating messages as events

Given the above structured representation of messages, it ought to be
possible to replace the current one-at-a-time set_error_handler() with
something more like a set of registered event listeners. Every time a
message was raised, the object representing it would be passed, in turn,
to all interested listener callbacks. It might be a good idea to let
listeners define the order they are called via a relative priority.

The object passed could be mutable, like an Event in JavaScript, so a
listener could, say, lower the severity of a particular message; it
could also have methods to stop other listeners from being called at
all. Also, since this was brought up a lot as an advantage of
exceptions, it would presumably be possible to include the stack trace
of each message - perhaps only collecting it if a listener expressed an
interest in such when it was registered.

Various existing functionality could be implemented in core, but
expressed as "pseudo-listeners" - not callbacks per se, but registerable
with the same system - e.g. "display_plain", "display_html",
"write_to_log", etc. A "collect" pseudo-listener could implement the
same kind of behaviour as libxml_use_internal_errors(true), pushing each
message into some kind of collection object for later access.

Selectivity of handling

The power of the above scheme would come if you could register a
"message listener" not just for a set of severities, but for a
particular namespace, type, or even single message ID. As each listener
was registered, the options selected could be saved as a value+mask pair

if you want everything in the libxml namespace with severity warning,
the mask would be blank for type and ID; despatching an event would
involve calculating the "fingerprint" of the current message, iterating
through the registered list of handlers, and calling any of them that
matched.

The advantage to this is two-fold: first, it means less boilerplate code
in the listener functions, since the input is pre-filtered; and second,
it's much more efficient to not fire a callback from the engine than it
is to fire a callback which performs some boilerplate logic and decides
to do nothing.

This code is called frequently, so needs to be very efficient; however,
some logic of this sort presumably already happens to check the various
user settings and the severity mask provided to set_error_handler(). I
imagine some pre-optimisation could also happen when listeners are
registered and unregistered - special cases for zero listeners, or one
unfiltered listener, for instance.

Sidenote: selectivity of catch() blocks?

While thinking about the above, it occurred to me it would be nice to
have a syntax for catching exceptions by their code as well as their
class. Basically, a sugar for this boilerplate:

catch ( FooException $e ) {
if ( $e->getCode() != FooException::EX_NO_FOOS ) {
throw $e;
}
/* handle lack of foos ... */
}

Any thoughts?

Lexical scope

Most error handling is dynamically scoped - "from now until I tell you
otherwise, treat these messages like this" - but occasionally it would
be nice to have it lexically scoped, as in "for any message raised
directly in this file, or set of lines, do this". The use case I have in
mind is legacy code, such as an old PEAR module which you plan to
replace wholesale, but are unlikely to patch - I want to be able to set
a flag on include saying "this file is poorly written third party code,
please don't display warnings about it".

Since messages already pass through the file path and line number they
occurred on, this could in principle be implemented as part of the
pre-filtering discussed above. This would make the "fingerprint" to be
matched a lot longer, or the check more complex; perhaps a hash of the
file path would be more efficient; also, if no registered listener had
such a filter in its mask, all filename-related logic could be optimised
away.

Alternatively, it could be stored in a completely separate list, and
swapped into the main list when code in the relevant file was executing.

An interesting thought on lexical scope is that with the right syntax, a
listener could be registered at compile-time, rather than run-time:
rather than registering a listener when line 50 is executed, and
unregistering it when line 55 is executed, a block spanning lines 50-55
could register a permanent listener, masked for all except those lines.

The infamous "shut up" operator (@)

No discussion of error-handling would be complete without mentioning
this little oddity, although I admit to a slight ignorance of exactly
how it works, and why it causes the compiler to skip optimisations.
(e.g. https://gist.github.com/nikic/6699370)

One thought I had was that you could have a special syntax that could
register a high-priority "discard" pseudo-listener for a few lines of
lexical scope (hopefully that makes sense if you've read this far);
something like this:

suppress_messages {
$fh = fopen('foo');
}

That wouldn't be quite the same as the current @ operator - if you
replaced fopen() with a user-defined wrapper, you'd need dynamic scope
again - but it would replace some use cases. I'm not sure if it would
actually make sense or not, but it gives you an idea of where my mind is
going with the whole "listener"/"pseudo-listener" concept.

So, thank you for reading this far (assuming you actually did).
Thoughts? Feedback? Brickbats?

Regards,

--
Rowan Collins
[IMSoP]

11 years ago by Marco Schuster — view source — reply

unread

Hi,

The infamous "shut up" operator (@)

No discussion of error-handling would be complete without mentioning this
little oddity, although I admit to a slight ignorance of exactly how it
works, and why it causes the compiler to skip optimisations. (e.g.
https://gist.github.com/nikic/6699370)

One thought I had was that you could have a special syntax that could
register a high-priority "discard" pseudo-listener for a few lines of
lexical scope (hopefully that makes sense if you've read this far);
something like this:

suppress_messages {
$fh = fopen('foo');
}

That wouldn't be quite the same as the current @ operator - if you replaced
fopen() with a user-defined wrapper, you'd need dynamic scope again - but it
would replace some use cases. I'm not sure if it would actually make sense
or not, but it gives you an idea of where my mind is going with the whole
"listener"/"pseudo-listener" concept.

The fopen shut-up actually could be resolved much, much more elegant:
make it throw real, usable exceptions, instead of returning a FALSE
and spitting out a warning.

Something like DomainNotFoundException, ConnectionDeniedException,
FileNotFoundException or whatnot. It's a nightmare if you actually
want to know what happened.

Of course one could also just use the cURL library for http/ftp
requests, but it doesn't have exceptions either... and it's a massive
boilerplate together with checking every freaking curl_set_option for
a FALSE return value.

Marco

11 years ago by Rowan Collins — view source — reply

unread

Marco Schuster wrote (on 09/04/2014):

The fopen shut-up actually could be resolved much, much more elegant:
make it throwreal, usable exceptions, instead of returning a FALSE
and spitting out a warning.

Good point. Perhaps the real question is: what are the use cases - real
or perceived - for the @ operator, and how, if we were re-organising
error and message handling in general, can they best be replaced.

The only problem is that changing the behaviour of such a basic function
is quite a major compatibility break, and there may need to be some way
of switching to the old behaviour in order to use code which pre-dates
the change, or write code which works in both versions.

Something like DomainNotFoundException, ConnectionDeniedException,
FileNotFoundException or whatnot. It's a nightmare if you actually
want to know what happened.

The problem with such specific exceptions in this case is that different
stream wrappers would want to throw completely different exceptions; I
guess they could all extend a generic FileAccessException.

Regards,

Rowan Collins
[IMSoP]

11 years ago by Stas Malyshev — view source — reply

unread

Hi!

Good point. Perhaps the real question is: what are the use cases - real
or perceived - for the @ operator, and how, if we were re-organising
error and message handling in general, can they best be replaced.

There are these major use cases for using @ as far as I can see:

Shutting up noisy function because I don't care about warnings, I
just want it to do what's possible and return failure code if it's not.
E.g. - I want to json_decode some data which may or may not be JSON, if
it doesn't decode it's ok, I'll just try something else, I don't see it
to tell me what's wrong with it because I don't care, it either works or
it doesn't. I.e. the situation where if it fails, I don't care why it
failed, I just want failure code.
Similar to the previous, but a bit different twist: if I want to
handle the error myself, in some different way than outputting a string
to the screen.

@fopen is the frequent example for both of 1 and 2. Sometimes you want
to just ignore the file if it doesn't open. Sometimes you want to handle
it, but file pointer being false is enough to catch it. Of course, it'd
be even better if you could extract the reason why it failed but you
often don't want it where warning reporting system puts it. Same goes
for @unlink, @include, etc.

@$foo['bar'] - i.e. get me $foo['bar'] if it's there, null if it's
not. Because writing isset($foo['bar'])?$foo['bar']:null each time is so
damn annoying. ?: helps in some cases but not all.

Something like DomainNotFoundException, ConnectionDeniedException,
FileNotFoundException or whatnot. It's a nightmare if you actually
want to know what happened.
The problem with such specific exceptions in this case is that different
stream wrappers would want to throw completely different exceptions; I
guess they could all extend a generic FileAccessException.

Knowing what happened is very useful, however I don't think exception
hierarchy is a good way to keep this info. You very rarely would want to
do different things depending on why opening your file failed - was it a
disk error? permission? network problem? In any case, you probably would
want to log the error somewhere (here's where what happened is useful)
and move on (or bail out if the file was critically needed). Class
hierarchy is useless here, one class with good toString and maybe a
couple of other API methods would be much more useful.

In general, my experience is that converting all common errors to
exceptions leads to a lot of code like:

$success = true;
try { ... }
catch(Exception $e) {
$sucess = false;
}
if($success) { ... }

which is plain ugly. Also, exceptions are expensive, so that would also
lead to a lot of boilerplate checking for conditions that otherwise
would be ignore - i.e. each time we try to open the file we'd have to
check if it exists and if it's readable and so on, to avoid expensive
exception.
@ actually works better in this case, but how it does it under the hood
is ugly, clunky and quite expensive too. If we could keep the agility of
@ while fixing the underlying ugliness and making the error still
accessible and useable when needed, that would be a great thing.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Julien Pauli — view source — reply

unread

Hi!

Good point. Perhaps the real question is: what are the use cases - real
or perceived - for the @ operator, and how, if we were re-organising
error and message handling in general, can they best be replaced.

There are these major use cases for using @ as far as I can see:

Shutting up noisy function because I don't care about warnings, I
just want it to do what's possible and return failure code if it's not.
E.g. - I want to json_decode some data which may or may not be JSON, if
it doesn't decode it's ok, I'll just try something else, I don't see it
to tell me what's wrong with it because I don't care, it either works or
it doesn't. I.e. the situation where if it fails, I don't care why it
failed, I just want failure code.

Similar to the previous, but a bit different twist: if I want to
handle the error myself, in some different way than outputting a string
to the screen.

@fopen is the frequent example for both of 1 and 2. Sometimes you want
to just ignore the file if it doesn't open. Sometimes you want to handle
it, but file pointer being false is enough to catch it. Of course, it'd
be even better if you could extract the reason why it failed but you
often don't want it where warning reporting system puts it. Same goes
for @unlink, @include, etc.

@$foo['bar'] - i.e. get me $foo['bar'] if it's there, null if it's
not. Because writing isset($foo['bar'])?$foo['bar']:null each time is so
damn annoying. ?: helps in some cases but not all.

Something like DomainNotFoundException, ConnectionDeniedException,
FileNotFoundException or whatnot. It's a nightmare if you actually
want to know what happened.
The problem with such specific exceptions in this case is that different
stream wrappers would want to throw completely different exceptions; I
guess they could all extend a generic FileAccessException.

Knowing what happened is very useful, however I don't think exception
hierarchy is a good way to keep this info. You very rarely would want to
do different things depending on why opening your file failed - was it a
disk error? permission? network problem? In any case, you probably would
want to log the error somewhere (here's where what happened is useful)
and move on (or bail out if the file was critically needed). Class
hierarchy is useless here, one class with good toString and maybe a
couple of other API methods would be much more useful.

In general, my experience is that converting all common errors to
exceptions leads to a lot of code like:

$success = true;
try { ... }
catch(Exception $e) {
$sucess = false;
}
if($success) { ... }

which is plain ugly. Also, exceptions are expensive, so that would also
lead to a lot of boilerplate checking for conditions that otherwise
would be ignore - i.e. each time we try to open the file we'd have to
check if it exists and if it's readable and so on, to avoid expensive
exception.
@ actually works better in this case, but how it does it under the hood
is ugly, clunky and quite expensive too. If we could keep the agility of
@ while fixing the underlying ugliness and making the error still
accessible and useable when needed, that would be a great thing.

That's the thing to do for PHP-Next.
PHP-Next will be a gap in rethinking and rewriting some technical
parts of the engine.
Error and exception management is one part that can benefit from a
rewrite. We could change things in way that they are not too error
prone, and the change at user level is nearly invisible (e.g : rethink
the '@' internal handling).

Turning all errors to exceptions is a mistake IMHO.
However, easing such a process for those who'd need it could be a great point.

I'm adding the task to our wiki.

Julien.Pauli

11 years ago by Kris Craig — view source — reply

unread

On Wed, Apr 9, 2014 at 10:33 PM, Stas Malyshev smalyshev@sugarcrm.com
wrote:

Hi!

Good point. Perhaps the real question is: what are the use cases - real
or perceived - for the @ operator, and how, if we were re-organising
error and message handling in general, can they best be replaced.

There are these major use cases for using @ as far as I can see:

Shutting up noisy function because I don't care about warnings, I
just want it to do what's possible and return failure code if it's not.
E.g. - I want to json_decode some data which may or may not be JSON, if
it doesn't decode it's ok, I'll just try something else, I don't see it
to tell me what's wrong with it because I don't care, it either works or
it doesn't. I.e. the situation where if it fails, I don't care why it
failed, I just want failure code.

Similar to the previous, but a bit different twist: if I want to
handle the error myself, in some different way than outputting a string
to the screen.

@fopen is the frequent example for both of 1 and 2. Sometimes you want
to just ignore the file if it doesn't open. Sometimes you want to handle
it, but file pointer being false is enough to catch it. Of course, it'd
be even better if you could extract the reason why it failed but you
often don't want it where warning reporting system puts it. Same goes
for @unlink, @include, etc.

@$foo['bar'] - i.e. get me $foo['bar'] if it's there, null if it's
not. Because writing isset($foo['bar'])?$foo['bar']:null each time is so
damn annoying. ?: helps in some cases but not all.

Something like DomainNotFoundException, ConnectionDeniedException,
FileNotFoundException or whatnot. It's a nightmare if you actually
want to know what happened.
The problem with such specific exceptions in this case is that different
stream wrappers would want to throw completely different exceptions; I
guess they could all extend a generic FileAccessException.

Knowing what happened is very useful, however I don't think exception
hierarchy is a good way to keep this info. You very rarely would want to
do different things depending on why opening your file failed - was it a
disk error? permission? network problem? In any case, you probably would
want to log the error somewhere (here's where what happened is useful)
and move on (or bail out if the file was critically needed). Class
hierarchy is useless here, one class with good toString and maybe a
couple of other API methods would be much more useful.

In general, my experience is that converting all common errors to
exceptions leads to a lot of code like:

$success = true;
try { ... }
catch(Exception $e) {
$sucess = false;
}
if($success) { ... }

which is plain ugly. Also, exceptions are expensive, so that would also
lead to a lot of boilerplate checking for conditions that otherwise
would be ignore - i.e. each time we try to open the file we'd have to
check if it exists and if it's readable and so on, to avoid expensive
exception.
@ actually works better in this case, but how it does it under the hood
is ugly, clunky and quite expensive too. If we could keep the agility of
@ while fixing the underlying ugliness and making the error still
accessible and useable when needed, that would be a great thing.

That's the thing to do for PHP-Next.
PHP-Next will be a gap in rethinking and rewriting some technical
parts of the engine.
Error and exception management is one part that can benefit from a
rewrite. We could change things in way that they are not too error
prone, and the change at user level is nearly invisible (e.g : rethink
the '@' internal handling).

Turning all errors to exceptions is a mistake IMHO.
However, easing such a process for those who'd need it could be a great
point.

I'm adding the task to our wiki.

Julien.Pauli

--

This is just a thought, but how about giving devs the option to have all
errors throw exceptions at the script level? It wouldn't be enabled by
default, so we wouldn't wind-up with people trying to mimick the old
behavior using try/catch blocks as Stas mentioned. However, people who
actually want this sort of behavior so they can route everything through
their own custom handlers (might be particularly useful for framework
developers) could switch it on. Something like ini_set(
"throw_all_errors", 1 ); at the top of the script stack would do the trick.

I'm not necessarily advocating this. It just occurred to me while reading
this so I thought I'd see what y'all think.

--Kris

11 years ago by Stas Malyshev — view source — reply

unread

Hi!

behavior using try/catch blocks as Stas mentioned. However, people who
actually want this sort of behavior so they can route everything through
their own custom handlers (might be particularly useful for framework
developers) could switch it on. Something like ini_set(

Why just not set error handler which would throw an exception?

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Marco Schuster — view source — reply

unread

Hi!

behavior using try/catch blocks as Stas mentioned. However, people who
actually want this sort of behavior so they can route everything through
their own custom handlers (might be particularly useful for framework
developers) could switch it on. Something like ini_set(

Why just not set error handler which would throw an exception?
You manually have to reconstruct the stack trace from the exception
handler; also, with this approach you lose the ability to fine-control
where you want to react on which exceptions in which way - unless you
do a save/restore every time...

Also, you need to parse the error string by PHP (e.g. in one code
block you do a fopen followed by a mysqli_connect, both might throw
you a E_WARNING) in order to find out what's going on.

Marco

11 years ago by Stas Malyshev — view source — reply

unread

Hi!

You manually have to reconstruct the stack trace from the exception

Not sure what you mean here - exception would have the whole stack
trace, why anything will need to be reconstructed?

handler; also, with this approach you lose the ability to fine-control
where you want to react on which exceptions in which way - unless you
do a save/restore every time...

Not clear here either - save/restore what?

Also, you need to parse the error string by PHP (e.g. in one code
block you do a fopen followed by a mysqli_connect, both might throw
you a E_WARNING) in order to find out what's going on.

This is true, but not sure how autoconverting errors to exceptions with
ini_set would change that - unless you change every piece of code to use
individual exception class - which is a huge chunk of work and goes way
beyond ini setting - you still have no idea what happens without
checking the message, exception or not.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Kris Craig — view source — reply

unread

unless you change every piece of code to use
individual exception class - which is a huge chunk of work and goes way
beyond ini setting

That's just it. I know a lot of PHP developers who believe that's the
ideal way to go about it; that everything should be funneled through a
massive custom exception class(es). Those people would build their code
around that from the ground-up, anyway. As I mentioned earlier, a likely
example of this would be someone developing a new framework and they want
to control all error handling through there.

I'm not sure if I would take that approach on a given project, myself, but
I know a lot of people who would and I can see something like this being
very useful to them for their purposes.

--Kris

11 years ago by Rowan Collins — view source — reply

unread

Julien Pauli wrote (on 11/04/2014):

That's the thing to do for PHP-Next.
PHP-Next will be a gap in rethinking and rewriting some technical
parts of the engine.
Error and exception management is one part that can benefit from a
rewrite.

Well, yes, that was kind of the point of this e-mail thread ;)

We could change things in way that they are not too error
prone, and the change at user level is nearly invisible (e.g : rethink
the '@' internal handling).

That would indeed be an option; however, it feels to me like saying
"suppress all error messages" is only ever a workaround for some other
requirement, so it's interesting to think if there are better solutions
to the underlying problem, rather than/as well as better implementations
of the current feature.

Regards,

Rowan Collins
[IMSoP]

11 years ago by Nils Andre — view source — reply

unread

"Sort-of-nice" to see that the debate is mainly streamlined to the
@-Operator-part of the Message, which to my point of view consists of
much more, having "consistency" as one major thought back in mind when
it was written (so I understand it). But anyways:

The "Fuck-the-Shut-Uperator" (my internal name, sorry for the locker
room language) should be turned around wholly, throwing an
YouTriedToIgnoreThisExceptionException (taking the real Exception as a
child exception) when actually supressing some error behind, which
should be an exception. Think of Java, guys. If you want to ignore and
proceed (Stas, this is your use case, "It may not be important"),
catch a throwable and throw it into trash (one could be so kind as to
at least providing something nice as Windows often did, like "Error
occoured, try again, you fool"). For those who want to know the reaon,
the original exception (see Java) or it's properly designed hierarcy
might DO care.

As some do care about proper error handling and consistency. So thank
you very much for all the good work, Rowan.
Just my two cents :-)

Cheers
Nils

2014-04-09 22:33 GMT+02:00 Stas Malyshev smalyshev@sugarcrm.com:

Hi!

Good point. Perhaps the real question is: what are the use cases - real
or perceived - for the @ operator, and how, if we were re-organising
error and message handling in general, can they best be replaced.

There are these major use cases for using @ as far as I can see:

Shutting up noisy function because I don't care about warnings, I
just want it to do what's possible and return failure code if it's not.
E.g. - I want to json_decode some data which may or may not be JSON, if
it doesn't decode it's ok, I'll just try something else, I don't see it
to tell me what's wrong with it because I don't care, it either works or
it doesn't. I.e. the situation where if it fails, I don't care why it
failed, I just want failure code.

Similar to the previous, but a bit different twist: if I want to
handle the error myself, in some different way than outputting a string
to the screen.

@fopen is the frequent example for both of 1 and 2. Sometimes you want
to just ignore the file if it doesn't open. Sometimes you want to handle
it, but file pointer being false is enough to catch it. Of course, it'd
be even better if you could extract the reason why it failed but you
often don't want it where warning reporting system puts it. Same goes
for @unlink, @include, etc.

@$foo['bar'] - i.e. get me $foo['bar'] if it's there, null if it's
not. Because writing isset($foo['bar'])?$foo['bar']:null each time is so
damn annoying. ?: helps in some cases but not all.

Something like DomainNotFoundException, ConnectionDeniedException,
FileNotFoundException or whatnot. It's a nightmare if you actually
want to know what happened.
The problem with such specific exceptions in this case is that different
stream wrappers would want to throw completely different exceptions; I
guess they could all extend a generic FileAccessException.

Knowing what happened is very useful, however I don't think exception
hierarchy is a good way to keep this info. You very rarely would want to
do different things depending on why opening your file failed - was it a
disk error? permission? network problem? In any case, you probably would
want to log the error somewhere (here's where what happened is useful)
and move on (or bail out if the file was critically needed). Class
hierarchy is useless here, one class with good toString and maybe a
couple of other API methods would be much more useful.

In general, my experience is that converting all common errors to
exceptions leads to a lot of code like:

$success = true;
try { ... }
catch(Exception $e) {
$sucess = false;
}
if($success) { ... }

which is plain ugly. Also, exceptions are expensive, so that would also
lead to a lot of boilerplate checking for conditions that otherwise
would be ignore - i.e. each time we try to open the file we'd have to
check if it exists and if it's readable and so on, to avoid expensive
exception.
@ actually works better in this case, but how it does it under the hood
is ugly, clunky and quite expensive too. If we could keep the agility of
@ while fixing the underlying ugliness and making the error still
accessible and useable when needed, that would be a great thing.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Ferenc Kovacs — view source — reply

unread

On Tue, Apr 8, 2014 at 11:53 PM, Rowan Collins rowan.collins@gmail.comwrote:

Hi All,

One of the things that I'd love to see on the roadmap, or at least the
brainstorm, for PHP.next is some kind of review of error handling. I've
been thinking about this for a while, but hestitated to post until I had a
positive suggestion, not just a list of whinges.

By "error handling", I guess I actually mean "message handling", or
something - everything from E_ERROR down to E_STRICT is currently just a
string of text, with a few ini settings and the ability to register a
single global handler. A lot of the time, this is fine, because these
messages should simply be displayed on the developer's screen, or appended
to the production server's log; but sometimes, it's useful to know that
certain non-fatal things happened - as evidenced by the existence of ad hoc
facilities like libxml_get_errors().

I think the core of PHP could do more to help with this, standardising
things and making them more flexible for the user.

The basic gist I'm suggesting below is that we review and classify all the
existing messages, and promote them from fragile strings into more useful
"message events", with a filtered listener system providing everything we
currently have and more.

Sorry it's got a bit long; consider it a draft RFC ;)

Exceptions for fatals?

I know there was a big discussion about this a while back, and I didn't
read all of it, so I'm not going to go into it here. However, if we were to
review the classification of errors, a proper hierarchy of Exception
classes might be somewhere to put what are currently fatal errors. I
mention this first just to point out that most of what I'm about to discuss
doesn't apply so well to fatal errors, since they can't be handled in the
same ways.

Review of severity

There are currently a lot of errors, warnings, notices, etc in PHP, but
which ones have which severity sometimes feels a bit arbitrary and
inconsistent. I think it would be good to have clear guidelines of what
those severities should mean, and which messages should therefore fall
under which. Severities can also change over time as circumstances alter.
For instance, it bugs me that referencing an undefined class constant is a
fatal error, but referencing an undefined global (or namespace) constant is
only a Notice; code relying on unquoted string literals has been considered
badly written for longer than I've known PHP, so perhaps it's time to
either remove the fallback completely, or at least raise the message to
Warning level.

Classification of messages

The assumption which underpins a lot of what follows is that errors can
and should be classified by type as well as severity. At the moment, the
messages have no identity, they are just strings; this makes handling them
convoluted and fragile - unless you are just logging or displaying
everything that happened, you have to perform a string match, often masking
out variable parts of the message with a regex or prefix-only match.
Ideally, it should be possible to improve the wording for human consumption
without breaking machine handling of that type of event.

My suggestion is that each existing message could be assigned a
"namespace" (the extension name, or section of core), a "type" within that
namespace (analagous to an Exception sub-class), and an ID (like the
numeric code of an exception). The human readable message could then be
tweaked, translated, etc, without appearing to be a completely new message
to any code trying to handle it. Note that applying this to existing code
is mostly trivial as far as assigning a namespace and message ID to each
string; the only hard decision would be assigning "types" to group similar
but non-identical messages in larger extensions.

Treating messages as events

Given the above structured representation of messages, it ought to be
possible to replace the current one-at-a-time set_error_handler() with
something more like a set of registered event listeners. Every time a
message was raised, the object representing it would be passed, in turn, to
all interested listener callbacks. It might be a good idea to let listeners
define the order they are called via a relative priority.

The object passed could be mutable, like an Event in JavaScript, so a
listener could, say, lower the severity of a particular message; it could
also have methods to stop other listeners from being called at all. Also,
since this was brought up a lot as an advantage of exceptions, it would
presumably be possible to include the stack trace of each message - perhaps
only collecting it if a listener expressed an interest in such when it was
registered.

Various existing functionality could be implemented in core, but expressed
as "pseudo-listeners" - not callbacks per se, but registerable with the
same system - e.g. "display_plain", "display_html", "write_to_log", etc. A
"collect" pseudo-listener could implement the same kind of behaviour as
libxml_use_internal_errors(true), pushing each message into some kind of
collection object for later access.

Selectivity of handling

The power of the above scheme would come if you could register a "message
listener" not just for a set of severities, but for a particular namespace,
type, or even single message ID. As each listener was registered, the
options selected could be saved as a value+mask pair - if you want
everything in the libxml namespace with severity warning, the mask would be
blank for type and ID; despatching an event would involve calculating the
"fingerprint" of the current message, iterating through the registered list
of handlers, and calling any of them that matched.

The advantage to this is two-fold: first, it means less boilerplate code
in the listener functions, since the input is pre-filtered; and second,
it's much more efficient to not fire a callback from the engine than it is
to fire a callback which performs some boilerplate logic and decides to do
nothing.

This code is called frequently, so needs to be very efficient; however,
some logic of this sort presumably already happens to check the various
user settings and the severity mask provided to set_error_handler(). I
imagine some pre-optimisation could also happen when listeners are
registered and unregistered - special cases for zero listeners, or one
unfiltered listener, for instance.

Sidenote: selectivity of catch() blocks?

While thinking about the above, it occurred to me it would be nice to have
a syntax for catching exceptions by their code as well as their class.
Basically, a sugar for this boilerplate:

catch ( FooException $e ) {
if ( $e->getCode() != FooException::EX_NO_FOOS ) {
throw $e;
}
/* handle lack of foos ... */
}

Any thoughts?

Lexical scope

Most error handling is dynamically scoped - "from now until I tell you
otherwise, treat these messages like this" - but occasionally it would be
nice to have it lexically scoped, as in "for any message raised directly in
this file, or set of lines, do this". The use case I have in mind is legacy
code, such as an old PEAR module which you plan to replace wholesale, but
are unlikely to patch - I want to be able to set a flag on include saying
"this file is poorly written third party code, please don't display
warnings about it".

Since messages already pass through the file path and line number they
occurred on, this could in principle be implemented as part of the
pre-filtering discussed above. This would make the "fingerprint" to be
matched a lot longer, or the check more complex; perhaps a hash of the file
path would be more efficient; also, if no registered listener had such a
filter in its mask, all filename-related logic could be optimised away.

Alternatively, it could be stored in a completely separate list, and
swapped into the main list when code in the relevant file was executing.

An interesting thought on lexical scope is that with the right syntax, a
listener could be registered at compile-time, rather than run-time: rather
than registering a listener when line 50 is executed, and unregistering it
when line 55 is executed, a block spanning lines 50-55 could register a
permanent listener, masked for all except those lines.

The infamous "shut up" operator (@)

No discussion of error-handling would be complete without mentioning this
little oddity, although I admit to a slight ignorance of exactly how it
works, and why it causes the compiler to skip optimisations. (e.g.
https://gist.github.com/nikic/6699370)

One thought I had was that you could have a special syntax that could
register a high-priority "discard" pseudo-listener for a few lines of
lexical scope (hopefully that makes sense if you've read this far);
something like this:

suppress_messages {
$fh = fopen('foo');
}

That wouldn't be quite the same as the current @ operator - if you
replaced fopen() with a user-defined wrapper, you'd need dynamic scope
again - but it would replace some use cases. I'm not sure if it would
actually make sense or not, but it gives you an idea of where my mind is
going with the whole "listener"/"pseudo-listener" concept.

So, thank you for reading this far (assuming you actually did). Thoughts?
Feedback? Brickbats?

Regards,

--
Rowan Collins
[IMSoP]

--

to be able to come up with a better alternative, we should be also consider
what is and isn't possible with the current approach.

currently it is possible to trigger multiple errors from a single
operation/method call
currently it is possible, to trigger one or more errors in an
operation while still being able to have a return value
currently it is possible to define a single error handler which will
be called for some errors (based on the $error_types was set)
currently based on the return value of the custom error handler the
default error handler may or may not be called
currently based on the return value of the custom error handler the
current execution may or may not be halted (recoverable fatal)
currently some errors won't trigger the custom error handler (fatals)
but still execute the shutdown handler (and the outbut buffer callback) so
people use that for handling fatal errors a bit more graceful
currently some errors (E_COMPILE_*) can cause weird side-effects in
the custom error handler ( https://bugs.php.net/bug.php?id=60724
https://bugs.php.net/bug.php?id=65322 )
currently the @ operator will disable and restore the error_reporting
level for the given operation while also causes some weird side effects (
https://gist.github.com/nikic/6699370 )
currently we still do a bunch of work for errors which in the end will
be discarded anyways (I guess it could be improved if we consider removing
stuff like error_get_last() and track_errors)

I hope I haven't missed anything important.
The current error handling infrastructure favors the procedural paradigm
over the object oriented, but feels a bit less alien to oop than it would
be using exceptions for error handling in a procedural project.
Using exceptions instead of the current system would be a rought change,
because:

PHP is a dynamic language, so it is moderatelly hard to write your
code in a way that you can be sure that you catched every possible type of
exception the underlying operation could throw.
- This would promote pokemon exception handling(
  http://www.dodgycoder.net/2011/11/yoda-conditions-pokemon-exception.html)
  which is the same as using @ everywhere but even slower.
Exceptions can't be used for notices or warnings, as the execution
stops when the throw occurs.
When you do want to ignore the error, the suppression would look much
more clunky.

But it would also have some cons: it would also local error resolution
(compared to the current way where you have to handle the error in a
separate global function which can't see anything from the error but the
type, the error code, the error message and where comes the error from).

I have two idea, which if could be implemented would be satisfy most of the
current complaints:

Provide a way to resolve the errors in the local scope, would be nice
if we could wrap operations in a block (similar to try) and when an error
occurs the "catch" block would be executed there.
Extend the current error handler mechanism so it can have a stacks
like the spl_autoload infrastructure does, this would allow stuff like
separate libraries using their own error logging without interfering with
each other. If this implemented the feature in the previous point would
simply putting that "catch" block to the top of the error handling stack
for the execution of that block.

What do you think?

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu