Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:73642
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.182 as permitted sender)
Message-ID: <53446FC5.7000001@gmail.com>
Date: Tue, 08 Apr 2014 22:53:09 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: PHP Internals <internals@lists.php.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [PHP.next] Error-handling using "Error Events"
From: rowan.collins@gmail.com (Rowan Collins)

Hi All,

One of the things that I'd love to see on the roadmap, or at least the 
brainstorm, for PHP.next is some kind of review of error handling. I've 
been thinking about this for a while, but hestitated to post until I had 
a positive suggestion, not just a list of whinges.

By "error handling", I guess I actually mean "message handling", or 
something - everything from E_ERROR down to E_STRICT is currently just a 
string of text, with a few ini settings and the ability to register a 
single global handler. A lot of the time, this is fine, because these 
messages should simply be displayed on the developer's screen, or 
appended to the production server's log; but sometimes, it's useful to 
know that certain non-fatal things happened - as evidenced by the 
existence of ad hoc facilities like libxml_get_errors().

I think the core of PHP could do more to help with this, standardising 
things and making them more flexible for the user.

The basic gist I'm suggesting below is that we review and classify all 
the existing messages, and promote them from fragile strings into more 
useful "message events", with a filtered listener system providing 
everything we currently have and more.

Sorry it's got a bit long; consider it a draft RFC ;)


Exceptions for fatals?
----------------------

I know there was a big discussion about this a while back, and I didn't 
read all of it, so I'm not going to go into it here. However, if we were 
to review the classification of errors, a proper hierarchy of Exception 
classes might be somewhere to put what are currently fatal errors. I 
mention this first just to point out that most of what I'm about to 
discuss doesn't apply so well to fatal errors, since they can't be 
handled in the same ways.


Review of severity
------------------

There are currently a lot of errors, warnings, notices, etc in PHP, but 
which ones have which severity sometimes feels a bit arbitrary and 
inconsistent. I think it would be good to have clear guidelines of what 
those severities should mean, and which messages should therefore fall 
under which. Severities can also change over time as circumstances 
alter. For instance, it bugs me that referencing an undefined class 
constant is a fatal error, but referencing an undefined global (or 
namespace) constant is only a Notice; code relying on unquoted string 
literals has been considered badly written for longer than I've known 
PHP, so perhaps it's time to either remove the fallback completely, or 
at least raise the message to Warning level.


Classification of messages
--------------------------

The assumption which underpins a lot of what follows is that errors can 
and should be classified by type as well as severity. At the moment, the 
messages have no identity, they are just strings; this makes handling 
them convoluted and fragile - unless you are just logging or displaying 
everything that happened, you have to perform a string match, often 
masking out variable parts of the message with a regex or prefix-only 
match. Ideally, it should be possible to improve the wording for human 
consumption without breaking machine handling of that type of event.

My suggestion is that each existing message could be assigned a 
"namespace" (the extension name, or section of core), a "type" within 
that namespace (analagous to an Exception sub-class), and an ID (like 
the numeric code of an exception). The human readable message could then 
be tweaked, translated, etc, without appearing to be a completely new 
message to any code trying to handle it. Note that applying this to 
existing code is mostly trivial as far as assigning a namespace and 
message ID to each string; the only hard decision would be assigning 
"types" to group similar but non-identical messages in larger extensions.


Treating messages as events
---------------------------

Given the above structured representation of messages, it ought to be 
possible to replace the current one-at-a-time set_error_handler() with 
something more like a set of registered event listeners. Every time a 
message was raised, the object representing it would be passed, in turn, 
to all interested listener callbacks. It might be a good idea to let 
listeners define the order they are called via a relative priority.

The object passed could be mutable, like an Event in JavaScript, so a 
listener could, say, lower the severity of a particular message; it 
could also have methods to stop other listeners from being called at 
all. Also, since this was brought up a lot as an advantage of 
exceptions, it would presumably be possible to include the stack trace 
of each message - perhaps only collecting it if a listener expressed an 
interest in such when it was registered.

Various existing functionality could be implemented in core, but 
expressed as "pseudo-listeners" - not callbacks per se, but registerable 
with the same system - e.g. "display_plain", "display_html", 
"write_to_log", etc. A "collect" pseudo-listener could implement the 
same kind of behaviour as libxml_use_internal_errors(true), pushing each 
message into some kind of collection object for later access.


Selectivity of handling
-----------------------

The power of the above scheme would come if you could register a 
"message listener" not just for a set of severities, but for a 
particular namespace, type, or even single message ID. As each listener 
was registered, the options selected could be saved as a value+mask pair 
- if you want everything in the libxml namespace with severity warning, 
the mask would be blank for type and ID; despatching an event would 
involve calculating the "fingerprint" of the current message, iterating 
through the registered list of handlers, and calling any of them that 
matched.

The advantage to this is two-fold: first, it means less boilerplate code 
in the listener functions, since the input is pre-filtered; and second, 
it's much more efficient to not fire a callback from the engine than it 
is to fire a callback which performs some boilerplate logic and decides 
to do nothing.

This code is called frequently, so needs to be very efficient; however, 
some logic of this sort presumably already happens to check the various 
user settings and the severity mask provided to set_error_handler(). I 
imagine some pre-optimisation could also happen when listeners are 
registered and unregistered - special cases for zero listeners, or one 
unfiltered listener, for instance.


Sidenote: selectivity of catch() blocks?
----------------------------------------

While thinking about the above, it occurred to me it would be nice to 
have a syntax for catching exceptions by their code as well as their 
class. Basically, a sugar for this boilerplate:

catch ( FooException $e ) {
     if ( $e->getCode() != FooException::EX_NO_FOOS ) {
         throw $e;
     }
     /* handle lack of foos ... */
}

Any thoughts?


Lexical scope
-------------

Most error handling is dynamically scoped - "from now until I tell you 
otherwise, treat these messages like this" - but occasionally it would 
be nice to have it lexically scoped, as in "for any message raised 
directly in this file, or set of lines, do this". The use case I have in 
mind is legacy code, such as an old PEAR module which you plan to 
replace wholesale, but are unlikely to patch - I want to be able to set 
a flag on include saying "this file is poorly written third party code, 
please don't display warnings about it".

Since messages already pass through the file path and line number they 
occurred on, this could in principle be implemented as part of the 
pre-filtering discussed above. This would make the "fingerprint" to be 
matched a lot longer, or the check more complex; perhaps a hash of the 
file path would be more efficient; also, if no registered listener had 
such a filter in its mask, all filename-related logic could be optimised 
away.

Alternatively, it could be stored in a completely separate list, and 
swapped into the main list when code in the relevant file was executing.

An interesting thought on lexical scope is that with the right syntax, a 
listener could be registered at compile-time, rather than run-time: 
rather than registering a listener when line 50 is executed, and 
unregistering it when line 55 is executed, a block spanning lines 50-55 
could register a permanent listener, masked for all except those lines.


The infamous "shut up" operator (@)
-----------------------------------

No discussion of error-handling would be complete without mentioning 
this little oddity, although I admit to a slight ignorance of exactly 
how it works, and why it causes the compiler to skip optimisations. 
(e.g. https://gist.github.com/nikic/6699370)

One thought I had was that you could have a special syntax that could 
register a high-priority "discard" pseudo-listener for a few lines of 
lexical scope (hopefully that makes sense if you've read this far); 
something like this:

suppress_messages {
     $fh = fopen('foo');
}

That wouldn't be quite the same as the current @ operator - if you 
replaced fopen() with a user-defined wrapper, you'd need dynamic scope 
again - but it would replace some use cases. I'm not sure if it would 
actually make sense or not, but it gives you an idea of where my mind is 
going with the whole "listener"/"pseudo-listener" concept.


So, thank you for reading this far (assuming you actually did). 
Thoughts? Feedback? Brickbats?

Regards,

-- 
Rowan Collins
[IMSoP]