Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:73686
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.47 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <53446FC5.7000001@gmail.com>
References: <53446FC5.7000001@gmail.com>
Date: Mon, 14 Apr 2014 13:55:20 +0200
Message-ID: <CAH-PCH7i9cEAv7GxysdY26+TMDbaR=Lm9qvRhr86eX-fFbySqw@mail.gmail.com>
To: Rowan Collins <rowan.collins@gmail.com>
Cc: PHP Internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=047d7bdca39c90909c04f6ff5b6d
Subject: Re: [PHP-DEV] [PHP.next] Error-handling using "Error Events"
From: tyra3l@gmail.com (Ferenc Kovacs)

--047d7bdca39c90909c04f6ff5b6d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Tue, Apr 8, 2014 at 11:53 PM, Rowan Collins <rowan.collins@gmail.com>wro=
te:

> Hi All,
>
> One of the things that I'd love to see on the roadmap, or at least the
> brainstorm, for PHP.next is some kind of review of error handling. I've
> been thinking about this for a while, but hestitated to post until I had =
a
> positive suggestion, not just a list of whinges.
>
> By "error handling", I guess I actually mean "message handling", or
> something - everything from E_ERROR down to E_STRICT is currently just a
> string of text, with a few ini settings and the ability to register a
> single global handler. A lot of the time, this is fine, because these
> messages should simply be displayed on the developer's screen, or appende=
d
> to the production server's log; but sometimes, it's useful to know that
> certain non-fatal things happened - as evidenced by the existence of ad h=
oc
> facilities like libxml_get_errors().
>
> I think the core of PHP could do more to help with this, standardising
> things and making them more flexible for the user.
>
> The basic gist I'm suggesting below is that we review and classify all th=
e
> existing messages, and promote them from fragile strings into more useful
> "message events", with a filtered listener system providing everything we
> currently have and more.
>
> Sorry it's got a bit long; consider it a draft RFC ;)
>
>
> Exceptions for fatals?
> ----------------------
>
> I know there was a big discussion about this a while back, and I didn't
> read all of it, so I'm not going to go into it here. However, if we were =
to
> review the classification of errors, a proper hierarchy of Exception
> classes might be somewhere to put what are currently fatal errors. I
> mention this first just to point out that most of what I'm about to discu=
ss
> doesn't apply so well to fatal errors, since they can't be handled in the
> same ways.
>
>
> Review of severity
> ------------------
>
> There are currently a lot of errors, warnings, notices, etc in PHP, but
> which ones have which severity sometimes feels a bit arbitrary and
> inconsistent. I think it would be good to have clear guidelines of what
> those severities should mean, and which messages should therefore fall
> under which. Severities can also change over time as circumstances alter.
> For instance, it bugs me that referencing an undefined class constant is =
a
> fatal error, but referencing an undefined global (or namespace) constant =
is
> only a Notice; code relying on unquoted string literals has been consider=
ed
> badly written for longer than I've known PHP, so perhaps it's time to
> either remove the fallback completely, or at least raise the message to
> Warning level.
>
>
> Classification of messages
> --------------------------
>
> The assumption which underpins a lot of what follows is that errors can
> and should be classified by type as well as severity. At the moment, the
> messages have no identity, they are just strings; this makes handling the=
m
> convoluted and fragile - unless you are just logging or displaying
> everything that happened, you have to perform a string match, often maski=
ng
> out variable parts of the message with a regex or prefix-only match.
> Ideally, it should be possible to improve the wording for human consumpti=
on
> without breaking machine handling of that type of event.
>
> My suggestion is that each existing message could be assigned a
> "namespace" (the extension name, or section of core), a "type" within tha=
t
> namespace (analagous to an Exception sub-class), and an ID (like the
> numeric code of an exception). The human readable message could then be
> tweaked, translated, etc, without appearing to be a completely new messag=
e
> to any code trying to handle it. Note that applying this to existing code
> is mostly trivial as far as assigning a namespace and message ID to each
> string; the only hard decision would be assigning "types" to group simila=
r
> but non-identical messages in larger extensions.
>
>
> Treating messages as events
> ---------------------------
>
> Given the above structured representation of messages, it ought to be
> possible to replace the current one-at-a-time set_error_handler() with
> something more like a set of registered event listeners. Every time a
> message was raised, the object representing it would be passed, in turn, =
to
> all interested listener callbacks. It might be a good idea to let listene=
rs
> define the order they are called via a relative priority.
>
> The object passed could be mutable, like an Event in JavaScript, so a
> listener could, say, lower the severity of a particular message; it could
> also have methods to stop other listeners from being called at all. Also,
> since this was brought up a lot as an advantage of exceptions, it would
> presumably be possible to include the stack trace of each message - perha=
ps
> only collecting it if a listener expressed an interest in such when it wa=
s
> registered.
>
> Various existing functionality could be implemented in core, but expresse=
d
> as "pseudo-listeners" - not callbacks per se, but registerable with the
> same system - e.g. "display_plain", "display_html", "write_to_log", etc. =
A
> "collect" pseudo-listener could implement the same kind of behaviour as
> libxml_use_internal_errors(true), pushing each message into some kind of
> collection object for later access.
>
>
> Selectivity of handling
> -----------------------
>
> The power of the above scheme would come if you could register a "message
> listener" not just for a set of severities, but for a particular namespac=
e,
> type, or even single message ID. As each listener was registered, the
> options selected could be saved as a value+mask pair - if you want
> everything in the libxml namespace with severity warning, the mask would =
be
> blank for type and ID; despatching an event would involve calculating the
> "fingerprint" of the current message, iterating through the registered li=
st
> of handlers, and calling any of them that matched.
>
> The advantage to this is two-fold: first, it means less boilerplate code
> in the listener functions, since the input is pre-filtered; and second,
> it's much more efficient to not fire a callback from the engine than it i=
s
> to fire a callback which performs some boilerplate logic and decides to d=
o
> nothing.
>
> This code is called frequently, so needs to be very efficient; however,
> some logic of this sort presumably already happens to check the various
> user settings and the severity mask provided to set_error_handler(). I
> imagine some pre-optimisation could also happen when listeners are
> registered and unregistered - special cases for zero listeners, or one
> unfiltered listener, for instance.
>
>
> Sidenote: selectivity of catch() blocks?
> ----------------------------------------
>
> While thinking about the above, it occurred to me it would be nice to hav=
e
> a syntax for catching exceptions by their code as well as their class.
> Basically, a sugar for this boilerplate:
>
> catch ( FooException $e ) {
>     if ( $e->getCode() !=3D FooException::EX_NO_FOOS ) {
>         throw $e;
>     }
>     /* handle lack of foos ... */
> }
>
> Any thoughts?
>
>
> Lexical scope
> -------------
>
> Most error handling is dynamically scoped - "from now until I tell you
> otherwise, treat these messages like this" - but occasionally it would be
> nice to have it lexically scoped, as in "for any message raised directly =
in
> this file, or set of lines, do this". The use case I have in mind is lega=
cy
> code, such as an old PEAR module which you plan to replace wholesale, but
> are unlikely to patch - I want to be able to set a flag on include saying
> "this file is poorly written third party code, please don't display
> warnings about it".
>
> Since messages already pass through the file path and line number they
> occurred on, this could in principle be implemented as part of the
> pre-filtering discussed above. This would make the "fingerprint" to be
> matched a lot longer, or the check more complex; perhaps a hash of the fi=
le
> path would be more efficient; also, if no registered listener had such a
> filter in its mask, all filename-related logic could be optimised away.
>
> Alternatively, it could be stored in a completely separate list, and
> swapped into the main list when code in the relevant file was executing.
>
> An interesting thought on lexical scope is that with the right syntax, a
> listener could be registered at compile-time, rather than run-time: rathe=
r
> than registering a listener when line 50 is executed, and unregistering i=
t
> when line 55 is executed, a block spanning lines 50-55 could register a
> permanent listener, masked for all except those lines.
>
>
> The infamous "shut up" operator (@)
> -----------------------------------
>
> No discussion of error-handling would be complete without mentioning this
> little oddity, although I admit to a slight ignorance of exactly how it
> works, and why it causes the compiler to skip optimisations. (e.g.
> https://gist.github.com/nikic/6699370)
>
> One thought I had was that you could have a special syntax that could
> register a high-priority "discard" pseudo-listener for a few lines of
> lexical scope (hopefully that makes sense if you've read this far);
> something like this:
>
> suppress_messages {
>     $fh =3D fopen('foo');
> }
>
> That wouldn't be quite the same as the current @ operator - if you
> replaced fopen() with a user-defined wrapper, you'd need dynamic scope
> again - but it would replace some use cases. I'm not sure if it would
> actually make sense or not, but it gives you an idea of where my mind is
> going with the whole "listener"/"pseudo-listener" concept.
>
>
> So, thank you for reading this far (assuming you actually did). Thoughts?
> Feedback? Brickbats?
>
> Regards,
>
> --
> Rowan Collins
> [IMSoP]
>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
to be able to come up with a better alternative, we should be also consider
what is and isn't possible with the current approach.

   - currently it is possible to trigger multiple errors from a single
   operation/method call
   - currently it is possible, to trigger one or more errors in an
   operation while still being able to have a return value
   - currently it is possible to define a single error handler which will
   be called for some errors (based on the $error_types was set)
   - currently based on the return value of the custom error handler the
   default error handler may or may not be called
   - currently based on the return value of the custom error handler the
   current execution may or may not be halted (recoverable fatal)
   - currently some errors won't trigger the custom error handler (fatals)
   but still execute the shutdown handler (and the outbut buffer callback) =
so
   people use that for handling fatal errors a bit more graceful
   - currently some errors (E_COMPILE_*) can cause weird side-effects in
   the custom error handler ( https://bugs.php.net/bug.php?id=3D60724
   https://bugs.php.net/bug.php?id=3D65322 )
   - currently the @ operator will disable and restore the error_reporting
   level for the given operation while also causes some weird side effects =
(
   https://gist.github.com/nikic/6699370 )
   - currently we still do a bunch of work for errors which in the end will
   be discarded anyways (I guess it could be improved if we consider removi=
ng
   stuff like error_get_last() and track_errors)

I hope I haven't missed anything important.
The current error handling infrastructure favors the procedural paradigm
over the object oriented, but feels a bit less alien to oop than it would
be using exceptions for error handling in a procedural project.
Using exceptions instead of the current system would be a rought change,
because:

   - PHP is a dynamic language, so it is moderatelly hard to write your
   code in a way that you can be sure that you catched every possible type =
of
   exception the underlying operation could throw.
      - This would promote pokemon exception handling(
      http://www.dodgycoder.net/2011/11/yoda-conditions-pokemon-exception.h=
tml)
      which is the same as using @ everywhere but even slower.
   - Exceptions can't be used for notices or warnings, as the execution
   stops when the throw occurs.
   - When you do want to ignore the error, the suppression would look much
   more clunky.

But it would also have some cons: it would also local error resolution
(compared to the current way where you have to handle the error in a
separate global function which can't see anything from the error but the
type, the error code, the error message and where comes the error from).

I have two idea, which if could be implemented would be satisfy most of the
current complaints:

   1. Provide a way to resolve the errors in the local scope, would be nice
   if we could wrap operations in a block (similar to try) and when an erro=
r
   occurs the "catch" block would be executed there.
   2. Extend the current error handler mechanism so it can have a stacks
   like the spl_autoload infrastructure does, this would allow stuff like
   separate libraries using their own error logging without interfering wit=
h
   each other. If this implemented the feature in the previous point would
   simply putting that "catch" block to the top of the error handling stack
   for the execution of that block.

What do you think?

--=20
Ferenc Kov=C3=A1cs
@Tyr43l - http://tyrael.hu

--047d7bdca39c90909c04f6ff5b6d--