Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122072 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 71445 invoked from network); 30 Dec 2023 11:29:39 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 30 Dec 2023 11:29:39 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 070F418004D for ; Sat, 30 Dec 2023 03:30:07 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 30 Dec 2023 03:30:06 -0800 (PST) Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-40d76923ec4so12237095e9.3 for ; Sat, 30 Dec 2023 03:29:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703935777; x=1704540577; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=EXRQCCRKPywJgFomS1OSh09ZypGEGg2s8VtyWmZtWio=; b=PEB/hvHzK4/yP+Syv1M2q2emXYT+6s9P2Jc7VV+JyWHKDKpt7iyVG/YY9UHOg/vl17 /Jpn2+/OnVjopqT80aa/+l9xln97oo1+ouyiVxrCK5mhH1lRG/tlHdh/1yLSwDeN85s4 /pA1ysjO3ZqLkN+GN0etjq6EWAZ/wbJ2S5QFs0n2oZQ4FgGGwKB6A3ceTnk9yLCq3Jgg Zc674fNCFIcuoErlR2w/ZxCfVzIO3xW9EyvKWMAO3qCSQSkdC36e2jfDA9/A3H9AMM1i 1TYgOY3oKNgrU+4mzFgyes+eOuKAq+7lMSVg9fQIXmw5gdjMEAdzeQOcE3eaf3LYPVkV QL+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703935777; x=1704540577; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EXRQCCRKPywJgFomS1OSh09ZypGEGg2s8VtyWmZtWio=; b=uE3gPj0EzJ8aKEO77ApdEzh+wbzne1XB8LJ2i4aiIyCZMKdDpVeTRZD+3K2zcLf7je 09vanjiHLI0riw3BuRTCVekxZNctiBEL5q3mOmzAiAqSVr8BjzvC8voWMXIFbn7gPj2b Wyz4xO7Zw1k1nZOKDj00p+y21WVO7Nz/GcxuKmSdMBJ7JJ2JrwN1Vjdilv4uYqGBh86q 5MXcC/HBsvJBOowNj55+ExOM61eJpVSL49YZhyKhpVPaWGR0t7+1eW/CRlRvTNgTOGUo F3kY7vAzYP8pgsDV8uDf0ocgqPyW61PUWIQzl1u3JAye4ApVXgqU8cq0WrBkX/h2lG9s ymGw== X-Gm-Message-State: AOJu0Yx1dVgSyzGdAnjyxR8FQ6rJVtdoMCXcFiKDHGsfTykBtx4s0n5G RaZSZSihf9hOR3sj59Gzee8= X-Google-Smtp-Source: AGHT+IECJ+jZEohwAaLeYtMZwfJYQo2d4jtlMIwth73i/ZNYRaXPNt0BHOXtGhrUw41xBKNfpRur8g== X-Received: by 2002:a1c:4c18:0:b0:40d:5c40:93f0 with SMTP id z24-20020a1c4c18000000b0040d5c4093f0mr3408383wmf.74.1703935776763; Sat, 30 Dec 2023 03:29:36 -0800 (PST) Received: from ?IPV6:2a02:1811:cc83:ee30:8e76:2662:766d:ebaa? (ptr-dtfv04vjm7u23t23d7u.18120a2.ip6.access.telenet.be. [2a02:1811:cc83:ee30:8e76:2662:766d:ebaa]) by smtp.gmail.com with ESMTPSA id z10-20020a5d4d0a000000b00336a2566aa2sm16896182wrt.61.2023.12.30.03.29.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 30 Dec 2023 03:29:36 -0800 (PST) Message-ID: <86b7f199-861a-47f3-90ad-468f711593ec@gmail.com> Date: Sat, 30 Dec 2023 12:29:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Robert Landers Cc: internals@lists.php.net References: <756bcf2b-f98d-4203-9004-1cbfd402337a@gmail.com> <8632ff2a-0169-4cbb-b5d8-3bafb841f1ee@app.fastmail.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Pre-RFC: Fixing spec bugs in the DOM extension From: dossche.niels@gmail.com (Niels Dossche) Hi Robert On 30/12/2023 10:25, Robert Landers wrote: > Hi Niels, > >> They are indeed going to be very similar, but at least having better return types would be good to give one particular example. >> e.g. we currently have a lot of methods that can return an object or false. The current living DOM spec always throws exceptions instead of returning false on error which is a much cleaner API. >> Furthermore, we have the DOMNameSpaceNode that can be returned by some methods and has been a point of confusion for static analysis tools (I did a PR on psalm to fix one of those issues). >> That node type won't be special cased in the new classes API so the (inconsistent use of the) union of DOMAttr|DOMNameSpaceNode will go away. > > Actually, I'm not sure it is supposed to be throwing exceptions (if we > look at https://html.spec.whatwg.org/multipage/parsing.html#parse-errors); > in fact, I'd argue there are three different ways to handle errors > (from some experience in writing a parser from scratch): I'm not talking about handling parser errors. Parser errors indeed should not be handled via exceptions, they emit a warning and continue with error recovery as described in spec. This was part of my HTML 5 RFC: https://wiki.php.net/rfc/domdocument_html5_parser I'm talking about methods like createElement, setAttributeNode, ... that can fail due to errors. In DOM 3 (and therefore PHP too), there was a "strictErrorChecking" boolean option. When enabled, exceptions were thrown when constraints were not met of such methods. When disabled, no exception is thrown but a warning is emit and false is returned instead. The DOM living spec no longer has that option and always uses exceptions. In the new classes I would also only use exceptions and not include the strictErrorChecking option, as spec demands. This cleans up return types. For example: $doc->createElement("") should throw. Or $element->setAttributeNode($attr) should throw when $attr is already used by another element. Etc. > > 1. Acting as a user-agent: in this case, errors should be handled as > described in the spec for a user-agent, e.g., switching to Text-Mode > in some cases and gobbling up the rest of the document. The HTML 5 RFC follows the spec error recovery rules for user agents. > > 2. Acting as a conformance checker: in this case, a list of errors > should be available to the programmer instead of bailing when parsing > (e.g., not switching to Text-Mode, but trying to continue parsing the > document, as described in the parser spec for conformance checking). > > 3. Acting as a document builder: Putting the document into an invalid > state should emit at least a warning. However, it's likely better to > let the user-agent handle the invalid DOM (as this is probably more > forward-thinking for new HTML that currently doesn't exist). This is > actually one of the biggest draw-backs to the current implementation > as it requires a number of "hacks" to build valid HTML. Kind regards Niels