Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121181 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 37604 invoked from network); 29 Sep 2023 18:13:01 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 29 Sep 2023 18:13:01 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8F1551804C6 for ; Fri, 29 Sep 2023 11:12:58 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 29 Sep 2023 11:12:58 -0700 (PDT) Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-307d58b3efbso12496686f8f.0 for ; Fri, 29 Sep 2023 11:12:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696011176; x=1696615976; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=jswQhJ1eJ/Tw34YMgEvwVe+MHKk+tIdM0tarZPOryek=; b=BNQK5j4dunZqh7gEJxjbVrsxLr7Cp9LVgRXHtOJuj6Ef9578cxWs8HT7aF2ldkeI9E 0yEoS0pprr1CrXsTFDNAa92z5KeiK2Gy0XUpaWNxBAYIpPGB6IPA+9pImfzfbeX/faw6 MsBD0fXnTohppzDl9noV7ubSUQI7tx+Xp/2w1n8k6SA9kxudnttta321dF5ujsgpcMXO PBsTnfbG+cQVCHlFF8+biV+H9+dAcApcd1hwADuZ0/iYtIhraFZFAooTEMFUKJiRhCXw PY+Iv2EfkBE6+YBA3RKOwVTnHKp2nAtbLtxJ957m2C6eh2KLiIJPZf+aCWkXiX/+yi0Y Bbyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696011176; x=1696615976; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jswQhJ1eJ/Tw34YMgEvwVe+MHKk+tIdM0tarZPOryek=; b=jTX5hlCn97BsvSgcKJfOykrqyXrjHD7KylF3AARqEhCue1c3pPjKMFhnRYRAUgoH4u gEUZ35O5jyyeQsAkzHW//0cb7kkR3pp7vudXJfgOQLueiC+dZfO/DOaxqi0KlA9ad2l5 WlhuWT4NArWJUF4zsz84V6zx83cg1zhr4W2s3ib/C8nTjw8YOCNjqI7R0xZgiEEAXZ++ 8nJummT7BwHX8uzaQuGMk6dzdlsV19GWRIO1L9lfuzxrKxIa3vY0XfIUnqru/c651BCR hSch6fiE2GrHHDAv9Zt5c3n+0xSLVRAar8iJfz+enampyWNAEOetiWcIi6X5jjd1iiCY YYDQ== X-Gm-Message-State: AOJu0YyXfljeAVFqRW73mj78GsVP+7prkOc94i2L60Jgef3x0OpHRbGk MgfJ37344oEKRdhETJLgIgoXi4bU1BI= X-Google-Smtp-Source: AGHT+IEJZjYwdHl76lKfA44lFOtxure1S031d+1CLUQWsd+UWUSwMFWsAN5DnWuuhAHruzCTtAg+gQ== X-Received: by 2002:a5d:6408:0:b0:323:306e:65cf with SMTP id z8-20020a5d6408000000b00323306e65cfmr4337019wru.10.1696011176292; Fri, 29 Sep 2023 11:12:56 -0700 (PDT) Received: from ?IPV6:2a02:1811:cc83:ee50:280e:1e36:3a00:824? (ptr-dtfv08akcem5xburtic.18120a2.ip6.access.telenet.be. [2a02:1811:cc83:ee50:280e:1e36:3a00:824]) by smtp.gmail.com with ESMTPSA id o13-20020a5d408d000000b003258934a4bfsm1676192wrp.36.2023.09.29.11.12.55 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Sep 2023 11:12:55 -0700 (PDT) Message-ID: <59bd7af5-937a-4240-9768-c620e387e8d8@gmail.com> Date: Fri, 29 Sep 2023 20:12:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: internals@lists.php.net References: <48c7bb29-a52c-416e-b855-be2746dc7a84@gmail.com> <5ace1060-dd75-4cdc-b5e2-a2ea617df586@app.fastmail.com> In-Reply-To: <5ace1060-dd75-4cdc-b5e2-a2ea617df586@app.fastmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Re: [RFC] [Discussion] DOM HTML5 parsing and serialization support From: dossche.niels@gmail.com (Niels Dossche) Hi Larry On 29/09/2023 18:58, Larry Garfield wrote: > On Fri, Sep 29, 2023, at 7:07 AM, Niels Dossche wrote: >> On 02/09/2023 21:41, Niels Dossche wrote: >>> Hello internals >>> >>> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support". >>> https://wiki.php.net/rfc/domdocument_html5_parser >>> >>> Kind regards >>> Niels >> >> Hi internals >> >> Discussion seems to have died down. >> Today, it's been 14 days since the last major change was done to the >> RFC (i.e. the class hierarchy update). >> And it's also been close to 4 weeks since I first announced the RFC it >> on the mailing list. >> I'd like to start the vote on Monday (20:00 PM GMT+2) and I intend to >> let it run for 2 weeks. >> Any final complaints should be raised now. >> >> Kind regards >> Niels > > From the RFC: > >> \DOMDocument will also use DOM\Document as a base class to make it interchangeable with the new classes. We're only adding XMLDocument for completeness and API parity. It's a drop-in replacement for \DOMDocument, and behaves the exact same. The difference is that the API is on par with HTMLDocument, and the construction is designed to be more misuse-resistant. \DOMDocument will NOT change, and remains for the foreseeable future. > > Would it make sense then for one of \DOMDocument and DOM\XMLDocument to extend the other, then? So that, eg, we can type against DOM\XMLDocument and then support both old and new classes? Or are the construction et al differences enough that is not viable? > I agree with Tim's answer here :) (Thanks Tim!) >> Similarly, the constants would lose their DOM_ prefix in the namespace version, e.g. DOM\INDEX_SIZE_ERR will be an alias for DOM_INDEX_SIZE_ERR. For constants that begin with XML_ I propose to keep the prefix. > > Unclear to me: Would the XML constants also be aliased into the namespace verbatim, or left globally? > I'll clarify this. The intention is to alias them verbatim. > Did you consider making the new classes throw exceptions rather than forcing people to remember to call another "was there an error" global function like it's still 1996? :-) I did think about it. Using exceptions for the parser is not viable. This is because parse errors in HTML aren't actually hard errors. The errors are recoverable, i.e. the parser spec tells us how to proceed when an error occurred. So in a sense, they're closer to warnings. Using an exception would abort parsing. As a side note, a good amount of the web pages out there violates at least one parsing rule, but browsers know by-spec how to proceed in that case (which is probably also why they're often not fixed). I thought about other options as well. E.g. providing a getParseErrors() method or letting the factory methods return parse errors optionally as well. However, I think they're not significantly better than what we have now. Furthermore, I think overhauling how the parse errors are handled in ext/dom (and maybe by extension ext/simplexml to keep consistency) is a bit of a feature creep. See also the motivation in the RFC text. Therefore, I would keep the error handling as it is described now in the RFC. If accepted, this RFC would land early in the 8.4 development cycle. Therefore, we can gather feedback very early on. If we do notice a major problem in how these things are handled, they can be changed by a hypothetical future RFC in the same development cycle. That would also require thinking about the other XML extensions though. > > Otherwise looks good to me. Thanks! > > --Larry Garfield > Cheers Niels