Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121174 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 22620 invoked from network); 29 Sep 2023 15:45:30 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 29 Sep 2023 15:45:30 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 388D71804B0 for ; Fri, 29 Sep 2023 08:45:30 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 29 Sep 2023 08:45:29 -0700 (PDT) Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-32483535e51so2118589f8f.0 for ; Fri, 29 Sep 2023 08:45:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696002328; x=1696607128; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=MqYm1o6QTz7J5eF5BQKtvD8RxEUfTPuzx+z1EdM4gu0=; b=aGFHhVMI+A0EU39m6v3Cyf9RppDItQPfNs2vxjt4mZS5K8i55UH1/Yz8NtvsG8eoFt 0OZ24X/n72RZPS7yEB2NwKvFtHJec0MHqvdk/rsAOjU/CCi+SHuckn80beWxgEhCSTwV Z/tPBmnA5fMTkFuInAXjKPRzigjEM898MgJwenNmFkVlIBG0dig3NUm5Rb8Gsbz/S75r ip1YgeSmZ2SgAC14DxYgwGq27TeQd5wVNzSx0nPEKboU7VNBxst5ggzpz88Xm4MaRPnN t683tOZ/GWBuMLZ3DdhXzGiLjC6XLLYL2+MoaLE+UDQAMYj7Y/RqVC8/sIzTCqaxYnsX IhRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696002328; x=1696607128; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MqYm1o6QTz7J5eF5BQKtvD8RxEUfTPuzx+z1EdM4gu0=; b=mv6IgPiCXw3UeeWn0hEvMjE74x3bDeOY0Os5n6mVf41RVsJYqA708mpPYZYQd5J5Aw W5MDSJdJy9/Ig5EDJCOYHb++N6X/V5ubKBwC6x9s9uVF6GK2RGmC1kzDQYjmJ4t0PRAR WpG6QegBo6i1q3dZ1wKU/1Gk8pqXk6IoaYWGNlpwrRYKxgc5j5Jc4rmuvWjFL6QJerpI cKVHO63GiGCBETUiAasZNBhprkRecsMKSZELAAKYYakszE9mHLBoHcy0Pdo2hxSEnfh/ JaxdwQCGW2rubw1ynPBzYUWPtq7sldGyJmGwBKpPvGrWEDpgMlo3nUym8n/4c+uDYupe VVpQ== X-Gm-Message-State: AOJu0YxFogeCTe3MLZaAIW+V1IPQ8XfGWWwjGiET1RY+Y1YUZMFKH5HE s9+AW94SAhBqbRSCbz/KURDWO80IAWQ= X-Google-Smtp-Source: AGHT+IEZKFxOGc34DxHrHfErz8M3PVBzVBPN2S+zZx6c/3C9If2M5rw0epRWGmg29BE1OzUH7wt7Rg== X-Received: by 2002:a05:6000:136a:b0:317:1b08:b317 with SMTP id q10-20020a056000136a00b003171b08b317mr4222567wrz.6.1696002327981; Fri, 29 Sep 2023 08:45:27 -0700 (PDT) Received: from ?IPV6:2a02:1811:cc83:ee50:280e:1e36:3a00:824? (ptr-dtfv08akcem5xburtic.18120a2.ip6.access.telenet.be. [2a02:1811:cc83:ee50:280e:1e36:3a00:824]) by smtp.gmail.com with ESMTPSA id w14-20020a5d680e000000b00325a59b2080sm1300779wru.97.2023.09.29.08.45.27 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Sep 2023 08:45:27 -0700 (PDT) Message-ID: <0c2d7e79-2c74-4263-81ec-ac8832ca50bb@gmail.com> Date: Fri, 29 Sep 2023 17:45:26 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: internals@lists.php.net References: <48c7bb29-a52c-416e-b855-be2746dc7a84@gmail.com> <39900ce4-56b1-2397-ee9c-c9b7086b33cb@mabe.berlin> In-Reply-To: <39900ce4-56b1-2397-ee9c-c9b7086b33cb@mabe.berlin> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Re: [RFC] [Discussion] DOM HTML5 parsing and serialization support From: dossche.niels@gmail.com (Niels Dossche) Hi Marc On 29/09/2023 09:39, Marc Bennewitz wrote: > Hi Niels, > > On 29.09.23 09:07, Niels Dossche wrote: >> Hi internals >> >> Discussion seems to have died down. >> Today, it's been 14 days since the last major change was done to the RFC (i.e. the class hierarchy update). >> And it's also been close to 4 weeks since I first announced the RFC it on the mailing list. >> I'd like to start the vote on Monday (20:00 PM GMT+2) and I intend to let it run for 2 weeks. >> Any final complaints should be raised now. > > Not much to complain but a question - not sure if it was discussed before. > > Naming: `XMLDocument::fromEmpty` vs. `HTMLDocument::createEmpty` in the PHP code section. Oops. Well spotted! This should be createEmpty everywhere. I just checked and only in that class definition I used fromEmpty accidentally. I fixed this now in the RFC text. This happened when updating the method names, the emails from back then do refer to the right name though. > > For both, `XMLDocument::fromEmpty` and `HTMLDocument::createEmpty` there is an argument available to define the encoding but none of the other `createFrom*` methods have this argument. > > As far as I understand, in the these other cases the encoding gets detected from the content of the passed source but what happens is the source does not contain any information about the encoding?. E.g. you load an XML/HTML document over HTTP, the encoding is defined via HTTP header but the content itself doesn't contain it. > Right, we follow the HTML spec in this regard. Roughly speaking we determine the charset in the following order of priorities. If one option fails, it will fall through to the next one. 1. The Content-Type HTTP header from which you loaded the document. 2. BOM sniffing in the content. I.e. UTF-8 with BOM and UTF-16 LE/BE prepend the content with byte markers. This is used to detect encoding. 3. Meta tag in the content. If it could not be determined at all, UTF-8 will be assumed as it's the default in HTML. >> Kind regards >> Niels >> > Best, > Marc Kind regards Niels