Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125031 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 0F54F1A00BD for ; Sat, 17 Aug 2024 20:38:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723927189; bh=cQzOeJzSLmhjhl03YU3TB2cUSWP5pIlEvh0Ti/sYJPQ=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=SR2KumzH3XMdb9YQxKNhkpzkANZJUYBAMgKu8oKMtwJZnveDMIVXzydDG+w6fxBOw Ty7MaWjA1vVyu/HfgjDWEL9nMXtoLD02/QlAHCiVJORggcKsCDrs2IPEvICYyN8UtY 5tI0n/7JabxFJInKLhaRxZ5VLcbDBg5ZJZCOg2d7ryTJLbP/oxUTWQRQDJY/fM7EU7 ygm5KxWYXSAFUADUkczqTqNNWzT2VXO3gP7VfcxmyIsdsOXwznqNtbLcrtbhvyXIFE EXF0tIXFOs3NDD/zd7qE5lDgMHFz3xgiTWH8r3Y5UPxEy8mhHJvTS1Zp8jvoUhmP08 AnF8YJbuciM6A== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id EC5AD180055 for ; Sat, 17 Aug 2024 20:39:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_MESSAGE, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mx1.dfw.automattic.com (mx1.dfw.automattic.com [192.0.84.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 17 Aug 2024 20:39:48 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mx1.dfw.automattic.com (Postfix) with ESMTP id 2DA15340501 for ; Sat, 17 Aug 2024 20:38:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; h=x-mailer:references:in-reply-to:date:date:subject:subject :mime-version:content-type:content-type:message-id:from:from :received:received:received:received:received:received; s= automattic1; t=1723927079; bh=cQzOeJzSLmhjhl03YU3TB2cUSWP5pIlEvh 0Ti/sYJPQ=; b=ZPURCvcLO29d94VNTUZBZEWTxrgbW/mNQbIPewwizPfugIoaJe tUZv7mLDYAWibO+jAj2iJFpnlZKYka62a9+n27HHxb8cceQHNvm+mKP+8SlE3b0y 4IVVffNbapsS//7Fg0UKSiVbseWZ+XHXdhY34ihozzc6fhZXgeGnIbuzusCesfcb iLKvJbYwYLuS2zTnJ8w7DRu8k12/oiv0U7ynfhbw6a2UICB2TAxzlU0btFr282xC B+u+3a2tLdU20fbPQC5deCMfzXtdL98XSUWW7O7oI6JCHSTyVUaGJzKjZaFMqWwb BfTaiu3ycGXvxP7UKpcPahyHu+wuiNA8yNcQ== X-Virus-Scanned: Debian amavisd-new at wordpress.com Received: from mx1.dfw.automattic.com ([127.0.0.1]) by localhost (mx1.dfw.automattic.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CK34cUYRbHka for ; Sat, 17 Aug 2024 20:37:59 +0000 (UTC) Received: from smtp-gw.dca.automattic.com (smtp-gw.dca.automattic.com [192.0.97.210]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx1.dfw.automattic.com (Postfix) with ESMTPS id C2FCB34014F for ; Sat, 17 Aug 2024 20:37:59 +0000 (UTC) Authentication-Results: mail.automattic.com; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="LbDCODKn"; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="L1ANZX0d"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=automattic.com header.i=@automattic.com header.b="Q+dWUPAx"; dkim-atps=neutral Received: from smtp-gw.dca.automattic.com (localhost.localdomain [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw.dca.automattic.com (Postfix) with ESMTPS id 65A42A0775 for ; Sat, 17 Aug 2024 20:37:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic2; t=1723927079; bh=cQzOeJzSLmhjhl03YU3TB2cUSWP5pIlEvh0Ti/sYJPQ=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=LbDCODKn7moF+gufB9BTH4Zx6ufdkZ7K5nrJWPAdX96DwRU9sot0VbZvKhehIGmMN bLGeiO9GoriZ+Aag451WTbiPP1XaBt6/o9Rh6pRhPjQOzMer5RB3PKgXMn0R1stiNz c7Anx3Qiuxu9q06RAdXKkpqHxDiCVRLi/x97kXDUH8F9G60h523X66DYBHeHXKJhCa pRJt9iIlQt/2Kfi3xA9fGCyDx3zMYYTaYOYZzxkmggV1C1ueCJtDV5secdfUhvJ6h5 LkUhExieNsaD0/uZRIsz8hYgB0qo4TUbRL3Pn9dO9iykglh88Qqk5frFymGCfZ6aC5 tAxS5EVAoWIwQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic1; t=1723927079; bh=cQzOeJzSLmhjhl03YU3TB2cUSWP5pIlEvh0Ti/sYJPQ=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=L1ANZX0dfADTiFU9q44ssgJu518CHRCG4tWosrGINcexX9CB87JIDedhKhiXDQz+c bwsWqMth5XfTttY4cxWgV30VFBSUgtRXqVIYR5MfV5Wxffcci1qB5Gs9BNsM8JYiGh QrvGecfuo7jTodB6yKtCf28CcsrtN2NRI+FpYwgnkk+rQlkrl4pXrW1fWF9LdA73rY dthdhmZSwS3nWVAZRtito/uUaia1rE1f7TbVN62H7T0UK1ulViD+IO7eEIaNrfhxjN aS3k5Y9hFeklncaumIHM1vjWvF1Vuz82Rh65OIDl6qWlee2L7uMAR96B58ovx9Fcq+ Loq9/3/ucD4MQ== Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw.dca.automattic.com (Postfix) with ESMTPS id 3EB48A0350 for ; Sat, 17 Aug 2024 20:37:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic2; t=1723927079; bh=cQzOeJzSLmhjhl03YU3TB2cUSWP5pIlEvh0Ti/sYJPQ=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=Q+dWUPAxoSOPfxx9JIwunoqPTJjcIu0wta46rNNcQwyeFQAX8JTmPJ/1UB0PZP7sc kdxuA9xZr8ZH9GmqwrzCS9WwMRX7IgHMvG3ZGBEfx38MzvXANDyayQRhRA4AFNPPch 1aJU34TJMh6cNH55qw7+muKZCu0jjNxxN0PTCs1fRZnLXK9FPfOS79GswMxn+vCrkn 2zteoLyiV3iKxKK0luLmjynsHX6NG//zKaLpoBRJG3tvSsDXOHfyKjiweV+lpPv0fB di/KD+96RF9gf9BUUz+RLrpKEPIqONMjJAJbv5gJq9FkFIbJBLBtaBiHPYKlX3GlCf D0fZjeM5QPWow== Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-70d1a9bad5dso2958865b3a.0 for ; Sat, 17 Aug 2024 13:37:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723927078; x=1724531878; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wdzdRJZVmRTzMXsaxAJpv/iQ3Kw2aPwg3z2NtirsTXM=; b=mfvKQ5nwNpuTvz1NTlToUMe8moDjBAl7OhXnlUUS0C+PMGQjdhLHUHbA5+AfhxbnnI d1v2jpxng+P0/85bdS7XmCvKdHDSwHWo/peQ7w2lO3hSpCBz65syRlGOaKOv/vsg6rDN zL3hjbr/KpL1LVbKO2h23EyE8OGGL+NPVwXqXMKgpGO+V8G0lDB+TrEocqAg7vEENYGj K05+3HJ+hv7bMxU1Fg8hR07Qz3Fw0GWP32vsaF8TAs1bMcfqbt8V300bxyltEsGgGrPA 4+p4Y+H5dJX0MtPhh/FxeIZRVx43y4u/hnffok67VsZeBlFiPQhReCzqmNdfV9DGHgyz Dlfg== X-Gm-Message-State: AOJu0Yy0D5XQNm/ZUuqXBL4yncWPItR54nYWLf/WPgKz04MzjH+a/Zfo eSa2w4/qNB4DB//tGu59NUGJVIBlrcCdgML59nLHxXtPXiSm4S86P6Q+o1BjdZgUqicszB8iEBe lVc1UpQCgxp26nHQdF1ehJpO34OTZbLdz1i/s/lvKiLenoD7fRwSZvuaR9SRto8A= X-Received: by 2002:a05:6a00:1813:b0:70b:a46:7db3 with SMTP id d2e1a72fcca58-713c502db5fmr9165951b3a.19.1723927078094; Sat, 17 Aug 2024 13:37:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IETQ/boqxnzI5jXrQnrHMCbQyERKpCl9BbfM4xV6V/0aWI1umdDsHVOiVooxFQFmkjpUaGE9A== X-Received: by 2002:a05:6a00:1813:b0:70b:a46:7db3 with SMTP id d2e1a72fcca58-713c502db5fmr9165931b3a.19.1723927077452; Sat, 17 Aug 2024 13:37:57 -0700 (PDT) Received: from smtpclient.apple (ip70-162-86-48.ph.ph.cox.net. [70.162.86.48]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7127ae6c76fsm4548358b3a.94.2024.08.17.13.37.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 17 Aug 2024 13:37:57 -0700 (PDT) X-Google-Original-From: Dennis Snell Message-ID: Content-Type: multipart/alternative; boundary="Apple-Mail=_8E4ED315-0DAD-4839-BBC4-46DE1CA6077A" Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: [PHP-DEV] Decoding HTML and the Ambiguous Ampersand Date: Sat, 17 Aug 2024 13:37:45 -0700 In-Reply-To: Cc: Internals To: Mel Dafert References: <1FD3A9B0-D46F-4589-A803-3CC2347EC7DF@automattic.com> X-Mailer: Apple Mail (2.3774.600.62) From: dennis.snell@automattic.com (Dennis Snell) --Apple-Mail=_8E4ED315-0DAD-4839-BBC4-46DE1CA6077A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On Aug 17, 2024, at 11:20=E2=80=AFAM, Mel Dafert wrote: >=20 > On August 16, 2024 2:59:11 AM GMT+02:00, Dennis Snell = > = wrote: >>=20 >> All, >>=20 >>=20 >> I have submitted an RFC draft for including the proposed feature from = this issue. Thanks to everyone who helped me in this process. It=E2=80=99s= my first RFC, so I apologize in advance for any mistakes I=E2=80=99ve = made in the process. >>=20 >>=20 >> https://wiki.php.net/rfc/decode_html >>=20 >>=20 >>=20 >>=20 >> This is proposed for a future PHP version after 8.4. >>=20 >>=20 >> Warmly, >> Dennis Snell >>=20 >=20 > Hello, >=20 > I have just one nit: I think it would be better to use an enum for the = `$context` > parameter rather than a constant. This sounds like a fine idea. I was following some examples in the = source code from other functions. > It also feels like it ought to be nice to find some reasonable = default... I'm sure many > programmers will see this parameter and be > unsure which value to use. There should, at the very least, be clear = documentation with > real-world examples in which cases which one should be=20 Clear documentation seems like a good avenue to both communicate what = the expectations are and also to educate on what the choices must be. I=E2=80=99m very concerned about setting a default though, largely due = to the way that it hides the choice we=E2=80=99re making when we ask to = decode. This concern comes from seeing the mistakes play out frequently, = particularly because we reach for this same function when reading = attribute values and text alike, even though that can lead to different = decodes. If we default to one context, then I have a hard time imagining that = anyone would realize they are making the choice that they are. >=20 > All in all, this is definitely a welcome improvement! >=20 Thanks! It=E2=80=99s already proven to be quite useful within the = WordPress context where it came from. I would love to see it grow into a = form where PHP can have a reliable decoder that doesn=E2=80=99t depend = on manual intervention to work right. > Regards, > Mel Have a nice weekend, Dennis Snell --Apple-Mail=_8E4ED315-0DAD-4839-BBC4-46DE1CA6077A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 On Aug 17, = 2024, at 11:20=E2=80=AFAM, Mel Dafert <mel@dafert.at> = wrote:

On August 16, 2024 2:59:11 AM GMT+02:00, = Dennis Snell <dennis.snell@automattic.com> wrote:

All,


I have submitted an RFC = draft for including the proposed feature from this issue. Thanks to = everyone who helped me in this process. It=E2=80=99s my first RFC, so I = apologize in advance for any mistakes I=E2=80=99ve made in the = process.


https://wiki.php.net/rfc/decode_html



This is proposed for a future PHP version after = 8.4.


Warmly,
Dennis Snell


Hello,

I have = just one nit: I think it would be better to use an enum for the = `$context`
parameter rather than a = constant.

This sounds like a fine = idea. I was following some examples in the source code from other = functions.

It also feels like it ought to be nice to = find some reasonable default... I'm sure many
programmers will see this parameter and = be
unsure which value to = use. There should, at the very least, be clear documentation = with
real-world examples in = which cases which one should = be 

Clear documentation = seems like a good avenue to both communicate what the expectations are = and also to educate on what the choices must = be.

I=E2=80=99m very concerned about setting a = default though, largely due to the way that it hides the choice we=E2=80=99= re making when we ask to decode. This concern comes from seeing the = mistakes play out frequently, particularly because we reach for this = same function when reading attribute values and text alike, even though = that can lead to different decodes.

If we = default to one context, then I have a hard time imagining that anyone = would realize they are making the choice that they = are.


All in = all, this is definitely a welcome improvement!


Thanks! It=E2=80=99s = already proven to be quite useful within the WordPress context where it = came from. I would love to see it grow into a form where PHP can have a = reliable decoder that doesn=E2=80=99t depend on manual intervention to = work right.

Regards,
Mel

Have a nice = weekend,
Dennis Snell

= --Apple-Mail=_8E4ED315-0DAD-4839-BBC4-46DE1CA6077A--