Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125244 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id AD0271A00BD for ; Sun, 25 Aug 2024 21:56:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1724623092; bh=85t4kVhYA2UN9WDw/L1Rh6C0MjvL0nHvKaLU9RtiqjY=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=AmwfGuOr5nPAQPuvq3orzXkJQCNcjX7028cBQ0kD+gp4OiQHxjjXPXLJOfKvoRYVH T8Qpj5FAji5bkGMHAGmnTMS/vF06J/n7orDIPbTrg2dXJe4IzCnf+RHZsDDWc0ojnM uX7T6smsiRtepMrDuSqv/VIDZYbd/TZWq6NX4+YDnwKYE4/eSv+XzO4fCm5npZRAak gm7vpj6giroAK56KKOW/eWFkLgtXuy8BC3j/pk8V71mvPuLZps11PXMHB7mH0Bmxvb vtoNyy+4EO8skuEWf2vMZMb/d9ZypYE8AmyUmiJwQKXflHeicknE6+XxVlXwyRRfS2 i9RRqqpQm4hDQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 41D6F18007E for ; Sun, 25 Aug 2024 21:58:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_MESSAGE, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mx1.dfw.automattic.com (mx1.dfw.automattic.com [192.0.84.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 25 Aug 2024 21:58:11 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mx1.dfw.automattic.com (Postfix) with ESMTP id 2A97934087C for ; Sun, 25 Aug 2024 21:56:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; h=x-mailer:references:in-reply-to:date:date:subject:subject :mime-version:content-type:content-type:message-id:from:from :received:received:received:received:received:received; s= automattic1; t=1724622978; bh=85t4kVhYA2UN9WDw/L1Rh6C0MjvL0nHvKa LU9RtiqjY=; b=f5K0m+tI+Qij9DT5xqi/4RzvI15xA9vEQ+ND9MCcxO4guj77tP UlqD4lKbOp5baKjL6tx2g6L7dwY6LPeIxPtx9FRZ0NK0g4Vu2H+jiZAefvY3mZoZ tdTM5opYG6vsI3IRR4u39hVHPnk+yflByBxMsevr/0HQ5L/MDDLrr+zoQs0QXNxG UMGymiIbvK3N9Xnuf11G5zI25y6BED7V9Ics3sVGM9mU9MH8lMymxI5aqXo2yPU9 7e4gt1cRfyLqqgZImXQ3B2BN8Se16uOT/eZ5qIn9G8NcBsNReffMC5uaGEbs8zT0 dcX4VvyLVQj5vAJd2p2kDjF6Gk6yV04RkPeA== X-Virus-Scanned: Debian amavisd-new at wordpress.com Received: from mx1.dfw.automattic.com ([127.0.0.1]) by localhost (mx1.dfw.automattic.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JBSJGc6ogsFk for ; Sun, 25 Aug 2024 21:56:18 +0000 (UTC) Received: from smtp-gw2.dfw.automattic.com (smtp-gw2.dfw.automattic.com [192.0.95.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx1.dfw.automattic.com (Postfix) with ESMTPS id CE2A634090F for ; Sun, 25 Aug 2024 21:56:18 +0000 (UTC) Authentication-Results: mail.automattic.com; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="ESaU336k"; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="ges/kfh5"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=automattic.com header.i=@automattic.com header.b="Z/dDvzTd"; dkim-atps=neutral Received: from smtp-gw2.dfw.automattic.com (localhost.localdomain [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw2.dfw.automattic.com (Postfix) with ESMTPS id BEBD2A0362 for ; Sun, 25 Aug 2024 21:56:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic2; t=1724622978; bh=85t4kVhYA2UN9WDw/L1Rh6C0MjvL0nHvKaLU9RtiqjY=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=ESaU336kDxyvEOOtzfKV365Z2Ekz7Am0P501+ilQ96WbFJMcsjRHvY0jfvViMyaMt ofagmvQ2JNbMU/3u2znJkDfXlu4iRTrG2qAGoVWvdAbObO8Ms/S9g2Lja2bxr9Fo3o F0TvNpG6nlfLeRM2th+Adac5rFSTNvyFMNTbOu8TbL2PcaRI6/VRzh74CLKumBcs91 hKB3V/sVMRx2wQd6nrRrGhbN5L1D9IzPWki/S3OYBf8V7DYyaqfr5bZkpxLZlsfF4Q rWTrslkgZys9HuGDlhPWFP3SxpttzrtwOj0LBlwh2gY/vaeV/ygJRrMIcfIl7GorJV WOvvq3LpWnxZQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic1; t=1724622978; bh=85t4kVhYA2UN9WDw/L1Rh6C0MjvL0nHvKaLU9RtiqjY=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=ges/kfh5qteYx/m6wL4FNMKK+IblUV54heS1oO0PaQWMv5eACcPneJ0Rqh+fL2snQ yob3lGPz0dOSB+Yg1OCa+ZRpIDCUsEjC2vukOtHVJFbsbK6Pc8mHeQd7hfyHKy+KPG V7Yx+ILfcLXGvPpkS/Ee514mzGUKBhHxjlSRlCSeRAXp/CbylLqb2xZxD1ypACkvIt sbGSQ8spB6ZjtASPl4quIAD60vdqgroGrKNTxUdlYA7Q2Ui2GcRrJpaSbat1u2GF/J Zq+KoNFBcmszf8X7WmjQ4kfdCsV2F90eSXpw8iNw+XslixNQFHM1lVPgz18U1ZCYq8 a3qPcbwN0byTA== Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw2.dfw.automattic.com (Postfix) with ESMTPS id B0F1DA0A55 for ; Sun, 25 Aug 2024 21:56:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic2; t=1724622978; bh=85t4kVhYA2UN9WDw/L1Rh6C0MjvL0nHvKaLU9RtiqjY=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=Z/dDvzTdD1nUwBXUy1qJ5rEKvnLlV6B9idr5Pyfw5oE3TsjLi0BUYc67B6uakg6/y JW2Qeg1e4m85qkYV90wwC4qNUyHr3s7JCbGv/WaFneyOcnp5r9jIoUrhuq1Dkn+JxJ gRWmx+XzLaF32MgbIkaUDvoCfR6FRSJHYJkQmQ4rykcwwWZ/SShQb0pvDXnndCDAS8 4YwyJbkqWfhcmlFmnaUNgOGVpBnAqyGK6Xnm9AHJjKv+dDfjP1FlpOFafo5spYKyZA Wu2kKtI1zWhKCEGBK9G3bop/66FV0IyO9WpMssN8O6jaad7evK5+u/13dM9CZZdMLh CeCoux/v/Co0w== Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-39d293e492aso43507555ab.1 for ; Sun, 25 Aug 2024 14:56:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724622978; x=1725227778; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iIW+/hEBBDI6Z2XFaGl09v2T9uxij1RSKdC9c9lvCJY=; b=jDZSBOzsKdUk3wCbO+uaPCbOLLho26Eu9vsjO6McOWXjoOJgqnLweUOtaNHdC+Zpw+ Fd7in7OmFweNasE8+m6Wh8Tc9MVlnFyT9cC9PYWkuE0ywQ9YrwVvL7nG22PeLdILNuTk bUux9s4Iij7m+pa3EDmy7QC6e9WmBwt4hJ3dohoFr4Kd0cdQkXf6Gli3NgEcTlsI1MkI LNxDp4HXq5xcphOtSCT3RygHyjtnJuvVftQuFeuBhVFB9WbRV1m/f7GE7ERuI/NcATAI TcnGyo4AA3prtU5TWZpD91CgdNWWSsr1OKVgWR+CVtCpMSq/6OHEOLuQAWRsodsv5RLV GZsA== X-Forwarded-Encrypted: i=1; AJvYcCUqWTYAPDw8iCgntsGh4r6jCje1cfAZqlVkLg5pfuEAYAbigFyh7ydBxGtr6ygpQr4R2G5B3/qzYfU=@lists.php.net X-Gm-Message-State: AOJu0Yyl5BWqduhAST8Yqjb6dhA2LmyvIxAvRYe9zYwU43+foqzgaD50 yCfaHq6Undx7HHE6v+YP9b6QL7U+xO5DvJodi3tpdeg0ip1ck8e35llY3ia0KAt+8F5Uyq8azw0 20QPcHD6GFsxvBpu75iYlTlVGSAGUy4OkY1nZvb+P44pOAR+wran3WZw= X-Received: by 2002:a05:6e02:1522:b0:39d:246e:ce8b with SMTP id e9e14a558f8ab-39e3c9831e7mr119298545ab.8.1724622978111; Sun, 25 Aug 2024 14:56:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH6FvmnxaRzm8c529OIwFbDXMdUKCsfv3UNKd2wjDUUjvlk/KzuSvB+QPEiNlpO3tDZKmOwaQ== X-Received: by 2002:a05:6e02:1522:b0:39d:246e:ce8b with SMTP id e9e14a558f8ab-39e3c9831e7mr119298315ab.8.1724622977575; Sun, 25 Aug 2024 14:56:17 -0700 (PDT) Received: from smtpclient.apple (ip70-171-161-83.om.om.cox.net. [70.171.161.83]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-39d73eee7a9sm28941475ab.86.2024.08.25.14.56.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Aug 2024 14:56:17 -0700 (PDT) X-Google-Original-From: Dennis Snell Message-ID: Content-Type: multipart/alternative; boundary="Apple-Mail=_DBD7741B-CA65-4387-8174-96BDD5E9C41D" Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: [PHP-DEV] [RFC] Decoding HTML and the Ambiguous Ampersand Date: Sun, 25 Aug 2024 16:56:06 -0500 In-Reply-To: Cc: "Christoph M. Becker" , Niels Dossche , Internals To: =?utf-8?B?TcOhdMOpIEtvY3Npcw==?= References: <76D9E1DA-57CE-45C3-8E3E-B08A0B70FB60@a8c.com> <7ED2EE07-D7C6-43A4-A4E1-E9928E8B8D31@automattic.com> <48fa132e-3511-4503-8523-b59972bcfd53@gmx.de> X-Mailer: Apple Mail (2.3776.700.51) From: dennis.snell@automattic.com (Dennis Snell) --Apple-Mail=_DBD7741B-CA65-4387-8174-96BDD5E9C41D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Aug 25, 2024, at 4:17=E2=80=AFPM, M=C3=A1t=C3=A9 Kocsis = wrote: >=20 > Hi Christoph, Dennis, >=20 >> Well, I don't think it would be a big deal to move the bundled lexbor = to >> somewhere where it is always available. I mean, so far it's only = used >> by ext/dom so it's bundled there, but if other parts of the php-src = code >> base would use it, we could put it elsewhere. >=20 > Exactly. You might be aware that I'm working on an "uri" extension = (https://externals.io/message/123997) Yes, and I only briefly saw that before, but I=E2=80=99m excited, = because I=E2=80=99ve wanted very much to be able to properly parse URLs = within PHP. Myself, I was also interested in seeing if we could get Ada = into the language. As with HTML parsing, I see much value in having additional interfaces = that aren=E2=80=99t a DOM interface but which are designed for specific = software purposes. > and it also needs some parts of lexbor. My implementation currently = depends on ext/dom > for simplicity's sake, however if the vote once passes, this temporary = solution has to be changed. > Therefore we previously agreed with Niels that we would make lexbor an = "internal extension" (similar to mysqlnd), or > at least we would somehow find a way for it to be always available, = just like how Christoph said. With all the improvements going around PHP these days, I find it = extremely important to finally be able to reliably and safety understand = some of the most basic content that we produce and parse: HTML and URLs. Although the user-space libraries are of varying completion and quality, = all of them suffer from the fact that it=E2=80=99s so challenging to = efficiently parse most content using PHP. Getting these things baked = into the language of the web will bring a potent uplift to the entire = ecosystem, both because there will be less corruption, but also because = performance won=E2=80=99t suffer in getting there. >=20 > Regards, > M=C3=A1t=C3=A9 >=20 --Apple-Mail=_DBD7741B-CA65-4387-8174-96BDD5E9C41D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Aug 25, = 2024, at 4:17=E2=80=AFPM, M=C3=A1t=C3=A9 Kocsis = <kocsismate90@gmail.com> wrote:

Hi = Christoph, Dennis,

Well, I don't think it would be a big = deal to move the bundled lexbor to
somewhere where it is always = available.  I mean, so far it's only used
by ext/dom so it's = bundled there, but if other parts of the php-src code
base would use = it, we could put it = elsewhere.

Exactly. You might = be aware that I'm working on an "uri" extension (https://externals.io/message/123997)<= /div>

Yes, and I = only briefly saw that before, but I=E2=80=99m excited, because I=E2=80=99v= e wanted very much to be able to properly parse URLs within PHP. Myself, = I was also interested in seeing if we could get Ada into the = language.

As with HTML parsing, I see much = value in having additional interfaces that aren=E2=80=99t a = DOM interface but which are designed for specific software = purposes.

and it also needs some parts of = lexbor. My implementation currently depends on ext/dom
for = simplicity's sake, however if the vote once passes, this temporary = solution has to be changed.
Therefore we previously agreed = with Niels that we would make lexbor an "internal extension" (similar to = mysqlnd), or
at least we would somehow find a way for it to be = always available, just like how Christoph = said.

With = all the improvements going around PHP these days, I find it extremely = important to finally be able to reliably and safety understand some of = the most basic content that we produce and parse: HTML and = URLs.

Although the user-space libraries are of = varying completion and quality, all of them suffer from the fact that = it=E2=80=99s so challenging to efficiently parse most content using PHP. = Getting these things baked into the language of the web will bring a = potent uplift to the entire ecosystem, both because there will be less = corruption, but also because performance won=E2=80=99t suffer in getting = there.


Regards,
M=C3= =A1t=C3=A9


= --Apple-Mail=_DBD7741B-CA65-4387-8174-96BDD5E9C41D--