Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125008 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 95FC81A00BD for ; Fri, 16 Aug 2024 23:32:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1723851273; bh=QYRP/mPqJavL+WHzMjvGaXDQ+SpXgWkXiKrPFgL30G4=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=R0pYE3Hk6HJ1XjOmr/HHISNF5TfyMnQt+i2X9l7ioIrtksye6ASZo8C8MpvfeigYY EivIi3os9RiRneWLOa5k2ThI7uiO/lzqIcWVreLdNV8+IurFFJDrhD2uqXwHBdwqe9 AO+Cbl3DAhU+BpnQG2kbNdguuSemEoM3T/BfQiXQAvq6mubRLNZPWMYTw3TMH+dcaj tkn1PoSvaK8MuO2U7eTAdXlEeu6aAaS8d0WxiE/hDvxHvD8s518ajwC5GGSZZ+MJdg OMX0xeBTMqJZ+Fe6pHh5CdyiTA1c2XxLlqDI912MycFVF5j/ihXXrMes983xmRdf/v Fztn6DPNPurlQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 29EBD180087 for ; Fri, 16 Aug 2024 23:34:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DMARC_MISSING,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 16 Aug 2024 23:34:32 +0000 (UTC) Received: by mail-yb1-f170.google.com with SMTP id 3f1490d57ef6-e115ef5740dso2738125276.3 for ; Fri, 16 Aug 2024 16:32:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20230601.gappssmtp.com; s=20230601; t=1723851165; x=1724455965; darn=lists.php.net; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QYRP/mPqJavL+WHzMjvGaXDQ+SpXgWkXiKrPFgL30G4=; b=enQ4u5906b/g+NVegkLCfDl8Vafw32CGR0IvJyP+CXJYtK7S6YX+RbJ6ujBnBr0YF3 4pbtkE3NC9HW0X0+BG8nj6hmRKa8M40qGtQJNqKcPIGdEIbccCTCNkpLz3XWj61IlvPi EJOq+eKDFc2Ok2PLK/f1w6u9KeVouLqa3txxskI26s65tjz2GU6vCtnrHS1jyliy6Qg6 H8gJiEwRAo4tYmSdoqh8RVrfsOoF4ygSaiugY/w1tdzKsSxdPlfdOAMp0zUV2laAnbjl Nwz1CGg7IN7rZsPuUpmlquMuRzLuktcnts/+0XnMbSVztOvdfgrh+XypLaQgwD7qdjkw xHwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723851165; x=1724455965; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QYRP/mPqJavL+WHzMjvGaXDQ+SpXgWkXiKrPFgL30G4=; b=BSJrpZ03MxsmVG0R131m1asrmlywjAIyXKIDpgoHnk53tnrbOhsahqNS9s9T3A+Y/f z1IJIRiiAVmoXIuAzTLnTtCi+T7GnH2dZF5yCugSg5dbfVBxWmRyFo0soxi6rg7/heqJ H8BdUlf6phnUfhpH6xRqvHhGDXcXPsAkP02f9aMnU4/zq04LsBl6FTtwPcsfmz/QRRT7 DDc7vS4LEBMHLwW8Ye+2LfAXcO8+UkmQ0DU1ZBK8/bg9XlAQZRlugb61//aFcedovTD/ WgWIG75sDLbi0KYjJBo8aL5wJ/zd6kCPqqSgFbMtrip7fKoJrxFeY+9ewsULWu4kZYBe jyZg== X-Gm-Message-State: AOJu0YxDy/56ZdjlSvKO9vkmW8Wf8/RadJSENJz7LURfRWf+0jGz/q2O /0hqXZhwx25a9DIwsVrXU7rImuhBlRV5xIXxZbSFraioay2+yCcPO9kK1kYtfb6UH+6p0Mm3qxH 3sLU= X-Google-Smtp-Source: AGHT+IEtP/J6EsSQzIV3NhB1AkQsG8bZk6gZ4y9oKUZw7+7ytToIp+PeowXjJtun7Psdzl25XmWWAA== X-Received: by 2002:a05:6902:1203:b0:e11:7176:80a9 with SMTP id 3f1490d57ef6-e1180f6e0b5mr5549421276.26.1723851164596; Fri, 16 Aug 2024 16:32:44 -0700 (PDT) Received: from smtpclient.apple (c-98-252-216-111.hsd1.ga.comcast.net. [98.252.216.111]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e1171e096c2sm982524276.5.2024.08.16.16.32.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Aug 2024 16:32:44 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.8\)) Subject: Re: [PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe? In-Reply-To: <8360937cc7ca31bf3bd0f8e3050c53cb32663428.camel@ageofdream.com> Date: Fri, 16 Aug 2024 19:32:42 -0400 Cc: internals@lists.php.net Content-Transfer-Encoding: quoted-printable Message-ID: <01BF288F-E0C5-4F67-A829-411FD7C1315D@newclarity.net> References: <1AFE8300-D363-43D8-A989-15D001B9879C@newclarity.net> <270D6057-626D-4720-B44A-3CB7A7B9320B@newclarity.net> <8360937cc7ca31bf3bd0f8e3050c53cb32663428.camel@ageofdream.com> To: Nick Lockheart X-Mailer: Apple Mail (2.3696.120.41.1.8) From: mike@newclarity.net (Mike Schinkel) Hi Nick, > On Aug 16, 2024, at 6:11 PM, Nick Lockheart = wrote: >=20 > I wanted to reply generally to this and not to any person in > particular, as I'm the one who started the thread. >=20 > I used the rather broad title "Should All String Functions Become > Multi-Byte Safe" because there are many smaller related topics, but my > intention was to discuss multi-byte in general, and see if there was > some consensus on action items that could have a more limited = scope/RFC > for that task. >=20 > My overall intent and goal was to make PHP safer against multi-byte > attacks by providing developers with tools that could become best > practices for dealing with user input stings, the same way we had > mysql_real_escape_string, and then PDO prepared statements for SQL. >=20 > There's a lot of potential pitfalls for dealing with Unicode input, = and > there are some best practices per the Unicode Consortium that I'm not > sure how to implement in PHP, and it seems that since everyone needs > them, they might be better as a shared library in core. >=20 > For example, there should be a function that removes unassigned code > points. >=20 > There should also be a function that removes "scripts" (as defined by > Unicode). >=20 > We should have an easy way to remove private use code points (unless > you're running a Star Trek fan site and really do need Klingon). >=20 > And the default replacement character for `mb_scrub` shouldn't be `?`. >=20 > Each of these and other ideas could be part of an RFC, or we could > brainstorm a Unicode built-in class that handles lots of the common = use > cases. >=20 > Having a team-built and audited Unicode class would benefit almost > everyone using PHP. My suggestion =E2=80=94 take it or leave it =E2=80=94 is to create a = GitHub repo for your own RFCs and start writing your RFC there "in the = open." Add the code for your implementation to the repo, add a = discussion forum to allow really interested parties to participate, and = send an invite on this list to those who are really interested to = discuss, comment on the RFC, and even offer PRs.=20 Then when everyone participating at your repo thinks the RFC is = fully-baked, bring it back to the list here to discuss. =20 Doing it that way will =E2=80=94 unlike just discussing on the list =E2=80= =94 enable comments made in the forum a place to be captured and = converted into text and implementation visible for everyone to see, and = really motivated people can even submit PRs to your RFC in order to = spread the load of writing a good RFC.=20 #jmtcw #fwiw -Mike=