Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123597 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id C93EC1A009C for ; Thu, 13 Jun 2024 22:25:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1718317572; bh=dOqoHyqcFDiD3y1z8bITqJinFhwE54c+2o8m4Xw5MWs=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=dUeBxJ2KiWOeRig34IQHyFD8hTn9ElOwsjXU3m95d1tHLO4iiU+ApYL5yomQMOIyp Gh9TQtvP8xjX+OnOICnCng8tMRiMqODrlYby0WnMA2CjtTs105fAlrvZitaEcIOqxP yws2zj/Z9VhsQ4m5uR/PGAGyxE3CgIrkfmDgFFFT0zr5Oms0ET/pKWGsaSZr+1/SA+ Wq24W4noG79IEbKt4v/jcFHhpvyHZSHJCHyiq/iHiUxHxRjuc30HVhm5eiMSUHq5Pi mu2PCWQ0dqaorMInAqUsqPI47UI0uCDjT+UqVXOu7RgVYs9TYWc+pMDujJxWE07tBE 7dUuRX3kBc+NQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3D74E18005D for ; Thu, 13 Jun 2024 22:26:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 13 Jun 2024 22:26:10 +0000 (UTC) Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-42198492353so12891655e9.1 for ; Thu, 13 Jun 2024 15:25:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718317500; x=1718922300; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dOqoHyqcFDiD3y1z8bITqJinFhwE54c+2o8m4Xw5MWs=; b=Yeh+H3zYLIeRt+l8fWLPc/1ncQc1T34J1nksN1y5+t/5oCMiSUzqzh8PWGt/hthM8D sYeYJ4Rc+NulaMD3q9qQo9IhFjLd5krPsaQlVSBzJqwxeHgvGMupKeyriKfXMoGmEVJy XrUqiMsfnS5kGpIkiAAM/ZF1zu24oF/mtLjSpFflALqXIjPMT41oViseEeN984pg7UhQ d/i6siVm6AsJnV7RWPko6o8lRn9lfnIi6Wrjh+l66LufFcfNLepwUUWeMP9mHf3o3EHY 3/1p3muRhhDHHqQzx1GUiD/rZARyxLY5I9yeDZN5DU9UP+o4OSnWzaOr+QIAe4TO/fMg NxwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718317500; x=1718922300; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dOqoHyqcFDiD3y1z8bITqJinFhwE54c+2o8m4Xw5MWs=; b=i+fSKzWZOhCbp8g8PYc+mYlz6JDMKK8RvCZWCanVdcboYsfcllDKiyDJq+F+DMWIb+ I0ouywULPqmqrt35JosRmc1VQcHkPRw8vGAaGPE4hNK9KkNA+XBR6eqD2SlZXhK/gbIY 3BL4od/qlYuAGcYaoFHYDAGhSiZxua/msp3KkzeWdCEv8q/UarjGIIHwX3K3lMdq3Agr 4GRFdmINWuj+IQimT07R2cgm8C2T2i3D5a15N7SlbLnIymWVj2SowGMCLzICT1NMcZ73 b4jr6ReadVzlOC+6L+9esPpoWMnejTs7LXr76We0lrMAv3nH74noQ2u5oYwVhQ0Xc7dw 6RyA== X-Gm-Message-State: AOJu0YxYO7qvmHcFUxhcY5jdRkDdlg/JE2/K9BkZAd2XvHmWdx6F594X lbgCSxl/2j2q4zQGkk+JjYXSlfeAS9lJI+pmoUBRKpCyC97040HtYtxNfns/Fh1j7kKunyRuDTW lz0VQVa9/Z9ijYgcVj3K0d591d0PCsMs= X-Google-Smtp-Source: AGHT+IGZLHmzPycf6ASXH+M0zcAIm2vS681vNvbSMMOhqKwZ5uwe6kPlzXPxu9p/kdQ/lye07uJZZP1uNSHs7TWuCzA= X-Received: by 2002:a05:600c:1c18:b0:41c:2313:da92 with SMTP id 5b1f17b1804b1-42304822799mr10809205e9.4.1718317499515; Thu, 13 Jun 2024 15:24:59 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: In-Reply-To: Date: Fri, 14 Jun 2024 01:24:48 +0300 Message-ID: Subject: Re: [PHP-DEV] Revisiting case-sensitivity in PHP To: Timo Tijhof Cc: php internals Content-Type: multipart/alternative; boundary="0000000000002b8ce4061accf888" From: udaltsov.valentin@gmail.com (Valentin Udaltsov) --0000000000002b8ce4061accf888 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Friday, 14 June, 2024=E2=80=AF=D0=B3. at 00:04, Timo Tijhof wrote: > Would this affect unserialize()? > > I ask because MediaWiki's main "text" database table is an > immutable/append-only store where we store the text of each page revision > since ~2004. It is stored as serialised blobs of a value class. There hav= e > been a number of different implementations over the past twenty years of > Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.= ). > > When we adopted modern autoloading in MediaWiki, we quickly found that > blobs originally serialized by PHP 4 actually encoded the class in > lowercase, regardless of the casing in source code. > > From https://3v4l.org/jl0et: > >> class ConcatenatedGzipHistoryBlob {=E2=80=A6} >> print serialize($blob); >> # PHP 4.x: O:27:"concatenatedgziphistoryblob":=E2=80=A6 >> # PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":=E2=80=A6 > > > It is of course the application's responsibility to load these classes, > but, it is arguably PHP's responsiblity to be able to construct what it > serialized. I suppose anything is possible when announced as a breaking > change for PHP 9.0. I wanted to share this as something to take into > consideration as part of the impact. Potentially worthy of additional > communicating, or perhaps worth supporting separately. > > -- > Timo Tijhof, > Principal Engineer, > Wikimedia Foundation. > https://timotijhof.net/ > > Hi, Timo! Thank you very much for bringing up this important case. Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting `MyClass` class entry with key `MyClass` (not `myclass`) into the loaded classes table and serialization will not be able to find it as `myclass`. Even if some deprecation layer is introduced (that puts both `myclass` and `MyClass` keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer =E2=80=94 it = does not really help. However, you will be able to use `class_alias()` to solve your issue. If classes are case-sensitive, `class_alias(MyClass::class, 'myclass');` should work, since MyClass !=3D myclass anymore. And serialization works perfectly with class aliases, see https://3v4l.org/1n1as . -- Valentin Udaltsov --0000000000002b8ce4061accf888 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Friday, 14 June, 2024=E2=80=AF=D0=B3. at 00:04, Ti= mo Tijhof <ttijhof@wikimedia.or= g> wrote:
Would this affect unseria= lize()?

I ask because MediaWiki's main &qu= ot;text" database table is an immutable/append-only store where we sto= re the text of each page revision since ~2004. It is stored as serialised b= lobs of a value class. There have been a number of different implementation= s over the past twenty years of Wikipedia's existence (plain text, gzip= -compressed, diff-compressed, etc.).

When we a= dopted modern autoloading in MediaWiki, we quickly found that blobs origina= lly serialized by PHP 4 actually encoded the class in lowercase, regardless= of the casing in source code.

From https://3v4l.org/jl0et:
<= /div>
class ConcatenatedGz= ipHistoryBlob {=E2=80=A6}
print serialize($blob);
# PHP 4.x: O:27:&qu= ot;concatenatedgziphistoryblob":=E2=80=A6
# PHP 5/7/8: O:27:"C= oncatenatedGzipHistoryBlob":=E2=80=A6

=
It is of course the application's responsibility to load these cla= sses, but, it is arguably PHP's responsiblity to be able to construct w= hat it serialized. I suppose anything is possible when announced as a break= ing change for PHP 9.0. I wanted to share this as something to take into co= nsideration as part of the impact. Potentially worthy of additional communi= cating, or perhaps worth supporting separately.

--
Timo Tijhof,
Principal Engineer,
=
Wikimedia Foundation.

<= /div>

Hi, Timo!=C2=A0

Thank you = very much for bringing up=C2=A0this=C2=A0important=C2=A0case.

Here&#= 39;s how I see this. If PHP gets=C2=A0class=C2=A0case-sensitivity, unserial= ization of classes with lowercase names will fail. This is because the=C2= =A0engine will start putting `MyClass` class entry with key `MyClass` (not = `myclass`) into the loaded=C2=A0classes table and serialization will not be= able to find it as `myclass`.
Even=C2=A0if some deprecation laye= r is introduced (that puts both `myclass` and `MyClass` keys into the table= ), you will first have a ton of notices and then eventually end up with the= same problem,=C2=A0when transition=C2=A0to case sensitivity is complete. H= ence I propose no deprecation layer =E2=80=94 it does not really help.

However, you will be able to use `class_alias()` to so= lve your issue. If classes are case-sensitive, `class_alias(MyClass::class,= 'myclass');` should work,=C2=A0since MyClass !=3D myclass anymore.= And serialization works perfectly with class aliases, see=C2=A0https://3v4l.org/1n1as .

--
Valentin Udaltsov

=
--0000000000002b8ce4061accf888--