Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:121834
Return-Path: <youkidearitai@gmail.com>
Delivered-To: mailing list internals@lists.php.net
Received: (qmail 43613 invoked from network); 28 Nov 2023 22:58:32 -0000
Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5)
  by pb1.pair.com with SMTP; 28 Nov 2023 22:58:32 -0000
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id 65BFD18002F
	for <internals@lists.php.net>; Tue, 28 Nov 2023 14:58:39 -0800 (PST)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM,
	RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=4.0.0
X-Spam-Virus: No
X-Envelope-From: <youkidearitai@gmail.com>
Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Tue, 28 Nov 2023 14:58:39 -0800 (PST)
Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-332f90a375eso2285720f8f.3
        for <internals@lists.php.net>; Tue, 28 Nov 2023 14:58:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1701212310; x=1701817110; darn=lists.php.net;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=2Ezs+P0xu2bwyRbGiylichfwpe2yshzMGgqQa4BIrCI=;
        b=JIpBNU3fW4E7WWSfmHIly3r1Gaue3CR7HJ7LBAhRJkJPx2RYhebPIUA6ssr9437sL0
         yjo6FBQwu/LTk+rQNuqdOBmyXBiEMu7j7gt4uQeKSq0azIR0R0IWFGOleDNtpMq10wfV
         LVdQ5RY88TF8bdlmBIyprW21+NXB+nZT1lHlsEbf6GpaV2/8XkDHkLFvwtzFUYonELwN
         YyrHespCuZGmNHrNuU2p+mAQ1qgY/Vi6sEQcquNSijEO4rdzeFz8ImVuCJdINBhBicGx
         hIt/T/K8Zmh2SSKPSsv3XmIXfR8MHzmaDuMeKK6EJnLD48pLQgssB44AXvsvkoRVhGFG
         vUrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1701212310; x=1701817110;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=2Ezs+P0xu2bwyRbGiylichfwpe2yshzMGgqQa4BIrCI=;
        b=C7aT5AHrMvFZ1V9oASRxdrQcZpCLEpjcRhLozpo/HLLuyx+JSelboEuJ9V7elQ8ZUB
         jtrJUdMXBIbVlrkfsixUBhUK5Uihz1VbuQ1z1nsykHhz/0yDC8PivSiRZpfGIWp/fGPK
         wv3GCI59gpppyOfZjuRCrQBXuFiCJlqtnbcTS4TaRaQ5fvr4TWVeLwVdGjGCtB6Qnh1O
         0KKJn1qzJfXwje+TgWUv7KL1s+KGAPXrxid3lMtcObpZBC8FbfI3sbUoymTWg8vFRT+0
         zp8FWyxRkmhC1C38rvLltor0w1bT0tLumyKCF4pPwJ2fe4h/O7BRniBGEXZvhgv0ytx4
         tvyQ==
X-Gm-Message-State: AOJu0YxqQg5Na1wU+UAgXrf2gIXvBzh9d4pKAwYIjnLsFlcbGC8vtjfa
	l6dw4bNt6O3URgHH8NvivVZxoyfEyAm7jsu1cNUI+TPv3g==
X-Google-Smtp-Source: AGHT+IGdg6106yVildftsJQDoz6n/7QsHBcrUp9UI0ldf+tEHVF2S894ey/PbizdUjawonhlqes8c2sKPJ4eyqLpmeM=
X-Received: by 2002:adf:a38f:0:b0:32f:83e4:50e5 with SMTP id
 l15-20020adfa38f000000b0032f83e450e5mr11952532wrb.6.1701212310012; Tue, 28
 Nov 2023 14:58:30 -0800 (PST)
MIME-Version: 1.0
References: <CAJmy8YFy5RmW1yWuA6akPU31xRz_i1io5uWgt1GWo0n5rzY9EQ@mail.gmail.com>
 <1BA05C1A-AFAE-4E86-BAA2-420B22549519@gmail.com> <0D8856BC-DDEE-47F8-8C59-7F4DC7A64237@woofle.net>
 <CAGBsUrd=5WhKERySavP=9yQM-RSkVCwhTJ2oQ4tAQOKTNFoCew@mail.gmail.com>
 <A00495F3-EEB5-4D49-9433-84E94C6920D0@gmail.com> <CAJmy8YG91ki8M-n7id=DUDUsG+1XBvtpSNdf6sJx3Y1B1Rytsg@mail.gmail.com>
 <CAJmy8YF=cpXJAWsUKh-eYBembDAWu+X4_q24qV_xfTJLzAztiw@mail.gmail.com>
In-Reply-To: <CAJmy8YF=cpXJAWsUKh-eYBembDAWu+X4_q24qV_xfTJLzAztiw@mail.gmail.com>
Date: Wed, 29 Nov 2023 07:58:18 +0900
Message-ID: <CAEPPVa2xgCh-Y2ovTL3=gX5vtTyiSAiNDnKLWPpohYCJ5Tr-cQ@mail.gmail.com>
To: internals@lists.php.net
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [PHP-DEV] Deprecate declare(encoding='...') + zend.multibyte +
 zend.script_encoding + zend.detect_unicode ?
From: youkidearitai@gmail.com (youkidearitai)

2023=E5=B9=B411=E6=9C=8829=E6=97=A5(=E6=B0=B4) 7:41 Hans Henrik Bergan <div=
inity76@gmail.com>:
>
> btw if we come to some consensus to my php2utf8.php script is actually
> worthwhile to expand on, i can volunteer to add more encodings (SJIS,
> BIG5, anything supported by mbstring),
> but it wouldn't surprise me if a better approach exist and the script
> should be rewritten entirely~
>
> >add that what's special about UTF-8 isn't that it's "fixed-endian".
>
> should've added this to the last post, but the "zend.detect_unicode"
> ini-option is specifically to scan for BOMs, and BOMs are
> significantly less useful in fixed-endian encodings (like UTF8) than
> bi-endian encodings (like UTF16/UTF32) ^^
>
> On Tue, 28 Nov 2023 at 21:47, Hans Henrik Bergan <divinity76@gmail.com> w=
rote:
> >
> > > What is the migration path for legacy code that use those directives?
> >
> > The migration path is to convert the legacy-encoding PHP files to UTF-8=
.
> > Luckily this can be largely automated, here is my attempt:
> > https://github.com/divinity76/php2utf8/blob/main/src/php2utf8.php
> > but that code definitely needs some proof-reading and additions - idk
> > if the approach used is even a good approach, it was just the first i
> > could think of, feel free to write one from scratch
> >
> >
> > >Can you share a little more details about how this works?
> >
> > I hope someone else can do that, but it allows PHP to parse and
> > execute scripts not written in UTF-8 and scripts utilizing
> > BOM/byte-order-masks.
> >
> > >add that what's special about UTF-8 isn't that it's "fixed-endian".
> >
> > one of multiple good things about UTF-8 is that it's fixed-endian, and
> > UTF8 don't need a BOM to specify endianess (unlike UTF16 and UTF32
> > which are bi-endian, and a BOM helps identify endianess used~)
> >
> > >If the solution is as easy as just converting the encoding of the
> > source file, then why did we even need to have this setting at all?
> > Why did PHP parser support encodings that demanded the introduction of
> >
> > I've read your question but don't have an answer to it, hopefully
> > someone else knows.
> >
> >
> > On Tue, 28 Nov 2023 at 21:09, Claude Pache <claude.pache@gmail.com> wro=
te:
> > >
> > >
> > >
> > > > Le 28 nov. 2023 =C3=A0 20:56, Kamil Tekiela <tekiela246@gmail.com> =
a =C3=A9crit :
> > > >
> > > >> Convert your PHP source files to UTF-8.
> > > >
> > > > If the solution is as easy as just converting the encoding of the
> > > > source file, then why did we even need to have this setting at all?
> > > > Why did PHP parser support encodings that demanded the introduction=
 of
> > > > this declare?
> > >
> > > It is not necessary as simple: because your code base may contain lit=
eral strings, and changing the encoding of the source file will effectively=
 change the contents of the strings.
> > >
> > > =E2=80=94Claude
> > >
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>

Hi, Hans

Is this convert PHP code from any encoding to UTF-8?
If correct, PHP code is coded various character encoding,
It is very difficult.
This is because it is not necessarily implemented in UTF-8.

In the world, we have many character encoding.
PHP code will be difficult to unify.

Regards
Yuya

--=20
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-----------------------------