Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121835 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 45261 invoked from network); 28 Nov 2023 23:07:06 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 28 Nov 2023 23:07:06 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id BCFB1180003 for ; Tue, 28 Nov 2023 15:07:13 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com [209.85.167.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 28 Nov 2023 15:07:13 -0800 (PST) Received: by mail-oi1-f175.google.com with SMTP id 5614622812f47-3b3f55e1bbbso3927039b6e.2 for ; Tue, 28 Nov 2023 15:07:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701212825; x=1701817625; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aFYpvXx09IEffQ4Yg5Qei9VkmF2/BNtg5GDTWLvDnqo=; b=H1/G6EVjULREcJlTce4hA3dWpjS/H/ivBMGAwzghkl305jOktYLBDukw2JD70s/Yxa VQKwi/47V6L6K1lfdEdO6p2KC34dkjZOBrdiAy7QPrOoce6G9r9Ipwk1VAoxzuW2lrSL N49nyYEtLeuMPO7y5ZFTPe7+kus0H42Mdp6eS5h9oN7BdPK85w5XzTXfZmmigZ3599od ytE/hJgjLk3ySxyPQ223uoPNmkahh9qsVMs3jV6j652Xy7Awizr2Awf/95P+RviPL3/v OaE5bnmG2souqdiV0SDE+VXq9BuhE58qrG63AkEr3bsaWLORrl8TY89HBaDx1Mmuuvo+ 9TzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701212825; x=1701817625; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aFYpvXx09IEffQ4Yg5Qei9VkmF2/BNtg5GDTWLvDnqo=; b=dEWcvtgpTCadI9Y7APXy7KgHXpyyJ9LffpVBgPPwKSkixe6T4pqAlvyWG0a/ieasjL PeOUKPkHlt2fr8gUr2QRzOQXb1PhhNsNGDyePCfy/4F44SfLM4AtYKwibQKL02Zqk7Pz q4V9S0p1JU8lAx9LrN4gumDgsSWnPoh1gfpzmm1lCIgNXaJ7f6zt6eOEpHBlCl+jMY6H 4n3S+r5kAatLo9bR4Hft8rJpiAZNCfkWoNcQZTh0aWwQ/ApeiIENVrKb1A7ovgR6yQkD VEi9g7fqCjvFxUVgpDCYikrGgllTVevP1211brSWVQEPiBpGsizYGnU2dPDfB/ns0pQ7 wmVw== X-Gm-Message-State: AOJu0YwjvTwmfboUy6HXopSkTZlOAJ3o3bWMLWsgAUp4JeEat3BoIM8L 2jxpZsPkJavq9p9jK5/bLQKllpSEZQOKPw8I0KI= X-Google-Smtp-Source: AGHT+IEopfYAJsLZnTLA5e0YiHPLB2eTM5bAjq5fkFL1VDmWQluYYLMketYPeung0vv11GRrtzs+J5OsCtb9+XzRGBk= X-Received: by 2002:a05:6808:1150:b0:3b5:6422:da01 with SMTP id u16-20020a056808115000b003b56422da01mr22022075oiu.1.1701212824959; Tue, 28 Nov 2023 15:07:04 -0800 (PST) MIME-Version: 1.0 References: <1BA05C1A-AFAE-4E86-BAA2-420B22549519@gmail.com> <0D8856BC-DDEE-47F8-8C59-7F4DC7A64237@woofle.net> In-Reply-To: Date: Wed, 29 Nov 2023 00:06:29 +0100 Message-ID: To: youkidearitai Cc: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Deprecate declare(encoding='...') + zend.multibyte + zend.script_encoding + zend.detect_unicode ? From: divinity76@gmail.com (Hans Henrik Bergan) @youkidearitai right now the code specifically deals with - UTF8: removing UTF8 BOM and removing `declare(encoding=3D'UTF-8'); - UTF16LE/UTF16BE/UTF32LE/UTF32BE: converting to UTF8 removing the BOM and removing declare(encoding=3D'...') - ISO-8859-1: converting to UTF-8 and removing declare(encoding=3D'ISO-8859-1'), i couldn't really find information on a ISO-8859-1 BOM, so to the best of my knowledge it does not exist it does not deal with any other encodings as of writing, but more can be added if needed. On Tue, 28 Nov 2023 at 23:58, youkidearitai wrote= : > > 2023=E5=B9=B411=E6=9C=8829=E6=97=A5(=E6=B0=B4) 7:41 Hans Henrik Bergan : > > > > btw if we come to some consensus to my php2utf8.php script is actually > > worthwhile to expand on, i can volunteer to add more encodings (SJIS, > > BIG5, anything supported by mbstring), > > but it wouldn't surprise me if a better approach exist and the script > > should be rewritten entirely~ > > > > >add that what's special about UTF-8 isn't that it's "fixed-endian". > > > > should've added this to the last post, but the "zend.detect_unicode" > > ini-option is specifically to scan for BOMs, and BOMs are > > significantly less useful in fixed-endian encodings (like UTF8) than > > bi-endian encodings (like UTF16/UTF32) ^^ > > > > On Tue, 28 Nov 2023 at 21:47, Hans Henrik Bergan = wrote: > > > > > > > What is the migration path for legacy code that use those directive= s? > > > > > > The migration path is to convert the legacy-encoding PHP files to UTF= -8. > > > Luckily this can be largely automated, here is my attempt: > > > https://github.com/divinity76/php2utf8/blob/main/src/php2utf8.php > > > but that code definitely needs some proof-reading and additions - idk > > > if the approach used is even a good approach, it was just the first i > > > could think of, feel free to write one from scratch > > > > > > > > > >Can you share a little more details about how this works? > > > > > > I hope someone else can do that, but it allows PHP to parse and > > > execute scripts not written in UTF-8 and scripts utilizing > > > BOM/byte-order-masks. > > > > > > >add that what's special about UTF-8 isn't that it's "fixed-endian". > > > > > > one of multiple good things about UTF-8 is that it's fixed-endian, an= d > > > UTF8 don't need a BOM to specify endianess (unlike UTF16 and UTF32 > > > which are bi-endian, and a BOM helps identify endianess used~) > > > > > > >If the solution is as easy as just converting the encoding of the > > > source file, then why did we even need to have this setting at all? > > > Why did PHP parser support encodings that demanded the introduction o= f > > > > > > I've read your question but don't have an answer to it, hopefully > > > someone else knows. > > > > > > > > > On Tue, 28 Nov 2023 at 21:09, Claude Pache w= rote: > > > > > > > > > > > > > > > > > Le 28 nov. 2023 =C3=A0 20:56, Kamil Tekiela a =C3=A9crit : > > > > > > > > > >> Convert your PHP source files to UTF-8. > > > > > > > > > > If the solution is as easy as just converting the encoding of the > > > > > source file, then why did we even need to have this setting at al= l? > > > > > Why did PHP parser support encodings that demanded the introducti= on of > > > > > this declare? > > > > > > > > It is not necessary as simple: because your code base may contain l= iteral strings, and changing the encoding of the source file will effective= ly change the contents of the strings. > > > > > > > > =E2=80=94Claude > > > > > > > > -- > > PHP Internals - PHP Runtime Development Mailing List > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > Hi, Hans > > Is this convert PHP code from any encoding to UTF-8? > If correct, PHP code is coded various character encoding, > It is very difficult. > This is because it is not necessarily implemented in UTF-8. > > In the world, we have many character encoding. > PHP code will be difficult to unify. > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php >