Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121832 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 40524 invoked from network); 28 Nov 2023 22:41:19 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 28 Nov 2023 22:41:19 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 256E218004E for ; Tue, 28 Nov 2023 14:41:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-oi1-f177.google.com (mail-oi1-f177.google.com [209.85.167.177]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 28 Nov 2023 14:41:25 -0800 (PST) Received: by mail-oi1-f177.google.com with SMTP id 5614622812f47-3b8903f7192so452485b6e.0 for ; Tue, 28 Nov 2023 14:41:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701211277; x=1701816077; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dh6N3rKZh50ZqUHfYIy4C3rAe+/lQ6Wd5uhpe8fN02E=; b=HA6y6KXb9u85gh0D54Il3YYxoTXfmNMULv1ym7FYOY25Qe9LX/UslsAxCvGl+pLdZY tAyBy1iXcGpmhZ14UAhZsF1xQ/rRYffuZ67sGYGHgMbn6jfrvUT+w+rZSWV0bi79tZ7/ aCmvhyGsQk40lApQvu5xGCdCD5aSfcND5iEZSQFPuwnMKBkZ70ak5pG7pfEk+KtM3byo g1iengUBj9pN4sSXuak6FeH1XO11q7yISvDYsA3gJF+z20kBDuvN2jSbAIkoIzoBVk0O mM4zqbY7DkFjo5ulPVP5ZWa79l/h89R3bQqe4p57MVinLFQZ7Q/K0QeR/aQ49OMV90J0 LuXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701211277; x=1701816077; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dh6N3rKZh50ZqUHfYIy4C3rAe+/lQ6Wd5uhpe8fN02E=; b=H5QXmVn7whnNCdntcfaTh8BS+6OMIeTjAj6AiZTVPtP8/0hrULreRTks5CxtO4p+zJ Y0/0Tll23e491+ojzSQc1AA3gI3XXNeF1D9uiVAldL3wPzQcQHovDJ1sxYshZG0L+K3A TWKvgobzjEv2ihlUEyPgJe3a5ohC8+ViGl+CHhXEu0d+O6GD9kFe0yaQZjRobD9eaADv P0WUqqqR6I8Sll1jsOAaxccjnS3ymmc/9On5QMfbsat7sYmijLzDTNWAlL+q6C1xbeSP KQoY9jQg0OqkPyRfIeQkVSXXgGVt/UFV6zLj2xYa1w9foszdgWsiz8JkdvU/55fk6q5P 9Gog== X-Gm-Message-State: AOJu0YxzM4xBok85dVEjSfoQbXn+AZGxdo+n8aQiJYQkrQUiaSGAv3UV gUxF6okJlxukEY4JaIucVAgRGSswr5zZEIgv30U= X-Google-Smtp-Source: AGHT+IHsqOGO0ZHrxnuX9YVWruY3OlEXTRMq3cuozTHYie5V19PJVZDgpViXfB6qJh3JqjIjmwohfNhtSSP5W1RH8Co= X-Received: by 2002:a05:6871:4504:b0:1f9:6971:12da with SMTP id nj4-20020a056871450400b001f9697112damr23485390oab.0.1701211276700; Tue, 28 Nov 2023 14:41:16 -0800 (PST) MIME-Version: 1.0 References: <1BA05C1A-AFAE-4E86-BAA2-420B22549519@gmail.com> <0D8856BC-DDEE-47F8-8C59-7F4DC7A64237@woofle.net> In-Reply-To: Date: Tue, 28 Nov 2023 23:40:40 +0100 Message-ID: To: Claude Pache Cc: Kamil Tekiela , Dusk , PHP internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Deprecate declare(encoding='...') + zend.multibyte + zend.script_encoding + zend.detect_unicode ? From: divinity76@gmail.com (Hans Henrik Bergan) btw if we come to some consensus to my php2utf8.php script is actually worthwhile to expand on, i can volunteer to add more encodings (SJIS, BIG5, anything supported by mbstring), but it wouldn't surprise me if a better approach exist and the script should be rewritten entirely~ >add that what's special about UTF-8 isn't that it's "fixed-endian". should've added this to the last post, but the "zend.detect_unicode" ini-option is specifically to scan for BOMs, and BOMs are significantly less useful in fixed-endian encodings (like UTF8) than bi-endian encodings (like UTF16/UTF32) ^^ On Tue, 28 Nov 2023 at 21:47, Hans Henrik Bergan wro= te: > > > What is the migration path for legacy code that use those directives? > > The migration path is to convert the legacy-encoding PHP files to UTF-8. > Luckily this can be largely automated, here is my attempt: > https://github.com/divinity76/php2utf8/blob/main/src/php2utf8.php > but that code definitely needs some proof-reading and additions - idk > if the approach used is even a good approach, it was just the first i > could think of, feel free to write one from scratch > > > >Can you share a little more details about how this works? > > I hope someone else can do that, but it allows PHP to parse and > execute scripts not written in UTF-8 and scripts utilizing > BOM/byte-order-masks. > > >add that what's special about UTF-8 isn't that it's "fixed-endian". > > one of multiple good things about UTF-8 is that it's fixed-endian, and > UTF8 don't need a BOM to specify endianess (unlike UTF16 and UTF32 > which are bi-endian, and a BOM helps identify endianess used~) > > >If the solution is as easy as just converting the encoding of the > source file, then why did we even need to have this setting at all? > Why did PHP parser support encodings that demanded the introduction of > > I've read your question but don't have an answer to it, hopefully > someone else knows. > > > On Tue, 28 Nov 2023 at 21:09, Claude Pache wrote= : > > > > > > > > > Le 28 nov. 2023 =C3=A0 20:56, Kamil Tekiela a = =C3=A9crit : > > > > > >> Convert your PHP source files to UTF-8. > > > > > > If the solution is as easy as just converting the encoding of the > > > source file, then why did we even need to have this setting at all? > > > Why did PHP parser support encodings that demanded the introduction o= f > > > this declare? > > > > It is not necessary as simple: because your code base may contain liter= al strings, and changing the encoding of the source file will effectively c= hange the contents of the strings. > > > > =E2=80=94Claude > >