Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121862 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 33111 invoked from network); 29 Nov 2023 20:14:28 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 29 Nov 2023 20:14:28 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 011FA180003 for ; Wed, 29 Nov 2023 12:14:36 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 29 Nov 2023 12:14:35 -0800 (PST) Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-54af61f2a40so248078a12.3 for ; Wed, 29 Nov 2023 12:14:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701288866; x=1701893666; darn=lists.php.net; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=nU0YRB50ViqhLlcgz1O8fFQiUTEKjMLaI0bJt4h7MA0=; b=RCAZtoy1LQj4uxcI4CZ218Pe0tX6o+npvC/upFGKdLKnx4LL3EGGL36TcJbVGxM0Mm egpE/EHvzyfwozyyr500VVCE+yVkrLJeQAEVW59etYYZOTApC+ycgTLvamgpgpnWzZW+ HuJF4pFz4K0gGciAH7ozxcD8GZShv/8kgDFPmmYGDZzcxeZ5DN0a3n5b3vwuVUSwhyBQ LUsDs8zaoUEDpjolsgRhpPEU5ju9zRELDKLATh4sZVxKCIGU5M8Et2Wo8Osd0bgBIF8s xrrMwbwY5CUaP7tW0FQgtUpkN+mCaiI4GZGyTeU1sMeMeB6L9bRx26ZYSWaMyvqc51xA 70cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701288866; x=1701893666; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nU0YRB50ViqhLlcgz1O8fFQiUTEKjMLaI0bJt4h7MA0=; b=RYtr8NZ/LhWGxs0zd5ZHMICOLek+I+8riC5rmIJeYXF2/+j9dKtl2OT34FJ2Hfrk2Y 7KaAK2rqc2PCZhT7X6QmFpNFDKegThoVdnO7kREcv10/rqN/8R/Fz+wkgJ58FOyO4FGe X8vmEwHP7LRTDqOgQfGiV9hq/m07iiF99es/5DKz6sx4FP1VpBHFleK5G+NUxK8m9HHP VRq5G5UvuW0nY7q2MunYoFBC3tulH4RC8cxmVy0+uvdjt86DeN5PfKTK83M5HuWBbbfT IDsCkwsd7ePjaw+OMko5yLQziYsXMOAmtMXMthiZxD+BuiRZM5ukJ9GBV6NFKIf6HyeU BHRg== X-Gm-Message-State: AOJu0YzMGj5FHEOjXzCgrEGHzaHOuh2MDrqXeOqJBwPAIcdpZtv+XSjT Z/CDvlun7XVMrOlJM/vOfkw= X-Google-Smtp-Source: AGHT+IELqxQSovzVO2wy9e2fVZzqvJKxOpa6ipZ1QLXjdzPTF7xj5VLaNO8N1zNosdGdomGcJP/Neg== X-Received: by 2002:a05:6402:b12:b0:54a:eeae:a63 with SMTP id bm18-20020a0564020b1200b0054aeeae0a63mr13116488edb.21.1701288865944; Wed, 29 Nov 2023 12:14:25 -0800 (PST) Received: from smtpclient.apple ([89.249.45.14]) by smtp.gmail.com with ESMTPSA id z10-20020a05640240ca00b0054bde4df7f0sm785135edb.66.2023.11.29.12.14.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 29 Nov 2023 12:14:25 -0800 (PST) Message-ID: <2FF6C60C-300D-4900-84B8-0DF4311143A7@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_14D56F02-1DBB-48CD-87A5-DDE57C4CD5C1" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Date: Wed, 29 Nov 2023 21:14:13 +0100 In-Reply-To: Cc: Kamil Tekiela , Dusk , PHP internals To: Hans Henrik Bergan References: <1BA05C1A-AFAE-4E86-BAA2-420B22549519@gmail.com> <0D8856BC-DDEE-47F8-8C59-7F4DC7A64237@woofle.net> X-Mailer: Apple Mail (2.3731.700.6) Subject: Re: [PHP-DEV] Deprecate declare(encoding='...') + zend.multibyte + zend.script_encoding + zend.detect_unicode ? From: claude.pache@gmail.com (Claude Pache) --Apple-Mail=_14D56F02-1DBB-48CD-87A5-DDE57C4CD5C1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > Le 28 nov. 2023 =C3=A0 21:47, Hans Henrik Bergan = a =C3=A9crit : >=20 >> What is the migration path for legacy code that use those directives? >=20 > The migration path is to convert the legacy-encoding PHP files to = UTF-8. > Luckily this can be largely automated, here is my attempt: > https://github.com/divinity76/php2utf8/blob/main/src/php2utf8.php > but that code definitely needs some proof-reading and additions - idk > if the approach used is even a good approach, it was just the first i > could think of, feel free to write one from scratch Hi, Converting the character encoding of php files is by no means = sufficient, except in the simplest cases. Strings of text are to be found in various places, such as: 1. in the php files, as literals; 2. inside memory, at runtime; 3. in non-php data files stored on the server; 4. in the database; 5. as presented to the user (e.g. html document) and as received from = them (e.g. form submission); 6. etc. If you change the character encoding in (1), you necessarily change the = encoding in (2), unless you wrap your literals with some function that = performs the conversion in the other direction at runtime. And if you = change the encoding in (2), you should be very careful when your text = flows from and to (3), (4), (5) and (6): you should either change the = encoding at those places, or make sure that proper conversion is done at = the boundaries of those domains. Also, mechanical conversion is not the whole story. For example, if you = change the encoding in (5), you should not forget to adapt the tag and/or the content-type http header. Also, all strings are not text, and only a human can decide whether the = literal =E2=80=9C\xe9=E2=80=9D in a random location is meant to encode = the raw byte 0xE9 or the character =E2=80=9C=C3=A9=E2=80=9D in latin-1. Of course, because we live in an interesting world, there will be = situations where the encoding is unknown or ambiguous. Yuya mentioned = the case of Shift-JIS which has various incompatible variants, and I am = happy not to have encountered such ambiguities (only unknownnesses) when = I converted our code base from windows-1252 (aka latin-1) to utf-8 a few = years ago. =E2=80=94Claude --Apple-Mail=_14D56F02-1DBB-48CD-87A5-DDE57C4CD5C1--