Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58886 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 20646 invoked from network); 12 Mar 2012 19:44:33 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 19:44:33 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.215.170 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.215.170 mail-ey0-f170.google.com Received: from [209.85.215.170] ([209.85.215.170:52653] helo=mail-ey0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 93/61-13375-0225E5F4 for ; Mon, 12 Mar 2012 14:44:32 -0500 Received: by eaao10 with SMTP id o10so1363791eaa.29 for ; Mon, 12 Mar 2012 12:44:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-gm-message-state; bh=YblxVSrKapV5j5/xWM3mKfQcSDLcRmFLKGvVuGonkW8=; b=nNTgNIoRkZbXTgcX+TGKiV4foZkZXGFOC5AK1Q+gJuiulzwxmDQHmKYDN16UtWJ3vG pJYHvFTx+NhsTwCB/Xq8toa0UWGLwBpspUlLp99WqYEomVsuSVqp6x3vfYTRSDuOrngS mf7lHiAu0heZuz8M2cRfhQYRkRxg118g5JHTB3SUQH2WtmAQXSWigJH7IvifleIq1/Qe oBQoZ5dXd1a8KB1wOb1GmVEap6jJrkzMMLgxpP/gjw/3J/VW8U30mQbOK1KSfPgiMY/k 7kWZqgnKjoYtcg3JvQ513vbDuenCHwA+N0P+ytTbODuiA7tncIEcGOrJCWq85Kc+3RIP j76w== Received: by 10.229.75.139 with SMTP id y11mr2940358qcj.69.1331581468436; Mon, 12 Mar 2012 12:44:28 -0700 (PDT) Received: from [172.16.21.6] ([38.106.64.245]) by mx.google.com with ESMTPS id n8sm30260788qan.12.2012.03.12.12.44.26 (version=SSLv3 cipher=OTHER); Mon, 12 Mar 2012 12:44:27 -0700 (PDT) Message-ID: <4F5E5219.7080501@lerdorf.com> Date: Mon, 12 Mar 2012 12:44:25 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Stas Malyshev CC: PHP internals References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> <4F5E5148.4030106@sugarcrm.com> In-Reply-To: <4F5E5148.4030106@sugarcrm.com> X-Enigmail-Version: 1.3.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQmurxhwFe5dTvr13/uoIs/u+CTU9uaxYPjNjYJyOJNNk/z5cGipcs9AVkHa938rVyjIOeis Subject: Re: [PHP-DEV] default charset confusion From: rasmus@lerdorf.com (Rasmus Lerdorf) On 03/12/2012 12:40 PM, Stas Malyshev wrote: > Hi! > >> And yes, it may very well be dangerous to use the wrong charset and now >> that we have better support for GB2312 and other asian charsets in the >> entities functions in 5.4 it is even more prudent to choose the right >> one so we should provide some way to help people get it right short of >> changing every call. > > I'm not sure "changing every call" is such a big problem - it's one grep > and one replace, can be done in one line of sed/awk/perl/php probably. > But a bigger issue is here that people insist on using wrong charsets > and expect language to have some magical external defaults that work for > exactly their use case, instead of doing what they should be doing all > along - putting charset right there in the argument. > We need to get people off this mindset fast, since it is not a good one. > Having tons of hidden defaults that modify behavior of functions called > with the same arguments in hundreds of different ways is a coding and > maintenance nightmare. Now if I write htmlspecialchars() I can never be > sure if works right and uses UTF-8 - what if somebody messed with the > INI setting because of some other broken library that required that to > work? But you can't necessarily hardcode the encoding if you are writing portable code. That's a bit like hardcoding a timezone. In order to write portable code you need to give people the ability to localize it. -Rasmus