Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:32479 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 8831 invoked by uid 1010); 27 Sep 2007 19:30:24 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 8811 invoked from network); 27 Sep 2007 19:30:24 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Sep 2007 19:30:24 -0000 X-Host-Fingerprint: 217.224.139.160 pD9E08BA0.dip.t-dialin.net Received: from [217.224.139.160] ([217.224.139.160:25917] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 1E/DC-31655-8C40CF64 for ; Thu, 27 Sep 2007 15:30:23 -0400 To: internals@lists.php.net, troelskn@gmail.com Message-ID: <46FC0539.5030107@gmx.de> Date: Thu, 27 Sep 2007 21:32:09 +0200 User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 References: <98b8086f0709271137w32a88103sa3c3b810970e150e@mail.gmail.com> In-Reply-To: <98b8086f0709271137w32a88103sa3c3b810970e150e@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 217.224.139.160 Subject: Re: [PHP-DEV] how php knows the charset of my code? From: drysler@gmx.de (drysler) > On 9/27/07, drysler wrote: >> Hello, >> >> i am practising with charsets at the moment and so i thought: >> >> -> How does PHP know the charset i use in my source-code? >> -> Are php-sources limited to specific charsets? >> -> In which areas you have to be aware of the source-code-charset? >> >> >> Perhaps somebody here on the list can tell something about these issues? >> Thanks! >> > Unless I'm mistaken, PHP expects the source files to be in the > internal charset, which is ISO-8859-1. If you use the mbstring > extension, you can use different internal encodings. See: > http://www.php.net/mbstring > > Another good read on charset vs. PHP is: > http://www.phpwact.org/php/i18n/charsets?s=utf > > -- > troels I think, the problem may be divided into 2 areas: 1) handling charsets of data (e.g. regex or string functions) ------------------------------------------------------------------------ No unsolvable problem. You have to know (and/or validate) the charset of the data you process, no matter if typed in in the source code or loaded from other data sources. There are "tools and workarounds" available, to do the things right. 2) paying attention to the charset of the source code ------------------------------------------------------------------------ This is the main issue, i wanted to address with my posting. I asked myself, if there can be characters i use as source code, which php perhaps can not recognize because of the charset i used in the source-code-document. Or perhaps in php are only characters "allowed", which are represented all the same in all supported charsets, so there might be a list of charsets, you can safely use when scripting php. I mean, is there a difference (bytes?) writing the following in iso-8859-1 or utf-8? public function foo($bar = true) { return self::SOME_CONSTANT; } And if there is a difference, how php knows what i typed? So many questions .... :) -- Greetings, drysler