Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29636 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79016 invoked by uid 1010); 21 May 2007 19:47:01 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 79001 invoked from network); 21 May 2007 19:47:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 May 2007 19:47:01 -0000 Authentication-Results: pb1.pair.com smtp.mail=andrei@gravitonic.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=andrei@gravitonic.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain gravitonic.com from 204.11.219.139 cause and error) X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 204.11.219.139 mail.lerdorf.com Received: from [204.11.219.139] ([204.11.219.139:45743] helo=mail.lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 89/63-03101-237F1564 for ; Mon, 21 May 2007 15:47:00 -0400 Received: from [192.168.1.166] (adsl-75-57-244-158.dsl.snfc21.sbcglobal.net [75.57.244.158]) (authenticated bits=0) by mail.lerdorf.com (8.14.1/8.14.1/Debian-2) with ESMTP id l4LJktNA007456; Mon, 21 May 2007 12:46:56 -0700 In-Reply-To: <48916.88.118.163.159.1179771974.squirrel@avilys.eik.lt> References: <51491.88.118.163.159.1179577357.squirrel@avilys.eik.lt> <464EEF4B.1030002@zend.com> <40865.88.118.163.159.1179583186.squirrel@avilys.eik.lt> <464F090A.9090200@zend.com> <35054.88.118.163.159.1179589687.squirrel@avilys.eik.lt> <464F650B.6090802@zend.com> <59165.88.118.163.159.1179641635.squirrel@avilys.eik.lt> <335A483A-55B1-4A1D-A2CF-A6DB0EDDFA5F@gravitonic.com> <48916.88.118.163.159.1179771974.squirrel@avilys.eik.lt> Mime-Version: 1.0 (Apple Message framework v752.2) X-Priority: 3 (Normal) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: Cc: internals@lists.php.net Content-Transfer-Encoding: 7bit Date: Mon, 21 May 2007 12:46:55 -0700 To: Tomas Kuliavas X-Mailer: Apple Mail (2.752.2) X-Virus-Scanned: ClamAV 0.90.2/3276/Mon May 21 10:44:12 2007 on colo.lerdorf.com X-Virus-Status: Clean Subject: Re: [PHP-DEV] PHP Unicode extension in PHP6 From: andrei@gravitonic.com (Andrei Zmievski) > They are not documented and I am testing configurations that might > break > scripts. If I test things and want to make code portable, > configuration is > not supposed to be rational. I can set option with ini_set(), if I > understand what option does and it fixes the issue. > > http://www.php.net/unicode > > Do you have updated documentation version which explains encoding > settings > and lists available configuration values? Or am I testing PHP6 too > early > and you are still months or years away from 6.0.0 betas and rcs? > Could you > implement pseudo encoding similar to 'pass' encoding used in mbstring? > Current implementation does not give controls needed by script > writers. Have you looked at any of the talks I've given on this topic? http://www.gravitonic.com/talks That's the closest thing to documentation you'll find right now. Unfortunately, documentation always lags behind the actual development. > SquirrelMail scripts are not written in unicode. They are in ascii. If > some 8bit value is used, it is always written in octal or hex > notation. > These hex values are not written in one character set. In some cases > scripts use byte values. For example, locating first utf-8 byte or > looking > for 0x80-0xFF bytes in string. In other cases they are written in > source > or target character set. For example, iso-8859-2 decoding function > contains array with iso-8859-2 hex values mapped to html codes. > Code can't > use raw 8bit strings, because they might be corrupted in misconfigured > editor used by developer and it is very hard to track such corruption. > 8bit data can come only from user input (composed emails and > preferences, > html forms, one common charset) and imap server (received emails, > lots of > different charsets and encodings). Maybe you don't need to turn unicode.semantics=on, if you are working only with 8-bit strings. -Andrei