Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30290 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 16846 invoked by uid 1010); 20 Jun 2007 15:04:10 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 16831 invoked from network); 20 Jun 2007 15:04:10 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Jun 2007 15:04:10 -0000 Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 213.197.162.99 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 213.197.162.99 avilys.eik.lt Linux 2.6 Received: from [213.197.162.99] ([213.197.162.99:39247] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D9/B2-30934-7E149764 for ; Wed, 20 Jun 2007 11:04:09 -0400 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 10BF91F5100; Wed, 20 Jun 2007 18:02:25 +0300 (EEST) Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id EA21A1F50FB; Wed, 20 Jun 2007 18:02:24 +0300 (EEST) Received: from 78.61.224.253 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Wed, 20 Jun 2007 18:02:24 +0300 (EEST) Message-ID: <47923.78.61.224.253.1182351744.squirrel@avilys.eik.lt> In-Reply-To: References: <1181829227.3478.3.camel@localhost.localdomain> <7d5a202f0706141844l3c75b556hdbecbcd5a43747c9@mail.gmail.com> <4671F184.2020401@lerdorf.com> <6sof73dj69ldpspfc5ukrc58qr9ckbin2b@4ax.com> <4677E7B1.2080305@lerdorf.com> <4677F3FA.3010000@pooteeweet.org> <49348.78.61.224.253.1182270469.squirrel@avilys.eik.lt> <4858f9d90706190949s618b572bpf3fbb48a1c9f9c7c@mail.gmail.com> <60131.78.61.224.253.1182277933.squirrel@avilys.eik.lt> Date: Wed, 20 Jun 2007 18:02:24 +0300 (EEST) To: =?utf-8?B?TEFVUFJFVFJFIEZyYW7Dp29pcw==?= Cc: internals@lists.php.net User-Agent: NaSMail/1.0 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [PHP-DEV] RE : [PHP-DEV] What is the use of "unicode.semantics" in PHP 6? From: tokul@users.sourceforge.net ("Tomas Kuliavas") >> We are working on different code. You have code with some specific >> character set and you can control all strings. > > Tomas, stop arguing on this. As a library maintainer, I agree with you and > I don't understand where the > 'killer feature' is (I heard that Yahoo China asked for it, or is it > because Zend is established in > Israel, I don't know...), but, now, if people don't switch to PHP 6 (and I > am sure they won't), it will > be your fault, because of your supposed FUD ;) ---- /** * @param string $string utf8 string * @return string html encoded string */ function test_convert_utf8ToHtml($string) { // removed 0xE0-0xFD decoding // decode two byte utf8 characters $string = preg_replace("/([\300-\337])([\200-\277])/e", "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", $string); // remove broken utf8 $string = preg_replace("/[\200-\237]|\240|[\241-\377]/",'?',$string); return $string; } // \u0105\u30A1 $string = 'ąァ'; // expected result 'ą???' or 'ąァ' echo test_convert_utf8ToHtml($string); ---- Please show how to do this in PHP6 unicode.semantics=on. Without mbstring, recode or other character set conversion extensions and without htmlentities() function. Only core functions and pcre extension. Then make updated function compatible with PHP 5.2.0. test_convert_utf8ToHtml() is based on code from modular library. I know that I can split it into PHP5 and PHP6 code, but I can find functions that are not modulized and can't be replaced with unicode_encode(). For example MIME Q encoding or 8bit string detection. -- Tomas