Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:18742 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 52000 invoked by uid 1010); 9 Sep 2005 13:44:44 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 51975 invoked from network); 9 Sep 2005 13:44:44 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Sep 2005 13:44:44 -0000 X-Host-Fingerprint: 204.11.219.139 lerdorf.com Linux 2.4/2.6 Received: from ([204.11.219.139:58264] helo=colo.lerdorf.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 34/62-17383-BC191234 for ; Fri, 09 Sep 2005 09:44:43 -0400 Received: from [192.168.1.117] ([66.90.6.56]) (authenticated bits=0) by colo.lerdorf.com (8.13.4/8.13.4/Debian-4) with ESMTP id j89DidBt018967; Fri, 9 Sep 2005 06:44:39 -0700 In-Reply-To: <43215A91.8050409@zend.com> References: <43215A91.8050409@zend.com> Mime-Version: 1.0 (Apple Message framework v733) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <9CF57DC5-A18B-4264-B20B-8552B0BB66F1@gravitonic.com> Cc: php-dev , Dmitry Stogov Content-Transfer-Encoding: 7bit Date: Fri, 9 Sep 2005 06:44:38 -0700 To: Antony Dovgal X-Mailer: Apple Mail (2.733) Subject: Re: [PHP-DEV] unserialize() & unicode issues From: andrei@gravitonic.com (Andrei Zmievski) Yes, serialization is a problem. I would actually advocate putting a marker in the serialized file that indicates what the value of unicode_semantics switch was during the serialization, and if the value is different during deserialization, refuse to load it or start a new session. One really should not be changing that switch on a whim in-between sessions. -Andrei On Sep 9, 2005, at 2:49 AM, Antony Dovgal wrote: > Hello all. > > I'm currently working on unicode support in serialize()/unserialize > () and stuck with some issues. > Here they are: > > 1) What to do with unserializing serialized unicode strings when > unicode_semantics is Off? > I presume it's safe to create & return IS_UNICODE in this case ? > > 2) Classnames are serialized without U: or s: prefix, but I can > detect unicode string by it's leading "\". > It's looks kinda tricky, but on the other hand forward slash can't > appear there if it's not unicode. > Or should I change it to use U:/s: prefixes? (Didn't try it yet, so > I can't say how difficult it would be). > > The other problem here is that we can't use unicode class names > when unicode_semantics is Off because in this case class_table > stores them as IS_STRING and we won't be able to find class entry > by it's unicode name (thanks to Val for noticing this). > > 3) Currently serialize() produces valid \u0000 sequences, which can > be parsed/restored perfectly fine when reading them from a file or > returning from serialize(). > But specifying them as a const string won't work as these sequences > get parsed in compile time. > > Short example: > var_dump(unserialize('U:2:"\u0061\u0061";')); // won't work > var_dump(unserialize(serialize("aa"))); // works > var_dump('U:2:"\u0061\u0061";'); //produces unicode(9) "U:2:"aa";" > ?> > IMO the best way here is to change serialize() output to produce > something else (for example \pu0000 instead of \u0000) - in this > case it works just fine. > > Comments? > > -- > Wbr, Antony Dovgal > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php >