Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:18807 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 60167 invoked by uid 1010); 13 Sep 2005 00:54:33 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 60152 invoked from network); 13 Sep 2005 00:54:33 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Sep 2005 00:54:33 -0000 X-Host-Fingerprint: 80.74.107.235 mail.zend.com Linux 2.5 (sometimes 2.4) (4) Received: from ([80.74.107.235:36784] helo=mail.zend.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 44/03-27924-84326234 for ; Mon, 12 Sep 2005 20:54:33 -0400 Received: (qmail 24718 invoked from network); 13 Sep 2005 00:54:27 -0000 Received: from localhost (HELO ANDI-NOTEBOOK.zend.com) (127.0.0.1) by localhost with SMTP; 13 Sep 2005 00:54:27 -0000 Message-ID: <6.2.3.4.2.20050912175136.04449320@localhost> X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4 Date: Mon, 12 Sep 2005 17:54:28 -0700 To: Andrei Zmievski ,Antony Dovgal Cc: php-dev ,Dmitry Stogov In-Reply-To: <9CF57DC5-A18B-4264-B20B-8552B0BB66F1@gravitonic.com> References: <43215A91.8050409@zend.com> <9CF57DC5-A18B-4264-B20B-8552B0BB66F1@gravitonic.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: Re: [PHP-DEV] unserialize() & unicode issues From: andi@zend.com (Andi Gutmans) Not coming with a solution, but I believe this would be a bad idea. I do think some people will be using IS_UNICODE strings when unicode_semantics=off, mainly for existing applications. They may want to serialize Unicode strings even though their classes are IS_STRING. It might make sense to raise an error though if a "class" is used, but if it's just a value or a hash key, then those are valid in unicode_semantics=off. Andi At 06:44 AM 9/9/2005, Andrei Zmievski wrote: >Yes, serialization is a problem. I would actually advocate putting a >marker in the serialized file that indicates what the value of >unicode_semantics switch was during the serialization, and if the >value is different during deserialization, refuse to load it or start >a new session. One really should not be changing that switch on a >whim in-between sessions. > >-Andrei > > >On Sep 9, 2005, at 2:49 AM, Antony Dovgal wrote: > >>Hello all. >> >>I'm currently working on unicode support in serialize()/unserialize >>() and stuck with some issues. >>Here they are: >> >>1) What to do with unserializing serialized unicode strings when >>unicode_semantics is Off? >>I presume it's safe to create & return IS_UNICODE in this case ? >> >>2) Classnames are serialized without U: or s: prefix, but I can >>detect unicode string by it's leading "\". >>It's looks kinda tricky, but on the other hand forward slash can't >>appear there if it's not unicode. >>Or should I change it to use U:/s: prefixes? (Didn't try it yet, so >>I can't say how difficult it would be). >> >>The other problem here is that we can't use unicode class names >>when unicode_semantics is Off because in this case class_table >>stores them as IS_STRING and we won't be able to find class entry >>by it's unicode name (thanks to Val for noticing this). >> >>3) Currently serialize() produces valid \u0000 sequences, which can >>be parsed/restored perfectly fine when reading them from a file or >>returning from serialize(). >>But specifying them as a const string won't work as these sequences >>get parsed in compile time. >> >>Short example: >>>var_dump(unserialize('U:2:"\u0061\u0061";')); // won't work >>var_dump(unserialize(serialize("aa"))); // works >>var_dump('U:2:"\u0061\u0061";'); //produces unicode(9) "U:2:"aa";" >>?> >>IMO the best way here is to change serialize() output to produce >>something else (for example \pu0000 instead of \u0000) - in this >>case it works just fine. >> >>Comments? >> >>-- >>Wbr, Antony Dovgal >> >>-- >>PHP Internals - PHP Runtime Development Mailing List >>To unsubscribe, visit: http://www.php.net/unsub.php > >-- >PHP Internals - PHP Runtime Development Mailing List >To unsubscribe, visit: http://www.php.net/unsub.php