Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:18089 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 2149 invoked by uid 1010); 15 Aug 2005 06:02:01 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 2132 invoked from network); 15 Aug 2005 06:02:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Aug 2005 06:02:01 -0000 X-Host-Fingerprint: 66.156.3.122 adsl-156-3-122.asm.bellsouth.net Received: from ([66.156.3.122:28392] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id C0/92-33075-7DF20034 for ; Mon, 15 Aug 2005 02:02:00 -0400 Message-ID: To: internals@lists.php.net Reply-To: References: <42FCE0E4.604@lerdorf.com> <1114795828.20050813012424@marcus-boerger.de> <42FD935D.8060602@prohost.org> <5DC75CC6-B8F8-4431-B031-606EA0BFF912@gravitonic.com> <42FD977E.1030705@prohost.org> <0ABEEE7D-6126-4C90-85B9-B2DDF2754287@gravitonic.com> <133A42C8-184B-49F3-8EEA-3B595B4BD062@gravitonic.com> Date: Sun, 14 Aug 2005 14:00:05 -0700 Lines: 99 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-RFC2646: Format=Flowed; Response X-Posted-By: 66.156.3.122 Subject: Re: [PHP-DEV] PHP 6.0 Wishlist From: cshmoove@bellsouth.net Andrei, it was controlled by an ini setting. there are certain APIs that take or return offsets, so translation was done in those instances depending on the setting. Here's an example (it's not currently implemented this way, though..) since my concern was only the extension, i didnt touch the engine itself.. pardon the formatting.... /* {{{ proto long BreakIterator::next([long offset]) */ static ZEND_BEGIN_ARG_INFO_EX(arginfo_breakiterator_next, 0, 0, 0) ZEND_ARG_INFO(0, offset) ZEND_END_ARG_INFO(); BREAKITERATOR_METHOD(next) { php_breakiterator_obj *obj = (php_breakiterator_object *)zend_object_store_get_object(getThis() TSRMLS_CC); BreakIterator *iter = (BreakIterator *)obj->ptr; UnicodeString *text = obj->text; long offset, result; if (0 == ZEND_NUM_ARGS()) { offset = (long)iter->next(); } else { long start = 0; if (FAILURE == zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "l", &start)) { return; } if (ICUG(codepoint_semantics)) { FROM_CODEPOINT_INDEX(text->getBuffer(), text->length(), start, offset); offset = (long)iter->next(offset); } else { offset = (long)iter->next(start); } } if (ICUG(codepoint_semantics)) { long result; TO_CODEPOINT_INDEX(text->getBuffer(), text->length(), offset, result); RETURN_LONG(result); } else { RETURN_LONG(offset); } } /* }}} */ clayton "Andrei Zmievski" wrote in message news:F269F06B-C34A-4BE0-A486-9C0AAC9CA2DF@gravitonic.com... > And this was controlled how and from where? > > -Andrei > > > On Aug 14, 2005, at 12:29 PM, > wrote: > >> Back in the early days of the extension, i had a request global >> ICUG(codepoint_semantics) which controlled this. Setting this to false >> would >> revert to code-unit indexing (which ICU does internally). >> >> clayton >> >> "Andrei Zmievski" wrote in message >> news:133A42C8-184B-49F3-8EEA-3B595B4BD062@gravitonic.com... >> >>> >>> Then why don't we put our collective brains together and think of a >>> solution for this that does not involve hacks? >>> >>> -Andrei >>> >>> On Aug 14, 2005, at 3:51 AM, Derick Rethans wrote: >>> >>>> >>>> In quite some cases for me i'm sure there are no surrogates in the >>>> text >>>> I'm parsing. Having to deal with rescanning the string for every >>>> access to a character is not really wanted. >>>> >>>> Derick >>>> >>>> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >>