Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29696 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 47564 invoked by uid 1010); 23 May 2007 16:32:03 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 47513 invoked from network); 23 May 2007 16:31:58 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 May 2007 16:31:58 -0000 Received: from [127.0.0.1] ([127.0.0.1:29414]) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ECSTREAM id 59/60-45184-67C64564 for ; Wed, 23 May 2007 12:31:50 -0400 Authentication-Results: pb1.pair.com header.from=salsi@icosaedro.it; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=salsi@icosaedro.it; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain icosaedro.it from 88.149.172.234 cause and error) X-PHP-List-Original-Sender: salsi@icosaedro.it X-Host-Fingerprint: 88.149.172.234 88-149-172-234.f5.ngi.it Linux 2.4/2.6 Received: from [88.149.172.234] ([88.149.172.234:34172] helo=icosrv.icosaedro.it) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 18/79-45184-49064564 for ; Wed, 23 May 2007 11:41:10 -0400 Received: from icosrv (localhost [127.0.0.1]) by icosrv.icosaedro.it with SMTP id l4NFf2dL011054 for ; Wed, 23 May 2007 17:41:04 +0200 Message-ID: <200705231541.l4NFf2dL011054@icosrv.icosaedro.it> User-Agent: tt v. 1.0.5; www.icosaedro.it/tt MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Date: Wed, 23 May 2007 17:40:57 CEST To: Subject: mbstring: missing support for hex numeric entities &xHHHH; From: salsi@icosaedro.it (Umberto Salsi) mbstring does not support numeric entities in HTML code. For example: echo urlencode( mb_convert_encoding("Е", "UTF-8", "HTML-ENTITIES") ); displays %F2%AF%B8%9F rather than the expected %D0%95. This bug was detected by Nick Wedd and reported in the newsgroup comp.lang.php, Message-ID: . I'd found the bug in the file ext/mbstring/libmbfl/filters/mbfilter_htmlent.c and added these features: - decode hex entities &xHHHH; - detect invalid digits - detect digits missing at all - detect values out of the range 0-0xffff Invalid values are returned verbatim. Apparently the right place for this patch should be http://cvs.sourceforge.jp/cgi-bin/viewcvs.cgi/php-i18n/ but currently the project isn't no more hosted there. The patch for ext/mbstring/libmbfl/filters/mbfilter_htmlent.c follows: 173a174,217 > static int mbfl_decode_numeric_entity(char *s, int s_len) > /* > s = numeric entity "ddd" or "xhhhh" > return: numeric value or -1 if not inside [0,0xffff] or invalid digits > */ > { > int ent, pos, c, d; > > ent = 0; > > if (*s == 'x' || *s == 'X') { > /* hexadecimal base */ > if ( s_len < 2 ) > return -1; /* no digits found */ > for (pos=1; pos c = s[pos]; > if (isdigit(c)) > d = c - '0'; > else if (isxdigit(c)) > d = tolower(c) - 'a' + 10; > else > return -1; /* invalid hex digit */ > ent = (ent << 4) + d; > if (ent > 0xffff) > return -1; /* too big */ > } > > } else { > /* decimal base */ > if ( s_len < 1 ) > return -1; /* no digits found */ > for (pos=0; pos c = s[pos]; > if (! isdigit(c) ) > return -1; /* invalid dec char */ > ent = ent*10 + (c - '0'); > if (ent > 0xffff) > return -1; /* too big */ > } > } > > return ent; > } > 192,193c236,246 < for (pos=2; posstatus; pos++) { < ent = ent*10 + (buffer[pos] - '0'); --- > ent = mbfl_decode_numeric_entity(&buffer[2], filter->status - 2); > if( ent >= 0 ){ > CK((*filter->output_function)(ent, filter->data)); > filter->status = 0; > /*php_error_docref("ref.mbstring" TSRMLS_CC, E_NOTICE, "mbstring decoded '%s'=%d", buffer, ent);*/ > } else { > /* failure */ > buffer[filter->status++] = ';'; > buffer[filter->status] = 0; > /* php_error_docref("ref.mbstring" TSRMLS_CC, E_WARNING, "mbstring cannot decode '%s'", buffer); */ > mbfl_filt_conv_html_dec_flush(filter); 195,197d247 < CK((*filter->output_function)(ent, filter->data)); < filter->status = 0; < /*php_error_docref("ref.mbstring" TSRMLS_CC, E_NOTICE, "mbstring decoded '%s'=%d", buffer, ent);*/ Best regards, ___ /_|_\ Umberto Salsi \/_\/ www.icosaedro.it