Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78024 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17025 invoked from network); 14 Oct 2014 15:27:40 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Oct 2014 15:27:40 -0000 Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.176 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 209.85.216.176 mail-qc0-f176.google.com Received: from [209.85.216.176] ([209.85.216.176:61941] helo=mail-qc0-f176.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E4/27-18603-BE04D345 for ; Tue, 14 Oct 2014 11:27:40 -0400 Received: by mail-qc0-f176.google.com with SMTP id r5so6863660qcx.21 for ; Tue, 14 Oct 2014 08:27:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=QRQCPsnJsuCLbdLMhoFRYUSmVI0jTxJUlhjQc8Lnv3Q=; b=qpMsmBK9ANaq98Or3E7g3Yz4odTOp4jL+sMIqvyw/xru0fZ3sahL50JThveHzBGV6Q YDGvmxDguJKK5DSEGgSVSJyHca2RbjSfGY9WTvo2PM8XpFJ0gvs2IOQGYySbhXy6Htj4 JzWpV7KkxgiGeTHaQpjfVTaLPmeg+ojCH8tDIxxG4FxjwhXTZiGCw2KUZUi5eo+cI4QY 5Dpb0Tq1F3ldFv6MCHTFv4/hQyGT+uU/UhhSl38oowkCyi5Ac7c8RbIgPPRCZt+3NdDa pOwij24z2OEhSMYOupKI4LD3vYXeIL+ueKoxk6G4aH40DHap+T5go/74hxxfa3If4n3B BanA== MIME-Version: 1.0 X-Received: by 10.224.64.71 with SMTP id d7mr10442618qai.16.1413300457050; Tue, 14 Oct 2014 08:27:37 -0700 (PDT) Sender: are.you.winning@gmail.com Received: by 10.141.28.193 with HTTP; Tue, 14 Oct 2014 08:27:36 -0700 (PDT) In-Reply-To: <543D3CA7.3060704@gmail.com> References: <543CE705.7030203@gmail.com> <543D3CA7.3060704@gmail.com> Date: Tue, 14 Oct 2014 16:27:36 +0100 X-Google-Sender-Auth: JniqzRPFW-RNiq9qeaPceGKbD9Q Message-ID: To: Aleksey Tulinov Cc: Chris Wright , PHP Internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] Unicode support From: daverandom@php.net (Chris Wright) On 14 October 2014 16:09, Aleksey Tulinov wrote: > On 14/10/14 14:00, Chris Wright wrote: > > Chris, > >>> Latter is referring to difficulties like "excess memory usage" and >>> "rewrite >>> the language". I'm developing an open-source Unicode implementation >>> library >>> (nunicode), and it doesn't consume any heap at all, it also works on >>> native >>> binary strings, as PHP does. Hence i thinks that maybe it could help with >>> at >>> least these two problems. >> >> >> On the face of it, this implies a rather large performance hit and a >> tendency to overflow the stack much more readily, do you have any >> details on these elements? >> > > I can't really tell if hit is going to be large before understanding what > final result would be, at least approximately. > > I can tell that internal complexity of nunicode is O(1) everywhere. I'm > comparing performance to ICU and nunicode mostly outperforms it. I've > compiled some numbers here: > https://bitbucket.org/alekseyt/nunicode#markdown-header-performance-considerations Great, thanks for this > Regarding stack, i'm not sure if get the point. As far as i'm concerned, > library does not have recursive calls, it does not have internal > representation and does not allocate on stack aggressively. Everything works > on immutable binary strings, stack will be used mostly for function calls. > > But honestly, i feel like i'm not answering your question at all. Could you > possibly clarify it? My apologies, this was a case of typing before thinking properly. I was envisaging very large stack frames due to large char arrays being allocated on the stack but when I actually apply my brain to what you are doing I realise that this isn't going to be the case. Carry on. >>> I would appreciate if someone would point me to a good read or explain >>> collective opinion on this topic. I'm basically interested in the >>> following >>> questions: >> >> >> The only additional thing I can find quickly is something Pierre put >> together earlier this year, when PHP6 (now 7) discussions were >> started: >> https://wiki.php.net/ideas/php6/unicode >> > > Thank you, this is exactly what i was looking for. > > I would appreciate if someone would comment on the following: > >> Some of the keys point we need to take care of are: >> >> 1) UTF-8 storage >> 2) UTF-8 support for almost (if not all) existing string APIs >> 3) Performance >> >> As of today, I did not find any library covering at least two of these key >> points. > > I think i could claim that nunicode is covering at least two key points, > maybe all of them, but i'm not sure about point 2). API do include > operations on strings, but this API is simply following standard string > functions (UTF equivalents of strcoll(), strchr(), strstr(), etc). Does that > sound good or not?