Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78022 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 13596 invoked from network); 14 Oct 2014 15:09:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Oct 2014 15:09:37 -0000 Authentication-Results: pb1.pair.com header.from=aleksey.tulinov@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=aleksey.tulinov@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.170 as permitted sender) X-PHP-List-Original-Sender: aleksey.tulinov@gmail.com X-Host-Fingerprint: 209.85.217.170 mail-lb0-f170.google.com Received: from [209.85.217.170] ([209.85.217.170:37915] helo=mail-lb0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 5F/66-18603-FAC3D345 for ; Tue, 14 Oct 2014 11:09:36 -0400 Received: by mail-lb0-f170.google.com with SMTP id u10so8438858lbd.15 for ; Tue, 14 Oct 2014 08:09:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=mSIoSutLPET6M6Y6ZVRJ91KpSYtOOq7zJ57XTiseYNQ=; b=r/wodHXWhrm5RCmukyTgzpoYxuM5O4U3ne8uuLBbcvM0csbXDYK9pkR/DOl/UOn4SZ dalrHVmTQhcFt+UJECvJ5MuRWTH51zAiedvnw1iT7SH1toakDKSTzInx/Tlmh+M39X68 TyN56hsmT3oWDVo/MFRD9JVsms1+0Typssq8fYZ+MFZzzjYp7pcDFzfN3VzI2qDv0h/J TcnU5Rpemsfpf+wMQMMI/3Snk9IWlQ1b2UbkBv/oDbv26P4d6Sg1njHYTpMinJtdEcz0 +bEHirx6YDtDHlOwK4sYB2aisvaViuNsHjxbyaQhylXatBHVzmqNfxKjrOeuHVOhGTa3 UlIQ== X-Received: by 10.152.23.68 with SMTP id k4mr5834171laf.79.1413299372190; Tue, 14 Oct 2014 08:09:32 -0700 (PDT) Received: from [172.16.0.137] ([195.177.73.61]) by mx.google.com with ESMTPSA id jp17sm5701672lab.18.2014.10.14.08.09.30 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Oct 2014 08:09:31 -0700 (PDT) Message-ID: <543D3CA7.3060704@gmail.com> Date: Tue, 14 Oct 2014 18:09:27 +0300 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Chris Wright CC: PHP Internals References: <543CE705.7030203@gmail.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Unicode support From: aleksey.tulinov@gmail.com (Aleksey Tulinov) On 14/10/14 14:00, Chris Wright wrote: Chris, >> Latter is referring to difficulties like "excess memory usage" and "rewrite >> the language". I'm developing an open-source Unicode implementation library >> (nunicode), and it doesn't consume any heap at all, it also works on native >> binary strings, as PHP does. Hence i thinks that maybe it could help with at >> least these two problems. > > On the face of it, this implies a rather large performance hit and a > tendency to overflow the stack much more readily, do you have any > details on these elements? > I can't really tell if hit is going to be large before understanding what final result would be, at least approximately. I can tell that internal complexity of nunicode is O(1) everywhere. I'm comparing performance to ICU and nunicode mostly outperforms it. I've compiled some numbers here: https://bitbucket.org/alekseyt/nunicode#markdown-header-performance-considerations Regarding stack, i'm not sure if get the point. As far as i'm concerned, library does not have recursive calls, it does not have internal representation and does not allocate on stack aggressively. Everything works on immutable binary strings, stack will be used mostly for function calls. But honestly, i feel like i'm not answering your question at all. Could you possibly clarify it? >> I would appreciate if someone would point me to a good read or explain >> collective opinion on this topic. I'm basically interested in the following >> questions: > > The only additional thing I can find quickly is something Pierre put > together earlier this year, when PHP6 (now 7) discussions were > started: > https://wiki.php.net/ideas/php6/unicode > Thank you, this is exactly what i was looking for. I would appreciate if someone would comment on the following: > Some of the keys point we need to take care of are: > > 1) UTF-8 storage > 2) UTF-8 support for almost (if not all) existing string APIs > 3) Performance > > As of today, I did not find any library covering at least two of these key points. I think i could claim that nunicode is covering at least two key points, maybe all of them, but i'm not sure about point 2). API do include operations on strings, but this API is simply following standard string functions (UTF equivalents of strcoll(), strchr(), strstr(), etc). Does that sound good or not?