Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104331 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 73529 invoked from network); 10 Feb 2019 15:46:58 -0000 Received: from unknown (HELO mail-qt1-f176.google.com) (209.85.160.176) by pb1.pair.com with SMTP; 10 Feb 2019 15:46:58 -0000 Received: by mail-qt1-f176.google.com with SMTP id j36so9142830qta.7 for ; Sun, 10 Feb 2019 04:29:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=dYylQXdF5F4BK91dYyarF9TNtZ+Al0gI0gHxTwNktOk=; b=KIV9dws64aOqSypY+0awJHmMT36vf7smfMIvdeGcyv80YeOl/IioSqSlSm0BqwCUaT qB8RHotodW6wCFz/tzRn6xMSN1S1/iY5bCLd7M+HPc8xaCy0W5B/I26uH5//smimBno4 LBqOGB7Faxx22WosmmqtuV648booAUn1UxMX59ZAtozs++i2R7wSX253jq8A0N2ThCD3 ct++5MOm3barUMZNZpNRHs3UOF00e1oQHefjuUM4smcAGm1BF/PoiBjPCkciDQwcmHT7 QBt1yjNYF/VkvEar9CVbLsldmybkLuIw5eNLp6wMUqLmMrUgRk1Sn6xCD6X4usAoBIP/ pe7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=dYylQXdF5F4BK91dYyarF9TNtZ+Al0gI0gHxTwNktOk=; b=mCXXcBaKqb/0pqnRM6O1V6m0hAefoHmki/K3KiGEhmutm+sTWGDVheoq1KTd7P09mE xrpmC9oYP8Lu8mDXjveDUxple+GNeQPneBraiwVO+q5IY7cZ/KMZ9q9VdAehtdMIHxa+ 9RSMBki90pBK9s3WPpcZ+JQxeRC//jZOafk2Kgp0pKk8Jvkh7ppOEW6n+a0TGvQZSSLy Cub5uoqnvhvEZ2wHOe3JIBmTjV4ZJSYSomqIQ4tJB+K5gvrq2l4yzNlyPgnw25dOb+Sa 49kAvoxrcK1igiLmjGjaZnlZEeClB4nAEfQl0cTjZ/Dtu2kkiGMMutU7QDSju1ueqEwg 8ZDQ== X-Gm-Message-State: AHQUAuaxxDgjbg42fPRvgeXkASWWy5VaadXW/P9CIUoTKoQfOFfj+Yhj wubOsYDkBq/WQf3wYYp8rIcL6pfI+A06c+/hI28EHqdz X-Google-Smtp-Source: AHgI3IaKPUCTjzupAbaj53NXuLaQp8rxRK/y6v8mvNKyzozl3vytJ77cKZF9wbVQIBO3dy6xXPoDUyA5jVVym/LXuG0= X-Received: by 2002:a0c:ef88:: with SMTP id w8mr23166529qvr.25.1549801761044; Sun, 10 Feb 2019 04:29:21 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a0c:d651:0:0:0:0:0 with HTTP; Sun, 10 Feb 2019 04:29:20 -0800 (PST) Date: Sun, 10 Feb 2019 13:29:20 +0100 Message-ID: To: internals@lists.php.net Content-Type: text/plain; charset="UTF-8" Subject: [PHP-DEV] reasonability of change the mbfl library From: legale.legale@gmail.com (Legale Legage) Hello, internals! While I was working on a new function mb_str_split (https://wiki.php.net/rfc/mb_str_split) for the extension mbstring, I noticed a place to seriously improve the mbfl library performance for the utf-16 encoding. Currently, all variable-length encodings are processed byte-by-byte. for(int i = 0; i < string_length; ++i){ ....... } utf-8 strings are processed with precounted char length table. while (i < string_length) { int m = mbtab[*p]; i += m; ..... } This conception can be used for the utf-16 encoding, but table size would be 65536 bytes against 256 byte for the utf-8 table. Moreover the tables would be 2, one for the utf-16 big endian and 1 for the utf-16 little endian. The results of my tests show a more than 2 times speed increase. The implementation of the proposed concept is here: https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38 To do, or not to do: that is the question. What do you think? Regards, Ruslan