Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104333 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 5849 invoked from network); 10 Feb 2019 23:20:05 -0000 Received: from unknown (HELO mail-wm1-f52.google.com) (209.85.128.52) by pb1.pair.com with SMTP; 10 Feb 2019 23:20:05 -0000 Received: by mail-wm1-f52.google.com with SMTP id r17so13375888wmh.5 for ; Sun, 10 Feb 2019 12:02:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=t95ZaSm4CtazgVYnufdmmNEwt/tmx227CWKPS8rKaWI=; b=IujTtqwRx5+tjN21VWKi/Zr8fbdMWAwE0+CbU9GfqYrNrog2goMQDgSZwK4aMpeSqr t1i4G5Qt403TwfV24ej1uI5R0s6WUxAX96ULQv75uzMMWipFKHpfPSU2Locbn0G9TjzU F/G6ST8ezz6sfp43JzGvEfvOL4HfIR1gV8xAnKDC11s6z/3fwjMAwQt428XgykkdA1Az GzWTYy3pHWNsftqWVqHuSdWTBxSzr4227I0llxNYUtGzYgsZpuo6PaFS6ylzTGP5xDJD CPdsFSeavVlR/DRA3QxR4Mh0Yby5PVhEoNledL/C2KfeF2sK9Il/fywuPOVz/+EiKBh2 a/TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=t95ZaSm4CtazgVYnufdmmNEwt/tmx227CWKPS8rKaWI=; b=ouQtwXxflz7QYSGqR56m0IMj3jEu2SK24xcnmPSmSgwzMfYFmVajF0GmDxc8v284a6 cxpD5+5C4DsQa6VnXnP9TtayqOZuXoV5WCCZnUYrGwMvZwt7acYx2jNe6xdPNOpvWDvt 1a4aS9rYIftZDYknQTSvMA2QxVudH83SsnpKyKU0wvmYKzXqF/y8xZhZTbNyIcb5pfBH 0VWLDAmVBZSesvAhOddJGLp7hpe7yx/YbzbOgOHPTX4A682EDrApEbJkCPg7suWzYhJB ESiKzgaSktTj1yucjZg/2ONVdngo1tjZNuJA8oKlQygjYLLGOottvl0GMEfMCQDcSEPR cfrg== X-Gm-Message-State: AHQUAuYMHxJvxylPIhbhJvjgoJp5A3rAeOK4tY00HnFW8IfST4MRhmXL b8NtM7TvesNvK90qsvKnN4Fp/iZu X-Google-Smtp-Source: AHgI3IbF/yuLv/gJ1D8ST9KlsRie4n7NKdpc0SzrTd6iEPghEDX7gGkNVWA1OJ4EbmVS8QKSLv8LzA== X-Received: by 2002:a5d:434f:: with SMTP id u15mr2407567wrr.174.1549828951958; Sun, 10 Feb 2019 12:02:31 -0800 (PST) Received: from [192.168.0.16] (cpc84253-brig22-2-0-cust114.3-3.cable.virginm.net. [81.108.141.115]) by smtp.googlemail.com with ESMTPSA id b13sm15902372wmj.42.2019.02.10.12.02.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 10 Feb 2019 12:02:31 -0800 (PST) To: internals@lists.php.net References: Message-ID: Date: Sun, 10 Feb 2019 20:02:28 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB Subject: Re: [PHP-DEV] reasonability of change the mbfl library From: rowan.collins@gmail.com (Rowan Collins) On 10/02/2019 12:29, Legale Legage wrote: > This conception can be used for the utf-16 encoding, but table size > would be 65536 bytes against 256 byte for the utf-8 table. Rather than two 65 kilobyte lookup tables with most entries identical, would it be reasonable to use a bit mask to check for the range we care about? I may have this slightly wrong, but something like: #define UTF16_LE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0xFC00 == 0xD800) #define UTF16_BE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0x00FC == 0x00D8) m = UTF16_LE_CODE_UNIT_IS_HIGH_SURROGATE(*(uint16_t *)p) ? 4 : 2; Regards, -- Rowan Collins [IMSoP]