Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:19627 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28607 invoked by uid 1010); 16 Oct 2005 13:12:26 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 28592 invoked from network); 16 Oct 2005 13:12:26 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Oct 2005 13:12:26 -0000 X-Host-Fingerprint: 68.142.207.70 web31807.mail.mud.yahoo.com Received: from ([68.142.207.70:48976] helo=web31807.mail.mud.yahoo.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 77/8A-22302-AB152534 for ; Sun, 16 Oct 2005 09:12:26 -0400 Received: (qmail 28965 invoked by uid 60001); 16 Oct 2005 13:12:22 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=qdpLQ/nBwDtP+y+eOhzdwcVfGzvou1RTRyynuMZZWyYEVkCclwu3nj78uBqka/STXjZvgx6janJM6DC9NmU4DVdgwG9n+RQJhDQ4oahM6y5/HtoV289u51E6MQaPSOhOHbqs76UEQVYsypYKGfMbp/1rNt08Il7HqdxPzweLn7E= ; Message-ID: <20051016131222.28963.qmail@web31807.mail.mud.yahoo.com> Received: from [202.46.19.93] by web31807.mail.mud.yahoo.com via HTTP; Sun, 16 Oct 2005 06:12:22 PDT Date: Sun, 16 Oct 2005 06:12:22 -0700 (PDT) To: Andrei Zmievski , Tex Texin Cc: internals MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Subject: Next batch of strings funcns ... From: rollandsantimano@yahoo.com (Rolland Santimano) ... to be converted to handle Unicode - comments ? TIA, Rolland -- [1] int strrpos(string haystack, string needle [, int offset]) Finds position of last occurrence of 'needle' within 'haystack' - 'offset' determines how much of the string is searched. To be impl'ed in terms of u_strFindLast(), which matches codepoints. [2] int stripos(string haystack, string needle [, int offset]) Case-insensitive version of strpos(), finds position of first occurrence of 'needle' within 'haystack'. To be impl'ed in terms of zend_u_memnstr(). [3] int strspn(string str, string mask [, start [, len]]) Returns length of initial segment of 'str' consisting entirely of characters found in 'mask' [4] int strcspn(string str, string mask [, start [, len]]) Returns length of initial segment of 'str' consisting entirely of characters NOT found in 'mask' Currently impl'ed by stepping through character arrays, Unicode handling will process codepoints. u_strspn() & u_strcspn() are available, but these require input to be NULL-terminated. [5] string stristr(string haystack, string needle[, bool part]) Case-insensitive version of strstr(), finds first occurrence of a string within another - 'part' determines portion of 'haystack' to be returned. To be impl'ed in terms of zend_u_memnstr(). [6] int strripos(string haystack, string needle [, int offset]) Case-insensitive version of strrpos()[1] [7] mixed str_replace(mixed search, mixed replace, mixed subject [, int &replace_count]) Replaces 'search' with 'replace' in 'subject' - these maybe arrays [8] mixed str_ireplace(mixed search, mixed replace, mixed subject [, int &replace_count]) Case-insensitive version of str_replace()[7] Currently impl'ed in terms of zend_memnstr() to locate 'search' within 'subject', Unicode handling will use zend_u_memnstr() to match at codepoints boundaries. [9] string strrchr(string haystack, string needle) Finds the last occurrence of 'needle' character within 'haystack'. Currently impl'ed in terms of libc strrchr(), Unicode handling can be impl'ed in terms of zend_u_memnstr() or simply the U16_PREV() macro. u_strrchr32() is available, but assumes input is NULL-terminated. [10] string strpbrk(string haystack, string char_list) Search 'haystack' for any character in 'char_list'. Currently impl'ed in terms of libc strpbrk(), Unicode handling to be impl'ed in terms of U16_FWD() macro. u_strpbrk() is available, but assumes input is NULL-terminated. [11] string chunk_split(string str [, int chunklen [, string ending]]) "Wraps" 'str' by inserting 'ending' every 'chunklen' number of characters. For Unicode, 'chunklen' will be treated as no of codepoints. Q: Handling of base+combining sequences ? [12] array str_split(string str [, int split_length]) Convert 'str' to an array by splitting it into 'split_length'-sized chunks. For Unicode, splits will be made at codepoint boundaries ? Q: Handling of base+combining sequences ? [13] string strtr(string str, string from, string to) Translates/replaces characters in 'str'; characters in 'from' are replaced by 'to' [14] string quotemeta(string str) Escapes meta characters ".\+*?[]^$()" with backslash [15] string nl2br(string str) Converts newlines(\n\r) to HTML line breaks(
) [16] array pathinfo(string path) Breaks 'path' into dir, file & extension like [17] & [18] below [17] string basename(string path [, string suffix]) Returns the filename component of 'path', after removing 'suffix' [18] string dirname(string path) Returns the directory name component of 'path' [19] int strnatcmp(string s1, string s2) Returns the result of string comparison using "natural" algorithm to collate numeric portions. [20] int strnatcasecmp(string s1, string s2) Case-insensitive version of strnatcmp()[19] For handling Unicode, adapt existing code to handle codepoints. [21] int substr_compare(string main_str, string str, int offset [, int length [, bool case_sensitivity]]) Compares 'main_str' with 'str' from 'offset' up to 'length' characters For handling Unicode, adapt existing code to handle codepoints. [22] string wordwrap(string str [, int width [, string break [, boolean cut]]]) Like chunk_split()[11], "wraps" 'str' by inserting 'break' every 'width' characters Q: Handling of base+combining sequences ? [23] void str_shuffle(string str) Shuffles 'str', one random permutation of all possible is created. For Unicode, handle codepoints & base+combining sequences. [24] mixed str_word_count(string str, [int format [, string charlist]]) Count the number of words in 'str'. Words consists of alphabetic chars, chars in 'charlist', single-quote & hyphen; everything else is a delimiter. 'format' specifies how output is returned. Existing algo can be adapted for Unicode. ICU provides a break iteration API, but current docs donot specify facilities to add custom rules - to be investigated further.