Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:61366 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 71588 invoked from network); 17 Jul 2012 16:29:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Jul 2012 16:29:01 -0000 Authentication-Results: pb1.pair.com smtp.mail=keisial@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=keisial@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.83.42 as permitted sender) X-PHP-List-Original-Sender: keisial@gmail.com X-Host-Fingerprint: 74.125.83.42 mail-ee0-f42.google.com Received: from [74.125.83.42] ([74.125.83.42:55058] helo=mail-ee0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id EF/33-54353-DC295005 for ; Tue, 17 Jul 2012 12:29:01 -0400 Received: by eekd17 with SMTP id d17so243993eek.29 for ; Tue, 17 Jul 2012 09:28:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=SDs0SFqwJtwIn5FDic1wHmEI0FgXBa/iSgZPcH7DgQY=; b=OvovR5yfjoIQoUnXtYlxAZDmfBkvaUbFG8attmVLxkjPf24Roxh1uCp+4MD7RGyuQf LcHzQKHr2Uju6GPW8/qOkTM52z2/3GBWk5QHbC6eRw+3sTn6oK07/gbNYkm/Rr6DX/XQ LJtDeRHjS+dL2NqJznfny4+UBZF1hbxlbU3yNIO4rVzL3EUzqh49TPqLZEnWjPmLwN8S ydR+I6xh+4mAwWsQJVgwp6Fav+GGu4T6sabKv9K1Zgj5fTNWw5fch62397+LFv8TZM8b Kj1px3r5MU135pXmCp3hZ2FOQIbyTLmgoYdGXNmp2Qr4tT0BxdG3kJhKU3MA1vCzX+I5 RWkg== Received: by 10.14.213.133 with SMTP id a5mr3958257eep.4.1342542538647; Tue, 17 Jul 2012 09:28:58 -0700 (PDT) Received: from [192.168.1.26] (186.Red-81-38-140.dynamicIP.rima-tde.net. [81.38.140.186]) by mx.google.com with ESMTPS id j4sm28959945eeo.11.2012.07.17.09.28.56 (version=SSLv3 cipher=OTHER); Tue, 17 Jul 2012 09:28:58 -0700 (PDT) Message-ID: <500592AE.4060305@gmail.com> Date: Tue, 17 Jul 2012 18:28:30 +0200 User-Agent: Thunderbird MIME-Version: 1.0 To: Alex Aulbach CC: Anthony Ferrara , Andrew Faulds , Nikita Popov , PHP internals References: <5004775D.601@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Random string generation =?ISO-8859-1?Q?=28=E1_l?= =?ISO-8859-1?Q?a_password=5Fmake=5Fsalt=29?= From: keisial@gmail.com (=?ISO-8859-1?Q?=C1ngel_Gonz=E1lez?=) On 17/07/12 13:34, Alex Aulbach wrote: >> That's more or less what I have thought. >> If it's a string surrounded by square brackets, it's a character class, >> else >> treat as a literal list of characters. >> ] and - can be provided with the old trick of provide "] as first >> character", >> "make - the first or last one". > Right thought. But introducing a new scheme of character-class > identificators or a new kind of describing character-classes is > confusing. As PHP developer I think "Oh no, not again new magic > charsets". Not really new. Those escapings is how you had to work with them in character classes of traditional regular expressions. But I agree it can be confusing. What about a flag parameter, then? > I suggest again to use PCRE for that. The difference to your proposal > is not so big. Examples: > > "/[[:alnum:]]/" will return "abc...XYZ0123456789". We can do this also > with "/[a-zA-Z0-9]/". Or "/[a-z0-9]/i". Or "/[[:alpha:][:digit:]]/" > > You see: You can do things in much more different ways with PCRE. And > you continue to use this "standard". > > [And PCRE supports UTF8. Currently not important. But who knows?] > > And maybe we can think about removing the beginning "/[" and the > ending "]/", but a "/" at the end should be optionally possible to add > some regex-parameters (like "/i"). Those could be in the flag. The / are not really needed, they are an additional syntax over regex provided by PHP (and the character can be a different one, although usually / is picked). >> Having to detect character limits makes it uglier. > Exactly. That's why I think we need not so much magic to the second > parameter. The character-list is just a list of characters. No magic. > We can extent this with a third parameter to tell the function from > which charset it is. And maybe a fourth to tell the random-algorithm, > but I think it's eventually better to have a function for each > algorithm, because that's the way how random currently works. > > If I should write it with php this looks like that: > > pseudofunction str_random($len, $characters, $encoding = 'ASCII', $algo) > { > $result = ''; > $chlen = mb_strlen($characters,$encoding); > for ($i = 0; $i < $len; $i++) { > $result .= mb_substr($characters, myrandom(0, $chlen, $algo),1); > } > return $result; > } > > Without testing anything. It's just an idea. > > This is a working php-function, but $encoding doesn't work (some > stupid error?) and not using $algo: > > function str_random($len, $characters, $encoding = 'ASCII', $algo = null) > { > $result = ''; > $chlen = mb_strlen($characters,$encoding); > for ($i = 0; $i < $len; $i++) { > $result .= mb_substr($characters, rand(0, $chlen),1); > } > return $result; > } > > >> About supporting POSIX classes, that could be cool. But you then need a way >> to enumerate them. Note that isalpha() will be provided by the C >> library, so you >> can't count on having its data. It's possible that PCRE, which we bundle, >> contains the needed unicode tables. > It works without thinking as above written in PHP code, but I dunno if > this could be done in C equally. The above code doesn't support POSIX character classes, just picking characters out of a string (which I agree is simple). >>> 3. Because generating a string from character-classes is very handy in >>> general for some other things (many string functions have it), I >>> suggest that it is not part of random_string(). Make a new function >>> str_from_character_class(), or if you use pcre like above >>> pcre_str_from_character_class()? >> How would you use such function? If you want to make a string out of them, > Oh, there are many cases to use it. > > For example (I renamed the function to "str_charset()", because it is > just a string of a charset): > > // Search spacer strings > strpbrk ("Hello World", str_charset('/[\s]/')); So you're expanding all spacing characters, then iterating over them with strpbrk(), a preg_match() would have been more efficient. > // remove invisible chars at begin or end (not very much sense, > because a regex in this case is maybe faster) > trim("\rblaa\n", str_charset('/[^[:print:]]/')); > > // remove invisible chars: when doing this with very big strings it > could be much faster than with regex. > str_replace(str_split(str_charset('/[^[:print:]]/')), "\rblaa\n"); I don't see why expanding to a string, then converting to an array to finally str_replace would be faster :S Also, that str_split() for all non-printable characters (even considering that you wouldn't get out of the memory limit with the many unicode chars you will meet) will fail with codepoints > 127 (str_split works on bytes) > There are many other more or less useful things you can do with a > charset-string. :) I'm not really convinced it's the right way to do them :)