Hey:
there comes a FR: https://bugs.php.net/bug.php?id=64730
the main idea is, in 5.5 we remove support of 'e' modifier.
then comes a problem, the old codes(a real use case see
https://github.com/php/php-src/blob/PHP-5.4/Zend/zend_vm_gen.php#L390):
preg_replace(array(
"/pattern1(.*)/",
"/pattern2(.*)/"
),
array(
"/replace1/e",
"/replace2/e"
)
..),
can not be easier convert to the "callback" style.
then I have to change it to something very ugly like(a real use case
see: https://github.com/php/php-src/blob/PHP-5.5/Zend/zend_vm_gen.php#L390):
function callback($subject) {
if (!strncmp($subject, "pattern1", 8)) {
//function for pattern 1
} else if(!strncmp($subject, "pattern2", 8)) {
//function for pattern 2
} else .....
}
so I propose to add a second argument to callback(aim to php-5.5.1),
which is the matched regex string self.
then I can simplify the previous codes to:
function callback ($subject, $regex) {
$replace_funcs = array(
"/pattern1(.)" => function ($subect) { //function for
parttern 1; },
"/pattern2(.)" => function($sbuect) { //function for pattern
2; }
);
$replace_funcs[$regex]($subject);
}
what do you think(of course, the second argument can also easily change
to be the regex idx in the regex array)?
patch is here:
https://bugs.php.net/patch-display.php?bug_id=64730&patch=sencode_argument.patch&revision=latest
--
Laruence Xinchen Hui
http://www.laruence.com/
Le 29/04/2013 18:46, Laruence a écrit :
so I propose to add a second argument to callback(aim to php-5.5.1),
Seems a useful fix to this problem (also encounter elsewhere)
what do you think(of course, the second argument can also easily change
to be the regex idx in the regex array)?
I will prefer the idx (regex can be really long and complex string, and
a minor change to a regex need to be also applied in the callback)
Esp if you can use a more simpler / meaningful value
Ex:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);
My 0,02€
Remi.
Hey:
there comes a FR: https://bugs.php.net/bug.php?id=64730the main idea is, in 5.5 we remove support of 'e' modifier.
then comes a problem, the old codes(a real use case see
https://github.com/php/php-src/blob/PHP-5.4/Zend/zend_vm_gen.php#L390):preg_replace(array(
"/pattern1(.*)/", "/pattern2(.*)/"
),
array(
"/replace1/e",
"/replace2/e"
)
..),can not be easier convert to the "callback" style.
then I have to change it to something very ugly like(a real use case
see: https://github.com/php/php-src/blob/PHP-5.5/Zend/zend_vm_gen.php#L390
):function callback($subject) {
if (!strncmp($subject, "pattern1", 8)) {
//function for pattern 1
} else if(!strncmp($subject, "pattern2", 8)) {
//function for pattern 2
} else .....}
so I propose to add a second argument to callback(aim to php-5.5.1),
which is the matched regex string self.then I can simplify the previous codes to:
function callback ($subject, $regex) {
$replace_funcs = array(
"/pattern1(.)" => function ($subect) { //function for
parttern 1; },
"/pattern2(.)" => function($sbuect) { //function for pattern
2; }
);$replace_funcs[$regex]($subject);
}
what do you think(of course, the second argument can also easily change
to be the regex idx in the regex array)?patch is here:
https://bugs.php.net/patch-display.php?bug_id=64730&patch=sencode_argument.patch&revision=latest
What's wrong with this?
$replacements = ['regex' => 'callback'];
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
Nikita
Hey:
there comes a FR: https://bugs.php.net/bug.php?id=64730the main idea is, in 5.5 we remove support of 'e' modifier.
then comes a problem, the old codes(a real use case see
https://github.com/php/php-src/blob/PHP-5.4/Zend/zend_vm_gen.php#L390):preg_replace(array(
"/pattern1(.*)/", "/pattern2(.*)/"
),
array(
"/replace1/e",
"/replace2/e"
)
..),can not be easier convert to the "callback" style.
then I have to change it to something very ugly like(a real use case
see:
https://github.com/php/php-src/blob/PHP-5.5/Zend/zend_vm_gen.php#L390):function callback($subject) {
if (!strncmp($subject, "pattern1", 8)) {
//function for pattern 1
} else if(!strncmp($subject, "pattern2", 8)) {
//function for pattern 2
} else .....}
so I propose to add a second argument to callback(aim to php-5.5.1),
which is the matched regex string self.then I can simplify the previous codes to:
function callback ($subject, $regex) {
$replace_funcs = array(
"/pattern1(.)" => function ($subect) { //function for
parttern 1; },
"/pattern2(.)" => function($sbuect) { //function for
pattern
2; }
);$replace_funcs[$regex]($subject);
}
what do you think(of course, the second argument can also easily change
to be the regex idx in the regex array)?patch is here:
https://bugs.php.net/patch-display.php?bug_id=64730&patch=sencode_argument.patch&revision=latest
What's wrong with this?
$replacements = ['regex' => 'callback'];
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
hey:
efficiency & simplicity
do you use foreach even there is array_map?
thanks
}
Nikita
--
Laruence Xinchen Hui
http://www.laruence.com/
what about patch preg_replace to accept callbacks?
example:
preg_replace(array(
"/pattern1(.)/",
"/pattern2(.)/"
),
array(
function ($pat) { ... } ,
"replace"
)
, ... );
what about patch preg_replace to accept callbacks?
example:
preg_replace(array(
"/pattern1(.)/",
"/pattern2(.)/"
),
array(
function ($pat) { ... } ,
"replace"
)
, ... );
Hey:
then I think perg_replace_callback become useless
thanks
--
--
Laruence Xinchen Hui
http://www.laruence.com/
ALeX,
what about patch preg_replace to accept callbacks?
example:
preg_replace(array(
"/pattern1(.)/",
"/pattern2(.)/"
),
array(
function ($pat) { ... } ,
"replace"
)
, ... );
There are some weird questions that come up with that. For example, if an
array item is an object that implements both __invoke() and __toString(),
which should be fired? Is it a string? Or a callback? What if I pass a
literal string "strlen", is that a callback, or a replacement?
The point is not that it's a bad idea and that we can't make a distinction,
but more that there are pretty severe edge-cases that we'd have to work
around, and if we don't get it right security could suffer significantly
(especially when user-input is allowed to be a replacement)...
Anthony
There are some weird questions that come up with that. For example, if an
array item is an object that implements both __invoke() and __toString(),
which should be fired? Is it a string? Or a callback? What if I pass a
literal string "strlen", is that a callback, or a replacement?
You're right that was not fully thought trough - I shouldn't post in
the middle of the night ;-)
What about a new modifier (example "c") which then uses the
replacement as a callback.
Hey:
Sorry for the delay, the new patch, which make the second argument the
regex array keys is attached.
https://bugs.php.net/patch-display.php?bug_id=64730&patch=second_arg_rege_key.patch&revision=latest
with this patch, preg_replace_callback will call user callback with two
arguments, the first one is the same, the second is the regex key if the
regex is an array or NULL
if the regex is a string.
then we can do something like:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);
if no objections, I will commit this patch after 5.5. 0 final release..
thanks
Hey:
there comes a FR: https://bugs.php.net/bug.php?id=64730the main idea is, in 5.5 we remove support of 'e' modifier.
then comes a problem, the old codes(a real use case see
https://github.com/php/php-src/blob/PHP-5.4/Zend/zend_vm_gen.php#L390):preg_replace(array(
"/pattern1(.*)/", "/pattern2(.*)/"
),
array(
"/replace1/e",
"/replace2/e"
)
..),can not be easier convert to the "callback" style.
then I have to change it to something very ugly like(a real use case
see: https://github.com/php/php-src/blob/PHP-5.5/Zend/zend_vm_gen.php#L390
):function callback($subject) {
if (!strncmp($subject, "pattern1", 8)) {
//function for pattern 1
} else if(!strncmp($subject, "pattern2", 8)) {
//function for pattern 2
} else .....}
so I propose to add a second argument to callback(aim to php-5.5.1),
which is the matched regex string self.then I can simplify the previous codes to:
function callback ($subject, $regex) {
$replace_funcs = array(
"/pattern1(.)" => function ($subect) { //function for
parttern 1; },
"/pattern2(.)" => function($sbuect) { //function for
pattern 2; }
);$replace_funcs[$regex]($subject);
}
what do you think(of course, the second argument can also easily change
to be the regex idx in the regex array)?patch is here:
https://bugs.php.net/patch-display.php?bug_id=64730&patch=sencode_argument.patch&revision=latest--
Laruence Xinchen Hui
http://www.laruence.com/
--
Laruence Xinchen Hui
http://www.laruence.com/
Hey:
Sorry for the delay, the new patch, which make the second argument the
regex array keys is attached.https://bugs.php.net/patch-display.php?bug_id=64730&patch=second_arg_rege_key.patch&revision=latest
with this patch, preg_replace_callback will call user callback with two
arguments, the first one is the same, the second is the regex key if the
regex is an array orNULL
if the regex is a string.then we can do something like:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);if no objections, I will commit this patch after 5.5. 0 final release..
thanks
I object. If this were using the preg_replace_callback(Array, Array,
String) syntax [which is not practically possible due to ambiguity], I
would agree that this is a nice feature which makes the function consistent
with preg_replace and str_replace. But this implementation with some weird
additional identifier that you need to switch over makes absolutely no
sense to me and only complicated the API. Better to just use a loop for
this.
One thing that might be nicer API wise but without ambiguity is an
preg_replace_callback(['regex' => callback], $string) overload. Though
personally I'm not really a fan of doing overloads with completely
different argument types.
Nikita
Hey:
Sorry for the delay, the new patch, which make the second argument the
regex array keys is attached.https://bugs.php.net/patch-display.php?bug_id=64730&patch=second_arg_rege_key.patch&revision=latest
with this patch, preg_replace_callback will call user callback with
two
arguments, the first one is the same, the second is the regex key if the
regex is an array orNULL
if the regex is a string.then we can do something like:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);if no objections, I will commit this patch after 5.5. 0 final release..
thanks
I object. If this were using the preg_replace_callback(Array, Array,
String) syntax [which is not practically possible due to ambiguity], I
would agree that this is a nice feature which makes the function consistent
with preg_replace and str_replace. But this implementation with some weird
additional identifier that you need to switch over makes absolutely no
sense to me and only complicated the API. Better to just use a loop for
this.
if you got an better solution, please propose it.
only object won't help much here, since the problem is there, we need to
solve it.
thanks
One thing that might be nicer API wise but without ambiguity is an
preg_replace_callback(['regex' => callback], $string) overload. Though
personally I'm not really a fan of doing overloads with completely
different argument types.Nikita
--
Laruence Xinchen Hui
http://www.laruence.com/
Hey:
Sorry for the delay, the new patch, which make the second argument
the
regex array keys is attached.https://bugs.php.net/patch-display.php?bug_id=64730&patch=second_arg_rege_key.patch&revision=latest
with this patch, preg_replace_callback will call user callback with
two
arguments, the first one is the same, the second is the regex key if the
regex is an array orNULL
if the regex is a string.then we can do something like:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);if no objections, I will commit this patch after 5.5. 0 final
release..thanks
I object. If this were using the preg_replace_callback(Array, Array,
String) syntax [which is not practically possible due to ambiguity], I
would agree that this is a nice feature which makes the function consistent
with preg_replace and str_replace. But this implementation with some weird
additional identifier that you need to switch over makes absolutely no
sense to me and only complicated the API. Better to just use a loop for
this.if you got an better solution, please propose it.
only object won't help much here, since the problem is there, we need to
solve it.thanks
Sorry, but what problem is there again? As I already said earlier, you can
just use a loop:
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
Which is both clearer and requires less code.
Nikita
On Sat, May 4, 2013 at 8:49 PM, Nikita Popov nikita.ppv@gmail.comwrote:
Hey:
Sorry for the delay, the new patch, which make the second argument
the
regex array keys is attached.https://bugs.php.net/patch-display.php?bug_id=64730&patch=second_arg_rege_key.patch&revision=latest
with this patch, preg_replace_callback will call user callback with
two
arguments, the first one is the same, the second is the regex key if
the
regex is an array orNULL
if the regex is a string.then we can do something like:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);if no objections, I will commit this patch after 5.5. 0 final
release..thanks
I object. If this were using the preg_replace_callback(Array, Array,
String) syntax [which is not practically possible due to ambiguity], I
would agree that this is a nice feature which makes the function consistent
with preg_replace and str_replace. But this implementation with some weird
additional identifier that you need to switch over makes absolutely no
sense to me and only complicated the API. Better to just use a loop for
this.if you got an better solution, please propose it.
only object won't help much here, since the problem is there, we need to
solve it.thanks
Sorry, but what problem is there again? As I already said earlier, you can
just use a loop:foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
So if there are 10 regex in the array, you will got 10 DO_FCALL. 10 times
arguments accept, 10 times etc...
and AS I already said: IT is inefficient. I personally can not accept it.
thanks
Which is both clearer and requires less code.
Nikita
--
Laruence Xinchen Hui
http://www.laruence.com/
Laruence,
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
So if there are 10 regex in the array, you will got 10 DO_FCALL. 10 times
arguments accept, 10 times etc...and AS I already said: IT is inefficient. I personally can not accept it.
thanks
My initial reaction was that the difference in runtime is going to be so
insignificant that it's the very definition of a micro-optimization between
the two. And unless you're doing it millions of times in a hot-loop, it's
never even going to be realized. I was going to reply to that effect.
Then I decided to write a quick test.
$count = 100;
$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace_callback('/a/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/a/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtolower($a[0]); }, $str);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";
$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace(array(
'/a/e',
'/b/e',
'/c/e',
'/a/e',
'/b/e',
'/c/e',
),
array(
'strtoupper('$1')',
'strtoupper('$1')',
'strtoupper('$1')',
'strtolower('$1')',
'strtolower('$1')',
'strtolower('$1')',
),
$str
);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";
So basically, it's comparing the old behavior (one call to preg_replace)
that you want to emulate, with what Nikita is suggesting.
I ran it on 3v4l (http://3v4l.org/dRjtU) and the results were quite
surprising. I was expecting the first to be slightly slower than the
second. But the reality surprised me:
Completed in 0.00076389312744141 Seconds Completed in 0.0012428760528564
Seconds
As it turns out, calling preg_replace_callback()
multiple times is almost
50% faster than using a single call to preg_replace()
.
It's likely due to the precompiled nature of closures, vs the compilation
happening multiple times at invocation in the preg_replace /e case.
But it's still worth noting that switching from the /e style to a more
traditional preg_replace_callback implementation will get you a significant
boost in performance of that.
Now, keep in mind, we're talking 0.000005 seconds saved per "execution"
(group of 6 replacements). So it's not likely to matter much or be worth
worrying about...
My $0.02 at least...
Anthony
Laruence,
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
So if there are 10 regex in the array, you will got 10 DO_FCALL. 10
times
arguments accept, 10 times etc...and AS I already said: IT is inefficient. I personally can not accept
it.thanks
My initial reaction was that the difference in runtime is going to be so
insignificant that it's the very definition of a micro-optimization between
the two. And unless you're doing it millions of times in a hot-loop, it's
never even going to be realized. I was going to reply to that effect.Then I decided to write a quick test.
$count = 100;
$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace_callback('/a/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/a/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtolower($a[0]); }, $str);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace(array(
'/a/e',
'/b/e',
'/c/e',
'/a/e',
'/b/e',
'/c/e',
),
array(
'strtoupper('$1')',
'strtoupper('$1')',
'strtoupper('$1')',
'strtolower('$1')',
'strtolower('$1')',
'strtolower('$1')',
),
$str
);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";So basically, it's comparing the old behavior (one call to preg_replace)
that you want to emulate, with what Nikita is suggesting.I ran it on 3v4l (http://3v4l.org/dRjtU) and the results were quite
surprising. I was expecting the first to be slightly slower than the
second. But the reality surprised me:Completed in 0.00076389312744141 Seconds Completed in 0.0012428760528564
SecondsAs it turns out, calling
preg_replace_callback()
multiple times is almost
50% faster than using a single call topreg_replace()
.It's likely due to the precompiled nature of closures, vs the compilation
happening multiple times at invocation in the preg_replace /e case.But it's still worth noting that switching from the /e style to a more
traditional preg_replace_callback implementation will get you a significant
boost in performance of that.Now, keep in mind, we're talking 0.000005 seconds saved per "execution"
(group of 6 replacements). So it's not likely to matter much or be worth
worrying about...
Hey:
thanks for the benchmark, but please don't think it's useless just
because it's monior
PHP is a web program language, let's think about a high trafiic site,
which get 100, 000, 000 pv perday.
100, 000, 000 * 0.0000005 = 500s perday
and inefficent not always means performance only, in this case , you
need to define various functions for different regexs if you use loop style.
and do you prefer array_walk or foreach when you try to iteraterly
process an array?
thanks
My $0.02 at least...
Anthony
--
Laruence Xinchen Hui
http://www.laruence.com/
I made a RFC for this: https://wiki.php.net/rfc/second_arg_to_preg_callback
thanks
On Sat, May 4, 2013 at 9:46 PM, Anthony Ferrara ircmaxell@gmail.comwrote:
Laruence,
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
So if there are 10 regex in the array, you will got 10 DO_FCALL. 10
times
arguments accept, 10 times etc...and AS I already said: IT is inefficient. I personally can not accept
it.thanks
My initial reaction was that the difference in runtime is going to be so
insignificant that it's the very definition of a micro-optimization
between
the two. And unless you're doing it millions of times in a hot-loop, it's
never even going to be realized. I was going to reply to that effect.Then I decided to write a quick test.
$count = 100;
$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace_callback('/a/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/a/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtolower($a[0]); }, $str);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace(array(
'/a/e',
'/b/e',
'/c/e',
'/a/e',
'/b/e',
'/c/e',
),
array(
'strtoupper('$1')',
'strtoupper('$1')',
'strtoupper('$1')',
'strtolower('$1')',
'strtolower('$1')',
'strtolower('$1')',
),
$str
);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";So basically, it's comparing the old behavior (one call to preg_replace)
that you want to emulate, with what Nikita is suggesting.I ran it on 3v4l (http://3v4l.org/dRjtU) and the results were quite
surprising. I was expecting the first to be slightly slower than the
second. But the reality surprised me:Completed in 0.00076389312744141 Seconds Completed in 0.0012428760528564
SecondsAs it turns out, calling
preg_replace_callback()
multiple times is almost
50% faster than using a single call topreg_replace()
.It's likely due to the precompiled nature of closures, vs the compilation
happening multiple times at invocation in the preg_replace /e case.But it's still worth noting that switching from the /e style to a more
traditional preg_replace_callback implementation will get you a
significant
boost in performance of that.Now, keep in mind, we're talking 0.000005 seconds saved per "execution"
(group of 6 replacements). So it's not likely to matter much or be worth
worrying about...Hey:
thanks for the benchmark, but please don't think it's useless just
because it's moniorPHP is a web program language, let's think about a high trafiic site,
which get 100, 000, 000 pv perday.100, 000, 000 * 0.0000005 = 500s perday
and inefficent not always means performance only, in this case , you
need to define various functions for different regexs if you use loop style.and do you prefer array_walk or foreach when you try to iteraterly
process an array?thanks
My $0.02 at least...
Anthony
--
Laruence Xinchen Hui
http://www.laruence.com/
--
Laruence Xinchen Hui
http://www.laruence.com/
I made a RFC for this:
https://wiki.php.net/rfc/second_arg_to_preg_callback
Note that giving meaning to an extra argument in existing callbacks
constitutes a small BC break in at least two cases:
function myhandyfunction($v, $arg = 42) { .. }
preg_replace_callback([..., ...], "myhandyfunction", ..)
here $arg will no longer be 42 after this change.
Most internal functions do not accept extra arguments, and thus will
suddenly raise a notice if used as callbacks.
It might be worth mentionning this in the RFC.
thanks
On Sat, May 4, 2013 at 9:46 PM, Anthony Ferrara <ircmaxell@gmail.com
wrote:Laruence,
foreach ($replacements as $regex => $callback) {
$str = preg_replace_callback($regex, $callback, $str);
}
So if there are 10 regex in the array, you will got 10 DO_FCALL. 10
times
arguments accept, 10 times etc...and AS I already said: IT is inefficient. I personally can not accept
it.thanks
My initial reaction was that the difference in runtime is going to be so
insignificant that it's the very definition of a micro-optimization
between
the two. And unless you're doing it millions of times in a hot-loop,
it's
never even going to be realized. I was going to reply to that effect.Then I decided to write a quick test.
$count = 100;
$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace_callback('/a/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtoupper($a[0]); }, $str);
$str = preg_replace_callback('/a/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/b/', function($a) { return
strtolower($a[0]); }, $str);
$str = preg_replace_callback('/c/', function($a) { return
strtolower($a[0]); }, $str);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";$start = microtime(true);
for ($i = 0; $i < $count; $i++) {
$str = "abc";
$str = preg_replace(array(
'/a/e',
'/b/e',
'/c/e',
'/a/e',
'/b/e',
'/c/e',
),
array(
'strtoupper('$1')',
'strtoupper('$1')',
'strtoupper('$1')',
'strtolower('$1')',
'strtolower('$1')',
'strtolower('$1')',
),
$str
);
}
echo "Completed in " . (microtime(true) - $start) . " Seconds\n";So basically, it's comparing the old behavior (one call to preg_replace)
that you want to emulate, with what Nikita is suggesting.I ran it on 3v4l (http://3v4l.org/dRjtU) and the results were quite
surprising. I was expecting the first to be slightly slower than the
second. But the reality surprised me:Completed in 0.00076389312744141 Seconds Completed in 0.0012428760528564
SecondsAs it turns out, calling
preg_replace_callback()
multiple times is
almost
50% faster than using a single call topreg_replace()
.It's likely due to the precompiled nature of closures, vs the
compilation
happening multiple times at invocation in the preg_replace /e case.But it's still worth noting that switching from the /e style to a more
traditional preg_replace_callback implementation will get you a
significant
boost in performance of that.Now, keep in mind, we're talking 0.000005 seconds saved per "execution"
(group of 6 replacements). So it's not likely to matter much or be worth
worrying about...Hey:
thanks for the benchmark, but please don't think it's useless just
because it's moniorPHP is a web program language, let's think about a high trafiic site,
which get 100, 000, 000 pv perday.100, 000, 000 * 0.0000005 = 500s perday
and inefficent not always means performance only, in this case , you
need to define various functions for different regexs if you use loop
style.and do you prefer array_walk or foreach when you try to iteraterly
process an array?thanks
My $0.02 at least...
Anthony
--
Laruence Xinchen Hui
http://www.laruence.com/--
Laruence Xinchen Hui
http://www.laruence.com/
--
Etienne Kneuss
http://www.colder.ch
and inefficent not always means performance only, in this case , you
need to define various functions for different regexs if you use loop style.and do you prefer array_walk or foreach when you try to iteraterly
process an array?
I certainly prefer foreach over array_walk/array_map because I think the
resulting code is easier to read. It also happens to be faster to use
foreach in every case I have ever tested.
-Rasmus
Hey:
Sorry for the delay, the new patch, which make the second argument the
regex array keys is attached.https://bugs.php.net/patch-display.php?bug_id=64730&patch=second_arg_rege_key.patch&revision=latest
with this patch, preg_replace_callback will call user callback with
two
arguments, the first one is the same, the second is the regex key if the
regex is an array orNULL
if the regex is a string.then we can do something like:
$code = preg_replace_callback(
array(
"foo" => "/some very complex regex/",
"bar" => "/another one/",
...
),
function($matches, $idx) {
switch ($idx) {
case 'foo'
...
case 'bar':
...
}
},
$code);if no objections, I will commit this patch after 5.5. 0 final release..
thanks
I object. If this were using the preg_replace_callback(Array, Array,
String) syntax [which is not practically possible due to ambiguity], I
would agree that this is a nice feature which makes the function consistent
with preg_replace and str_replace. But this implementation with some weird
additional identifier that you need to switch over makes absolutely no
sense to me and only complicated the API. Better to just use a loop for
this.
And, I DO NOT think it's weird, it's few choice we got now..
because:
As I said, a loop is inefficient here.
One thing that might be nicer API wise but without ambiguity is an
preg_replace_callback(['regex' => callback], $string) overload. Though
personally I'm not really a fan of doing overloads with completely
different argument types.
I don't this this is possible..
thinking of :
$regex = array(0 => "xxx");
if by accident there is a function call "xxx".
thanks
Nikita
--
Laruence Xinchen Hui
http://www.laruence.com/
Le 04/05/2013 14:39, Laruence a écrit :
with this patch, preg_replace_callback will call user callback with two
arguments, the first one is the same, the second is the regex key if the
regex is an array orNULL
if the regex is a string.
Here is another example from a real case (Horde_Date)
return preg_replace(
array('/%b/e',
'/%B/e',
'/%C/e',
'/%([-#]?)d/e',
'/%D/e',
'/%e/e',
'/%([-#]?)H/e',
'/%([-#]?)I/e',
'/%([-#]?)m/e',
'/%([-#]?)M/e',
'/%n/',
'/%p/e',
'/%R/e',
'/%([-#]?)S/e',
'/%t/',
'/%T/e',
'/%x/e',
'/%X/e',
'/%y/e',
'/%Y/',
'/%%/'),
array('$this->strftime(Horde_Nls::getLangInfo(constant('ABMON_' .
(int)$this->month)))',
'$this->strftime(Horde_Nls::getLangInfo(constant('MON' .
(int)$this->_month)))',
'(int)($this->_year / 100)',
'sprintf('%' . ('$1' ? '' : '02') . 'd', $this->_mday)',
'$this->strftime('%m/%d/%y')',
'sprintf('%2d', $this->_mday)',
'sprintf('%' . ('$1' ? '' : '02') . 'd', $this->_hour)',
'sprintf('%' . ('$1' ? '' : '02') . 'd', $this->_hour == 0 ?
12 : ($this->_hour > 12 ? $this->_hour - 12 : $this->_hour))',
'sprintf('%' . ('$1' ? '' : '02') . 'd', $this->_month)',
'sprintf('%' . ('$1' ? '' : '02') . 'd', $this->_min)',
"\n",
'$this->strftime(Horde_Nls::getLangInfo($this->_hour < 12 ? AM_STR
:
PM_STR))',
'$this->strftime('%H:%M')',
'sprintf('%' . ('$1' ? '' : '02') . 'd', $this->_sec)',
"\t",
'$this->strftime('%H:%M:%S')',
'$this->strftime(Horde_Nls::getLangInfo(D_FMT))',
'$this->strftime(Horde_Nls::getLangInfo(T_FMT))',
'substr(sprintf('%04d', $this->_year), -2)',
(int)$this->_year,
'%'),
$format);
}
Note, while I agree for this feature, must application / framework will
have to be fixed, but also kept compatible with 5.3, 5.4, 5.5.
Remi.
Hi!
Here is another example from a real case (Horde_Date)
Looks like the case of doing it wrong. In most cases it doesn't even
need regexp, in others, generic regexp with post-parsing would probably
be more efficient as this seems to do 20+ passes through the string.
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227