Hi internals.
I've been coding in PHP for 15 years now, and spend most days using it to
transform content into meaningful data.
Most of this is pulling out prices and attributes of certain products from
HTML, for example, grabbing the price from content such as "Total price
including delivery: £15.00".
To grab the "15.00" from the string can take quite a few lines of PHP and
can be pretty cumbersome.
I've built some helper functions - substring, and it's case-insensitive
variant subistring to help.
Functions take in the string plus a to and from string, and return a trim'd
string found between the two.
So substring("Total price including delivery: £15.00", "£", "")
would return "15.00".
From and to strings are optional and therefore return from the beginning or
to the end.
In the past I hadn't thought about adding this to PHP core, but with the
introduction of str_starts/ends_with functions in PHP 8.0 I thought it may
be useful to include the sub(i)string building blocks too.
Implementation and tests can be found @
https://github.com/php/php-src/pull/6602
I'm sure the C implementation can be made a lot better, but it seems to work
OK at present.
This is my first e-mail to internals, so please excuse my naivety with
things, but hope this is useful.
Thanks,
Adam
Hi internals.
I've been coding in PHP for 15 years now, and spend most days using it to
transform content into meaningful data.Most of this is pulling out prices and attributes of certain products from
HTML, for example, grabbing the price from content such as "Total price
including delivery: £15.00".To grab the "15.00" from the string can take quite a few lines of PHP and
can be pretty cumbersome.I've built some helper functions - substring, and it's case-insensitive
variant subistring to help.
Functions take in the string plus a to and from string, and return a trim'd
string found between the two.So substring("Total price including delivery: £15.00", "£", "")
would return "15.00".
From and to strings are optional and therefore return from the beginning or
to the end.In the past I hadn't thought about adding this to PHP core, but with the
introduction of str_starts/ends_with functions in PHP 8.0 I thought it may
be useful to include the sub(i)string building blocks too.Implementation and tests can be found @
https://github.com/php/php-src/pull/6602I'm sure the C implementation can be made a lot better, but it seems to
work
OK at present.This is my first e-mail to internals, so please excuse my naivety with
things, but hope this is useful.Thanks,
Adam
Hi Adam,
A few thoughts:
- The name of the function is not clear. It's not obvious what the
difference betweensubstr()
and substring() is. To make it worse,
JavaScript has bothsubstr()
and substring(), but the meaning of
substring() there is a different one from what you propose. I think a
better name would be something like str_between(). - Why does this perform an implicit
trim()
call? I understand that this
may be useful in some cases, but it will also limit applicability of the
function. It's easy to write trim(substring(...)), but if thetrim()
is
part of the call, there's no way to avoid it. - More generally, I feel that this API is a bit too specific for inclusion
in the standard library. It can certainly be useful, but I don't think it's
anywhere near as ubiquitous as operations likestr_ends_with()
. For complex
string matching tasks, I would probably pickpreg_match()
over a
combination of str* functions anyway.
Regards,
Nikita
From: Nikita Popov nikita.ppv@gmail.com
Sent: 18 January 2021 10:32
To: Adam Cable adam@adsar.co.uk
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] Addition of substring and subistring functions.
Hi internals.
I've been coding in PHP for 15 years now, and spend most days using it to
transform content into meaningful data.
Most of this is pulling out prices and attributes of certain products from
HTML, for example, grabbing the price from content such as "Total price
including delivery: £15.00".
To grab the "15.00" from the string can take quite a few lines of PHP and
can be pretty cumbersome.
I've built some helper functions - substring, and it's case-insensitive
variant subistring to help.
Functions take in the string plus a to and from string, and return a trim'd
string found between the two.
So substring("Total price including delivery: £15.00", "£", "")
would return "15.00".
From and to strings are optional and therefore return from the beginning or
to the end.
In the past I hadn't thought about adding this to PHP core, but with the
introduction of str_starts/ends_with functions in PHP 8.0 I thought it may
be useful to include the sub(i)string building blocks too.
Implementation and tests can be found @
https://github.com/php/php-src/pull/6602
I'm sure the C implementation can be made a lot better, but it seems to work
OK at present.
This is my first e-mail to internals, so please excuse my naivety with
things, but hope this is useful.
Thanks,
Adam
Hi Adam,
A few thoughts:
- The name of the function is not clear. It's not obvious what the difference between
substr()
and substring() is. To make it worse, JavaScript has bothsubstr()
and substring(), but the meaning of substring() there is a different one from what you propose. I think a better name would be something like str_between(). - Why does this perform an implicit
trim()
call? I understand that this may be useful in some cases, but it will also limit applicability of the function. It's easy to write trim(substring(...)), but if thetrim()
is part of the call, there's no way to avoid it. - More generally, I feel that this API is a bit too specific for inclusion in the standard library. It can certainly be useful, but I don't think it's anywhere near as ubiquitous as operations like
str_ends_with()
. For complex string matching tasks, I would probably pickpreg_match()
over a combination of str* functions anyway.
Regards,
Nikita
Hi Nikita,
Thanks for the reply, appreciated.
Answers below:
1 - str_between() makes much more sense to align with similar PHP function names
2 - We have never had a time when excess whitespace was wanted - but it doesn't need to trim
3 - That's fine, just thought I'd propose it as I use this nearly every day, and helps to keep things simple when regex might be a bit overkill
Happy to leave this for now - but again, appreciate the reply.
Best,
Adam