Hey all,
I think I have a fix for the problems in get_browser()
as per the bug
report. (#27291: get_browser matches browscap.ini patterns incorrectly)
Gary Keith, who handles the browscap.ini file we suggest in the
get_browser()
docs, uses IIS/ASP's browscap.dll for his benchmarking and I
believe I gotten get_browser()
to produce the same results as the DLL.
The original bug reporter, Ryan Schmidt, pointed me in the direction of this
documentation from MSFT:
http://www.microsoft.com/windows2000/en/server/iis/htm/asp/comp1g11.htm
So I basically just implemented what they say there. (The relevant section
starts with "Note The BrowserType object first attempts to match...")
The basic fix is basically...
- use ^ and $ to anchor the regexes
- do a case-insensitive search (lowercasing both the patterns and the user
agent being checked, using REG_ICASE is painfully slow) - use an exact match where possible
- if more than one match is found, use the match which replaces the fewest
number of characters versus the original browser name patterns. (Ignoring
any ?/* wildcard characters in the pattern for character counts.)
Based on the list of 22954 user agents I got from Gary, this seems to take
care of them all and should produce the same results as browscap.dll.
The patch against HEAD can be found at
http://216.94.11.234/browscap.c.diff
It's kind of long so I'd like to get some feedback. It can probably be
optimized a bit, as my C is getting a little rusty, but it seems to work.
There is a bit more overhead versus the previous get_browser()
implementation, but based on the little benchmarking I did I don't think
it's overly serious.
This doesn't address the ini parser issues in #27372; don't know what to do
about that yet.
I'd like to commit this to HEAD and also to PHP_4_3 if possible. (The patch
for 4_3 is identical except for that call to zend_is_autoglobal().) I don't
have access to a Windows box at the moment and have only tested on linux,
but I think everything should check out.
So, any yays or nays on the patch?
J
Hey Jay, sorry, didn't notice this message until after sending out my
previous request. How do you feel about taking over maintaining the
browscap code and splitting it off into its own pecl extension so you are
not restricted by PHP's release schedule?
-Rasmus
Hey all,
I think I have a fix for the problems in
get_browser()
as per the bug
report. (#27291: get_browser matches browscap.ini patterns incorrectly)Gary Keith, who handles the browscap.ini file we suggest in the
get_browser()
docs, uses IIS/ASP's browscap.dll for his benchmarking and I
believe I gottenget_browser()
to produce the same results as the DLL.The original bug reporter, Ryan Schmidt, pointed me in the direction of this
documentation from MSFT:http://www.microsoft.com/windows2000/en/server/iis/htm/asp/comp1g11.htm
So I basically just implemented what they say there. (The relevant section
starts with "Note The BrowserType object first attempts to match...")The basic fix is basically...
- use ^ and $ to anchor the regexes
- do a case-insensitive search (lowercasing both the patterns and the user
agent being checked, using REG_ICASE is painfully slow)- use an exact match where possible
- if more than one match is found, use the match which replaces the fewest
number of characters versus the original browser name patterns. (Ignoring
any ?/* wildcard characters in the pattern for character counts.)Based on the list of 22954 user agents I got from Gary, this seems to take
care of them all and should produce the same results as browscap.dll.The patch against HEAD can be found at
http://216.94.11.234/browscap.c.diff
It's kind of long so I'd like to get some feedback. It can probably be
optimized a bit, as my C is getting a little rusty, but it seems to work.
There is a bit more overhead versus the previousget_browser()
implementation, but based on the little benchmarking I did I don't think
it's overly serious.This doesn't address the ini parser issues in #27372; don't know what to do
about that yet.I'd like to commit this to HEAD and also to PHP_4_3 if possible. (The patch
for 4_3 is identical except for that call to zend_is_autoglobal().) I don't
have access to a Windows box at the moment and have only tested on linux,
but I think everything should check out.So, any yays or nays on the patch?
J
Rasmus Lerdorf wrote:
Hey Jay, sorry, didn't notice this message until after sending out my
previous request. How do you feel about taking over maintaining the
browscap code and splitting it off into its own pecl extension so you are
not restricted by PHP's release schedule?-Rasmus
Splitting it off into PECL would probably be the best thing to do,
especially with PHP 5 looming on the horizon. Gary is somewhat concerned
that after 5 comes out, the problems that exist in the 4_3 branch will
continue to plague him and 4.3.x browscap.ini users, so splitting it off
from the PHP release sounds pretty logical.
I'll volunteer to continue working on it, although hopefully most or all of
the issues concerning the pattern matching should be fixed by my patch. The
parser issues are another matter, but splitting it off into PECL will let
us fix that up without having to worry about a complete 4.3.x release.
So I'm +1 for this. The question is, do we split it off before the impending
4.3.5 release or afterwards? I would imagine that it should be done before.
If 4.3.5 goes out the door, browscap will be compiled in statically, so
that makes having it in PECL harder.
J
Rasmus Lerdorf wrote:
Hey Jay, sorry, didn't notice this message until after sending out my
previous request. How do you feel about taking over maintaining the
browscap code and splitting it off into its own pecl extension so you are
not restricted by PHP's release schedule?-Rasmus
Splitting it off into PECL would probably be the best thing to do,
especially with PHP 5 looming on the horizon. Gary is somewhat concerned
that after 5 comes out, the problems that exist in the 4_3 branch will
continue to plague him and 4.3.x browscap.ini users, so splitting it off
from the PHP release sounds pretty logical.I'll volunteer to continue working on it, although hopefully most or all of
the issues concerning the pattern matching should be fixed by my patch. The
parser issues are another matter, but splitting it off into PECL will let
us fix that up without having to worry about a complete 4.3.x release.So I'm +1 for this. The question is, do we split it off before the impending
4.3.5 release or afterwards? I would imagine that it should be done before.
If 4.3.5 goes out the door, browscap will be compiled in statically, so
that makes having it in PECL harder.
Given the fact that the current implementation in 4.3.4 doesn't actually
work, we might as well simply remove it for 4.3.5 and stick a note in the
documentation for people to do a "pear install browscap" for 4.3.5. In
PHP5 we will have the ability to automatically bundle the latest stable
version of a PECL extension when we do a release which gives us the best
of both worlds. Separate release cycles plus the ability for users to
update their browscap extension without upgrading PHP.
-Rasmus