Hi!
I've tried to fix https://bugs.php.net/72793, and it looks like
there's a general GC issue wrt. to resources referencing objects and
vice versa. Aren't resource ZVALS put in the root buffer?
See https://3v4l.org/JYIQs, which demonstrates the issue better than
the test script in the bug report. A steady increase of allocated
memory can be seen, even though gc_collect_cycles()
is called. When
uncommenting unset($this->parser);
, everything is fine (the GC
wouldn't be involved at all in this case).
Wrt. to the PHP 5.6 behavior: this appears fine, but actually it's in
error, because of https://github.com/php/php-src/commit/72ec2e8f. Not
increasing the refcount of parser->object
might theoretically lead to
use-after-free scenarios.
--
Christoph M. Becker
On Wed, Aug 10, 2016 at 12:52 PM, Christoph M. Becker cmbecker69@gmx.de
wrote:
Hi!
I've tried to fix https://bugs.php.net/72793, and it looks like
there's a general GC issue wrt. to resources referencing objects and
vice versa. Aren't resource ZVALS put in the root buffer?See https://3v4l.org/JYIQs, which demonstrates the issue better than
the test script in the bug report. A steady increase of allocated
memory can be seen, even thoughgc_collect_cycles()
is called. When
uncommentingunset($this->parser);
, everything is fine (the GC
wouldn't be involved at all in this case).Wrt. to the PHP 5.6 behavior: this appears fine, but actually it's in
error, because of https://github.com/php/php-src/commit/72ec2e8f. Not
increasing the refcount ofparser->object
might theoretically lead to
use-after-free scenarios.
As you correctly deduced, we currently do not support GC for resources.
This would require introducing something akin to the get_gc handler for
resources.
The simplest way to fix ext/xml in particular is probably to migrate it to
use an object instead of a resource.
Nikita
2016-08-10 12:59 GMT+02:00 Nikita Popov nikita.ppv@gmail.com:
On Wed, Aug 10, 2016 at 12:52 PM, Christoph M. Becker cmbecker69@gmx.de
wrote:
The simplest way to fix ext/xml in particular is probably to migrate it to
use an object instead of a resource.Nikita
If my memory strikes me right, then we did do this for ext/gmp too a
while back for the operator overloading (I even think it was you that
did that?). Maybe it would be time to evaluate the situation in 7.2,
for extensions that still use resources[1] that could be affected by
similar issues and adapt them hereafter.
Looking at the code we also got this "special" resource, which is not
exposed to the userland, called an "index pointer"[2], which only a
few extensions are using (interbase, oci8, odbc & pgsql), is this a
hack?
[1] https://gist.github.com/KalleZ/df08b99efc7c7166cfa36aab2dbe8160
[2] https://gist.github.com/KalleZ/b5b8d47f3d523c9e806188c85aa10c68
--
regards,
Kalle Sommer Nielsen
kalle@php.net
2016-08-10 12:59 GMT+02:00 Nikita Popov nikita.ppv@gmail.com:
On Wed, Aug 10, 2016 at 12:52 PM, Christoph M. Becker cmbecker69@gmx.de
wrote:
The simplest way to fix ext/xml in particular is probably to migrate it to
use an object instead of a resource.If my memory strikes me right, then we did do this for ext/gmp too a
while back for the operator overloading (I even think it was you that
did that?). Maybe it would be time to evaluate the situation in 7.2,
for extensions that still use resources[1] that could be affected by
similar issues and adapt them hereafter.
I would welcome that, but it appears to be a lot of work.
BTW: ext/mcrypt is deprecated as of PHP 7.1 so I don't think a rework is
apt. Furthermore, we may consider to deprecate some other extensions
(stas has a draft RFC for that). IIRC, ext/odbc/birdstep doesn't even
work/is not used.
--
Christoph M. Becker
On Wed, Aug 10, 2016 at 12:52 PM, Christoph M. Becker cmbecker69@gmx.de
wrote:I've tried to fix https://bugs.php.net/72793, and it looks like
there's a general GC issue wrt. to resources referencing objects and
vice versa. Aren't resource ZVALS put in the root buffer?See https://3v4l.org/JYIQs, which demonstrates the issue better than
the test script in the bug report. A steady increase of allocated
memory can be seen, even thoughgc_collect_cycles()
is called. When
uncommentingunset($this->parser);
, everything is fine (the GC
wouldn't be involved at all in this case).Wrt. to the PHP 5.6 behavior: this appears fine, but actually it's in
error, because of https://github.com/php/php-src/commit/72ec2e8f. Not
increasing the refcount ofparser->object
might theoretically lead to
use-after-free scenarios.As you correctly deduced, we currently do not support GC for resources.
This would require introducing something akin to the get_gc handler for
resources.
Thanks for the clarification!
The simplest way to fix ext/xml in particular is probably to migrate it to
use an object instead of a resource.
I guess that is the best action in the long run (i.e. 7.2+), but that
would of course cause a BC break, so probably it's best to document that
the user has to break the cycle manually, if xml_set_object()
is used
(7.0, 7.1).
Wrt. to PHP 5.6 it appears there are no issues at all. While the code
out-commented by Thies is still there (which should be removed to avoid
further confusion), in the following a (shallow) copy of the object is
made, and apparently due to the additional refcount in the object store
of PHP 5, everything is okay.
Another alternative for PHP 7.2 might be to drop (after deprecation)
xml_set_object()
; actually, it seems to me this function is a relict of
pre PHP 5. Cf. example #1[1]. Instead of doing:
xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, "tag_open", "tag_close");
one could simply do:
xml_set_element_handler($this->parser,
[$this, "tag_open"], [$this, "tag_close]);
[1]
http://php.net/manual/en/function.xml-set-object.php#refsect1-function.xml-set-object-examples
--
Christoph M. Becker
The simplest way to fix ext/xml in particular is probably to migrate it to
use an object instead of a resource.I guess that is the best action in the long run (i.e. 7.2+), but that
would of course cause a BC break, so probably it's best to document that
the user has to break the cycle manually, ifxml_set_object()
is used
(7.0, 7.1).Wrt. to PHP 5.6 it appears there are no issues at all. While the code
out-commented by Thies is still there (which should be removed to avoid
further confusion), in the following a (shallow) copy of the object is
made, and apparently due to the additional refcount in the object store
of PHP 5, everything is okay.Another alternative for PHP 7.2 might be to drop (after deprecation)
xml_set_object()
; actually, it seems to me this function is a relict of
pre PHP 5. Cf. example #1[1]. Instead of doing:xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, "tag_open", "tag_close");one could simply do:
xml_set_element_handler($this->parser,
[$this, "tag_open"], [$this, "tag_close]);[1]
http://php.net/manual/en/function.xml-set-object.php#refsect1-function.xml-set-object-examples
It does not have to be a BC if we simply introduce a new API for this
extension and keep the old one as is. The whole set of functions just
cries out loud for a class that encapsulates the state of the parser and
both the utf8_* functions simply do not belong in this extension.
This new API can be introduced in any PHP minor along with a deprecation
and removal (move to PECL and abandon) of the resource based things in
PHP 8.
--
Richard "Fleshgrinder" Fussenegger
The simplest way to fix ext/xml in particular is probably to migrate it to
use an object instead of a resource.I guess that is the best action in the long run (i.e. 7.2+), but that
would of course cause a BC break, so probably it's best to document that
the user has to break the cycle manually, ifxml_set_object()
is used
(7.0, 7.1).Wrt. to PHP 5.6 it appears there are no issues at all. While the code
out-commented by Thies is still there (which should be removed to avoid
further confusion), in the following a (shallow) copy of the object is
made, and apparently due to the additional refcount in the object store
of PHP 5, everything is okay.Another alternative for PHP 7.2 might be to drop (after deprecation)
xml_set_object()
; actually, it seems to me this function is a relict of
pre PHP 5. Cf. example #1[1]. Instead of doing:xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, "tag_open", "tag_close");one could simply do:
xml_set_element_handler($this->parser,
[$this, "tag_open"], [$this, "tag_close]);[1]
http://php.net/manual/en/function.xml-set-object.php#refsect1-function.xml-set-object-examplesIt does not have to be a BC if we simply introduce a new API for this
extension and keep the old one as is. The whole set of functions just
cries out loud for a class that encapsulates the state of the parser and
both the utf8_* functions simply do not belong in this extension.
That might be a good idea, even if we already have the XMLReader. And
yes, the utf8_* functions don't really belong into ext/xml; frankly, I
think they don't belong anywhere – we already have iconv, mbstring and
recode which offer a subset of utf8_decode()
/utf8_encode. OTOH,
utf8_decode()
/_encode() appear to be often used (IIRC there are a lot of
user notes suggesting to use these functions).
--
Christoph M. Becker
Maybe it's a good idea to move them to another ext, and then move xmlparser
back to PECL ?
2016-08-10 20:23 GMT+02:00 Christoph M. Becker cmbecker69@gmx.de:
The simplest way to fix ext/xml in particular is probably to migrate
it to
use an object instead of a resource.I guess that is the best action in the long run (i.e. 7.2+), but that
would of course cause a BC break, so probably it's best to document that
the user has to break the cycle manually, ifxml_set_object()
is used
(7.0, 7.1).Wrt. to PHP 5.6 it appears there are no issues at all. While the code
out-commented by Thies is still there (which should be removed to avoid
further confusion), in the following a (shallow) copy of the object is
made, and apparently due to the additional refcount in the object store
of PHP 5, everything is okay.Another alternative for PHP 7.2 might be to drop (after deprecation)
xml_set_object()
; actually, it seems to me this function is a relict of
pre PHP 5. Cf. example #1[1]. Instead of doing:xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, "tag_open", "tag_close");one could simply do:
xml_set_element_handler($this->parser,
[$this, "tag_open"], [$this, "tag_close]);[1]
<http://php.net/manual/en/function.xml-set-object.php#
refsect1-function.xml-set-object-examples>It does not have to be a BC if we simply introduce a new API for this
extension and keep the old one as is. The whole set of functions just
cries out loud for a class that encapsulates the state of the parser and
both the utf8_* functions simply do not belong in this extension.That might be a good idea, even if we already have the XMLReader. And
yes, the utf8_* functions don't really belong into ext/xml; frankly, I
think they don't belong anywhere – we already have iconv, mbstring and
recode which offer a subset ofutf8_decode()
/utf8_encode. OTOH,
utf8_decode()
/_encode() appear to be often used (IIRC there are a lot of
user notes suggesting to use these functions).--
Christoph M. Becker--
--
pozdrawiam
Michał Brzuchalski
That might be a good idea, even if we already have the XMLReader. And
yes, the utf8_* functions don't really belong into ext/xml; frankly, I
think they don't belong anywhere – we already have iconv, mbstring and
recode which offer a subset ofutf8_decode()
/utf8_encode. OTOH,
utf8_decode()
/_encode() appear to be often used (IIRC there are a lot of
user notes suggesting to use these functions).
utf8_decode()
/utf8_encode are, at best, extremely misleading names. Many
uses of them in my experience go something like this: "I have an
encoding problem, it's something to do with UTF-8, I'll try utf8_encode;
hm, that didn't work, I'll try utf8_decode instead".
I hadn't even realised they're in the same extension as the equally
misunderstood xml_parse_into_struct()
which exposes a somewhat baffling
intermediate parsing structure which is incredibly hard to work with. It
mostly seems to be used to reinvent event-based parsing, which is
already the main facility offered by that extension. (I just looked at
the example on the manual page; it's bizarre.)
I'm not sure if we need yet another XML parsing extension, or just
sign-post that we already have better ones for most tasks. Maybe
XMLReader could be expanded to attach callbacks for passed nodes, and
have a "readAll" method, so that it acted as an event-based parser?
Regards,
Rowan Collins
[IMSoP]
That might be a good idea, even if we already have the XMLReader. And
yes, the utf8_* functions don't really belong into ext/xml; frankly, I
think they don't belong anywhere – we already have iconv, mbstring and
recode which offer a subset ofutf8_decode()
/utf8_encode. OTOH,
utf8_decode()
/_encode() appear to be often used (IIRC there are a lot of
user notes suggesting to use these functions).
utf8_decode()
/utf8_encode are, at best, extremely misleading names. Many
uses of them in my experience go something like this: "I have an
encoding problem, it's something to do with UTF-8, I'll try utf8_encode;
hm, that didn't work, I'll try utf8_decode instead".
You're dead on, AFAICT.
I hadn't even realised they're in the same extension as the equally
misunderstoodxml_parse_into_struct()
which exposes a somewhat baffling
intermediate parsing structure which is incredibly hard to work with. It
mostly seems to be used to reinvent event-based parsing, which is
already the main facility offered by that extension. (I just looked at
the example on the manual page; it's bizarre.)
xml_parse_into_struct()
is, in my opinion, also a legacy to better get
rid of; we have DOM and SimpleXML for this.
I'm not sure if we need yet another XML parsing extension, or just
sign-post that we already have better ones for most tasks. Maybe
XMLReader could be expanded to attach callbacks for passed nodes, and
have a "readAll" method, so that it acted as an event-based parser?
I wouldn't add that to XMLReader, but rather introduce a new abstract
class, say SAXParser. I don't mind in which extension this class would
live.
--
Christoph M. Becker
That might be a good idea, even if we already have the XMLReader. And
yes, the utf8_* functions don't really belong into ext/xml; frankly, I
think they don't belong anywhere – we already have iconv, mbstring and
recode which offer a subset ofutf8_decode()
/utf8_encode. OTOH,
utf8_decode()
/_encode() appear to be often used (IIRC there are a lot of
user notes suggesting to use these functions).
utf8_decode()
/utf8_encode are, at best, extremely misleading names. Many
uses of them in my experience go something like this: "I have an
encoding problem, it's something to do with UTF-8, I'll try utf8_encode;
hm, that didn't work, I'll try utf8_decode instead".
You're dead on, AFAICT.
I hadn't even realised they're in the same extension as the equally
misunderstoodxml_parse_into_struct()
which exposes a somewhat baffling
intermediate parsing structure which is incredibly hard to work with. It
mostly seems to be used to reinvent event-based parsing, which is
already the main facility offered by that extension. (I just looked at
the example on the manual page; it's bizarre.)
xml_parse_into_struct()
is, in my opinion, also a legacy to better get
rid of; we have DOM and SimpleXML for this.
I'm not sure if we need yet another XML parsing extension, or just
sign-post that we already have better ones for most tasks. Maybe
XMLReader could be expanded to attach callbacks for passed nodes, and
have a "readAll" method, so that it acted as an event-based parser?
I wouldn't add that to XMLReader, but rather introduce a new abstract
class, say SAXParser. I don't mind in which extension this class would
exist.
--
Christoph M. Becker