I was going through failing tests today and noticed some which failed
because of memory leaks using code like the following:
RETURN_RT_STRING(s, 0);
}
For non-unicode mode this is functionally identical to:
RETURN_STRING(s, 0);
}
However for unicode mode, this binary value is translated to unicode using
(effectively) this:
{
UChar *u;
int u_len;
UErrorCode status = U_ZERO_ERROR;
zend_convert_to_string(ZEND_U_CONVERTER(UG(runtime_encoding_conv)),
&u, &u_len, s, strlen(s), &status);
RETURN_UNICODE(u, 0);
}
}
The trouble with this, is that the original s value winds up getting leaked.
I noticed a few other spots in the source where this is handled as such:
RETVAL_RT_STRING(s, 0);
if (UG(unicode)) {
efree(s);
}
return;
}
Which is, of course, perfectly valid, but it feels a bit cludgy to me as
it's a little inconsistent with the normal RETURN_STRING()/RETURN_UNICODE()
semantics when it comes to "giving away" the variable. It's also a problem
for the RETURN_RT_STRING(s, 0); usage in general as there's no point after
the macro to free the original var.
I see two solutions:
(1) Modify the RETURN_RT_STRING(L)() macros to the following:
#define RETURN_RT_STRING(t, duplicate)
{ RETVAL_RT_STRING(t, duplicate); if (duplicate && UG(unicode)) efree(t);
return; }
(2) Modify ZVAL_U_STRINGL() to:
#define ZVAL_U_STRINGL(conv, z, s, l, duplicate, auto_free)
if (UG(unicode)) {
UErrorCode status = U_ZERO_ERROR;
UChar *u_str;
int u_len;
zend_convert_to_unicode(conv, &u_str, &u_len, s, l, &status);
if (auto_free && !duplicate) {
efree(s);
}
ZVAL_UNICODEL(z, u_str, u_len, 0);
} else {
char *__s=(s); int __l=l;
Z_STRLEN_P(z) = __l;
Z_STRVAL_P(z) = (duplicate?estrndup(__s, __l):__s);
Z_TYPE_P(z) = IS_STRING;
}
Along with changes to the (ZVAL|RETVAL)RT_STRING() macros: to support the
additional auto_free option (with RETURN* assuming auto_free=1).
The first option solves the one real problem with the current implementation
by making the uncatchable RETURN_RT_STRING() macros free the original char*
as needed, whereas the latter takes that work away from the calling scope at
the cost of adding to the proto and requiring more work to go back and clean
up existing uses.
Thoughts?
-Sara
Hi Sara,
Your first solution will not work. String passed to ZVAL_RETURN_RT_STRING()
may be not allocated by emalloc().
The second solution will work.
ZVAL_RETURN_RT_STRINGL(str, len, duplicate) -> ZVAL_RETURN_RT_STRINGL(str,
len, duplicate, auto_free)
- It is possible to reuse "duplicate" argument
0 - don't duplicate
1 - duplicate
2 - duplicate and free
Thanks. Dmitry.
-----Original Message-----
From: Sara Golemon [mailto:pollita@php.net]
Sent: Monday, April 03, 2006 4:07 AM
To: internals@lists.php.net
Subject: [PHP-DEV] RETURN_RT_STRING() and family leakageI was going through failing tests today and noticed some which failed
because of memory leaks using code like the following:RETURN_RT_STRING(s, 0);
}For non-unicode mode this is functionally identical to:
RETURN_STRING(s, 0);
}However for unicode mode, this binary value is translated to
unicode using
(effectively) this:{
UChar *u;
int u_len;
UErrorCode status = U_ZERO_ERROR;zend_convert_to_string(ZEND_U_CONVERTER(UG(runtime_encoding_conv)),
&u, &u_len, s, strlen(s), &status);
RETURN_UNICODE(u, 0);
}
}The trouble with this, is that the original s value winds up
getting leaked.
I noticed a few other spots in the source where this is
handled as such:RETVAL_RT_STRING(s, 0);
if (UG(unicode)) {
efree(s);
}
return;
}Which is, of course, perfectly valid, but it feels a bit
cludgy to me as
it's a little inconsistent with the normal
RETURN_STRING()/RETURN_UNICODE()
semantics when it comes to "giving away" the variable. It's
also a problem
for the RETURN_RT_STRING(s, 0); usage in general as there's
no point after
the macro to free the original var.I see two solutions:
(1) Modify the RETURN_RT_STRING(L)() macros to the following:
#define RETURN_RT_STRING(t, duplicate)
{ RETVAL_RT_STRING(t, duplicate); if (duplicate &&
UG(unicode)) efree(t);
return; }(2) Modify ZVAL_U_STRINGL() to:
#define ZVAL_U_STRINGL(conv, z, s, l, duplicate, auto_free)
if (UG(unicode)) {
UErrorCode status = U_ZERO_ERROR;
UChar *u_str;
int u_len;
zend_convert_to_unicode(conv, &u_str, &u_len, s, l,
&status);
if (auto_free && !duplicate) {
efree(s);
}
ZVAL_UNICODEL(z, u_str, u_len, 0);
} else {
char *__s=(s); int __l=l;
Z_STRLEN_P(z) = __l;
Z_STRVAL_P(z) = (duplicate?estrndup(__s, __l):__s);
Z_TYPE_P(z) = IS_STRING;
}Along with changes to the (ZVAL|RETVAL)RT_STRING() macros:
to support the
additional auto_free option (with RETURN* assuming auto_free=1).The first option solves the one real problem with the current
implementation
by making the uncatchable RETURN_RT_STRING() macros free the
original char*
as needed, whereas the latter takes that work away from the
calling scope at
the cost of adding to the proto and requiring more work to go
back and clean
up existing uses.Thoughts?
-Sara
Your first solution will not work. String passed to
ZVAL_RETURN_RT_STRING()
may be not allocated by emalloc().
duplicate should only ever be set to 0 on this (or any of the macros) when
the string is allocated with emalloc. Otherwise the enegine would get in
trouble freeing it later on. I did just notice that I paste the wrong
version into my post though.... it should have been: if (!duplicate &&
UG(unicode) ...
#define RETURN_RT_STRING(t, duplicate)
{ RETVAL_RT_STRING(t, duplicate); if (!duplicate && UG(unicode)) efree(t);
return; }
The second solution will work.
ZVAL_RETURN_RT_STRINGL(str, len, duplicate) ->
ZVAL_RETURN_RT_STRINGL(str, len, duplicate, auto_free)
There's one other we came up with:
Leave existing protos as is, having them assume auto-free when duplicate==0
(There is no issue when duplicate==1).
Create an ad=ditional set of macros: (ZVAL|RETVAL)_RT_STRINGL_NOFREE(str,
len) to be used when duplication (for the sake of owning the buffer) is not
needed (because it's emalloc'd), but where (str) should not be freeded even
in the eventuality that it's converted into a new buffer as unicode
contents.
This gives that edge 10% the ability to reuse (str) after populating it into
the zval. A RETURN variant would be silly here as
RETURN_RT_STRING_NOFREE(str) would be guaranteed to leak in unicode mode.
(It converts into a new buffer then abandons the old one).
- It is possible to reuse "duplicate" argument
0 - don't duplicate
1 - duplicate
2 - duplicate and free
Andrei and I tossed this around last night (and actually it's "don't
duplicate and free" since the logic leading to the need for an auto_free
assumes that the original string should not have been copied but the unicode
conversion demanded that it was). The trouble with this approach is that
it's terribly inconsistent with other ZVAL/RETVAL/RETURN macros in use
everywhere else. e.g. duplicate has always been a binary value, not a
trinary one.
-Sara