Hi internals@,
I maintain an extension and I suspect there are some issues in the code. As such, I’ve been trying various tools to try to make it easier to catch the issues. (For the curious: I’ve tried *San, which I feel doesn’t work very well unless you /totally control/ the entire stack, which I didn’t have the luxury of. I also tried Valgrind, but I need ro revisit this to deal with possible false positives in the library.) This time, I decided to try static analysis through LLVM.
Luckily, clang-analyzer is pretty simple. Just prepending “scan-build” to my make invocation. Easy, right? Unfortunately, I noticed that due to an inconsistency in the codebase (a use of realloc instead of erealloc), that it doesn’t seem to account for i.e emalloc vs. malloc. Possible leaks “went away” from the output when I converted them to the PHP memory management functions.
Has anyone ever used clang-analyzer with PHP before? I noticed there was some tooling for a previous PHP transition 1, but I don’t know if anyone’s tackled the low-hanging fruit of memory functions. I suppose I could just redefine emalloc and friends, but I feel that would probably be inaccurate with things like zend_string.
Regards,
Calvin
Hi internals@,
I maintain an extension and I suspect there are some issues in the code. As such, I’ve been trying various tools to try to make it easier to catch the issues. (For the curious: I’ve tried *San, which I feel doesn’t work very well unless you /totally control/ the entire stack, which I didn’t have the luxury of. I also tried Valgrind, but I need ro revisit this to deal with possible false positives in the library.) This time, I decided to try static analysis through LLVM.
Luckily, clang-analyzer is pretty simple. Just prepending “scan-build” to my make invocation. Easy, right? Unfortunately, I noticed that due to an inconsistency in the codebase (a use of realloc instead of erealloc), that it doesn’t seem to account for i.e emalloc vs. malloc. Possible leaks “went away” from the output when I converted them to the PHP memory management functions.
Has anyone ever used clang-analyzer with PHP before? I noticed there was some tooling for a previous PHP transition 1, but I don’t know if anyone’s tackled the low-hanging fruit of memory functions. I suppose I could just redefine emalloc and friends, but I feel that would probably be inaccurate with things like zend_string.
Regards,
Calvin--
To unsubscribe, visit: https://www.php.net/unsub.php
Just to check: are you setting the environment variable USE_ZEND_ALLOC
to 0? This causes the engine to use malloc:
https://heap.space/xref/PHP-7.4/Zend/zend_alloc.c?r=600402d9#2738.
For what it's worth, I was recently annoyed again by valgrind being
so noisy because zend_string_equal_val intentionally reads past the
end of a zend_string. The allocator ensures that memory was allocated,
but it isn't guaranteed to be initialized. We should find some way to
initialize this memory for future releases -- maybe add a function
which null terminates a zend string by adding not 1 null byte but as
many as necessary to reach the end of the allocation. This should be
trivial enough in cost to do, compared to some other solutions like
always zero'ing out the whole memory block or initializing the
trailing bytes at zend_string_alloc time.
Also, I'm not sure this read-past-the-end technique is actually safe,
such as when USE_ZEND_ALLOC is set to zero and we use malloc directly,
which does not make the same guarantees about alignment and padding on
the string...
Nikita pushed up this change only today, but it would theoretically
help with valgrind being used a runtime but not compiled with valgrind
support: https://github.com/php/php-src/commit/a0c44fbaf19841164c7984a6c21b364d391f3750.
I say theoretically only because I haven't tested it yet.
Just to check: are you setting the environment variable USE_ZEND_ALLOC
to 0? This causes the engine to use malloc:
https://heap.space/xref/PHP-7.4/Zend/zend_alloc.c?r=600402d9#2738.For what it's worth, I was recently annoyed again by valgrind being
so noisy because zend_string_equal_val intentionally reads past the
end of a zend_string. The allocator ensures that memory was allocated,
but it isn't guaranteed to be initialized. We should find some way to
initialize this memory for future releases -- maybe add a function
which null terminates a zend string by adding not 1 null byte but as
many as necessary to reach the end of the allocation. This should be
trivial enough in cost to do, compared to some other solutions like
always zero'ing out the whole memory block or initializing the
trailing bytes at zend_string_alloc time.Also, I'm not sure this read-past-the-end technique is actually safe,
such as when USE_ZEND_ALLOC is set to zero and we use malloc directly,
which does not make the same guarantees about alignment and padding on
the string...Nikita pushed up this change only today, but it would theoretically
help with valgrind being used a runtime but not compiled with valgrind
support: https://github.com/php/php-src/commit/a0c44fbaf19841164c7984a6c21b364d391f3750.
I say theoretically only because I haven't tested it yet.
WRT Valgrind: Yes, I do run it with that environment variable, because run-tests sets it for you when using Valgrind. Unfortunately there’s some more false positives, but it’s coming from the library the extension is wrapping. I’d need to take some time to figure out if that’s a false positive, a leak caused by me, or worse, a leak in the library itself. The additional improvements to Valgrind support are welcome though.
I did some additional research and seemed to have decent luck with slamming a construct like this after the includes:
#define emalloc malloc
#define erealloc realloc
#define ecalloc calloc
#define estrdup strdup
#define estrndup strndup
#define efree free
One could wrap it around __clang_analyzer__
or such. Unfortunately, this is kinda ugly; it emits compiler warnings, but I suspect this probably belongs in PHP itself or perhaps in some kind of clang-analyzer addon that handles the e* allocation functions as well as things like mismatching them.
Hi internals@,
I maintain an extension and I suspect there are some issues in the code. As such, I’ve been trying various tools to try to make it easier to catch the issues. (For the curious: I’ve tried *San, which I feel doesn’t work very well unless you /totally control/ the entire stack, which I didn’t have the luxury of. I also tried Valgrind, but I need ro revisit this to deal with possible false positives in the library.) This time, I decided to try static analysis through LLVM.
Luckily, clang-analyzer is pretty simple. Just prepending “scan-build” to my make invocation. Easy, right? Unfortunately, I noticed that due to an inconsistency in the codebase (a use of realloc instead of erealloc), that it doesn’t seem to account for i.e emalloc vs. malloc. Possible leaks “went away” from the output when I converted them to the PHP memory management functions.
Has anyone ever used clang-analyzer with PHP before? I noticed there was some tooling for a previous PHP transition 1, but I don’t know if anyone’s tackled the low-hanging fruit of memory functions. I suppose I could just redefine emalloc and friends, but I feel that would probably be inaccurate with things like zend_string.
Regards,
Calvin--
To unsubscribe, visit: https://www.php.net/unsub.php
I did some additional research and seemed to have decent luck with
slamming a construct like this after the includes:#define emalloc malloc
#define erealloc realloc
#define ecalloc calloc
#define estrdup strdup
#define estrndup strndup
#define efree freeOne could wrap it around
__clang_analyzer__
or such. Unfortunately, this
is kinda ugly; it emits compiler warnings, but I suspect this probably
belongs in PHP itself or perhaps in some kind of clang-analyzer addon that
handles the e* allocation functions as well as things like mismatching them.
Isn't this handled by the GCC 11?
The existing malloc
https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Common-Function-Attributes.html#index-malloc-function-attribute
attribute has been extended so that it can be used to identify
allocator/deallocator API pairs. A pair of new -Wmismatched-dealloc
https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wmismatched-dealloc
and -Wmismatched-new-delete
https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/C_002b_002b-Dialect-Options.html#index-Wmismatched-new-delete
warnings will complain about mismatched calls, and -Wfree-nonheap-object
https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wfree-nonheap-object
about deallocation calls with pointers not obtained from allocation
functions. Additionally, the static analyzer will use these attributes when
checking for leaks, double-frees, use-after-frees, and similar issues.
From: https://gcc.gnu.org/gcc-11/changes.html
Best regards,
George P. Banyard
Good point, but I'm still on GCC 10 in Fedora. I haven't tried its
static analyzer yet either. That said, I don't know how GCC's analyzer
handles PHP's ewhateveralloc functions - that is, if it'd do better
than LLVM.
Isn't this handled by the GCC 11?