What sense of the [allowable_tags] parameter in strip_tags()
?
According docs, "You can use the optional second parameter to specify
tags which should not be stripped".
Ok. Suppose, I have a PHP-guestbook and use strip_tags()
to filter all
tags, excepting <b>,<i>,<u> in users' messages. Then a "cool-hacker"
enters the following string in my guestbook:
<b style="position:absolute;top:0px;left:0px;font-size:10em"
onmouseover="alert("you have been fu*ked!")">
THE MATRIX HAS YOU :)
</b>
I see following decisions of the problem:
- strip ALL tags by hands. The current version of function
strip_tags()
cannot be used for this operation. See below explanation - use "pseudotags" like BBCode in PHPbb
- do not strip any tags, but perform
htmlspecialchars()
before output - write new
strip_tags()
, which must strip all tags and cut ANY chars
after allowable tags. In the example above it must leave:
Propose any other way if you know it.
What way is better? The last one on my opinion.
And now I'll show some examples, which will explain the wrong behavior
of the current version of strip_tags()
:
- <b onclick="if (1 > 2) alert('WOW!')">the bold string</b>
- <b onclick="if (1 < 2) alert('YES!')">the hidden string</b>
- <!-- <<< the cool comment <<< -->any HTML after the HTML-comment will be stripped.
- <?='?>'?>test
the list coud be continued...
--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Alexander Valyalkin wrote:
What sense of the [allowable_tags] parameter in
strip_tags()
?According docs, "You can use the optional second parameter to specify
tags which should not be stripped".
strip_tags alone is indeed not enough to make sure the input is safe to
display inside your web page. But even if you'd remove all attributes
from the tags you still have the problem that you're not checking if the
input is valid html.
I once wrote a userland function which a) validated the input to ensure
xml conformance and b) stripped all but certain tags/attributes
combinations. Requires input to be xhtml but makes the tests much easier.
But to be honest I think most of the time it is much better to simply
disallow HTML and use htmlspecialchars()
on user input.
I'd leave strip_tags as it is, it's still useful in some cases where you
want to strip certain information for internal processing but you don't
include its output in a webpage.
- Chris
On Thu, 10 Jun 2004 16:33:13 +0300, Alexander Valyalkin valyala@tut.by
wrote:
Today I wrote the new version of strip_tags()
.
Yes, it is not ideal, but it is much better than current version.
Below is my complete version of strip_tags()
with testcases. You can add /
change
any testcases and compare speed & results of current strip_tags()
to my
one.
Sorry, but I stripped out majority of comments, because they were in
russian :)
====================cut====================
//
/ test strings /
//
char s[] = {
"", / empty string /
"a", / one character /
"<", / single < char /
"ab", / two chars /
"test<b", / incomplete tag /
"test<b title='asdf ", / incomplete single quotes /
"test<b title="add", / incomplete double quotes /
"test<!-- sdf ", / incomplete comment /
"test<? echo 'hello' ", / incomplete php-tag /
"test<% echo 'hello' ", / incomplete asp-tag /
"test<% $a = 'ss ", / incomplete php-string in single quotes /
"test<?php $a = "12\"3", / incomplete php-string in double quotes
/
"test<? // comment", / incomplete single-line comment /
"test<? # comment ", / incomplete single-line comment /
"test<? / comment\n** 23", /* incomplete multi-line comment /
"test<? $a = ls -l", /* incomplete quotes */ "test<? $a=<<<FOO\nssdf", /* incomplete HEREDOC */ "test<script>if (1<b) alert('<b>ee</b>');", /* incomplete <script> tag */ "test<StYle>div {font-weight:bold; }", /* incomplete <style> tag */ "a< b", /* not a tag */ "t<b>es</b>t", /* simple test */ "te<b title='1 > 2' />st", "<b title=\"1 > 2\">test", "t<b title='1 < 2'/>est", "tes<b title = qwe'rt>t", "t<!-- <<< comment <<< -->est", "<!-- >>> <b>comment</b> <<< -->test", "t<? echo '?>' ?>est", "<?='\"a\\'b' ?>test", "te<% $a = \"?>'%>\"; // comment1\n // comment2 %>st", "t<?php \n # here is comment ?>est", "te<?=\"dd\\\"d'?>d%>d\" ?>st", "<?php $a = <<<END\n
t's"q?>t\nEND;\n ?>test",
"tes<? / co'm\m\"ne\"t \n multi line \n */ test ?>t", "<? print
sd\a'd\\\"d
; ?>test",
"t<scrIpT type =\n 'text/javascript'>if (a <b )
alert('hello')</ScRipt>est",
"te<sTyLe>#a { color: red; }</style>st",
"t<?xml version="1.0"?>est",
"<?xml version="1.'0\" encoding='UT"F-8\' ?>test",
NULL
};
char allow = "<a><b><c>"; / allowable tags /
/**************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define PHPAPI
#define PHP_MAX_HEREDOC_LEN 32
#define PHP_MAX_TAG_LEN 32
void php_tag_find(char *allow, size_t allow_len, char *tag_name_begin,
char *tag_name_ptr, char *src_ptr, char **dst_ptr) {
size_t tag_len;
char *tmp_ptr;
size_t is_end_tag = 0;
tag_len = tag_name_ptr - tag_name_begin;
if (*tag_name_begin == '/' && tag_len > 1) is_end_tag = 1;
*tag_name_ptr = '\0';
if (allow_len < 3 || src_ptr - *dst_ptr <= tag_len || tag_len < 1)
return;
tmp_ptr = strstr(allow + 1, is_end_tag ? tag_name_begin + 1 :
tag_name_begin);
if (tmp_ptr != NULL
&& tmp_ptr + tag_len - is_end_tag < allow
- allow_len &&
*(tmp_ptr + tag_len - is_end_tag) == '>' && *(tmp_ptr - 1) == '<')
{
*(*dst_ptr)++ = '<';
memcpy(*dst_ptr, tag_name_begin, tag_len);
dst_ptr += tag_len;
if ((src_ptr - 2) == ' ') *(dst_ptr)++ = ' ';
if ((src_ptr - 1) == '/') *(*dst_ptr)++ = '/';
*(*dst_ptr)++ = '>';
}
}
PHPAPI size_t php_strip_all_tags(char *rbuf, int len, int *stateptr, char
*allow, int allow_len)
{
char *src_begin = rbuf,
*src_ptr = rbuf,
*src_end = rbuf + (size_t) len;
char *dst_ptr = rbuf;
int state;
if (stateptr != NULL) state = *stateptr;
else state = 0;
static char tag_name_begin[PHP_MAX_TAG_LEN + 1],
*tag_name_ptr = NULL,
*tag_name_end = NULL;
size_t tag_len = 0;
if (tag_name_ptr == NULL) tag_name_ptr = tag_name_begin;
if (tag_name_end == NULL) tag_name_end = tag_name_begin
- PHP_MAX_TAG_LEN;
static char heredoc_name_begin[PHP_MAX_HEREDOC_LEN + 1],
*heredoc_name_ptr = NULL,
*heredoc_name_end = NULL;
if (heredoc_name_ptr == NULL) heredoc_name_ptr = heredoc_name_begin;
if (heredoc_name_end == NULL) heredoc_name_end = heredoc_name_begin - PHP_MAX_HEREDOC_LEN;
char ch;
while (src_ptr < src_end) {
ch = src_ptr;
switch (ch) {
case '#' :
switch (state) {
case 4 : state = 18; break;
}
break;
case '-' :
switch (state) {
case 10 :
if ((src_ptr - src_begin) > 2 && (src_ptr - 1) == '-' &&
(src_ptr - 2) == '!' &&
(src_ptr - 3) == '<') state = 9;
break;
}
break;
case '\r' :
case '\n' :
switch (state) {
case 8 :
case 18 : state = 4; break;
case 10 : state = 1; break;
}
break;
case ' ' :
case '\t' :
case '\v' :
case '\f' :
switch (state) {
case 10 : state = 1; break;
}
break;
case '' :
switch (state) {
case 4 : if ((src_ptr - 1) == '/') state = 7; break;
}
break;
case '/' :
switch (state) {
case 4 : if ((src_ptr - 1) == '/') state = 8; break;
case 10 : if ((src_ptr - 1) != '<') state = 1; break;
}
break;
case '\' :
switch (state) {
case 5 :
case 6 :
case 17 :
if (src_ptr < src_end) src_ptr++; break;
}
break;
case '%' :
case '?' :
switch (state) {
case 10 :
if ((src_ptr - 1) == '<' && src_ptr + 1 < src_end &&
tolower((src_ptr + 1)) != 'x') {
if (tag_name_ptr < tag_name_end) tag_name_ptr++ = ch;
state = 4;
}
break;
}
break;
case '<' :
switch (state) {
case 0 : if (src_end - src_ptr > 1 && !isspace((src_ptr - 1))) state = 13; break;
case 4 :
if ((src_ptr + 2 < src_end) && (src_ptr + 1) == '<' &&
(src_ptr + 2) == '<') {
state = 15;
src_ptr += 2;
}
break;
}
break;
case '>' :
switch (state) {
case 4 :
case 8 :
case 18 : if ((src_ptr - 1) == tag_name_begin) state = 14;
break;
case 1 :
case 10 :
if (tag_name_ptr - tag_name_begin == 6 &&
!memcmp(tag_name_begin, "script", 6)) state = 11;
else if (tag_name_ptr - tag_name_begin == 5 &&
!memcmp(tag_name_begin, "style", 5)) state = 12;
else {
php_tag_find(allow, (size_t) allow_len,
tag_name_begin, tag_name_ptr, src_ptr, &dst_ptr);
state = 14;
}
break;
}
break;
case '"' :
switch (state) {
case 1 : if ((src_ptr - 1) == '=' || isspace((src_ptr - 1)))
state = 2; break;
case 4 : state = 5; break;
case 5 : state = 4; break;
}
break;
case ''' :
switch (state) {
case 1 : if ((src_ptr - 1) == '=' || isspace((src_ptr - 1)))
state = 3; break;
case 4 : state = 6; break;
case 6 : state = 4; break;
}
break;
case '' : switch (state) { case 4 : state = 17; break; case 17 : state = 4; break; } break; } switch (state) { case 0 : *dst_ptr++ = ch; break; case 2 : src_ptr++; src_ptr = memchr(src_ptr, '"', src_end - src_ptr); if (src_ptr == NULL) src_ptr = src_end; else state = 1; break; case 3 : src_ptr++; src_ptr = memchr(src_ptr, '\'', src_end - src_ptr); if (src_ptr == NULL) src_ptr = src_end; else state = 1; break; case 7 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '*', src_end - src_ptr); if (src_ptr ==
NULL|| src_end - src_ptr < 2) src_ptr = src_end; else { src_ptr++; if (*src_ptr == '/') break; } } if (src_ptr < src_end) state = 4; break; case 9 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '-', src_end - src_ptr); if (src_ptr ==
NULL|| src_end - src_ptr < 3) src_ptr = src_end; else { src_ptr++; if (*src_ptr == '-' && *(src_ptr + 1) == '>') break; } } if (src_ptr < src_end) { src_ptr++; state = 0; } break; case 10 : if (tag_name_ptr < tag_name_end) *tag_name_ptr++ = tolower(ch); break; case 11 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '<', src_end - src_ptr); if (src_ptr ==
NULL|| src_end - src_ptr < 8) src_ptr = src_end; else { src_ptr++; if (src_ptr[0] == '/' && tolower(src_ptr[1]) == 's' && tolower(src_ptr[2]) == 'c' && tolower(src_ptr[3]) == 'r' && tolower(src_ptr[4]) == 'i' && tolower(src_ptr[5]) == 'p' && tolower(src_ptr[6]) == 't') break; } } if (src_ptr < src_end) { src_ptr += 6; tag_name_ptr = tag_name_end; state = 1; } break; case 12 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '<', src_end - src_ptr); if (src_ptr ==
NULL` || src_end - src_ptr < 7) src_ptr =
src_end;
else {
src_ptr++;
if (src_ptr[0] == '/' && tolower(src_ptr[1]) == 's' &&
tolower(src_ptr[2]) == 't' &&
tolower(src_ptr[3]) == 'y' && tolower(src_ptr[4])
== 'l' && tolower(src_ptr[5]) == 'e') break;
}
}
if (src_ptr < src_end) {
src_ptr += 5;
tag_name_ptr = tag_name_end;
state = 1;
}
break;
case 13 :
tag_name_ptr = tag_name_begin;
state = 10;
break;
case 14 : state = 0; break;
case 15 :
src_ptr++;
heredoc_name_ptr = heredoc_name_begin;
while (src_ptr < src_end && (*src_ptr == ' ' || *src_ptr ==
'\t')) src_ptr++;
if (src_ptr < src_end) {
while (src_ptr < src_end && heredoc_name_ptr <
heredoc_name_end &&
isalnum(*src_ptr)) *heredoc_name_ptr++ = *src_ptr++;
if (src_ptr < src_end && isalpha(*heredoc_name_begin)) {
*heredoc_name_ptr++ = '\0';
src_ptr = strstr(src_ptr, heredoc_name_begin);
if (src_ptr == NULL) {
src_ptr = src_end;
state = 16;
} else {
src_ptr += heredoc_name_ptr - heredoc_name_begin;
state = 4;
}
} else state = 4;
}
break;
case 16 :
src_ptr = strstr(src_ptr, heredoc_name_begin);
if (src_ptr == NULL) src_ptr = src_end;
else {
src_ptr += heredoc_name_ptr - heredoc_name_begin;
state = 4;
}
break;
}
src_ptr++;
}
*dst_ptr = '\0';
if (stateptr != NULL) *stateptr = state;
return (size_t) (dst_ptr - src_begin);
}
/***************************************************/
int main(int argc,char *argv[])
{
int i = 0;
char *s1;
size_t len_old, len_new, allow_len;
int state;
allow_len = strlen(allow);
s1 = (char *) malloc(1);
len_old = 0;
*s1 = '\0';
while (s[i] != NULL) {
printf("str_num=%d, ", i);
state = 0; /* set state to 0 */
len_new = strlen(s[i]);
if (len_new > len_old) s1 = (char *) realloc(s1, len_new + 1);
strcpy(s1, s[i]);
// printf("src=[%s], ", s1);
len_old = php_strip_all_tags(s1, len_new, &state, allow,
allow_len);
printf("dst=[%s], src_len=%d, dst_len=%d, state=%d\n", s1,
len_new, len_old, state);
len_old = len_new;
i++;
}
free(s1);
return 0;
}
====================cut====================
--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Alexander Valyalkin wrote:
Today I wrote the new version of
strip_tags()
.
It looks like you're very eager to contribute to PHP and write code very
quickly. On the other hand I think the focus of people on this list
right now is a bit different from what you're trying to do: Fix bugs in
PHP 4 and get PHP 5 out of the door.
All: Maybe one of the PHP maintainers could direct Alexander to some
pressing matters because it would be a shame to waste his energy just
because the list ignores his efforts.
- Chris
Alexander Valyalkin wrote:
Today I wrote the new version of
strip_tags()
.It looks like you're very eager to contribute to PHP and write code
very quickly. On the other hand I think the focus of people on this
list right now is a bit different from what you're trying to do: Fix
bugs in PHP 4 and get PHP 5 out of the door.All: Maybe one of the PHP maintainers could direct Alexander to some
pressing matters because it would be a shame to waste his energy just
because the list ignores his efforts.
There are 1030 of the them listed on bugs.php.net.
George
----- Original Message -----
From: "George Schlossnagle" george@omniti.com
very quickly. On the other hand I think the focus of people on this
list right now is a bit different from what you're trying to do: Fix
bugs in PHP 4 and get PHP 5 out of the door.
On that note, I'd like to submit a couple patches for review.
I'm tempted to also ask for a CVS account, as I occasionally find/fix bugs in the course of working on a new SAPI I'm developing. Additionally, a later version of my SAPI would benefit from zend_llist's with additional support for sorted lists (e.g. insert into sorted list, find from a sorted list, etc.), which I plan to add as a private patch, unless adopted by the PHP team =). Pragmatically, my project does not depend on adoption, so I might find it simpler to just submit the patches to this list and let the winds carry them ..
Problem
MYSQL_UNIX_ADDR is defined in both php_config.h and MySQL's external mysql_version.h. If --with-mysql-sock is used with PHP's configure script to specify a location different than the location used when compiling MySQL, then these defines are inconsistent.
The mysql and mysqli extensions do not default to the socket specified with --with-mysql-sock when mysql.default_path is not defined in the phpini file. Instead, the mysql extension defaults to using whatever was compiled into the MySQL library. Patches are attached for the mysql extension of php-4.3.7, and both the mysql and mysqli extensions in php-5.0.0RC3.
What is accomplished by the patches?
- fixes the compilation warning: "MYSQL_UNIX_ADDR" redefined
- renamed PHP's conflicting use of 'MYSQL_UNIX_ADDR' to 'PHP_MYSQL_UNIX_ADDR'
- corrected the initialization of the global default_socket in mysql/mysqi extension to properly support the default supplied via --with-mysql-sock
- provided backwardly-compatible default in the event that neither --with-mysql-sock is used when compiling, nor a value is supplied for mysql.default_socket in the phpini file.
- selects default socket in the following priority:
a) uses the value supplied to --with-mysql-sock=
b) first socket found in the sequence of paths below:
/var/run/mysqld/mysqld.sock
/var/tmp/mysql.sock
/var/run/mysql/mysql.sock
/var/lib/mysql/mysql.sock
/var/mysql/mysql.sock
/usr/local/mysql/var/mysql.sock
/Private/tmp/mysql.sock
/private/tmp/mysql.sock
/tmp/mysql.sock
c) the default provided when MySQL was compiled (from mysql_version.h)
Cheers,
Gavin,
nic@vess.com
(do not use my trap address above)
Some bugs have been fixed in this version:
- All [#include] directives moved to the top. Just copy'n'compile sources
to
test it :) - Renamed php_strip_all_tags() to php_strip_tags() with the same
interface as
in the current version. - Fixed php_tag_find(). Allowable tags is case and order insensitive now
- Added new test strings.
Any comments & wishes are welcomed.
==================cut===================
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
//
/ test strings /
//
char s[] = {
"", / empty string /
"a", / one character /
"<", / single < char /
"<i>", / single tag /
"ab", / two chars /
"test<b", / incomplete tag /
"test<b title='asdf ", / incomplete single quotes /
"test<b title="add", / incomplete double quotes /
"test<!-- sdf ", / incomplete comment /
"test<? echo 'hello' ", / incomplete php-tag /
"test<% echo 'hello' ", / incomplete asp-tag /
"test<% $a = 'ss ", / incomplete php-string in single quotes /
"test<?php $a = "12\"3", / incomplete php-string in double quotes
/
"test<? // comment", / incomplete single-line comment /
"test<? # comment ", / incomplete single-line comment /
"test<? / comment\n** 23", /* incomplete multi-line comment /
"test<? $a = ls -l", /* incomplete quotes */ "test<? $a=<<<FOO\nssdf", /* incomplete HEREDOC */ "test<script>if (1<b) alert('<b>ee</b>');", /* incomplete <script> tag */ "test<StYle>div {font-weight:bold; }", /* incomplete <style> tag */ "a< b", /* not a tag */ "t<b>es</b>t", /* simple test */ "te<b title='1 > 2' />st", "<b title=\"1 > 2\">test", "t<b title='1 < 2'/>est", "tes<b title = qwe'rt>t", "t<!-- <<< comment <<< -->est", "<!-- >>> <b>comment</b> <<< -->test", "t<? echo '?>' ?>est", "<?='\"a\\'b' ?>test", "te<% $a = \"?>'%>\"; // comment1\n // comment2 %>st", "t<?php \n # here is comment ?>est", "te<?=\"dd\\\"d'?>d%>d\" ?>st", "<?php $a = <<<END\n
t's"q?>t\nEND;\n ?>test",
"tes<? / co'm\m\"ne\"t \n multi line \n */ test ?>t", "<? print
sd\a'd\\\"d
; ?>test",
"t<scrIpT type =\n 'text/javascript'>if (a <b )
alert('hello')</ScRipt>est",
"te<sTyLe>#a { color: red; }</style>st",
"t<?xml version="1.0"?>est",
"<?xml version="1.'0\" encoding='UT"F-8\' ?>test",
"test<br style="height:100px" /><br>a<br/><br title='a\"'/>b",
NULL
};
char allow = "<bR>,trash<B><DiV>"; / allowable tags /
/**************************************************/
#define PHPAPI
#define PHP_MAX_HEREDOC_LEN 1
#define PHP_MAX_TAG_LEN 32
PHPAPI char *php_strtolower(char *s, size_t len)
{
unsigned char *c, *e;
c = s;
e = c+len;
while (c < e) {
*c = tolower(*c);
c++;
}
return s;
}
/* {{{ php_tag_find
Copies tag [tag_name_begin] with length [tag_name_ptr -
tag_name_begin] to [dst_ptr]
if it is in a set of allowable tags, pointed by [allow] with length
[allow_len]
*/
void php_tag_find(char *allow, size_t allow_len, char *tag_name_begin,
char *tag_name_ptr, char *src_ptr, char **dst_ptr)
{
size_t tag_len, pure_tag_len;
char *tmp_ptr;
int is_end_tag = 0;
tag_len = tag_name_ptr - tag_name_begin;
if (allow_len < 3 || src_ptr - *dst_ptr <= tag_len || tag_len < 1)
return;
if (*tag_name_begin == '/' && tag_len > 1) is_end_tag = 1;
pure_tag_len = is_end_tag ? (tag_len - 1) : tag_len;
static char tag_name[PHP_MAX_TAG_LEN + 3];
tag_name[0] = '<';
memcpy(tag_name + 1, is_end_tag ? (tag_name_begin + 1) :
tag_name_begin, pure_tag_len);
pure_tag_len++;
tag_name[pure_tag_len++] = '>';
tag_name[pure_tag_len] = '\0';
tmp_ptr = strstr(allow, tag_name);
if (tmp_ptr != NULL) {
*(*dst_ptr)++ = '<';
memcpy(*dst_ptr, tag_name_begin, tag_len);
dst_ptr += tag_len;
if ((src_ptr - 2) == ' ') *(dst_ptr)++ = ' ';
if ((src_ptr - 1) == '/') *(*dst_ptr)++ = '/';
*(dst_ptr)++ = '>';
}
}
/ }}} */
PHPAPI size_t php_strip_tags(char *rbuf, int len, int *stateptr, char
*allow, int allow_len)
{
char *src_begin = rbuf,
*src_ptr = rbuf,
*src_end = rbuf + (size_t) len;
char *dst_ptr = rbuf;
int state;
if (stateptr != NULL) state = *stateptr;
else state = 0;
static char tag_name_begin[PHP_MAX_TAG_LEN + 1],
*tag_name_ptr = NULL,
*tag_name_end = NULL;
size_t tag_len = 0;
if (tag_name_ptr == NULL) tag_name_ptr = tag_name_begin;
if (tag_name_end == NULL) tag_name_end = tag_name_begin
- PHP_MAX_TAG_LEN;
static char heredoc_name_begin[PHP_MAX_HEREDOC_LEN + 1],
*heredoc_name_ptr = NULL,
*heredoc_name_end = NULL;
if (heredoc_name_ptr == NULL) heredoc_name_ptr = heredoc_name_begin;
if (heredoc_name_end == NULL) heredoc_name_end = heredoc_name_begin - PHP_MAX_HEREDOC_LEN;
char ch;
php_strtolower(allow, allow_len);
while (src_ptr < src_end) {
ch = src_ptr;
switch (ch) {
case '#' :
switch (state) {
case 4 : state = 18; break;
}
break;
case '-' :
switch (state) {
case 10 :
if ((src_ptr - src_begin) > 2 && (src_ptr - 1) == '-' &&
(src_ptr - 2) == '!' &&
(src_ptr - 3) == '<') state = 9;
break;
}
break;
case '\r' :
case '\n' :
switch (state) {
case 8 :
case 18 : state = 4; break;
case 10 : state = 1; break;
}
break;
case ' ' :
case '\t' :
case '\v' :
case '\f' :
switch (state) {
case 10 : state = 1; break;
}
break;
case '' :
switch (state) {
case 4 : if ((src_ptr - 1) == '/') state = 7; break;
}
break;
case '/' :
switch (state) {
case 4 : if ((src_ptr - 1) == '/') state = 8; break;
case 10 : if ((src_ptr - 1) != '<') state = 1; break;
}
break;
case '\' :
switch (state) {
case 5 :
case 6 :
case 17 :
if (src_ptr < src_end) src_ptr++; break;
}
break;
case '%' :
case '?' :
switch (state) {
case 10 :
if ((src_ptr - 1) == '<' && src_ptr + 1 < src_end &&
tolower((src_ptr + 1)) != 'x') {
if (tag_name_ptr < tag_name_end) tag_name_ptr++ = ch;
state = 4;
}
break;
}
break;
case '<' :
switch (state) {
case 0 : if (src_end - src_ptr > 1 && !isspace((src_ptr - 1))) state = 13; break;
case 4 :
if ((src_ptr + 2 < src_end) && (src_ptr + 1) == '<' &&
(src_ptr + 2) == '<') {
state = 15;
src_ptr += 2;
}
break;
}
break;
case '>' :
switch (state) {
case 4 :
case 8 :
case 18 : if ((src_ptr - 1) == tag_name_begin) state = 14;
break;
case 1 :
case 10 :
if (tag_name_ptr - tag_name_begin == 6 &&
!memcmp(tag_name_begin, "script", 6)) state = 11;
else if (tag_name_ptr - tag_name_begin == 5 &&
!memcmp(tag_name_begin, "style", 5)) state = 12;
else {
php_tag_find(allow, (size_t) allow_len,
tag_name_begin, tag_name_ptr, src_ptr, &dst_ptr);
state = 14;
}
break;
}
break;
case '"' :
switch (state) {
case 1 : if ((src_ptr - 1) == '=' || isspace((src_ptr - 1)))
state = 2; break;
case 4 : state = 5; break;
case 5 : state = 4; break;
}
break;
case ''' :
switch (state) {
case 1 : if ((src_ptr - 1) == '=' || isspace((src_ptr - 1)))
state = 3; break;
case 4 : state = 6; break;
case 6 : state = 4; break;
}
break;
case '' : switch (state) { case 4 : state = 17; break; case 17 : state = 4; break; } break; } switch (state) { case 0 : *dst_ptr++ = ch; break; case 2 : src_ptr++; src_ptr = memchr(src_ptr, '"', src_end - src_ptr); if (src_ptr == NULL) src_ptr = src_end; else state = 1; break; case 3 : src_ptr++; src_ptr = memchr(src_ptr, '\'', src_end - src_ptr); if (src_ptr == NULL) src_ptr = src_end; else state = 1; break; case 7 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '*', src_end - src_ptr); if (src_ptr ==
NULL|| src_end - src_ptr < 2) src_ptr = src_end; else { src_ptr++; if (*src_ptr == '/') break; } } if (src_ptr < src_end) state = 4; break; case 9 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '-', src_end - src_ptr); if (src_ptr ==
NULL|| src_end - src_ptr < 3) src_ptr = src_end; else { src_ptr++; if (*src_ptr == '-' && *(src_ptr + 1) == '>') break; } } if (src_ptr < src_end) { src_ptr++; state = 0; } break; case 10 : if (tag_name_ptr < tag_name_end) *tag_name_ptr++ = tolower(ch); break; case 11 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '<', src_end - src_ptr); if (src_ptr ==
NULL|| src_end - src_ptr < 8) src_ptr = src_end; else { src_ptr++; if (src_ptr[0] == '/' && tolower(src_ptr[1]) == 's' && tolower(src_ptr[2]) == 'c' && tolower(src_ptr[3]) == 'r' && tolower(src_ptr[4]) == 'i' && tolower(src_ptr[5]) == 'p' && tolower(src_ptr[6]) == 't') break; } } if (src_ptr < src_end) { src_ptr += 6; tag_name_ptr = tag_name_begin; state = 1; } break; case 12 : src_ptr++; while (src_ptr < src_end) { src_ptr = memchr(src_ptr, '<', src_end - src_ptr); if (src_ptr ==
NULL` || src_end - src_ptr < 7) src_ptr =
src_end;
else {
src_ptr++;
if (src_ptr[0] == '/' && tolower(src_ptr[1]) == 's' &&
tolower(src_ptr[2]) == 't' &&
tolower(src_ptr[3]) == 'y' && tolower(src_ptr[4])
== 'l' && tolower(src_ptr[5]) == 'e') break;
}
}
if (src_ptr < src_end) {
src_ptr += 5;
tag_name_ptr = tag_name_begin;
state = 1;
}
break;
case 13 :
tag_name_ptr = tag_name_begin;
state = 10;
break;
case 14 : state = 0; break;
case 15 :
src_ptr++;
heredoc_name_ptr = heredoc_name_begin;
while (src_ptr < src_end && (*src_ptr == ' ' || *src_ptr ==
'\t')) src_ptr++;
if (src_ptr < src_end) {
while (src_ptr < src_end && heredoc_name_ptr <
heredoc_name_end &&
isalnum(*src_ptr)) *heredoc_name_ptr++ = *src_ptr++;
if (src_ptr < src_end && isalpha(*heredoc_name_begin)) {
*heredoc_name_ptr++ = '\0';
src_ptr = strstr(src_ptr, heredoc_name_begin);
if (src_ptr == NULL) {
src_ptr = src_end;
state = 16;
} else {
src_ptr += heredoc_name_ptr - heredoc_name_begin;
state = 4;
}
} else state = 4;
}
break;
case 16 :
src_ptr = strstr(src_ptr, heredoc_name_begin);
if (src_ptr == NULL) src_ptr = src_end;
else {
src_ptr += heredoc_name_ptr - heredoc_name_begin;
state = 4;
}
break;
}
src_ptr++;
}
*dst_ptr = '\0';
if (stateptr != NULL) *stateptr = state;
return (size_t) (dst_ptr - src_begin);
}
/***************************************************/
int main(int argc,char *argv[])
{
int i = 0;
char *s1, *allow1;
size_t len_old, len_new, allow_len;
int state;
allow_len = strlen(allow);
allow1 = (char *) malloc(allow_len + 1);
memcpy(allow1, allow, allow_len + 1);
s1 = (char *) malloc(1);
len_old = 0;
s1 = '\0';
while (s[i] != NULL) {
printf("str_num=%d, ", i);
state = 0; / set state to 0 */
len_new = strlen(s[i]);
if (len_new > len_old) s1 = (char *) realloc(s1, len_new + 1);
strcpy(s1, s[i]);
// printf("src=[%s], ", s1);
len_old = php_strip_tags(s1, len_new, &state, allow1, allow_len);
printf("dst=[%s], src_len=%d, dst_len=%d, state=%d\n", s1,
len_new, len_old, state);
len_old = len_new;
i++;
}
free(s1);
free(allow1);
return 0;
}
==================cut===================
On Fri, 11 Jun 2004 15:37:32 +0300
"Alexander Valyalkin" valyala@tut.by wrote:
Some bugs have been fixed in this version:
- All [#include] directives moved to the top. Just copy'n'compile
sources to
test it :)- Renamed php_strip_all_tags() to php_strip_tags() with the same
interface as
in the current version.- Fixed php_tag_find(). Allowable tags is case and order insensitive
now 4) Added new test strings.
Maybe it would be better if you join Russian docs team instead of
patching everything, that works ok ?
We still need help, though.
Btw, there are a lot of REAL bugs, awaiting for their fixes.
Just take a look at http://bugs.php.net.
WBR,
Antony Dovgal aka tony2001
tony2001@phpclub.net || antony@dovgal.com