Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115781 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 2531 invoked from network); 23 Aug 2021 20:38:23 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 23 Aug 2021 20:38:23 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 228061804B3 for ; Mon, 23 Aug 2021 14:12:09 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS14061 82.196.0.0/20 X-Spam-Virus: No X-Envelope-From: Received: from mobilejoomla.com (mobilejoomla.com [82.196.7.134]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 23 Aug 2021 14:12:08 -0700 (PDT) Received: from [192.168.88.251] (95-24-2-62.broadband.corbina.ru [95.24.2.62]) by mobilejoomla.com (Postfix) with ESMTPSA id 69ED22923 for ; Mon, 23 Aug 2021 21:12:07 +0000 (UTC) To: internals@lists.php.net Message-ID: <3cd341d8-0572-fabf-4ec7-687195547e87@mobilejoomla.com> Date: Tue, 24 Aug 2021 00:12:06 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Subject: Proposal to a few fixes/improvements in the ini parser From: denis@mobilejoomla.com (Denis Ryabov) Hello internals, I'd like to discuss some issues related to escaping of characters in the ini parser (the lexer to be precise). 1. Currently double-quoted strings are processed twice: first time in the [^] lexer rule (to get string length), and then in the zend_ini_escape_string function (to create string by processing all escape sequences). The problem is that strings are processed differently: lexer rule uses a look-behind approach to check double quote character is escaped, and zend_ini_escape_string skips escaped characters in a usual way (skip-next-char approach, like in PHP's strings parser). As a result there are the following issue: In some cases there is no way to escape final backslash in a string, e.g. in the case of string followed by anything except of linebreak: KEY1 = "prefix\\" ; Warning: syntax error, unexpected end of file, expecting TC_DOLLAR_CURLY or TC_QUOTED_STRING or '"' KEY2 = "prefix\\" ACONST I'd switch to a PHP-way and require to escape each of special chars (", $, \) in a usual (skip-next-char) way, without look-behind approach. It may lead to a backward incompatibility to a code that use a sequence like \\" (instead of \\\") to get backslash followed by double quote, but I'm not sure it's widely used in the wild (moreover, this point is not explained in PHP docs, so none can rely on such a behavior). 2. In the [^] lexer rule, the token is processed starting from YYCURSOR position instead of yytext, as a result the first character is not taken into account. In turn, it lead to no way to escape the leading dollar character followed by open curvy brace: KEY = "\${" ; Warning: syntax error, unexpected end of file, expecting TC_VARNAME 3. Also I'd like to note that currently ini parser doesn't support standard escape sequences (\n, \t, etc.), though from official PHP docs (https://www.php.net/manual/en/function.parse-ini-file.php) one may expect it should be supported: ; \ is used to escape a value. newline_is = "\\n" ; results in the string "\n", not a newline character. It seems to be easy to fix/implement above-mentioned things (I'll send a PR in the case of no disagreement). So, how would you rate this idea on the following scale (1-5)? 1) It's not necessary at all, let's keep current ini lexer as is. 2) Let's require escaping of special characters (", \, $) only in a uniform (skip-next-char) way. 3) Above with support of \t, \n, \v, \f, \r, \e sequences. 4) Above with support of \123 (octal) and \xAB (hex) charcodes. 5) Above with support of \u{12AB} (unicode hex codepoints); actually I'd not like to implement it because I don't know how to deal with partial contents like KEY = "\u{" (PHP stops with "Parse error: Invalid UTF-8 codepoint escape sequence", but I'm not sure the ini parser should follow this rule). Any comments are welcome. Best regards, Denis Ryabov