base64_decode is buggy, what to fix?

9 years ago by lauri.kentta@gmail.com — view source

unread

Hello, Internals!

I was fixing #72152 when it became apparent that the base64_decode
function is very buggy.

Null byte ends processing.
"V" produces empty result, while "V=" fails. Not very logical.
Too short padding is allowed, e.g. "VV=" works like "VV==".
Extra padding is allowed (like "V=====").
Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the
second place of a 24-bit run ("V=VV=" fails).
In strict mode, space between padding fails:
"V V==" and "VV ==" and "VV== " are allowed,
"VV= =" fails.
In strict mode, after a padding, one character is skipped, so "VVV=V"
decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead
of failing.

For each of the above, what would be the preferred behaviour in default
mode and strict mode?

Affected existing tests:

ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an
invalid base64 string, based on the invalid padding.
ext/standard/tests/file/stream_rfc2397_006.phpt tests
"#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is
clearly not valid base64. This also raises a question whether fragments
should be skipped in data uri handling.

Suggestions?

I've created a bug-for-bug compatible rewrite of base64_decode [1], with
all the bugs neatly and specifically implemented and missing features
commented out, so it's now very simple to fix them one by one.

I've also attached a test script that tests "all" possible combinations
of data, padding, NUL and other invalid characters, and my first patch
indeed provides identical results to the old implementation.

Currently interesting lines in the test results:
'base64' 'default' 'strict'
'V' '' ''
'V=' (false) (false)
'VV=' 'U' 'U'
'VV==' 'U' 'U'
'V=====' (false) (false)
'=VVV=' 'UU' (false)
'VV=V=' 'UU' (false)
'VVV==' 'UU' 'UU'
'V=VV=' (false) (false)
'V V==' 'U' 'U'
'VV ==' 'U' 'U'
'VV== ' 'U' 'U'
'VV= =' 'U' (false)
'VVV=V' 'UUU' 'UU'
'VVVV=' 'UUU' 'UUU'
'VVVVVV=V' 'UUUUU' 'UUUU'
'VVVVVV=' 'UUUU' 'UUUU'
'VVVV===' 'UUU' 'UUU'
'VVV====V' 'UUU' 'UU'
'VVV====' 'UU' 'UU'
'VV=====V' 'UU' 'U'
'VV=====' 'U' 'U'
'=======' '' ''

--
Lauri Kenttä

9 years ago by Scott Arciszewski — view source

unread

Hello, Internals!

I was fixing #72152 when it became apparent that the base64_decode function
is very buggy.

Null byte ends processing.

"V" produces empty result, while "V=" fails. Not very logical.

Too short padding is allowed, e.g. "VV=" works like "VV==".

Extra padding is allowed (like "V=====").

Invalid padding is allowed ("=VVV=", "VV=V=", "VVV==") except on the
second place of a 24-bit run ("V=VV=" fails).

In strict mode, space between padding fails:
"V V==" and "VV ==" and "VV== " are allowed,
"VV= =" fails.

In strict mode, after a padding, one character is skipped, so "VVV=V"
decodes to "UU" (should be "UUU"), and "VVVV=*" decodes to "UUU" instead of
failing.

For each of the above, what would be the preferred behaviour in default mode
and strict mode?

Affected existing tests:

ext/openssl/tests/bug61124.phpt uses "kzo w2RMExUTYQXW2Xzxmg==" as an
invalid base64 string, based on the invalid padding.

ext/standard/tests/file/stream_rfc2397_006.phpt tests
"#Zm9vYmFyIGZvb2Jhcg==" and excepts this to be valid, while "#" is clearly
not valid base64. This also raises a question whether fragments should be
skipped in data uri handling.

Suggestions?

I've created a bug-for-bug compatible rewrite of base64_decode [1], with all
the bugs neatly and specifically implemented and missing features commented
out, so it's now very simple to fix them one by one.

I've also attached a test script that tests "all" possible combinations of
data, padding, NUL and other invalid characters, and my first patch indeed
provides identical results to the old implementation.

Currently interesting lines in the test results:
'base64' 'default' 'strict'
'V' '' ''
'V=' (false) (false)
'VV=' 'U' 'U'
'VV==' 'U' 'U'
'V=====' (false) (false)
'=VVV=' 'UU' (false)
'VV=V=' 'UU' (false)
'VVV==' 'UU' 'UU'
'V=VV=' (false) (false)
'V V==' 'U' 'U'
'VV ==' 'U' 'U'
'VV== ' 'U' 'U'
'VV= =' 'U' (false)
'VVV=V' 'UUU' 'UU'
'VVVV=' 'UUU' 'UUU'
'VVVVVV=V' 'UUUUU' 'UUUU'
'VVVVVV=' 'UUUU' 'UUUU'
'VVVV===' 'UUU' 'UUU'
'VVV====V' 'UUU' 'UU'
'VVV====' 'UU' 'UU'
'VV=====V' 'UU' 'U'
'VV=====' 'U' 'U'
'=======' '' ''

--
Lauri Kenttä

I'm surprised to see that these functions weren't heavily tested.

I've included some of the RFC 4648 test vectors in my
cache-timing-safe userland implementation at
https://github.com/paragonie/constant_time_encoding -- maybe this
would be useful for base64_decode() as well?

Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises <https://paragonie.com

9 years ago by Yasuo Ohgaki — view source

unread

Hi Lauri,

Suggestions?

IMHO

Return FALSE for any invalid input when strict mode is on.
Enable strict mode by default.
Keep current behavior when strict mode is off.

would be the best.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

9 years ago by Joe Watkins — view source

unread

Morning,

Would it be possible to open a PR that fixes the programming errors,

such as oob read, and the strange flow of the subsequent block ?

Assuming those fixes don't effect normal usage, I see no reason why

that can't be merged immediately.

When it comes to changing the behaviour, that needs a little more

discussion I think, and I wait to see what others think about those changes.

Cheers
Joe

Hi Lauri,

On Sun, May 22, 2016 at 7:56 PM, Lauri Kenttä lauri.kentta@gmail.com
wrote:

Suggestions?

IMHO

Return FALSE for any invalid input when strict mode is on.
Enable strict mode by default.
Keep current behavior when strict mode is off.

would be the best.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

9 years ago by Niklas Keller — view source

unread

Completely missing padding shouldn't fail, it's often removed to save space
or when base64 is converted to base64url.

Joe Watkins pthreads@pthreads.org schrieb am Mo., 23. Mai 2016 06:59:

Morning,
Would it be possible to open a PR that fixes the programming errors,
such as oob read, and the strange flow of the subsequent block ?
Assuming those fixes don't effect normal usage, I see no reason why
that can't be merged immediately.
When it comes to changing the behaviour, that needs a little more
discussion I think, and I wait to see what others think about those
changes.

Cheers
Joe

Hi Lauri,

On Sun, May 22, 2016 at 7:56 PM, Lauri Kenttä lauri.kentta@gmail.com
wrote:

Suggestions?

IMHO

Return FALSE for any invalid input when strict mode is on.
Enable strict mode by default.
Keep current behavior when strict mode is off.

would be the best.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

9 years ago by lauri.kentta@gmail.com — view source

unread

Thanks for all the comments.

Update:

Null byte ends processing.

In strict mode, space between padding fails

In strict mode, after a padding, one character is skipped

https://github.com/php/php-src/pull/1923

Too short padding is allowed, e.g. "VV=" works like "VV==".

Extra padding is allowed (like "V=====").

WONTFIX / NOTABUG. This doesn't really cause any problems.

"V" produces empty result, while "V=" fails. Not very logical.

Any comments?
I'd prefer failing in both cases. 6 bits out of 24 is invalid
and might mean that the data is truncated by accident.

Invalid padding is allowed ("=VVV=", "VV=V=")

Any comments? Strict mode at least gets this one right.
It's really sad if someone relies on this "feature".

--
Lauri Kenttä

base64_decode is buggy, what to fix?

-- Lauri Kenttä

--
Lauri Kenttä