Simple variable handling.

8 years ago by Lester Caine — view source

unread

People keep complaining that I do not contribute any proposals to
improve PHP, which to some extent s correct. Except the one thing that I
keep trying to get a handle on is tidying validating of the basic
variables that are the heart of PHP.

validate_var_array() is a case in point, since ALL it should do is
handle an array of simple variables for which we can define REAL
validation rules rather than just a very restricted 'type' rule.
Massaging the way the content of a variable is presented is another part
of the basic functions of handling a variable, and simply providing an
escape option which can be set as part of the variable rules set
eliminates the need for 'New operator (short tag) for context-dependent
escaping' and similar tangential matters. If we have a set of rules
wrapping a variable then everything else just follows on, and the SQL
domain model allows a group of variables to take an identical se of rules.

These are the sorts of things any decent user world library can and does
provide, but if the clock was rolled back prior to all the trouble
created by 'strict typing' and we started again with a more well defined
simple variable I'm sure that much of the conflict could have been
resolved by allowing full validation checks to control an error or
exception depending on the 'style' of PHP a programmer prefers.

If a function is going to return a variable and that variable has under
the hood a set of validation rules, then one can return an error if the
result is faulty. Or even allow a NULL return if a valid answer is not
available ... if that is the style of programming one prefers.
Exceptions handle unmanaged errors, while proper program flow handles
managed ones!

Wrap these intelligent variables inside a class and one can create more
powerful objects but ones which still use all the basic functionality.
Similarly an array of them can be asked to provide a simple 'yes/no' if
all of the variables pass their validation check, or an array of
elements which need processing.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Bishop Bettini — view source

unread

People keep complaining that I do not contribute any proposals to
improve PHP, which to some extent s correct. Except the one thing that I
keep trying to get a handle on is tidying validating of the basic
variables that are the heart of PHP.

validate_var_array() is a case in point, since ALL it should do is
handle an array of simple variables for which we can define REAL
validation rules rather than just a very restricted 'type' rule.
Massaging the way the content of a variable is presented is another part
of the basic functions of handling a variable, and simply providing an
escape option which can be set as part of the variable rules set
eliminates the need for 'New operator (short tag) for context-dependent
escaping' and similar tangential matters. If we have a set of rules
wrapping a variable then everything else just follows on, and the SQL
domain model allows a group of variables to take an identical se of rules.

These are the sorts of things any decent user world library can and does
provide, but if the clock was rolled back prior to all the trouble
created by 'strict typing' and we started again with a more well defined
simple variable I'm sure that much of the conflict could have been
resolved by allowing full validation checks to control an error or
exception depending on the 'style' of PHP a programmer prefers.

If a function is going to return a variable and that variable has under
the hood a set of validation rules, then one can return an error if the
result is faulty. Or even allow a NULL return if a valid answer is not
available ... if that is the style of programming one prefers.
Exceptions handle unmanaged errors, while proper program flow handles
managed ones!

Wrap these intelligent variables inside a class and one can create more
powerful objects but ones which still use all the basic functionality.
Similarly an array of them can be asked to provide a simple 'yes/no' if
all of the variables pass their validation check, or an array of
elements which need processing.

Do you mean attaching a functional validator to a variable, something like
this hypothetical code? (Note the 3rd argument to settype):

// $_POST['age'] = 27;
// $_POST['name'] = 'Sugah Pop';

try {
settype($_POST['age'], 'int', 'is_int');
settype($_POST['name'], 'string', function ($name) { return
strlen($name) < 255; });
} catch (\TypeError $er) {
die($er->getMessage());
}

// ... Later:

$_POST['name'] = str_repeat('a', 1024);
// Throws \TypeError: "Invalid value set into variable in demo.php on line
14"

Or same idea, but used as "smarter" formal argument validators:

function do_something($age, $name) {
settype($age, 'int', 'is_int');
settype($name, 'string', function ($name) { return strlen($name) < 255;
});
// $age and $name now have persistent validator rules attached to them
// write operations onto the variable will assert the truth of the
// validator before assigning. Obviously, there is a run-time cost.
}

8 years ago by michal@brzuchalski.com — view source

unread

You want to stick such validation at runtime at any time with variable and
throwing \TypeError at any time constraint is broken - wouldn't it cause of
throwing much more unexpected exceptions during runtime?
Imagine you'll be passing such variable with constraint into some object
who operates on it and it should expect \TypeError at any time because you
newer know what sort of constraint and optional validation callback is
sticked to variable!

I think using constraints annotation style in OO would be much more
readable just like using constraint beans in Java
https://docs.oracle.com/cd/E19798-01/821-1841/gircz/index.html but that's
not the subject of discussion.

But that's only my humble opinion.

2016-08-11 3:44 GMT+02:00 Bishop Bettini bishop@php.net:

People keep complaining that I do not contribute any proposals to
improve PHP, which to some extent s correct. Except the one thing that I
keep trying to get a handle on is tidying validating of the basic
variables that are the heart of PHP.

validate_var_array() is a case in point, since ALL it should do is
handle an array of simple variables for which we can define REAL
validation rules rather than just a very restricted 'type' rule.
Massaging the way the content of a variable is presented is another part
of the basic functions of handling a variable, and simply providing an
escape option which can be set as part of the variable rules set
eliminates the need for 'New operator (short tag) for context-dependent
escaping' and similar tangential matters. If we have a set of rules
wrapping a variable then everything else just follows on, and the SQL
domain model allows a group of variables to take an identical se of
rules.

These are the sorts of things any decent user world library can and does
provide, but if the clock was rolled back prior to all the trouble
created by 'strict typing' and we started again with a more well defined
simple variable I'm sure that much of the conflict could have been
resolved by allowing full validation checks to control an error or
exception depending on the 'style' of PHP a programmer prefers.

If a function is going to return a variable and that variable has under
the hood a set of validation rules, then one can return an error if the
result is faulty. Or even allow a NULL return if a valid answer is not
available ... if that is the style of programming one prefers.
Exceptions handle unmanaged errors, while proper program flow handles
managed ones!

Wrap these intelligent variables inside a class and one can create more
powerful objects but ones which still use all the basic functionality.
Similarly an array of them can be asked to provide a simple 'yes/no' if
all of the variables pass their validation check, or an array of
elements which need processing.

Do you mean attaching a functional validator to a variable, something like
this hypothetical code? (Note the 3rd argument to settype):

// $_POST['age'] = 27;
// $_POST['name'] = 'Sugah Pop';

try {
settype($_POST['age'], 'int', 'is_int');
settype($_POST['name'], 'string', function ($name) { return
strlen($name) < 255; });
} catch (\TypeError $er) {
die($er->getMessage());
}

// ... Later:

$_POST['name'] = str_repeat('a', 1024);
// Throws \TypeError: "Invalid value set into variable in demo.php on line
14"

Or same idea, but used as "smarter" formal argument validators:

function do_something($age, $name) {
settype($age, 'int', 'is_int');
settype($name, 'string', function ($name) { return strlen($name) < 255;
});
// $age and $name now have persistent validator rules attached to them
// write operations onto the variable will assert the truth of the
// validator before assigning. Obviously, there is a run-time cost.
}

--
pozdrawiam

Michał Brzuchalski

8 years ago by Lester Caine — view source

unread

You want to stick such validation at runtime at any time with variable and
throwing \TypeError at any time constraint is broken - wouldn't it cause of
throwing much more unexpected exceptions during runtime?
Imagine you'll be passing such variable with constraint into some object
who operates on it and it should expect \TypeError at any time because you
newer know what sort of constraint and optional validation callback is
sticked to variable!

Now this is where the fundamental difference in styles comes in.
PERSONALLY I would not be looking to throw exceptions at all. The whole
point of validation is to handle any validation error ... and it is an
error not an exception.

We have had the discussions on annotations and currently even historic
legacy systems still use docblock annotation to provide data that is not
handled in the core. The new 'styles' of adding this don't really
address making this area integral within the core, just as adding fancy
eyecandy to flag int or string does very little to the core validation
problems.

setannot( $age, 'number', range(1,120) ); // Fractional Years
or
$age->setannot( 'number', range(1,120) );

Both make perfect sense in my style of programming PHP, along with such
things as setaccess( 'public_readonly', 'no_write' );

This also comes in line with 'is_set' or 'is_valid'.
if ( !is_valid( $age ) { message( $age, 'no_valid' ); }
Rather than trying to capture some exception that was created out of line.

In my book if you want to produce compact optimised code then you should
be using C directly and the compiler optimizations. PHP needs a well
structured and compact runtime engine that handles simple objects such
as '$age' and every other variable using the ONE set of resident
optimised code, but it seems that everybody expects the runtime process
to duplicate copies of that code for every variable and then optimize
each to trim elements that are not needed for that particular use of the
variable? Why is it not simply a case that the table of variables simply
passes a pointer to the variable code to process an operation like
$age->setannot( 'number', range(1,120) ); at which point setannot( $age,
'number', range(1,120) ); makes perfect sense since $age is simply
passed to the global library of code.

--
Lester Caine - G8HFL

8 years ago by Niklas Keller — view source

unread

2016-08-11 14:42 GMT+02:00 Lester Caine lester@lsces.co.uk:

You want to stick such validation at runtime at any time with variable
and
throwing \TypeError at any time constraint is broken - wouldn't it cause
of
throwing much more unexpected exceptions during runtime?
Imagine you'll be passing such variable with constraint into some object
who operates on it and it should expect \TypeError at any time because
you
newer know what sort of constraint and optional validation callback is
sticked to variable!

Now this is where the fundamental difference in styles comes in.
PERSONALLY I would not be looking to throw exceptions at all. The whole
point of validation is to handle any validation error ... and it is an
error not an exception.

If not by using exceptions, how would you handle them if you assign such
checks to variables and assign a wrong value?

Regards, Niklas

We have had the discussions on annotations and currently even historic
legacy systems still use docblock annotation to provide data that is not
handled in the core. The new 'styles' of adding this don't really
address making this area integral within the core, just as adding fancy
eyecandy to flag int or string does very little to the core validation
problems.

setannot( $age, 'number', range(1,120) ); // Fractional Years
or
$age->setannot( 'number', range(1,120) );

Both make perfect sense in my style of programming PHP, along with such
things as setaccess( 'public_readonly', 'no_write' );

This also comes in line with 'is_set' or 'is_valid'.
if ( !is_valid( $age ) { message( $age, 'no_valid' ); }
Rather than trying to capture some exception that was created out of line.

In my book if you want to produce compact optimised code then you should
be using C directly and the compiler optimizations. PHP needs a well
structured and compact runtime engine that handles simple objects such
as '$age' and every other variable using the ONE set of resident
optimised code, but it seems that everybody expects the runtime process
to duplicate copies of that code for every variable and then optimize
each to trim elements that are not needed for that particular use of the
variable? Why is it not simply a case that the table of variables simply
passes a pointer to the variable code to process an operation like
$age->setannot( 'number', range(1,120) ); at which point setannot( $age,
'number', range(1,120) ); makes perfect sense since $age is simply
passed to the global library of code.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

8 years ago by Lester Caine — view source

unread

Now this is where the fundamental difference in styles comes in.

PERSONALLY I would not be looking to throw exceptions at all. The whole
point of validation is to handle any validation error ... and it is an
error not an exception.

If not by using exceptions, how would you handle them if you assign such
checks to variables and assign a wrong value?

This is where this 'strict' over 'weak' discussion starts. I have no
doubt that for some people a 'strict' switch would enable throwing
exceptions every time something that failed was actioned. While I'm
still in the camp where
if ( !$age->set( $result_set['age'] ) ) { // handle error }
else { // next step }
is a lot more readable than chaining everything together and building
some complex exception handler to process something which is much tidier
to handle in line ...

Much of the proposed style sugar only works for a completely different
style of working. I have yet to be convinced that re-factoring years of
code which naturally follows an in-line coding style would gain anything
from being reworked to the 'exception' style, but exceptions are being
added to the engine which may well be trigger by legacy styles and which
therefore have to be handled to keep code moving forward. Even something
as simple as reading each line of a CSV file and processing it until an
error is detected - such as end-of-file - requires a different mindset
if the errors throw you out of the data handling loop because of a
validation error.

Claiming that "You do not have to use the new style" misses the point
that often one has no control on where that new style impinges on legacy
code. Not least where third party code is an integral part of the
infrastructure. Even where PDO chaining has been used in third party
libraries, one invariably ends up pulling that apart to allow in-line
responses rather than remote exception handling. PDO is an area where
the correct handling of data validation would benefit from a clean built
in validation process. Rather than throwing some array handling
validator with an isolated set of rules at the problem post reading the
data.

--
Lester Caine - G8HFL

8 years ago by Rowan Collins — view source

unread

If not by using exceptions, how would you handle them if you assign such
checks to variables and assign a wrong value?

if ( !$age->set( $result_set['age'] ) ) { // handle error }
else { // next step }

So, cutting out the commentary about styles of coding, the answer to
Niklas's question is "I would replace the assignment with a function or
method that returned success based on validation result"?

To combine some of your examples:

// Set up the constraints:
$age->setannot( 'number', range(1,120) );

// Later:
$valid = $age->set( 'not a number' );
// or:
$age = $result_set['age'];
$valid = $age->is_valid();

This would result in $valid being false, but $age presumably still being
the string 'not a number'? Or perhaps still being null, as though the
assignment hadn't happened?

It seems like it would be more useful if the user didn't have to
remember to call is_valid() on each variable before using it. That means
having either the write operation, or subsequent read operations "fail"
in some way.

$age_at_next_birthday = $age + 1; // reads from $age; is $age valid?

Throwing exceptions is a way of making operations like this "fail",
rather than propogating the bad data to other parts of the program.

Does this look like the kind of thing you were imagining?

Regards,

Rowan Collins
[IMSoP]

8 years ago by Lester Caine — view source

unread

If not by using exceptions, how would you handle them if you assign
such
checks to variables and assign a wrong value?

if ( !$age->set( $result_set['age'] ) ) { // handle error }
else { // next step }

So, cutting out the commentary about styles of coding, the answer to
Niklas's question is "I would replace the assignment with a function or
method that returned success based on validation result"?

To combine some of your examples:

// Set up the constraints:
$age->setannot( 'number', range(1,120) );

// Later:
$valid = $age->set( 'not a number' );
// or:
$age = $result_set['age'];
$valid = $age->is_valid();

This would result in $valid being false, but $age presumably still being
the string 'not a number'? Or perhaps still being null, as though the
assignment hadn't happened?

It seems like it would be more useful if the user didn't have to
remember to call is_valid() on each variable before using it. That means
having either the write operation, or subsequent read operations "fail"
in some way.

$age_at_next_birthday = $age + 1; // reads from $age; is $age valid?

Throwing exceptions is a way of making operations like this "fail",
rather than propogating the bad data to other parts of the program.

Does this look like the kind of thing you were imagining?

For an 'exception' model of PHP then I have no problem with actions
throwing exceptions, but from a 'work flow' model then replacing the
exceptions with checks simply works in my model. The question is not how
you flag an error, but rather when do you check for one. If the 'load'
function from a database record or the populate from a web form results
in $age not being valid one handles that situation based on the data
model. If you are propagating that data after validating has failed then
the program flow is wrong and adding some exception when you use the
duff data later does nothing to help?

--
Lester Caine - G8HFL

8 years ago by Rowan Collins — view source

unread

The question is not how
you flag an error, but rather when do you check for one. If the 'load'
function from a database record or the populate from a web form results
in $age not being valid one handles that situation based on the data
model. If you are propagating that data after validating has failed then
the program flow is wrong and adding some exception when you use the
duff data later does nothing to help?

Fair enough, so in summary, what you're looking for is a more
comprehensive and/or user-friendly version of the filter_* functions?

That is, methods to:

associate validation rules to a variable/array key/object property
manually check whether the validation rules for one or several
variables currently pass

But you are not particularly interested in language-level enforcement or
tracking of whether and when these validation rules have been checked,
because you want to insert the validation at specific points in the
workflow.

Is that an accurate summary?

Regards,

Rowan Collins
[IMSoP]

8 years ago by michal.brzuchalski@gmail.com — view source

unread

Wgat about static analysis and IDE support? They probably can handle all
those sugarcandies because tgey are sticjed to variable but not with any
dynamic rules procedural style. Am I right?
11.08.2016 17:17 "Rowan Collins" rowan.collins@gmail.com napisał(a):

The question is not how
you flag an error, but rather when do you check for one. If the 'load'
function from a database record or the populate from a web form results
in $age not being valid one handles that situation based on the data
model. If you are propagating that data after validating has failed then
the program flow is wrong and adding some exception when you use the
duff data later does nothing to help?

Fair enough, so in summary, what you're looking for is a more
comprehensive and/or user-friendly version of the filter_* functions?

That is, methods to:

associate validation rules to a variable/array key/object property

manually check whether the validation rules for one or several variables
currently pass

But you are not particularly interested in language-level enforcement or
tracking of whether and when these validation rules have been checked,
because you want to insert the validation at specific points in the
workflow.

Is that an accurate summary?

Regards,

Rowan Collins
[IMSoP]

8 years ago by Rowan Collins — view source

unread

Wgat about static analysis and IDE support? They probably can handle all
those sugarcandies because tgey are sticjed to variable but not with any
dynamic rules procedural style. Am I right?

Yep, that's a good point, add "machine-friendly" alongside
"user-friendly" in my previous mail. :)

That is, if there is a comprehensive list of standard filters, IDEs and
other tools can build them in and give useful hints. I guess it might
also be possible to have an inspection of "validation rule assigned to
variable appears to never be checked" in some cases.

In case it's not clear, I would welcome a smarter syntax to replace
filter_* functions, as I think they're pretty unreadable as is.

Regards,

Rowan Collins
[IMSoP]

8 years ago by Lester Caine — view source

unread

The question is not how
you flag an error, but rather when do you check for one. If the 'load'
function from a database record or the populate from a web form results
in $age not being valid one handles that situation based on the data
model. If you are propagating that data after validating has failed then
the program flow is wrong and adding some exception when you use the
duff data later does nothing to help?

Fair enough, so in summary, what you're looking for is a more
comprehensive and/or user-friendly version of the filter_* functions?

That is, methods to:

associate validation rules to a variable/array key/object property

manually check whether the validation rules for one or several
variables currently pass

But you are not particularly interested in language-level enforcement or
tracking of whether and when these validation rules have been checked,
because you want to insert the validation at specific points in the
workflow.

Is that an accurate summary?

No ... add more than just validation rules. Include facilities such as
escape rules, display rules and variable specific material such as error
messages.

Just how the rules are enforced needs to be flexible and such things as
'required' will vary how the results are interpreted. I don't see that a
dynamic scripting language is improved by 'language-level enforcement'
when the bulk of the data IS flexible. It is the flexibility to include
or exclude checks as required and while a sub-set of code may work
better with a fixed set of rules defined at 'design time', just as much
code will consist of elements of the data set that will be optional or
variable. 'is_valid' on an object with a large set of variables is an
equally valid check and if each variable has the autonomy to return it's
state depending on other variables, then only the currently 'required'
variables need to be set and validated.

I AM interested in tracking the state of each variable but the validity
of it's state may depend on the specific state of other elements and a
failure to set a variable depends on the particular workflow of the
whole data set. So creating a workflow that allows different paths based
on the results of validation is a lot more flexible than simply calling
'validate' on a static array of variables?

But to keep the 'exception' camp happy, there is no reason that the
'strict' mode can't return exceptions while the 'weak' mode allows the
error to simply select an alternate path through the code.

--
Lester Caine - G8HFL

8 years ago by Fleshgrinder — view source

unread

No ... add more than just validation rules. Include facilities such as
escape rules, display rules and variable specific material such as error
messages.

Just how the rules are enforced needs to be flexible and such things as
'required' will vary how the results are interpreted. I don't see that a
dynamic scripting language is improved by 'language-level enforcement'
when the bulk of the data IS flexible. It is the flexibility to include
or exclude checks as required and while a sub-set of code may work
better with a fixed set of rules defined at 'design time', just as much
code will consist of elements of the data set that will be optional or
variable. 'is_valid' on an object with a large set of variables is an
equally valid check and if each variable has the autonomy to return it's
state depending on other variables, then only the currently 'required'
variables need to be set and validated.

I AM interested in tracking the state of each variable but the validity
of it's state may depend on the specific state of other elements and a
failure to set a variable depends on the particular workflow of the
whole data set. So creating a workflow that allows different paths based
on the results of validation is a lot more flexible than simply calling
'validate' on a static array of variables?

But to keep the 'exception' camp happy, there is no reason that the
'strict' mode can't return exceptions while the 'weak' mode allows the
error to simply select an alternate path through the code.

What you are describing are classes. You can achieve all of that with them.

You are completely right btw. that exceptions should not be used during
validation and for flow control.

--
Richard "Fleshgrinder" Fussenegger

8 years ago by Lester Caine — view source

unread

What you are describing are classes. You can achieve all of that with them.

You are completely right btw. that exceptions should not be used during
validation and for flow control.

We have said all along that all of this can be done in classes, and yes
a variable is simply a small class. So why not create a generic variable
which has all the current hard coded functions as well as the rules to
carry out proper type management, secure escaping and machine readable
attributes to allow IDE's and other external code to standardise
handling them.

I'm more than happy with docblock attributes and others have proposed
alternate coding styles, but this IS simple if every variable is a well
defined object? Even a simple 'int' variable is more than just a C word.
It needs a much larger data packet with it's name and other settings,
and adding range checking expands that packet.

--
Lester Caine - G8HFL

8 years ago by Rowan Collins — view source

unread

No ... add more than just validation rules. Include facilities such as
escape rules, display rules and variable specific material such as error
messages.

Just how the rules are enforced needs to be flexible and such things as
'required' will vary how the results are interpreted.

OK, so, again, you've mentioned plenty of the why, but what do you
picture to be the how?

When would these various rules be applied? What might the code look like
that relied on them?

You've mentioned a lot about flexibility, and that the feature could be
used in multiple styles, but some concrete examples of how you would
use it might help define what the feature needs to do (and not do).

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Lester Caine — view source

unread

You've mentioned a lot about flexibility, and that the feature could be
used in multiple styles, but some concrete examples of how you would
use it might help define what the feature needs to do (and not do).

Currently my code has lots of checks for constraints and that is hard
coded. Docblock helps to provide documentation and help in the IDE and
there is no fundamental reason to change anything ... except. Small
elements of the constraint process are being introduced into the
process. You can now complain if the variable is not an integer but it
does not remove the need to still check if the integer is valid. There
have been various discussions on how the rules for that extra step could
be added to PHP and in my book that has been there for years in the
docblocks, but other layers are being proposed to add them, and Yasuo is
now hiding them in his validate functions, so why not SIMPLY add a set
of functions to variables to allow those rules to be freely available
and managed on a variable by variable basis. The validate array function
would then simply iterate over a cleanly defined set of variables? Or
each variable can be managed in it's own right.

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.

--
Lester Caine - G8HFL

8 years ago by michal@brzuchalski.com — view source

unread

2016-08-12 10:19 GMT+02:00 Lester Caine lester@lsces.co.uk:

You've mentioned a lot about flexibility, and that the feature could be
used in multiple styles, but some concrete examples of how you would
use it might help define what the feature needs to do (and not do).

Currently my code has lots of checks for constraints and that is hard
coded. Docblock helps to provide documentation and help in the IDE and
there is no fundamental reason to change anything ... except. Small
elements of the constraint process are being introduced into the
process. You can now complain if the variable is not an integer but it
does not remove the need to still check if the integer is valid. There
have been various discussions on how the rules for that extra step could
be added to PHP and in my book that has been there for years in the
docblocks, but other layers are being proposed to add them, and Yasuo is
now hiding them in his validate functions, so why not SIMPLY add a set
of functions to variables to allow those rules to be freely available
and managed on a variable by variable basis. The validate array function
would then simply iterate over a cleanly defined set of variables? Or
each variable can be managed in it's own right.

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.

Because escaping can differ in context of usage? (html, css, js, xml, json.
etc.)
IMHO escaping pinned to variable is just bad idea.

In case of $var->setReadOnly() you will change variable behavior at runtime,
so it's behavior differs from one before that method run and after, it's
about consistency!

About ReadOnly I'm currently working on immutable feature proposal where
could be
possible to mark class and properties as immutable (ReadOnly after
initialisation)
so it could solve problem with ReadOnly ValueObjects and immutable
properties
which are simple primitives.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

--

--
pozdrawiam

Michał Brzuchalski

8 years ago by Lester Caine — view source

unread

About ReadOnly I'm currently working on immutable feature proposal where
could be
possible to mark class and properties as immutable (ReadOnly after
initialisation)
so it could solve problem with ReadOnly ValueObjects and immutable
properties
which are simple primitives.

This all comes back to the basic variables ;)

I can see the attraction of making an object 'immutable' which may be a
set of readonly variables. Initializing that obviously needs to bypass
the block, and MAY require that some variables are initialised in order
for others to then be populated, which just highlights again that having
a set of simple variables each of which can be managed on it's own, is
then wrapped in a workflow to ensure the resulting 'readonly' object is
initialized correctly with the variables supplied.

I am thinking particularly here about timestamp and that the current
'mess' of built in code which I still have to wrap to make it work with
those generated in my database code. The immutable version of that is
pointless much of the time, and I simply switch to a readonly BIGINT
value so it will work on 32 bit platforms. The raw view of the data can
vary depending on if you are using seconds or days as a base but both
bases eventually produce a variety of displays with managed time offsets.

--
Lester Caine - G8HFL

8 years ago by Rowan Collins — view source

unread

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.

Note that as discussed on the recent RFC thread, escape mechanisms are
very much specific to the destination of the data not the source. So
having "echo $foo" call "echo $foo->escape()" isn't going to work.

I guess you could have "echo $foo->forHTML()" with some kind of fallback
that converts that to "echo htmlspecialchars($foo)" if nothing further
is defined. Or "echo_html( $foo )" calling "$foo->forHTML()" internally
if it's been defined against the variable. I haven't thought this
through fully, just throwing ideas out there.

Regards,

Rowan Collins
[IMSoP]

8 years ago by Lester Caine — view source

unread

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.

Note that as discussed on the recent RFC thread, escape mechanisms are
very much specific to the destination of the data not the source. So
having "echo $foo" call "echo $foo->escape()" isn't going to work.

I guess you could have "echo $foo->forHTML()" with some kind of fallback
that converts that to "echo htmlspecialchars($foo)" if nothing further
is defined. Or "echo_html( $foo )" calling "$foo->forHTML()" internally
if it's been defined against the variable. I haven't thought this
through fully, just throwing ideas out there.

I struggled to see what problem adding more 'shortcut' tags was going to
solve. I have problems with third party sites returning data that may
or may not need 'un-escaping' and it's a little like trying to ensure I
get a UK rather than an American date. From the 'source' end it's almost
a 'this needs escaping' and you are right that where it is going forms
part of the 'escape' process, so the 'generic code' that handles echo
may benefit from expansion to echo_xxx to augment the process. Or just
like 'strict' mode we have 'js' mode and all output uses the appropriate
js rules rather than html?

I think the idea I am throwing out for discussion is a switch from
global_library($var, ... ) to $var->global_library( ... ) where $var is
now always an object without having every framework creating it's own
version of the wrapper?

--
Lester Caine - G8HFL

8 years ago by Rowan Collins — view source

unread

I think the idea I am throwing out for discussion is a switch from
global_library($var, ... ) to $var->global_library( ... ) where $var is
now always an object without having every framework creating it's own
version of the wrapper?

So kind of an extension to the "scalar objects" idea (e.g.
https://github.com/nikic/scalar_objects) such that variables can have
arbitrary properties / state rather than just the syntax sugar of
->methodCall() ?

One problem I can see is that like with references and objects, the
meaning of assignment has to change somehow:

$foo = 'hello world';
$foo->setReadonly();
var_dump($foo->isReadonly()); // TRUE

$bar = $foo;
var_dump($bar->isReadonly()); // ? same value, different variable

$foo = 'goodbye world';
var_dump($foo->isReadonly()); // ? same variable, different value

Of course, the same applies whenever you pass the variable into a
function, or return it from one...

Regards,

Rowan Collins
[IMSoP]

8 years ago by Christoph M. Becker — view source

unread

You've mentioned a lot about flexibility, and that the feature could be
used in multiple styles, but some concrete examples of how you would
use it might help define what the feature needs to do (and not do).

Currently my code has lots of checks for constraints and that is hard
coded. Docblock helps to provide documentation and help in the IDE and
there is no fundamental reason to change anything ... except. Small
elements of the constraint process are being introduced into the
process. You can now complain if the variable is not an integer but it
does not remove the need to still check if the integer is valid. There
have been various discussions on how the rules for that extra step could
be added to PHP and in my book that has been there for years in the
docblocks, but other layers are being proposed to add them, and Yasuo is
now hiding them in his validate functions, so why not SIMPLY add a set
of functions to variables to allow those rules to be freely available
and managed on a variable by variable basis. The validate array function
would then simply iterate over a cleanly defined set of variables? Or
each variable can be managed in it's own right.

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.

Actually, this is already possible; just use objects. E.g.

class Container {
function __construct() {}
function setConstraint() {}
function setEscape() {}
function setReadOnly() {}
function isValid() {}
function __toString() {}
}
$var = new Container($some_value);

--
Christoph M. Becker

8 years ago by Lester Caine — view source

unread

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.
Actually, this is already possible; just use objects. E.g.

class Container {
function __construct() {}
function setConstraint() {}
function setEscape() {}
function setReadOnly() {}
function isValid() {}
function __toString() {}
}
$var = new Container($some_value);

This has been the problem all along. There is no overriding reason to
change what we already have, but ALL of the improvements currently being
discussed MAY be better handled with a return to the basic model of a
variable. If a variable is 'readonly' there is no need to worry about
each variable.

--
Lester Caine - G8HFL

8 years ago by Peter Lind — view source

unread

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data
that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.
Actually, this is already possible; just use objects. E.g.

class Container {
function __construct() {}
function setConstraint() {}
function setEscape() {}
function setReadOnly() {}
function isValid() {}
function __toString() {}
}
$var = new Container($some_value);

This has been the problem all along. There is no overriding reason to
change what we already have, but ALL of the improvements currently being
discussed MAY be better handled with a return to the basic model of a
variable. If a variable is 'readonly' there is no need to worry about
each variable.

Thanks for the ideas on this feature.

A few thoughts.

The RFC for this isn't a change - it's an addition. If it gets accepted
and implemented, you still would not have to change your code if you didn't
want to.
There are differing ways of using the language. Yours is not better -
merely different. So I would think a relevant question is: can the RFC in
point support your style of coding along with that of others. A critical
point is throwing exceptions on invalid data, which might be hard to handle.
Your assumption of secure intra-nets is questionable. Defense in depth
is what one should strive for.
I think your suggestions might conflate validation and sanitation -
these are not the same and cannot be handled as one

That said, I generally think that built-in methods that accept Callables
are a great way to go. It encourages reuse through modular composition -
and could likely be a neater way around the throw exception/return error
code issue. It's obviously doable from userland, but could probably be
improved if implemented in the language.

Regards
Peter

--
CV: careers.stackoverflow.com/peterlind
LinkedIn: plind
Twitter: kafe15

8 years ago by Lester Caine — view source

unread

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

Rather than having to build 'reflections' classes to pull out data
that
a simple $var->is_valid or echo $var will output a correctly escaped
piece of text.
Actually, this is already possible; just use objects. E.g.

class Container {
function __construct() {}
function setConstraint() {}
function setEscape() {}
function setReadOnly() {}
function isValid() {}
function __toString() {}
}
$var = new Container($some_value);

This has been the problem all along. There is no overriding reason to
change what we already have, but ALL of the improvements currently being
discussed MAY be better handled with a return to the basic model of a
variable. If a variable is 'readonly' there is no need to worry about
each variable.

Thanks for the ideas on this feature.

A few thoughts.

The RFC for this isn't a change - it's an addition. If it gets accepted
and implemented, you still would not have to change your code if you didn't
want to.
I think that is the whole point ...

There are differing ways of using the language. Yours is not better -
merely different. So I would think a relevant question is: can the RFC in
point support your style of coding along with that of others. A critical
point is throwing exceptions on invalid data, which might be hard to handle.
Exceptions get generated hen something unhandled happens. Simple example
'divide by zero' only happens if the divisor is zero. If the variable
used has a constraint of 'not zero' and has been validated then the
exception will not be raised. My style would validate the divisor and
only call the divide if it was going to work, others would allow the
divide to fail and MAY actually handle the exception ;)

Your assumption of secure intra-nets is questionable. Defense in depth
is what one should strive for.
ACTUALLY that i why I've been looking for proper validation of data IN
PHP for the last 15 years. If the rules were easy to handle then
processing the the browser and cross checking the input array would be
consistent. It's just a shame that the browser end of things is an even
worse mess! html5 validation is somewhat incomplete as well?

I think your suggestions might conflate validation and sanitation -
these are not the same and cannot be handled as one
That people try to inject malicious input via variables is a different
problem. Firebird database has always preferred parameter data so SQL
injections just don't work, but other stuff does need clearing when
input rather than simply relying on escaping unprocessed output. Again
it's programming sytle?

That said, I generally think that built-in methods that accept Callables
are a great way to go. It encourages reuse through modular composition -
and could likely be a neater way around the throw exception/return error
code issue. It's obviously doable from userland, but could probably be
improved if implemented in the language.

It was the fact that Yasuo was adding these rules into his array
validation stuff that just grates so badly with what is actually needed ...

--
Lester Caine - G8HFL

8 years ago by Peter Lind — view source

unread

Thanks for the ideas on this feature.

A few thoughts.

The RFC for this isn't a change - it's an addition. If it gets
accepted
and implemented, you still would not have to change your code if you
didn't
want to.
I think that is the whole point ...

You're making different points, so maybe you should then be clearer.

There are differing ways of using the language. Yours is not better -
merely different. So I would think a relevant question is: can the RFC in
point support your style of coding along with that of others. A critical
point is throwing exceptions on invalid data, which might be hard to
handle.
Exceptions get generated hen something unhandled happens. Simple example
'divide by zero' only happens if the divisor is zero. If the variable
used has a constraint of 'not zero' and has been validated then the
exception will not be raised. My style would validate the divisor and
only call the divide if it was going to work, others would allow the
divide to fail and MAY actually handle the exception ;)

That doesn't address the point: if the feature should throw exceptions or
not, or allow for both styles.

I think your suggestions might conflate validation and sanitation -
these are not the same and cannot be handled as one
That people try to inject malicious input via variables is a different
problem. Firebird database has always preferred parameter data so SQL
injections just don't work, but other stuff does need clearing when
input rather than simply relying on escaping unprocessed output. Again
it's programming sytle?

No, it's not programming style. Conflating validation and
sanitation/escaping is wrong, because the contexts are different.

That said, I generally think that built-in methods that accept Callables
are a great way to go. It encourages reuse through modular composition -
and could likely be a neater way around the throw exception/return error
code issue. It's obviously doable from userland, but could probably be
improved if implemented in the language.

It was the fact that Yasuo was adding these rules into his array
validation stuff that just grates so badly with what is actually needed ...

Yasuo was presumably scratching an itch that he (and possibly others) feel.
As such, it is in fact "what is actually needed". It just doesn't happen to
be what you actually need, but that doesn't mean it won't fit perfectly
for many others.

Regards
Peter

--
CV: careers.stackoverflow.com/peterlind
LinkedIn: plind
Twitter: kafe15

8 years ago by Lester Caine — view source

unread

Thanks for the ideas on this feature.

A few thoughts.

The RFC for this isn't a change - it's an addition. If it gets
accepted
and implemented, you still would not have to change your code if you
didn't
want to.
I think that is the whole point ...

You're making different points, so maybe you should then be clearer.

If you don't set a constraint or other 'intelligent' action then nothing
changes?

There are differing ways of using the language. Yours is not better -
merely different. So I would think a relevant question is: can the RFC in
point support your style of coding along with that of others. A critical
point is throwing exceptions on invalid data, which might be hard to
handle.
Exceptions get generated hen something unhandled happens. Simple example
'divide by zero' only happens if the divisor is zero. If the variable
used has a constraint of 'not zero' and has been validated then the
exception will not be raised. My style would validate the divisor and
only call the divide if it was going to work, others would allow the
divide to fail and MAY actually handle the exception ;)

That doesn't address the point: if the feature should throw exceptions or
not, or allow for both styles.

We already have a 'strict' mode which enables exceptions on some actions
or leaves them as errors. I'm simply happy for both to co exist to keep
others happy.

I think your suggestions might conflate validation and sanitation -
these are not the same and cannot be handled as one
That people try to inject malicious input via variables is a different
problem. Firebird database has always preferred parameter data so SQL
injections just don't work, but other stuff does need clearing when
input rather than simply relying on escaping unprocessed output. Again
it's programming sytle?

No, it's not programming style. Conflating validation and
sanitation/escaping is wrong, because the contexts are different.

The programming style 'point' is that many sites do simply slap
htmlspecialchars() around all output and not worry that internally
suspect data is floating around. On a couple of projects I've been using
this was the sticking plaster while I prefer to ensure that the
variables being passed were clean in the first place. In some situations
sanitising the input may not be practical because of the workflow used
but THAT is a matter of programming style?

That said, I generally think that built-in methods that accept Callables
are a great way to go. It encourages reuse through modular composition -
and could likely be a neater way around the throw exception/return error
code issue. It's obviously doable from userland, but could probably be
improved if implemented in the language.

It was the fact that Yasuo was adding these rules into his array
validation stuff that just grates so badly with what is actually needed ...

Yasuo was presumably scratching an itch that he (and possibly others) feel.
As such, it is in fact "what is actually needed". It just doesn't happen to
be what you actually need, but that doesn't mean it won't fit perfectly
for many others.
See other post ... to Yasuo

--
Lester Caine - G8HFL

8 years ago by Peter Lind — view source

unread

Thanks for the ideas on this feature.

A few thoughts.

The RFC for this isn't a change - it's an addition. If it gets
accepted
and implemented, you still would not have to change your code if you
didn't
want to.
I think that is the whole point ...

You're making different points, so maybe you should then be clearer.

If you don't set a constraint or other 'intelligent' action then nothing
changes?

This is missing the point entirely. The RFC was for a new feature, that you
don't have to use. Not for changing existing ones.

There are differing ways of using the language. Yours is not better

merely different. So I would think a relevant question is: can the RFC
in
point support your style of coding along with that of others. A
critical
point is throwing exceptions on invalid data, which might be hard to
handle.
Exceptions get generated hen something unhandled happens. Simple example
'divide by zero' only happens if the divisor is zero. If the variable
used has a constraint of 'not zero' and has been validated then the
exception will not be raised. My style would validate the divisor and
only call the divide if it was going to work, others would allow the
divide to fail and MAY actually handle the exception ;)

That doesn't address the point: if the feature should throw exceptions or
not, or allow for both styles.

We already have a 'strict' mode which enables exceptions on some actions
or leaves them as errors. I'm simply happy for both to co exist to keep
others happy.

Good point.

I think your suggestions might conflate validation and sanitation -
these are not the same and cannot be handled as one
That people try to inject malicious input via variables is a different
problem. Firebird database has always preferred parameter data so SQL
injections just don't work, but other stuff does need clearing when
input rather than simply relying on escaping unprocessed output. Again
it's programming sytle?

No, it's not programming style. Conflating validation and
sanitation/escaping is wrong, because the contexts are different.

The programming style 'point' is that many sites do simply slap
htmlspecialchars() around all output and not worry that internally
suspect data is floating around. On a couple of projects I've been using
this was the sticking plaster while I prefer to ensure that the
variables being passed were clean in the first place. In some situations
sanitising the input may not be practical because of the workflow used
but THAT is a matter of programming style?

But other programmers entirely foregoing validation is besides the point.
The point was that conflating validation with escaping/sanitizing is wrong

and that still applies, even if your chosen method of validation is "I
don't".

--
CV: careers.stackoverflow.com/peterlind
LinkedIn: plind
Twitter: kafe15

8 years ago by Tony Marston — view source

unread

"Peter Lind" wrote in message
news:CAEU6NAdvBKMqhjU0aSsR5f+uBo9vvRyjYVJmr0ihxb8bDmG3Gg@mail.gmail.com...

Thanks for the ideas on this feature.

A few thoughts.

The RFC for this isn't a change - it's an addition. If it gets
accepted
and implemented, you still would not have to change your code if you
didn't
want to.
I think that is the whole point ...

You're making different points, so maybe you should then be clearer.

There are differing ways of using the language. Yours is not
better -
merely different. So I would think a relevant question is: can the RFC
in
point support your style of coding along with that of others. A
critical
point is throwing exceptions on invalid data, which might be hard to
handle.
Exceptions get generated hen something unhandled happens. Simple example
'divide by zero' only happens if the divisor is zero. If the variable
used has a constraint of 'not zero' and has been validated then the
exception will not be raised. My style would validate the divisor and
only call the divide if it was going to work, others would allow the
divide to fail and MAY actually handle the exception ;)

That doesn't address the point: if the feature should throw exceptions or
not, or allow for both styles.

I think your suggestions might conflate validation and sanitation -
these are not the same and cannot be handled as one
That people try to inject malicious input via variables is a different
problem. Firebird database has always preferred parameter data so SQL
injections just don't work, but other stuff does need clearing when
input rather than simply relying on escaping unprocessed output. Again
it's programming sytle?

No, it's not programming style. Conflating validation and
sanitation/escaping is wrong, because the contexts are different.

That said, I generally think that built-in methods that accept
Callables
are a great way to go. It encourages reuse through modular
composition -
and could likely be a neater way around the throw exception/return
error
code issue. It's obviously doable from userland, but could probably be
improved if implemented in the language.

It was the fact that Yasuo was adding these rules into his array
validation stuff that just grates so badly with what is actually needed
...

Yasuo was presumably scratching an itch that he (and possibly others) feel.
As such, it is in fact "what is actually needed". It just doesn't happen to
be what you actually need, but that doesn't mean it won't fit perfectly
for many others.

Define "many". Is it 1% of the total number of PHP programmers, or is it 51%
?

I do not see that complicating the language by making it do what SHOULD be
done in userland code is a good idea. Helping 1% of users while hindering
the remaining 99% is not an idea which should EVER be contemplated.

--
Tony Marston

8 years ago by Lester Caine — view source

unread

I do not see that complicating the language by making it do what SHOULD
be done in userland code is a good idea. Helping 1% of users while
hindering the remaining 99% is not an idea which should EVER be
contemplated.

I'm currently looking to 'modernise' forms that are still using hard
coded mechanisms for validation in the browser and cross checking in the
PHP end. What I can not find is a tidy 'modern' method of handling the
building of forms without having to hand write javascript and other
elements of form building. I'm based on Bootstrap3 for the layout and
css side of things and have just 'invested' in a copy of FormValidation
as it seems to do a good job of producing all the javascript side while
still working html in the smarty templates. I'd rather be using open
source code I can extend, but nothing I've found seems to work.

The next step is to produce slices of forms as templates each of which
can be populated from the rules attached to a variable. In the absence
of any standard for handling that in PHP directly I will probably resort
to using the SQL schema to provide the rules since essentially the
captured data will end up in a database anyway. I can extend ADOdb to
provide a more comprehensive field description for each variable to
populate the smarty templates. It would be nice though to have that
information more centrally accessible in PHP so the various third party
systems can standardise on using them? The schema based material is not
visible via phpdocumentor which sort of pushes expanding the docblock
alternatives instead.

Of cause to add to the fun. Having got a new style form working, I've
dropped the UK NI Number validation regex into FormValidator ... and
it's not working :( The regex as been used for years in hand coded
javascript and is clean in the checker program, just not producing a
match in FV :(

--
Lester Caine - G8HFL

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

That said, I generally think that built-in methods that accept Callables
are a great way to go. It encourages reuse through modular composition -
and could likely be a neater way around the throw exception/return error
code issue. It's obviously doable from userland, but could probably be
improved if implemented in the language.

It was the fact that Yasuo was adding these rules into his array
validation stuff that just grates so badly with what is actually needed ...

I think you've mentioned this RFC

https://wiki.php.net/rfc/add_validate_functions_to_filter

In secure coding, input data validation has clear task. It varies what
input data validation should do. i.e. It depends on what sender should
send. The new validation feature in filter module will do the job it
should.

Anyway, input validation spec is simple array. You can do

$my_date_spec = array(
// New filter module allows multiple filters and
options as follows.
// Array elements are evaluated in order. Non array
spec is evaluated last.
// Older implementation ignores this kind of spec silently.
array( // This is evaluated first.
'filter' => FILTER_VALIDATE_STRING,
'options' => array('min_bytes' => 10,
'max_bytes' => 10, 'encoding' => FILTER_STRING_ENCODING_PASS)
),
array(
'filter' => FILTER_VALIDATE_REGEXP,
'options' => array('regexp' =>
'/^[0-9]{4}-[0-9]{2}-[0-9]{2}$/')
),
array(
'filter' => FILTER_VALIDATE_CALLBAK,
'options' => array('callback' =>
'check_date_and_raise_exception_for_invalid()'),
),
'filter' => FILTER_UNSAFE_RAW, // Evaluated last. Does
nothing. It's here for an example.
);

$get_def_for_an_api = array(
    'date'    => $my_date_spec
);

filter_require_var_array($_GET, $get_def_for_an_api);

Input validation definition is manageable. Since it uses a simple
array, it is much more efficient than object based API. i.e. setting
spec via method is a lot slower than simple assignment. There is spec
validation filter_check_definition() function also.

What makes you feel missing some or designed badly?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

What makes you feel missing some or designed badly?

I may be missing something, but I thought the original code had rules
for each element of the array? I would certainly expect to see the
capability of setting different validating rules for each element, and
the rules you are defining are the same rules that would be needed on a
variable by variable basis?

--
Lester Caine - G8HFL

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

What makes you feel missing some or designed badly?

I may be missing something, but I thought the original code had rules
for each element of the array? I would certainly expect to see the
capability of setting different validating rules for each element, and
the rules you are defining are the same rules that would be needed on a
variable by variable basis?

You can do

$get_def_for_an_api = array(
'date' => $my_date_spec,
'bookname' => $my_bookname_spec,
'isbn' => $my_isbn_spec,
);

filter_require_var_array($_GET, $get_def_for_an_api);

One missing validation filter is "optional" filter. I may add this
later or now before starting vote.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

What makes you feel missing some or designed badly?

I may be missing something, but I thought the original code had rules
for each element of the array? I would certainly expect to see the
capability of setting different validating rules for each element, and
the rules you are defining are the same rules that would be needed on a
variable by variable basis?

You can do

$get_def_for_an_api = array(
'date' => $my_date_spec,
'bookname' => $my_bookname_spec,
'isbn' => $my_isbn_spec,
);

filter_require_var_array($_GET, $get_def_for_an_api);

One missing validation filter is "optional" filter. I may add this
later or now before starting vote.

"optional" filter can be defined by "callback" filter without
dedicated filter for optional inputs, BTW.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

I may be missing something, but I thought the original code had rules

for each element of the array? I would certainly expect to see the
capability of setting different validating rules for each element, and
the rules you are defining are the same rules that would be needed on a
variable by variable basis?
You can do

$get_def_for_an_api = array(
'date' => $my_date_spec,
'bookname' => $my_bookname_spec,
'isbn' => $my_isbn_spec,
);

filter_require_var_array($_GET, $get_def_for_an_api);

One missing validation filter is "optional" filter. I may add this
later or now before starting vote.

But my point is you have code to process the rule set $my_date_spec
against the variable $date and ditto on bookname and isbn. All three of
them are elements I have user land code to validate, and I don't see
what is so special about making that validation available inside an
array against simply having $date->rules($my_date_spec) and return
$date->valid; ...

--
Lester Caine - G8HFL

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()

DbC cannot cover all, but some of them can be covered during development.

https://wiki.php.net/rfc/introduce_design_by_contract
https://wiki.php.net/rfc/dbc
https://wiki.php.net/rfc/dbc2

Runtime constraint is good, but it could be too slow. Proper DbC would
have nice balance on production and development checks.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

Hi Lester,

I'm thinking
$var->setConstraint()
$var->setEscape()
$var->setReadOnly()
DbC cannot cover all, but some of them can be covered during development.

https://wiki.php.net/rfc/introduce_design_by_contract
https://wiki.php.net/rfc/dbc
https://wiki.php.net/rfc/dbc2

Runtime constraint is good, but it could be too slow. Proper DbC would
have nice balance on production and development checks.

I think my argument here would be that this is an alternate 'IDE'
approach? Rather than simply extending the 'simple' way PHP currently
works? As I say ... it's that your rfc on array validation has all the
code I think is needed to upgrade the handling of variables individually?

--
Lester Caine - G8HFL

8 years ago by Tony Marston — view source

unread

"Niklas Keller" wrote in message
news:CANUQDCjaXeJsCAjyE+q-UhXCFjexwRYWLKzMfwRQVDaEwVbLog@mail.gmail.com...

2016-08-11 14:42 GMT+02:00 Lester Caine lester@lsces.co.uk:

You want to stick such validation at runtime at any time with variable
and
throwing \TypeError at any time constraint is broken - wouldn't it
cause
of
throwing much more unexpected exceptions during runtime?
Imagine you'll be passing such variable with constraint into some
object
who operates on it and it should expect \TypeError at any time because
you
newer know what sort of constraint and optional validation callback is
sticked to variable!

Now this is where the fundamental difference in styles comes in.
PERSONALLY I would not be looking to throw exceptions at all. The whole
point of validation is to handle any validation error ... and it is an
error not an exception.

If not by using exceptions, how would you handle them if you assign such
checks to variables and assign a wrong value?

Regards, Niklas

I agree 100% with Lester. Data validation errors are NOT exceptions, they
are simple errors. Exceptions are supposed to be for exceptional or
unexpected events, usually pointing to a bug in the code which needs fixing,
whereas data validation errors are totally expected and regular occurrences.

I solved the data validation problem over a decade ago by writing a simple
validation function that has two input variables - an associative array of
field names and their values (such as the $_POST array), and a multi-level
array of field specifications which has a separate array of specifications
(type, size, etc) for each field in that table. The output is an array of
errors which can contain zero or more entries. I DO NOT use exceptions
because (a) they are errors and not exceptions, and (b) if I throw an
exception on the first error then further validation is stopped, and
everyone knows that there may be multiple errors in a single array of
values.

If you are still writing code to perform such data validation then you are
behind the times. If you expect the language to perform the validation for
you then your expectations are unreasonable.

--
Tony Marston

8 years ago by Marco Pivetta — view source

unread

So much confusion...

There are 3 (or more) types of validation in pretty much every web-app, so
let's please stop calling it simply "validation".

frontend validation (unsafe/unreliable), prevents invalid data
submission, makes things friendlier for the user
form validation, in the application layer (your mvc framework or your
entry point php script): returns error messages to the client (in HTML or
json or whatever you prefer). It is still not the source of truth
domain validation, in the core of your domain logic, throws exceptions
and generally stops execution whenever invalid data is provided: no nice
error messages required, except for logging/debugging purposes

Other than that, the process of validation is a simple function, and not a
magic box:

validate :: value -> (bool, [string])

It receives a value (nested, if necessary) and tells us if it is valid or
not, plus why (as a set of strings, usually).

This is validation. Please do use separate terminology if you mean:

"frontend (client-side) validation"
"frontend (server-side) validation"
"domain validation"

Even more specific if you are going with something different, please.

"Niklas Keller" wrote in message news:CANUQDCjaXeJsCAjyE+q-UhXC
FjexwRYWLKzMfwRQVDaEwVbLog@mail.gmail.com...

2016-08-11 14:42 GMT+02:00 Lester Caine lester@lsces.co.uk:

You want to stick such validation at runtime at any time with variable
and
throwing \TypeError at any time constraint is broken - wouldn't it >
cause
of
throwing much more unexpected exceptions during runtime?
Imagine you'll be passing such variable with constraint into some >
object
who operates on it and it should expect \TypeError at any time because
you
newer know what sort of constraint and optional validation callback is
sticked to variable!

Now this is where the fundamental difference in styles comes in.
PERSONALLY I would not be looking to throw exceptions at all. The whole
point of validation is to handle any validation error ... and it is an
error not an exception.

If not by using exceptions, how would you handle them if you assign such
checks to variables and assign a wrong value?

Regards, Niklas

I agree 100% with Lester. Data validation errors are NOT exceptions, they
are simple errors. Exceptions are supposed to be for exceptional or
unexpected events, usually pointing to a bug in the code which needs
fixing, whereas data validation errors are totally expected and regular
occurrences.

I solved the data validation problem over a decade ago by writing a simple
validation function that has two input variables - an associative array of
field names and their values (such as the $_POST array), and a multi-level
array of field specifications which has a separate array of specifications
(type, size, etc) for each field in that table. The output is an array of
errors which can contain zero or more entries. I DO NOT use exceptions
because (a) they are errors and not exceptions, and (b) if I throw an
exception on the first error then further validation is stopped, and
everyone knows that there may be multiple errors in a single array of
values.

If you are still writing code to perform such data validation then you are
behind the times. If you expect the language to perform the validation for
you then your expectations are unreasonable.

--
Tony Marston

8 years ago by Lester Caine — view source

unread

It receives a value (nested, if necessary) and tells us if it is valid or
not, plus why (as a set of strings, usually).

This is validation. Please do use separate terminology if you mean:

"frontend (client-side) validation"

"frontend (server-side) validation"

"domain validation"

Even more specific if you are going with something different, please.

Not quite sure what you are referring to by 'domain validation' all I
am talking about is the process of obtaining a valid set of data to
store in a database. Nothing more. That is a set of variables with a
little more complex rules than the simple 'int' or 'string' type
checking that has caused so much trouble. There are many useful well
defined strings such as email address, National Insurance Number Bank
sort code and so on which have well defined rules against which to
validate the data supplied. An NI Number is a fixed well defined string
that does not need escaping when displayed, but may in some cases need
masking if the client user does not have full access.

Most systems will actually be used by real people who expect the client
side to be fast, so passing those rules client side is part of a process
to handle variables. The same rules are used server side to validate the
dataset and when there are no morons around the server side should have
to do little more than rubber stamp the various variables in the post
data. Additional checks are normally added server side such as if an NI
number already exists, or a valid postcode has been supplied. Something
which could also be actioned via an AJAX check. Nowadays even inside PHP
the gap between client and server is somewhat woolly ? and this is where
access on a variable by variable basis to the rules is essential.

But we do have morons around who take pleasure in trying to make life
difficult for everybody else. They will capitalise on any known weakness
to try and mess sites up. That the validation process has to be robust
enough to cope with this sort of activity IS a different problem, but
with a robust variable based validation, injections should be difficult
to push through and apart from the previous discussions on being able to
store examples of malicious code while avoiding it also being able to be
activated, my preferred workflow ensures that validation includes
elimination of any potentially malicious code.

--
Lester Caine - G8HFL

8 years ago by Tony Marston — view source

unread

"Marco Pivetta" wrote in message
news:CADyq6sKZRBvYFtqyKYVYM4iUEx+2OuUJvHeP1jznM56k3+hznA@mail.gmail.com...

So much confusion...

There are 3 (or more) types of validation in pretty much every web-app, so
let's please stop calling it simply "validation".

frontend validation (unsafe/unreliable), prevents invalid data
submission, makes things friendlier for the user

form validation, in the application layer (your mvc framework or your
entry point php script): returns error messages to the client (in HTML or
json or whatever you prefer). It is still not the source of truth

domain validation, in the core of your domain logic, throws exceptions
and generally stops execution whenever invalid data is provided: no nice
error messages required, except for logging/debugging purposes

I do not perform any javascript validation in the frontend.

I do not distinguish between the application layer and the domain layer. My
framework is based on a combination of the 3 Tier Architecture with its
separate Presentation, Business and Data Access layers and MVC where the
Model is the same as the Business layer and the Controller and View exist in
the Presentation layer. All data validation is performed in the
Model/Business layer objects.

All primary validation - that which is required to verify that the data for
each column matches the specifications for that column - is carried out by a
standard routine within the framework and does not require ANY code to be
written by the developer. The standard routine uses a list of field
specifications which originate from the table's database structure. The only
validation which requires developer intervention is what I call secondary
validation, such as checking that date1 is greater than date2.

I NEVER throw exceptions for validation errors as they are NOT exceptions.
They are common occurrences, and my validation routine can produce multiple
errors whereas it could only throw a single exception.

If you are still writing code to perform primary validation on each field
then your coding style is way behind the times.

If you want the language to change to perform this validation for you I
would strongly suggest that you first learn how to write a standard
validation function in userland code - which I did over 10 years ago -
instead of trying to make the language more complicated just to cover up
your own shortcomings.

--
Tony Marston

Other than that, the process of validation is a simple function, and not a
magic box:

validate :: value -> (bool, [string])

It receives a value (nested, if necessary) and tells us if it is valid or
not, plus why (as a set of strings, usually).

This is validation. Please do use separate terminology if you mean:

"frontend (client-side) validation"

"frontend (server-side) validation"

"domain validation"

Even more specific if you are going with something different, please.

"Niklas Keller" wrote in message news:CANUQDCjaXeJsCAjyE+q-UhXC
FjexwRYWLKzMfwRQVDaEwVbLog@mail.gmail.com...

2016-08-11 14:42 GMT+02:00 Lester Caine lester@lsces.co.uk:

You want to stick such validation at runtime at any time with
variable
and
throwing \TypeError at any time constraint is broken - wouldn't it >
cause
of
throwing much more unexpected exceptions during runtime?
Imagine you'll be passing such variable with constraint into some >
object
who operates on it and it should expect \TypeError at any time
because
you
newer know what sort of constraint and optional validation callback
is
sticked to variable!

Now this is where the fundamental difference in styles comes in.
PERSONALLY I would not be looking to throw exceptions at all. The whole
point of validation is to handle any validation error ... and it is an
error not an exception.

If not by using exceptions, how would you handle them if you assign such
checks to variables and assign a wrong value?

Regards, Niklas

I agree 100% with Lester. Data validation errors are NOT exceptions, they
are simple errors. Exceptions are supposed to be for exceptional or
unexpected events, usually pointing to a bug in the code which needs
fixing, whereas data validation errors are totally expected and regular
occurrences.

I solved the data validation problem over a decade ago by writing a
simple
validation function that has two input variables - an associative array
of
field names and their values (such as the $_POST array), and a
multi-level
array of field specifications which has a separate array of
specifications
(type, size, etc) for each field in that table. The output is an array of
errors which can contain zero or more entries. I DO NOT use exceptions
because (a) they are errors and not exceptions, and (b) if I throw an
exception on the first error then further validation is stopped, and
everyone knows that there may be multiple errors in a single array of
values.

If you are still writing code to perform such data validation then you
are
behind the times. If you expect the language to perform the validation
for
you then your expectations are unreasonable.

--
Tony Marston

8 years ago by Lester Caine — view source

unread

If you are still writing code to perform primary validation on each
field then your coding style is way behind the times.

At some point you need to know the rules that wrap each field you are
working with. The sort of forms I work with have individual rules for
probably 90% of them. I'd dropped browser side checks from some site
upgrades simply to speed getting the code over and the old js/flash
solutions are no longer PC, so I'm looking to a new standard to restore
what the sites did 15 years ago! Clients like when the browser says not
valid rather than hving to press submit and wait for the new page ...

If you want the language to change to perform this validation for you I
would strongly suggest that you first learn how to write a standard
validation function in userland code - which I did over 10 years ago -
instead of trying to make the language more complicated just to cover up
your own shortcomings.

My 'complaint' if anything is that EVERYBODY does that now and there is
no problem with that, EXCEPT every solution has it's own style of
working and coding so there is no 'standard' way of passing the key
annotation that all of these systems use. Everything is built around a
set of variables? There are so many different database abstraction
layers even built on top of PDO that people simply hack them to their
own purpose rather than more effort being put into producing a base that
we can all put effort towards improving. And where PHP does not have a
style people like they go off and produce a new language.

I'm not against all of the different styles of working, just looking to
improving the common standards of the base set of variables they are
built on top of. So I have some reason to change the remaining legacy
sites which are just working fine as long as you babysit them.

--
Lester Caine - G8HFL

8 years ago by Tony Marston — view source

unread

"Lester Caine" wrote in message
news:fe222876-6875-3f07-e25b-aea2cbbedfc7@lsces.co.uk...

If you are still writing code to perform primary validation on each
field then your coding style is way behind the times.

At some point you need to know the rules that wrap each field you are
working with.

My automatic validation routine has the following method signatures:
a) $fieldarray = $valiationObj->validateInsert($fieldarray, $fieldspecs);
b) $fieldarray = $valiationObj->validateUpdate($fieldarray, $fieldspecs);

$fieldarray is an associative array of 'name=value' pairs, such as supplied
in the $_POST array.
$fieldspecs is a multi-level array in the form 'name=specs' where 'specs' is
an array of specifications for that particular field/column. The specs
provide values for type (string, number, date, time, datetime), size, and
nullable. Numbers may also have values for minvalue, maxvalue, precision and
scale.

The operation of each of these two methods is very simple - it iterates
through the two arrays and checks that each value matches its
specifications. Any error messages are added to the $this->errors array, not
thrown as exceptions.

The origins of $fieldarray should be obvious. The $fieldspecs array is
constructed from data which is originally extracted from the database
schema, imported into my Data Dictionary, then exported into a PHP script
which is then included when the table's class is instantiated.

This means that I never have to write code to perform such validation as my
validation routine does everything that is necessary. I can even change the
structure of a table without having to change any code - after changing the
structure I simply re-import the table into my Data Dictionary and re-export
the PHP script. I have been doing it this way for over a decade, so when I
see people still writing reams of code for such a common function I just
have to question their ability as programmers. Even worse are those who
don't have the ability to write a standard validation routine so they want
to change the language so that it performs the validation for them.

The sort of forms I work with have individual rules for
probably 90% of them. I'd dropped browser side checks from some site
upgrades simply to speed getting the code over and the old js/flash
solutions are no longer PC, so I'm looking to a new standard to restore
what the sites did 15 years ago! Clients like when the browser says not
valid rather than hving to press submit and wait for the new page ...

If you want the language to change to perform this validation for you I
would strongly suggest that you first learn how to write a standard
validation function in userland code - which I did over 10 years ago -
instead of trying to make the language more complicated just to cover up
your own shortcomings.

My 'complaint' if anything is that EVERYBODY does that now and there is
no problem with that, EXCEPT every solution has it's own style of
working and coding

As validation code has to be written by the developer then of course there
will be differences in the code produced by different developers. That is a
fact of life. The efficacy of the solution is down to the skill of the
individual developer, and it is another fact of life that some developers
have more sills than others.

so there is no 'standard' way of passing the key
annotation that all of these systems use. Everything is built around a
set of variables? There are so many different database abstraction
layers even built on top of PDO that people simply hack them to their
own purpose rather than more effort being put into producing a base that
we can all put effort towards improving. And where PHP does not have a
style people like they go off and produce a new language.

Data validation should never be performed in the data access layer. It
should be performed in the business layer so that only valid data is passed
to the data access layer.

I'm not against all of the different styles of working, just looking to
improving the common standards of the base set of variables they are
built on top of. So I have some reason to change the remaining legacy
sites which are just working fine as long as you babysit them.

Data validation has to be performed in userland code, so there will never be
a "standard" method. Some methods are better than others, and IMHO the
method which requires the least amount of effort on the part of the
developer is the best method.

--
Tony Marston

8 years ago by Lester Caine — view source

unread

The origins of $fieldarray should be obvious. The $fieldspecs array is
constructed from data which is originally extracted from the database
schema, imported into my Data Dictionary, then exported into a PHP
script which is then included when the table's class is instantiated.

What third party system are YOU using for that?

I've got exactly the same via ADOdb, but just about every project USING
it produces their own set of code for 'extracting' since the constraint
rules are not part of the core package. PDO has made things even worse
rather than better :(

This is simply about a standard method of adding constraint and other
data to the $fieldarray. Even Yasuo's filter extensions need a third
party tool to build HIS set of rules from every body else’s version of
storing them ...

--
Lester Caine - G8HFL

8 years ago by Tony Marston — view source

unread

"Lester Caine" wrote in message
news:4b381281-204d-5e57-0bba-127ab8d2e926@lsces.co.uk...

The origins of $fieldarray should be obvious. The $fieldspecs array is
constructed from data which is originally extracted from the database
schema, imported into my Data Dictionary, then exported into a PHP
script which is then included when the table's class is instantiated.

What third party system are YOU using for that?

The Data Dictionary is one that I wrote, and is available in my open source
Radicore framework.

I've got exactly the same via ADOdb, but just about every project USING
it produces their own set of code for 'extracting' since the constraint
rules are not part of the core package. PDO has made things even worse
rather than better :(

I don't use PDO as it was not created until years AFTER I had built my own
solution. Besides, PDO does not do data validation.

This is simply about a standard method of adding constraint and other
data to the $fieldarray. Even Yasuo's filter extensions need a third
party tool to build HIS set of rules from every body else’s version of
storing them ...

I have no intention of using Yasuo's filter extension as it does not provide
anything that I need. I have already written code that does the job, and I
see no reason to change it.

--
Tony Marston

8 years ago by Lester Caine — view source

unread

The Data Dictionary is one that I wrote, and is available in my open
source Radicore framework.

I don't use PDO as it was not created until years AFTER I had built my
own solution. Besides, PDO does not do data validation.

I have no intention of using Yasuo's filter extension as it does not
provide anything that I need. I have already written code that does the
job, and I see no reason to change it.

SNAP ...
Only problem is none of mine plays well with 'composer/PSR' style of
module management :(

--
Lester Caine - G8HFL

8 years ago by Marco Pivetta — view source

unread

Hey Tony,

On Sun, Aug 14, 2016 at 10:50 AM, Tony Marston TonyMarston@hotmail.com
wrote:

"Marco Pivetta" wrote in message news:CADyq6sKZRBvYFtqyKYVYM4iU
Ex+2OuUJvHeP1jznM56k3+hznA@mail.gmail.com...

So much confusion...

There are 3 (or more) types of validation in pretty much every web-app, so
let's please stop calling it simply "validation".

frontend validation (unsafe/unreliable), prevents invalid data
submission, makes things friendlier for the user

form validation, in the application layer (your mvc framework or your
entry point php script): returns error messages to the client (in HTML or
json or whatever you prefer). It is still not the source of truth

domain validation, in the core of your domain logic, throws exceptions
and generally stops execution whenever invalid data is provided: no nice
error messages required, except for logging/debugging purposes

I do not perform any javascript validation in the frontend.

I do not distinguish between the application layer and the domain layer.
My framework is based on a combination of the 3 Tier Architecture with its
separate Presentation, Business and Data Access layers and MVC where the
Model is the same as the Business layer and the Controller and View exist
in the Presentation layer. All data validation is performed in the
Model/Business layer objects.

User-side validation, together with simplistic form validation, are both
unreliable, and are just added as a layer to "make errors nice and
comprehensible to the user".

Domain-layer validation does cause exceptions to be thrown.

You are supposed to build them in this order regardless:

domain validation (fails hard, makes your app crash on purpose,
non-recoverable by design)
frontend/form validation (fails with error messages to be returned to
the user-agent)
client-side validation (just added UX, nothing else)

The fact that you don't distinguish between application and domain layer is
mostly your problem: means that you will have an incomprehensible mix of 1
and 2 at some point (from what I saw in your answer, you seem to have the
typical CRUD/anemic domain).

All primary validation - that which is required to verify that the data
for each column matches the specifications for that column - is carried out
by a standard routine within the framework and does not require ANY code to
be written by the developer. The standard routine uses a list of field
specifications which originate from the table's database structure. The
only validation which requires developer intervention is what I call
secondary validation, such as checking that date1 is greater than date2.

These rules should be made explicit.
It's fine to have them inferred, having a DSL for them, a validation
framework, etc., but they need to be clear, as they are part of your API
specification.

I NEVER throw exceptions for validation errors as they are NOT exceptions.

They are common occurrences, and my validation routine can produce multiple
errors whereas it could only throw a single exception.

An error or an exception are the same thing, where I come from, Error being
a sub-type of Exception.
User input is neither an Error nor an Exception, it's just a set of data
that you label as valid/invalid, plus you tell "why".
That's a function. You then define if continuation in your program's
execution requires validity, or if an invalid data handler should produce a
different response.

If you are still writing code to perform primary validation on each field
then your coding style is way behind the times.

If you want the language to change to perform this validation for you I
would strongly suggest that you first learn how to write a standard
validation function in userland code - which I did over 10 years ago -
instead of trying to make the language more complicated just to cover up
your own shortcomings.

There are dozens and dozens of userland stable/maintained/secure libraries
that handle these sort of problems:

And a load more (just peek into https://packagist.org/search/?q=validation).
The most relevant ones are usually maintained by knowledgeable people with
an acute sense for software security.
Dragging more of these features into the language would just complicate
things for php-src (security-wise), and reduce the bugfix/patching speed,
since sysadmins don't upgrade PHP versions that often (sigh).

If the entire suggestion is about what Lester said (standardizing currently
existing approaches), then I suggest that you bring this up with the
authors/maintainers of those libraries first, asking for what they actually
need, or what their users' needs are.

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

8 years ago by Tony Marston — view source

unread

"Marco Pivetta" wrote in message
news:CADyq6s+LjfL5g2mLNz=XSk7BWT=P3vFTGsiCGJGCK-tz0ce5KA@mail.gmail.com...

Hey Tony,

On Sun, Aug 14, 2016 at 10:50 AM, Tony Marston TonyMarston@hotmail.com
wrote:

"Marco Pivetta" wrote in message news:CADyq6sKZRBvYFtqyKYVYM4iU
Ex+2OuUJvHeP1jznM56k3+hznA@mail.gmail.com...

So much confusion...

There are 3 (or more) types of validation in pretty much every web-app,
so
let's please stop calling it simply "validation".

frontend validation (unsafe/unreliable), prevents invalid data
submission, makes things friendlier for the user

form validation, in the application layer (your mvc framework or your
entry point php script): returns error messages to the client (in HTML
or
json or whatever you prefer). It is still not the source of truth

domain validation, in the core of your domain logic, throws
exceptions
and generally stops execution whenever invalid data is provided: no nice
error messages required, except for logging/debugging purposes

I do not perform any javascript validation in the frontend.

I do not distinguish between the application layer and the domain layer.
My framework is based on a combination of the 3 Tier Architecture with
its
separate Presentation, Business and Data Access layers and MVC where the
Model is the same as the Business layer and the Controller and View exist
in the Presentation layer. All data validation is performed in the
Model/Business layer objects.

User-side validation, together with simplistic form validation, are both
unreliable, and are just added as a layer to "make errors nice and
comprehensible to the user".

Domain-layer validation does cause exceptions to be thrown.

I disagree. Common data validation errors are NOT exceptions. Exceptions
should only be thrown in EXCEPTIONAL circumstances and normally indicate a
problem in the code that needs fixing.

You are supposed to build them in this order regardless:

domain validation (fails hard, makes your app crash on purpose,
non-recoverable by design)

frontend/form validation (fails with error messages to be returned to
the user-agent)

client-side validation (just added UX, nothing else)

Again I disagree. Data validation is performed in only ONE place, and that
is the business layer (the Model in MVC).

The fact that you don't distinguish between application and domain layer is
mostly your problem:

It is not a problem at all. My code works perfectly well, and as it does not
cause any problems it does not need fixing.

means that you will have an incomprehensible mix of 1

and 2 at some point

It is not incomprehensible at all. Data validation is performed by a single
routine which is automatically called by the framework code and does not
require any intervention by the developer.

(from what I saw in your answer, you seem to have the
typical CRUD/anemic domain).

Wrong on both counts. While I do have transaction patterns which can perform
the basic Create/Read/Update/Delete operations on any of my 300 database
tables that accounts for only 4 patterns while my library actually contains
over 40 patterns. That immediately says that my framework is able to provide
much, much more than the 4 basic CRUD operations.

Some people look at my code and say that my classes are anemic while others
say the opposite and accuse me of creating a GOD" class that tries to do
everything. Those people are all wrong. I have a large abstract table class
which contains all the standard functions that can be performed on any
database table, while each concrete table class contains only that code
which is specific to a particular database table. All the standard code is
inherited from the abstract class.

All primary validation - that which is required to verify that the data
for each column matches the specifications for that column - is carried
out
by a standard routine within the framework and does not require ANY code
to
be written by the developer. The standard routine uses a list of field
specifications which originate from the table's database structure. The
only validation which requires developer intervention is what I call
secondary validation, such as checking that date1 is greater than date2.

These rules should be made explicit.
It's fine to have them inferred, having a DSL for them, a validation
framework, etc., but they need to be clear, as they are part of your API
specification.

Validation rules are NOT part of any API specification. An API contains
nothing more than a method name and its arguments. Data validation rules can
be extracted from the table's structure in the database schema.

I NEVER throw exceptions for validation errors as they are NOT exceptions.

They are common occurrences, and my validation routine can produce
multiple
errors whereas it could only throw a single exception.

An error or an exception are the same thing, where I come from, Error being
a sub-type of Exception.

I disagree. You may have been taught that, but it just points to bad
teaching. I was writing database applications with data validation rules for
25 years before I switched to OOP with PHP, and none of those previous
languages had exceptions. When I first heard about exceptions (which did not
exist in PHP4) I read that they were only to be used in exceptional
circumstances, and there was no indication that data validation errors fell
into that category. I have NEVER used exceptions for simple data validation
errors, and I see no reason to change the habits of a lifetime.

User input is neither an Error nor an Exception, it's just a set of data
that you label as valid/invalid, plus you tell "why".
That's a function. You then define if continuation in your program's
execution requires validity, or if an invalid data handler should produce a
different response.

If you are still writing code to perform primary validation on each field
then your coding style is way behind the times.

If you want the language to change to perform this validation for you I
would strongly suggest that you first learn how to write a standard
validation function in userland code - which I did over 10 years ago -
instead of trying to make the language more complicated just to cover up
your own shortcomings.

There are dozens and dozens of userland stable/maintained/secure libraries
that handle these sort of problems:

https://zendframework.github.io/zend-validator/

http://symfony.com/components/Validator

https://github.com/Respect/Validation

I don't use any of those as they did not exist when I developed my own
framework with its own method of performing data validation.

And a load more (just peek into
https://packagist.org/search/?q=validation).
The most relevant ones are usually maintained by knowledgeable people with
an acute sense for software security.
Dragging more of these features into the language would just complicate
things for php-src (security-wise), and reduce the bugfix/patching speed,
since sysadmins don't upgrade PHP versions that often (sigh).

If the entire suggestion is about what Lester said (standardizing currently
existing approaches), then I suggest that you bring this up with the
authors/maintainers of those libraries first, asking for what they actually
need, or what their users' needs are.

I don't care about other people's validation libraries as they did not exist
when I started using PHP in 2002. I built my own framework with its own way
of performing data validation, and this has worked flawlessly for over
years. While I have added to the code over the years to provide enhancements
and the ability to deal with new circumstances, I have never seen the need
to replace it with somebody else's inferior offering.

--
Tony Marston

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

People keep complaining that I do not contribute any proposals to
improve PHP, which to some extent s correct. Except the one thing that I
keep trying to get a handle on is tidying validating of the basic
variables that are the heart of PHP.

validate_var_array() is a case in point, since ALL it should do is
handle an array of simple variables for which we can define REAL
validation rules rather than just a very restricted 'type' rule.
Massaging the way the content of a variable is presented is another part
of the basic functions of handling a variable, and simply providing an
escape option which can be set as part of the variable rules set
eliminates the need for 'New operator (short tag) for context-dependent
escaping' and similar tangential matters. If we have a set of rules
wrapping a variable then everything else just follows on, and the SQL
domain model allows a group of variables to take an identical se of rules.

These are the sorts of things any decent user world library can and does
provide, but if the clock was rolled back prior to all the trouble
created by 'strict typing' and we started again with a more well defined
simple variable I'm sure that much of the conflict could have been
resolved by allowing full validation checks to control an error or
exception depending on the 'style' of PHP a programmer prefers.

If a function is going to return a variable and that variable has under
the hood a set of validation rules, then one can return an error if the
result is faulty. Or even allow a NULL return if a valid answer is not
available ... if that is the style of programming one prefers.
Exceptions handle unmanaged errors, while proper program flow handles
managed ones!

Wrap these intelligent variables inside a class and one can create more
powerful objects but ones which still use all the basic functionality.
Similarly an array of them can be asked to provide a simple 'yes/no' if
all of the variables pass their validation check, or an array of
elements which need processing.

It sounds you are looking for autoboxing (or at least something similar)

https://wiki.php.net/rfc/autoboxing

I like this proposal, BTW. I'm not sure performance impact, though.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Lester Caine — view source

unread

It sounds you are looking for autoboxing (or at least something similar)

https://wiki.php.net/rfc/autoboxing

That is interesting, and is probably something I would expect to come
out in the wash with making a more intelligent variable. Except with
PHP's loose casting style I would expect 'array_sum' to simply take a
loose cast numeric version of every element. The tidy I think I am
looking for is that 'is_num' rules on each variable would control the
result. if any is 'null' the result is 'null' in normal SQL practice, or
switch strict mode on and the first 'is_num' that fails throws an exception.

I like this proposal, BTW. I'm not sure performance impact, though.

What I am still missing is an understanding of just how the global
library of functions which act on a variable works internally with the
'list' of declared variables. People keep saying 'you just create a new
object' but in my book still that object is a fixed set of code - the
code library - and a variable set of data - the variable. Yes if the
variable now has a flag which says 'constrained' then there will be an
additional set of data with the constraints and as Rowan says, one has
to decide where that is processed and what you do with the result, but
the global code will check the 'constraint' element and see 'null' if it
has not been processed, valid, or some failure message such as 'over
limit'.

CURRENTLY the constraint element is handled in user code working with a
data set provided by docblock or other external storage means, SQL
schema for example. From a performance point of view I still prefer that
a lot of this is done in the IDE and that IS managing a lot of what we
are talking about and has been since the 2004 date of that rfc. But
almost every form I code on every website has a set of rules to
constrain each input and that data needs to be used in the code to
validate the variables being created, so isn't now the time to simply
add global functions that provide a single built in standard for
handling this problem?

From a practical point of view of cause, the validation of inputs may
well be done in the browser so that the constraints get passed TO some
html5 check, or javascript function. So having uploaded the form one
COULD simply tag a variable as valid? Or run the PHP validation as a
safety check. All of this is workflow and that workflow could include a
simple array function on the input array, but that still requires that
there are a set of constraint rules for each element of the array ...
applied to each variable ... so why can't we simply improve the variable?

--
Lester Caine - G8HFL

8 years ago by michal@brzuchalski.com — view source

unread

2016-08-12 9:51 GMT+02:00 Lester Caine lester@lsces.co.uk:

It sounds you are looking for autoboxing (or at least something similar)

https://wiki.php.net/rfc/autoboxing

That is interesting, and is probably something I would expect to come
out in the wash with making a more intelligent variable. Except with
PHP's loose casting style I would expect 'array_sum' to simply take a
loose cast numeric version of every element. The tidy I think I am
looking for is that 'is_num' rules on each variable would control the
result. if any is 'null' the result is 'null' in normal SQL practice, or
switch strict mode on and the first 'is_num' that fails throws an
exception.

I like this proposal, BTW. I'm not sure performance impact, though.

What I am still missing is an understanding of just how the global
library of functions which act on a variable works internally with the
'list' of declared variables. People keep saying 'you just create a new
object' but in my book still that object is a fixed set of code - the
code library - and a variable set of data - the variable. Yes if the
variable now has a flag which says 'constrained' then there will be an
additional set of data with the constraints and as Rowan says, one has
to decide where that is processed and what you do with the result, but
the global code will check the 'constraint' element and see 'null' if it
has not been processed, valid, or some failure message such as 'over
limit'.

CURRENTLY the constraint element is handled in user code working with a
data set provided by docblock or other external storage means, SQL
schema for example. From a performance point of view I still prefer that
a lot of this is done in the IDE and that IS managing a lot of what we
are talking about and has been since the 2004 date of that rfc. But
almost every form I code on every website has a set of rules to
constrain each input and that data needs to be used in the code to
validate the variables being created, so isn't now the time to simply
add global functions that provide a single built in standard for
handling this problem?

From a practical point of view of cause, the validation of inputs may
well be done in the browser so that the constraints get passed TO some
html5 check, or javascript function. So having uploaded the form one
COULD simply tag a variable as valid? Or run the PHP validation as a
safety check. All of this is workflow and that workflow could include a
simple array function on the input array, but that still requires that
there are a set of constraint rules for each element of the array ...
applied to each variable ... so why can't we simply improve the variable?

How does it differ from userland components/libraries like:

Symfony Form https://github.com/symfony/form
ZendFramework Form https://github.com/zendframework/zend-form
Phalcon Form https://docs.phalconphp.com/en/latest/reference/forms.html

and etc. They have additional functionalities to build form but validation
looks the same you want to acheive.
If most important thing of what you want to achieve is easy to use user
input validation there are plenty of lib which does that.
There you can put build in and own constraints, validate array and retrieve
valid data.
What is the reason such feature should be built in PHP language?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

--

--
pozdrawiam

Michał Brzuchalski

8 years ago by Rowan Collins — view source

unread

From a practical point of view of cause, the validation of inputs may
well be done in the browser so that the constraints get passed TO some
html5 check, or javascript function. So having uploaded the form one
COULD simply tag a variable as valid?

Just a reminder to you and anyone else reading: NEVER TRUST USER INPUT.
You can add all the JS in the world to your forms, but a user can always
ignore that and craft their own input with whatever data they like in it.

8 years ago by Lester Caine — view source

unread

From a practical point of view of cause, the validation of inputs may
well be done in the browser so that the constraints get passed TO some
html5 check, or javascript function. So having uploaded the form one
COULD simply tag a variable as valid?

Just a reminder to you and anyone else reading: NEVER TRUST USER INPUT.
You can add all the JS in the world to your forms, but a user can always
ignore that and craft their own input with whatever data they like in it.

Many of my systems run on secure intra-nets and much of the 'safety
concerns' that have been brought up recently as 'essential' simply don't
apply. YES for web services that anybody has access to then 'NEVER TRUST
USER INPUT' is the rule, but for a simple local network only system then
one can trust that the browser is doing the right thing. It's one of the
reasons I've not been able to convert a number of sites since they don't
have a problem :(

--
Lester Caine - G8HFL

8 years ago by Rowan Collins — view source

unread

Many of my systems run on secure intra-nets and much of the 'safety
concerns' that have been brought up recently as 'essential' simply don't
apply.

There's always rogue employees / students / visitors with temporary
access... But yes, IF you trust your users 100% to be non-malicious,
non-curious, and uninfected, THEN you can trust your user input. :)

8 years ago by Peter Lind — view source

unread

Many of my systems run on secure intra-nets and much of the 'safety
concerns' that have been brought up recently as 'essential' simply don't
apply.

There's always rogue employees / students / visitors with temporary
access... But yes, IF you trust your users 100% to be non-malicious,
non-curious, and uninfected, THEN you can trust your user input. :)

You forgot non-clumsy. Typos also happen and can have problematic results.

You cannot trust user input. End of discussion.

--
CV: careers.stackoverflow.com/peterlind
LinkedIn: plind
Twitter: kafe15

8 years ago by Lester Caine — view source

unread

Many of my systems run on secure intra-nets and much of the 'safety
concerns' that have been brought up recently as 'essential' simply don't
apply.

There's always rogue employees / students / visitors with temporary
access... But yes, IF you trust your users 100% to be non-malicious,
non-curious, and uninfected, THEN you can trust your user input. :)

You forgot non-clumsy. Typos also happen and can have problematic results.

You cannot trust user input. End of discussion.

That someone puts in Joens rather than Jones is a fact of life, and will
result in records that can't be matched. But a UK formatted date
validated in the browser makes checking it's in a valid range easier in
the PHP end. It's simply a matter of just what you can test and where,
and if needs be the system keeps track of who is making mistakes in data
entry and their supervisor deals with them. THAT is a report my CMS
systems have had from day one :) But if they have stolen someone else’s
access card then all bets are off. But there is no 'delete' function on
the data so all changes are recorded.

--
Lester Caine - G8HFL

8 years ago by Peter Lind — view source

unread

On 12 August 2016 at 11:54, Rowan Collins rowan.collins@gmail.com
wrote:

Many of my systems run on secure intra-nets and much of the 'safety
concerns' that have been brought up recently as 'essential' simply
don't
apply.

There's always rogue employees / students / visitors with temporary
access... But yes, IF you trust your users 100% to be non-malicious,
non-curious, and uninfected, THEN you can trust your user input. :)

You forgot non-clumsy. Typos also happen and can have problematic
results.

You cannot trust user input. End of discussion.

That someone puts in Joens rather than Jones is a fact of life, and will
result in records that can't be matched. But a UK formatted date
validated in the browser makes checking it's in a valid range easier in
the PHP end. It's simply a matter of just what you can test and where,
and if needs be the system keeps track of who is making mistakes in data
entry and their supervisor deals with them. THAT is a report my CMS
systems have had from day one :)

And if all typos were switching 'e' and 'n', what a wonderful world it
would be. That is not the case though - it's possible to accidentally enter
" and > too.

But if they have stolen someone else’s
access card then all bets are off. But there is no 'delete' function on
the data so all changes are recorded.

No, all bets are not off. That's the whole point of defense in depth.

--
CV: careers.stackoverflow.com/peterlind
LinkedIn: plind
Twitter: kafe15

8 years ago by Lester Caine — view source

unread

And if all typos were switching 'e' and 'n', what a wonderful world it
would be. That is not the case though - it's possible to accidentally enter
" and > too.
And the browser validation strips them and handles the ' when used in
text fields.

--
Lester Caine - G8HFL

8 years ago by Peter Lind — view source

unread

And if all typos were switching 'e' and 'n', what a wonderful world it
would be. That is not the case though - it's possible to accidentally
enter
" and > too.
And the browser validation strips them and handles the ' when used in
text fields.

No, it doesn't. It likely handles it on most browsers, but that still
doesn't mean you're safe - you don't know when someone suddenly decides to
try a new browser that doesn't behave the way you think it will.

--
CV: careers.stackoverflow.com/peterlind
LinkedIn: plind
Twitter: kafe15

8 years ago by Lester Caine — view source

unread

And if all typos were switching 'e' and 'n', what a wonderful world it
would be. That is not the case though - it's possible to accidentally
enter
" and > too.
And the browser validation strips them and handles the ' when used in
text fields.

No, it doesn't. It likely handles it on most browsers, but that still
doesn't mean you're safe - you don't know when someone suddenly decides to
try a new browser that doesn't behave the way you think it will.

Don't get me started on 'browsers' ... that is yet another hot potato
when it comes to public websites and keeping sites working when you have
no idea if the client has a suitable browser is another problem.
Fortunately my council sites keep tight control on the browser side so
that NONE of their web based systems break. Heck some are STILL on IE6!
That is why the likes of the Metropolitan Police are still using XP.

So far when a browser update trashes the javascript the pages don't work
at all so nothing can be submitted, but it is yet another mess that one
has to cope with :(

Simple variable handling.

-- Lester Caine - G8HFL

-- pozdrawiam

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

Regards,

-- Lester Caine - G8HFL

Regards,

Regards,

Regards,

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- pozdrawiam

-- Lester Caine - G8HFL

Regards,

-- Lester Caine - G8HFL

Regards,

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- pozdrawiam

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
pozdrawiam

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
pozdrawiam

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
pozdrawiam

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL