Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123774 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id BA7941A009C for ; Mon, 24 Jun 2024 04:19:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719202831; bh=P3Ww56eNf2xCECrkUrGZdOoHbzbbm6PdVy3TEva3Sao=; h=References:In-Reply-To:From:Date:Subject:To:From; b=KKZgUgx3WMsEXiTGk/D2gfWXSMz+lyXtW9S487HjYDQZ3q9MtxfGq+f+3EC5iL6vR dr3pZKuPrRaJJdHipQJ+0BbQFzELDefp1ME9o87K7DWzWyyYobkv9mmMLyTkg181Ac o2/XoDr+KhzDLJQKgQIAN5w2NY/5iEwziFTPtJ6ht95NvcXu9aUzr0VP3dpWXFDOgv bsQWwF87jRobEhk6B1d/zMvzTKEl9aCrHolBtNb9hbk1sMK8vCOlReHXgMSghG4LJr c563VYpER+ukIlKJMH02EneRh5S8tuRJhD16QCGPJVcIR2gHzPdwRtxooGhU6WPfvp tj5N6dfz5F7dQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A54F118068B for ; Mon, 24 Jun 2024 04:20:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 24 Jun 2024 04:20:30 +0000 (UTC) Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-3635f991f4bso282319f8f.0 for ; Sun, 23 Jun 2024 21:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719202752; x=1719807552; darn=lists.php.net; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=RhQftLKq/eOec4hF0dqej7B2530YqiS4djA2e55y4CI=; b=DAczlEn1TkLIb9AQ5oKJn6gSvSB9wB9cciFzB8tOYwEZJVqZQ/8Xhw1gZa4HTN1cwz 7wYQwnW3+FZSQM0khD2L6e8cstWyw24MhjJG2CfQRi1xoFXPuxd3DQi+xX5qdf5FXzSZ ArqNgu23GvTfEw3oh2Mq95ORZtrncxYXtbHFjelTBDJUFsxTMRC8Y6vgaqQiMr14BK7w mk3r/Z3hU4q1SbJW6ybYF/thnPC3O0Aq5cxDLw+uYDDPncbzVppLE+eSxGBL9uX1dLqa rH5To5rmOYUtkFsmEGG4UFTgmDL+ZlWqfjGoFlGonNlblELZPKnc/wNLTP1YRqqucxJc lkOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719202752; x=1719807552; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RhQftLKq/eOec4hF0dqej7B2530YqiS4djA2e55y4CI=; b=qRlf/PJ4r9MxGgqBlCt3A/UJhtb7ymm4uS3gKb5ycj16p7uYsUTYdZXS4kp3D0u0Bo jgXiS8kwVzsmHM14iGnthju+DfsGNHZl7JL972HnO6DoVH2dMTZ7Fx596bMbdURP/OLh mZcAPod9Tkng9gD8OSkykGjHjpYMk5j+ATuwWtWGlq63C89XzoksCYiwFAursc0KTD7y /TzjY7lXxgsOOjPXrSbXXSiGmTX6k7jPr7uPRD7frGp2bOOVIqOPU9uIUUrlu4VG5yWr c0ih9IIs9xng1sqYqlMsg2iA+cc1yiYnxo0tppHOTDOF3Hf+Cn1DhQONJ/a5Jx/Ewyzu zGbw== X-Gm-Message-State: AOJu0YynTDft8FR6AieBC4zIsBPIqVpSqXgQLYgFHs2q8x9Oi5Q4qE23 a2C6MtK6YEP3yu8/eutzpGs71KQsMiFEN8TzKVOfnpbMw0Ycjvjud3V1tch25sPln+YB49DdhTb lT8xZGpyUisSjDonMwUEOKIR4nDgrc+otMJc= X-Google-Smtp-Source: AGHT+IHwNh8zLVrxZnSXCDXbD156l3B6p0y25bBb/eZRs7/H6/j6Z4yztwy1QbJj6a7V4qgHPofYW4v6eQ2FGXJhoAs= X-Received: by 2002:a5d:5983:0:b0:35f:2929:8460 with SMTP id ffacd0b85a97d-366dfa2d903mr3957508f8f.3.1719202752373; Sun, 23 Jun 2024 21:19:12 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <2a6b92eb-d5e9-4a1a-9548-a068ac42ebd2@app.fastmail.com> In-Reply-To: <2a6b92eb-d5e9-4a1a-9548-a068ac42ebd2@app.fastmail.com> Date: Mon, 24 Jun 2024 11:19:00 +0700 Message-ID: Subject: Re: [PHP-DEV] [Early Feedback] Pattern matching To: php internals Content-Type: text/plain; charset="UTF-8" From: the.liquid.metal@gmail.com (Hendra Gunawan) > Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it. [THIS REPLY IS FOR THE CONVENIENCE OF https://externals.io READERS ONLY. SORRY FOR THE UNPLEASANT DISPLAY] Hello Larry. Thanks for this Juicy proposal. I agree that this proposal should be proposed as a whole. If we break it down into smaller parts, there is a chance some features will not pass. I will show you that the range pattern and regex pattern have greater value than what you think, and they must be placed together with the literal pattern (in Core patterns section, not in Possible patterns section). I don't want these patterns to fail in the poll. I don't know the line between "high level" and beyond. I hope that whatever I discuss here is still "high level". 1. Is there any chance of ```is not``` or ```isnot```? I am tired looking this ugly code: ```!($foo instanceof Bar)```. If there is, there should be a restricted version of pattern: no variable binding. It doesn't make sense if the binding happens but the pattern fails. 2. ```..=``` and ```..<``` make people have to remember what the left side is: is it greater-than or greater-than-equal. What if those operators are replaced with ```=..=``` and ```<..=```? We don't have to align with other languages if they have weaknesses. 3. There are "between" and ```>``` operators explicitly shown in the proposal (in Range pattern section). I assume that ```>=```, ```<```, ```<=``` implicitly included. What about the ```%``` operator, is it also included? ```%``` is not directly produces boolean, but it can simplify pattern. Note that, we must include additional rules: if produced number is ```0``` (zero) then it will be converted to ```true```, otherwise ```false```. ``` $foo = 2024; // leap year $foo is %4; // true $bar = 2025; // not a leap year $bar is %4; // false // furthermore: $baz is array<%4>; // true if all its members are leap year ``` 4. There are numeric range patterns explicitly shown in the proposal. Hopefully there are also string range patterns. ``` $birthdate is "2000-01-01 00:00:00" <..< "2019-12-31 23:59:59"; $dateOfDeath is <"1980-01-01"; $name is "larry " =..< "larry!"; // true if first name is "larry" ``` I have no idea about string range patterns for array, since it causes error. If there is currently no solution, hopefully we don't abandon string range patterns entirely. The same case also occurs with numeric range patterns for array. 5. It is great that regular expression native syntax is now a first class citizen in php (at least for pattern matching). I found there are 2 drawbacks in this case: loss of flexibility and repeated identical patterns. These things are not found in string based regular expression. ``` // native syntax $foo is /^https:\/\/(?[^\/]*)/; // string based: we can use any valid char as delimiter, // as long as it is not used in the pattern. $pattern = "/^https:\/\/(?[^\/]*)/"; $pattern = "|^https://(?[^/]*)/|"; // valid! $pattern = "@^https://(?[^/]*)/@"; // valid! // ---------------------- // native syntax $foo is /^https:\/\/(?[^\/]*)/ | StringBox{value: /^https:\/\/(?[^\/]*)/}; // string based: RE stored in variable/constant can be used as many as we need $pattern = "|^https://(?[^/]*)/|"; $foo is @RE($pattern) | StringBox{value: @RE($pattern)}; // 1st pattern is scalar string, 2nd is string encapsulated in a class // furthermore use GenericPattern as GP; class Person { public string $firstName is @RE(GP::NAME_PTRN); public string $lastName is @RE(GP::NAME_PTRN); // ... } ``` Hopefully, string based regular expression is also supported. Honestly, if I could choose whether I should support native or string-based syntax, I would choose to support string-based, as long as native syntax is not yet fully supported in general. ```@RE()``` is just an illustration on how to use variable as a regular expression in the pattern. It is stated that ```@()``` will be used in arbitrary expressions. Correct me if I'm wrong, regexp are one of the things in programming that cannot be manipulated or participate in manipulation with other parties, even with fellow regexp. Regexp must be used alone. That is why we need dedicated syntax for regexp. 6. It is shown that type pattern, literal pattern, class constant pattern, and expression pattern can be used to form a compound pattern. Hopefully, the range pattern and regex pattern have the same luxury. Furthermore, hopefully they can be mixed. ``` $foo is 2000 =..= 2100 & %4; // leap year in 21st century class person { public string $firstName is /LENGTH_PTRN/ & /FORBIDDEN_CHARS_PTRN/ & /FORBIDDEN_WORDS_PTRN/; public string $lastName is @RE(GP::LENGTH_PTRN) & @RE(GP::FORBIDDEN_CHARS_PTRN) & @RE(GP::FORBIDDEN_WORDS_PTRN); // these are hard to maintain public string $firstName is /COMPLEX_NAME_PTRN/; public string $lastName is @RE(GP::COMPLEX_NAME_PTRN); } ``` 7. I noticed that ```as``` is tightly coupled with exception. Can we suppress this exception with ```??```? ``` // it is weird to see this statement (as show in the proposal) $value = $foo as Foo {$username, $password}; // these statements are more make sense $foo as Foo {$username, $password}; $foo is Foo {$username, $password} ?: throw new Exception(); // if it can be suppressed, "as" is more valuable than what people think. $newRect = $rect as Rectangle{width: <=10, height: <=5} ?? new Rectangle(width: 10, height: 5); ``` 8. Is there any type checking for object property pattern? ``` class Circle { public int $radius; } // this statement will always fails unless there is type checking $circle is Circle{radius: "10"}; ``` ~~~~~ 4 of the 8 points I discussed above are related to range pattern and regex pattern. Both are used daily. From my point of view, literal pattern is no more special than range pattern and regex pattern. IMO, both should be placed in Core patterns section, not in Possible patterns section. Regards, Hendra Gunawan.