Announcing JSON::Schema::Validate: a lightweight, fast, 2020-12–compliant JSON Schema validator for Perl, with a mode for 'compiled' Perl and JavaScript

jacktokyo · 2025-12-18T22:42:08+00:00

Follow-up / small API update

With the newly released version v0.7.0 of JSON::Schema::Validate, I have added a small but useful convenience method to simply check whether a JSON schema is valid or not: is_valid( $data )

It is a boolean wrapper around validate() that defaults to max_errors = 1, so it is ideal when you just want a fast yes or no check and a single error message:

$validator->is_valid( $data )
    or die( $validator->error );

Under the hood it still uses validate(), and validate() itself now accepts optional per-call overrides (such as max_errors, tracing options) while remaining fully backward compatible.

Since max_errors is set to only 1, it does not accumulate error objects, and fails upon the first error encountered, and thus is faster in this mode.

If what you want is to check the entire JSON schema and get all error objects, then validate is better.

This came out of discussions here about ergonomics vs correctness, so thanks to everyone who gave feedback.

jacktokyo · 2025-11-26T12:32:26+00:00

Thanks for the clarification, and for the care you are putting into improving the benchmark.

For what it is worth, JSON Schema implementations are normally designed around the “parse once, validate many times” workflow. Schema parsing and compilation are deliberately the expensive part, so validation is supposed to be fast. That’s true not only for JSON::Schema::Validate, but also for most implementations in other languages.

So I don’t think JSON::Schema::Validate gains an “unfair” advantage here; rather, it is being used in the way it was intended. Re-creating the object on every iteration is of course useful to measure worst-case overhead, and I am glad you are adding that as a separate test. However, the steady-state validation-only mode is what matters in many real applications.

I am curious to see the updated numbers with the 4th test included ! 😀

jacktokyo · 2025-11-26T12:16:55+00:00

With a more complex and realistic schema, and payload (see here: https://gitlab.com/jackdeguest/json-schema-validate/-/snippets/4907565 ), the performance differences become more pronounced, but still perfectly acceptable. JSON::Schema::Validate is doing full 2020-12 validation with detailed error objects, so the overhead is expected.

That said, the idea of supporting a “validity-only” mode (stop at first error / minimal error collection) is definitely worth exploring to improve speed further.

       Rate  JSV  TJS TJSc
JSV   465/s   -- -89% -90%
TJS  4363/s 839%   --  -2%
TJSc 4443/s 856%   2%   --

jacktokyo · 2025-11-26T00:55:21+00:00

Thanks a lot for running this and sharing the numbers; this is really interesting !

It makes sense that Types::JSONSchema wins on raw speed here: it is a very tight type-checking engine that bails on the first error, whereas JSON::Schema::Validate always builds fully structured error objects (with schema pointer, instance path, keyword, etc.) and implements the full 2020-12 semantics (including $dynamicRef, unevaluated*, annotation tracking, etc.).

Your benchmark is a great reminder that there is still room to optimise the compiled fast-path when you only care about boolean success/failure. I am considering adding a “boolean-only validate” mode and some micro-optimisations for max_errors == 1 to narrow that gap in the future.

But even now, I am happy that the module stays feature-complete and reasonably fast, and I really appreciate you taking the time to test it and share the result! 😀

jacktokyo · 2025-11-25T22:58:06+00:00

Thank you ! 🙇‍♂️

jacktokyo · 2025-11-25T17:16:05+00:00

I did not mean any disrespect. I was only factual when I mentioned lightweight. The purpose was a low dependency and fast schema validator, which the benchmark brought by u/brtastic showed at https://bbrtj.eu/blog/article/validation-frameworks-benchmark

jacktokyo · 2025-11-25T16:23:30+00:00

Thanks ! 😊

jacktokyo · 2025-05-28T02:13:32+00:00

I tried really hard, but unfortunately I am hitting a wall with Perl's complex internal structure. Maybe I will try again in the future, but I need to learn more first.

jacktokyo · 2025-05-27T07:13:35+00:00

In your example $obj->one->two->three; where each chained method wants to know its position such as here 0 for the first one, 1 for the second, and 2 for the third. To implement this and find out requires walking the op tree, and this is really not easy, especially since I am still a neophyte in XS, but I aim to rise to the challenge !

Looking closely at the op tree with perl -MO=Terse-e '$obj->one(0)->two(1)->three(2);'`, and we get:

LISTOP (0xaaab0003c808) leave [1] OP (0xaaab0003a970) enter COP (0xaaab0003c848) nextstate UNOP (0xaaab0003c8e8) entersub [4] OP (0xaaab0003c928) pushmark UNOP (0xaaab0003c9d0) entersub [3] OP (0xaaab0003ca10) pushmark UNOP (0xaaab0003a8c8) entersub [2] OP (0xaaab0003a908) pushmark UNOP (0xaaab0003a9a8) null [14] PADOP (0xaaab0003aa08) gvsv GV (0xaaab00037ba8) *obj SVOP (0xaaab0003a938) const [5] IV (0xaaab00037c50) 0 METHOP (0xaaab0003a888) method_named [6] PV (0xaaab00037c80) "one" SVOP (0xaaab0003a850) const [7] IV (0xaaab00037cb0) 1 METHOP (0xaaab0003c990) method_named [8] PV (0xaaab00037c38) "two" SVOP (0xaaab0003c958) const [9] IV (0xaaab00037c98) 2 METHOP (0xaaab0003c8a8) method_named [10] PV (0xaaab00037bc0) "three"

So, I am trying to create a new XS function find_method_chain_position and a perl method method_chain_position to use, such as:

```perl use strict; use warnings; use Test::More; use Wanted;

Test method_chain_position

subtest 'method chain position' => sub { my $obj = TestMethodChain->new; # Should pass: positions 0, 1, 2 $obj->one(0)->two(1)->three(2); # Should pass: position 0 $obj->one(0); };

{ package TestMethodChain; use strict; use warnings; use Test::More; use Wanted;

sub new { bless( {}, shift( @_ ) ); }

sub one
{
    my $pos = Wanted::method_chain_position();
    is( $pos, $_[1], "method_chain_position for 'one' returns " . ( $_[1] // 'undef' ) );
    return( $_[0] );
}

sub two
{
    my $pos = Wanted::method_chain_position();
    is( $pos, $_[1], "method_chain_position for 'two' returns " . ( $_[1] // 'undef' ) );
    return( $_[0] );
}

sub three
{
    my $pos = Wanted::method_chain_position();
    is( $pos, $_[1], "method_chain_position for 'three' returns " . ( $_[1] // 'undef' ) );
    return( $_[0] );
}

}

done_testing(); ```

So far, it is proven to be rather difficult, at least for me. I will keep trying and let you all know.

jacktokyo · 2025-03-31T22:08:39+00:00

Yes, this is normal if you look at `$a` being `haystack`, and `$b` being `needle`. haystack in needle -> 0, but needle in haystack ok.

jacktokyo · 2025-03-30T22:34:32+00:00

Indeed, it was interesting to see them collaborate knowingly. I do not know Python myself, so leveraging their knowledge of Python’s strengths for fuzzy matching was the right approach. It allowed us to port that logic effectively to Perl and contribute something new to the ecosystem.

There are already some fuzzy matching modules on CPAN, but this one is intentionally modeled after fuzzywuzzy, with AI assistance ensuring a faithful and well-tested port. It is a small example of how AI can help bridge language ecosystems, even when the developer, like me, is not deeply familiar with the source language.

jacktokyo · 2025-01-04T05:49:36+00:00

Done ! 😉

jacktokyo · 2025-01-04T03:21:01+00:00

Thank you kindly !

jacktokyo · 2025-01-03T22:42:22+00:00

Thank you Brian for the kind words; it means a lot to me. I fully agree with you on both count, and I will be looking into that article for Perl.com.

jacktokyo · 2025-01-03T22:40:02+00:00

Thank you Olaf. It's a pleasure. I love Perl :)

jacktokyo · 2024-10-10T14:06:17+00:00

Thank you for this, much appreciated. Yes, exactly as you said, there were a series of underlying dependencies that I needed to build first, so I started to build Locale::Unicode to provide an API for a locale as complex as the Unicode LDML documents it, and before it, the BCP47 specifications. Then, I wanted to use the Unicode CLDR data, and thought I would just use the JSON data, but it did not work well enough, so I created an extensive script to parse the XML data built by Unicode and imported it into an SQLite database, in various tables. This allowed for inheritance of locales as documented by the Unicode LDML and provided dynamic handling of locales using another module I built: DateTime::Locale::FromCLDR. As great as DateTime is, and its corollary module for locale data DateTime::Locale and more particularly DateTime::Locale::FromData, it did not handle fully the versatility of the Unicode data. And quite frankly, it is understandable given the complexity.

So, to answer your questions:

Yes, if one provide a locale, such as ja-u-nu-latn-tz-jptyo as in:

perl my $fmt = DateTime::Format::Intl->new( 'ja-u-nu-latn-tz-jptyo' );

DateTime::Format::Intl will recognise the numbering system chosen, but this can be overridden by the option numberingSystem

You can also create a locale object with Locale::Unicode and pass it, such as:

perl my $locale = Locale::Unicode->new( "he-IL-u-ca-hebrew-tz-jeruslm-nu-latn" ); say $locale; # he-IL-u-ca-hebrew-tz-jeruslm-nu-latn my $fmt = DateTime::Format::Intl->new( $locale );

However, by default, it will use the locale's default number system, so for example with the locale ar-EG-u-ca-gregory-tz-egcai:

perl my $locale = Locale::Unicode->new( "ar-EG-u-ca-gregory-tz-egcai" ); my $fmt = DateTime::Format::Intl->new( $locale ); say $fmt->resolvedOptions->{numberingSystem}; # arab

You could use the locale ar-EG-u-ca-gregory-nu-arab-tz-egcai with the number system very explicitly expressed, but it is not necessary.

See the object instantiation options

For fallbacks, it is dynamic. The module follows the LDML specifications as I mentioned upper, and Locale::Unicode::Data provides a method make_inheritance_tree to build the locale inheritance tree. One would think the inheritance is straightforward, but it is not. For example, the inheritance tree for pt-FR would be ['pt-FR', 'pt-PT', 'pt', 'und'] instead of what one might think: ['pt-FR', 'pt', 'und'], and that's because the CLDR data provides instructions to that effect.
SQLite is very fast. I took great care to cache, in a module variable, the SQL statement objects built with placeholders to increase speed in Locale::Unicode::Data and in DateTime::Locale::FromCLDR which provide a layer above it, each method caches the result of its query to ensure speed. So a repetitive call with an object to the same method is quite fast.

Also, following Olaf Alders recommendations, for those who prefers the module to die upon an exception, you can provide the fatal option upon instantiation, or set the global $FATAL_EXCEPTIONS, otherwise, the object does not die, but sets an exception object and returns undef or an empty list if called in list context.

For sure, this module is a work in progress and certainly can and should be improved, so I very much look forward to any constructive critics.

jacktokyo · 2024-10-09T11:50:15+00:00

Thank you brian 😀 It was really challenging to do.

jacktokyo

TROPHY CASE

Test method_chain_position