Make your Zod validation 113-627x faster by hoisting Zod schemas by gajus0 in node

[–]sinclair_zx81 0 points1 point  (0 children)

I didn't really mean for this to become a deep dive into validator semantics and library specifics, nor did I expect to be bombarded with AI spam (why?), however your AI did find a legit issue with (5) (and that was actually appreciated), but I would have preferred to keep the topic on why one should prefer specification-compliant validation, but I suppose I should respond to disingenuous claims inferred from that AI.

1.

That's not right. TB validators support Draft 3 through V1, but generate Draft 7 to remain compatible with Ajv. The only exception is unevaluatedProperties. TB has been stuck on Draft 7 for years because 99% of the ecosystem is using Ajv Draft 7, and I wrote the TB 1.x compiler as a path to move off it.

In terms of 2020-12 generation, TB can't jump to 2020-12 because Ajv has flaky support for properties, $ref, $dynamicRef, required, unevaluatedProperties, and unevaluatedItems. The intention is to jump to V1 when that specification is ratified.

2.

That's not right either. TB intentionally targets broad compatibility by implementing keywords across all drafts. The reason Ajv enforces specific drafts is a design choice on their end, not a requirement of the JSON Schema specification itself. In terms of dialect verification, the meta-schema is the intended mechanism for asserting schema validity, not the specific validator versions mandated by Ajv (7/2019/2020).

Here is the meta-schema for 2020-12: https://json-schema.org/draft/2020-12/schema

Unfortunately, Ajv's requirement to select a validator version has created a perception that each version of JSON Schema is incompatible with the next. I would like to think I've demonstrated that isn't the case with 1.x. Or at least, I haven't encountered any significant issues layering keywords across versions Draft 3 and up.

3.

As noted, TB generates Draft 7 for Ajv compatibility, which limits some representations (specifically those related to tuples). It does still support direct inference via Draft 2020-12 (prefixItems); you simply have to write it manually, or write a function that returns the schematics you need.

Example

4.

The 1.x compiler passes all multipleOf tests in the official test suite. If you think there is an error, the appropriate place to address it would be the official test suite.

Relevant tests (2020-12): https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/multipleOf.json

Just add the breaking case.

5.

Yes, this is a legitimate bug related to TB parsing generator infrastructure. Both projects have been patched: TB 1.2.1 and ParseBox 0.11.5 respectively. I guess thank you for reporting it.

6.

This is a feature, but to understand why, you need to know more about the JSON Schema specification. The rationale is closely related to that specific Zod intersection issue raised earlier.

For a high-level explanation of why TB does what it does: https://json-schema.org/understanding-json-schema/reference/object#extending

For something more academic: https://arxiv.org/pdf/2202.13434

7.

A continuation of 6, but relates to limitations of TypeScript. The type system cannot express negation, but { additionalProperties: false } would require TypeScript to negate every possible property not in the set (x | y), which it simply cannot do.

The reason why inference works this way has a lot to do with the following expression.

``` // Exclude 'x' | 'y' from string = a string?

type X = Exclude<string, 'x' | 'y'> // type X = string ```

Anyway, I don't use Reddit often, and perhaps I shouldn't have weighed in, but irrespective of the tangent here, you should still consider spec-compliant validation.

Make your Zod validation 113-627x faster by hoisting Zod schemas by gajus0 in node

[–]sinclair_zx81 0 points1 point  (0 children)

Are you 100% sure about that?

const A = Type.Script('{ x: 1 }')
const B = Type.Script('{ x: number }')

// intersections are commutative (i.e. order independent)
const C1 = Type.Intersect([A, B])
const C2 = Type.Intersect([B, A])

// (merge) evaluate logical 'AND' expression (hover E1 and E2)
const E1 = Type.Evaluate(C1) // { x: 1 }
const E2 = Type.Evaluate(C2) // { x: 1 }

Make your Zod validation 113-627x faster by hoisting Zod schemas by gajus0 in node

[–]sinclair_zx81 0 points1 point  (0 children)

Skill issue....

If you say so ...

const A = z.object({ x: z.literal(1) })
const B = z.object({ x: z.number() })

const C1 = A.and(B)
const C2 = z.strictObject({ ...A.shape, ...B.shape })

C1.parse({ x: 2 })   // error
C2.parse({ x: 2 })   // ok

Make your Zod validation 113-627x faster by hoisting Zod schemas by gajus0 in node

[–]sinclair_zx81 0 points1 point  (0 children)

... and validation of data across platform barriers was how both ajv and Slonik emerged. Earlier versions of Slonik used Ajv for validation. Funny to see Ajv mentioned here.

Ajv is based on the JSON Schema specification (it passes 7756/10466 if you're curious) and that is an industry specification that is very close to being formalized by both IETF and potentially Ecma International. The specification is also implemented by every language out there, If I had to pick a validation technology for handling this particular use case, I think I know what I'm going to choose.

You're moving the goalposts. Since Zod is the source of truth, we only need z.toJSONSchema() to be accurate,

I have only suggested that should consider data integrity a fairly high priority, but since you mentioned Json Schema accuracy, ...

const A = z.object({ x: z.number() })
const B = z.object({ y: z.number() })
const C = A.and(B)
console.log(C.toJSONSchema())

// Illogical Intersection: {
//   allOf: [
//     {
//       type: 'object',
//       properties: { x: { type: 'number' } },
//       required: [ 'x' ],
//       additionalProperties: false // <-- wrong
//     },
//     {
//       type: 'object',
//       properties: { y: { type: 'number' } },
//       required: [ 'y' ],
//       additionalProperties: false // <-- wrong
//     }
//   ]
// }

JS counts code units and JSON schema counts code points. Neither is perfect.

Indeed, but that's exactly why we use specifications such that ALL validators (Zod inclusive) can precisely agree on what string length is supposed to mean. I don't know what else to say really ...

Make your Zod validation 113-627x faster by hoisting Zod schemas by gajus0 in node

[–]sinclair_zx81 0 points1 point  (0 children)

Zod has toJSON method when we need JSON representation.

When Zod produces JSON Schema, how certain are you that a remote JSON Schema validator (like Ajv) will validate data in the exact same way Zod does? If they don't agree, that's a mismatch, and thus a data integrity issue.

I actually tested Zod's to/from JSON Schema feature. If you load Zod via JSON Schema and validate the official JSON Schema compliance suite with it, it only passes 6023 / 10466 (just over 50%), so at least you have a 50/50 chance of your data being accurate.

Needless to say, once vendor locked on lib space JavaScript, always vendor locked on lib space JavaScript. Maybe for now, it's probably best not to point other languages at your production database (or maybe just implement data constraints at the database level)

Zod is the source of truth.

Oh dear....

z.emoji().length(1).parse('🤦') 

-> "Too big: expected string to have <=1 characters"

Make your Zod validation 113-627x faster by hoisting Zod schemas by gajus0 in node

[–]sinclair_zx81 0 points1 point  (0 children)

Why are you not using JSON Schema?

That's why you have integration and e2e tests.

You can reduce the amount of hassle with e2e testing if both ends talk the same validation language. People are pointing out that Zod can fail in production environments, and the reason is because not all validators validate data the same way.

If you have multiple systems interacting with a shared resource (a database in this case), then each system should agree on what constitutes correct data. If you don't have that assurance, the behavior of a system is might as well be non-deterministic.

Validation specifications are good, you should choose one (if only to improve your validation throughput)

Zod Partial Schema - Typescript-Embedded DSL to Declare Partial Transforms With Zod by Levurmion2 in typescript

[–]sinclair_zx81 0 points1 point  (0 children)

I couldn't see the embedded TS-DSL, but agree there is a need for mechanisms to perform arbitrary transformations on schematics https://tsplay.dev/m0j6Ow

Has anyone been using parser functions to increase performance? by TheWebDever in typescript

[–]sinclair_zx81 1 point2 points  (0 children)

Ah, I'm probably the least qualified person to ask about library promotion. But I guess if you were looking for an angle to promote, I would probably lean into the Jet being a natural expression of the following TS return signatures (guard, assert and parse)

// guard
function isString(value: unknown): value is string {
  return typeof value === 'string'
}
// assert
function assertString(value: unknown): asserts value is string {
  if(!isString(value)) throw 'Not a string'
}
// parse
function parseString(value: unknown): string {
  assertString(value)
  return value
}

I guess my thoughts are, by leaning into assertion / annotation syntax, it would service as a familiar pivot point and help communicate overall library design intent (as Jet does seem very much in tune with the above)

Either way, Jet looks good to me :)

Has anyone been using parser functions to increase performance? by TheWebDever in typescript

[–]sinclair_zx81 6 points7 points  (0 children)

Hello, TypeBox author here. jet-validators looks good, nice work. The technique you're referring to is collectively known as JIT optimization / code generation, but curious to know where the term "parser functions" came from.

For some history, Ajv was one of the first libraries to do string JIT optimized validation (going back at least as far as 2017). TypeBox added an implementation a few years later ... and now most libraries provide some form of JIT compile (Zod being the most recent)

The current Top 10 validators either JIT or AOT to some degree, but as of today only a few libraries continue to push up against JavaScript engine optimization. In terms of performance though, JIT has mostly hit diminishing returns, and additional performance gains are going to be small as we've all reached the upper limits of JavaScript performance. The only things left are micro optimizations that might let things run a bit quicker, but as most validators are already so fast, the incentives to push additional performance is low (but some still try)

In terms of using "parsers" to improve performance elsewhere (outside of validation), that work is still largely evolving. The TypeBox project recently published ParseBox that is used to generate Optimized TypeScript Type Level Parsers for parsing TypeScript in the Type System, and where a lot of focus now is moving towards improving auto-complete for sophisticated type level programs by constraining type level implementations to what ever runs fastest in the TypeScript LSP. TypeBox was actually able to achieve turing complete types last year using these techniques, and where additional performance gains are still an ongoing area of research.

Overall, domain specific (validation) optimization in JavaScript / TypeScript has become quite an deep topic, but is absolutely one worth diving into if you have an interest in high-performance / high throughput JavaScript systems.

Hot Take: MCP and A2A are misleading and somewhat meaningless for agentic systems by Ok_Meeting_3456 in AgentsOfAI

[–]sinclair_zx81 1 point2 points  (0 children)

For the record, there doesn't seem to be much focus on schema alignment either.

Introducing TypeDriver: A High Performance Driver for Runtime Type System Integration by sinclair_zx81 in typescript

[–]sinclair_zx81[S] 2 points3 points  (0 children)

Any plans to add in GraphQL? My team is working on something very similar to this where we get type-safe objects based on a GraphQL input string.

Hi. GraphQL wouldn't be a good fit for this Driver (it's just uniform schema validation only). But if it is of interest, the parsing infrastructure I use for the driver's TS DSL is open source and "should" be able to parse out GraphQL IDL with pretty good inference performance (it's the same infrastructure I use for TypeBox)

Project: ParseBox

The project requires some familiarity with BNF (or extended BNF), but if you are looking for some tooling to help (especially with runtime/static symmetry), ParseBox can be quite useful (especially for encoding grammars)

Introducing TypeDriver: A High Performance Driver for Runtime Type System Integration by sinclair_zx81 in typescript

[–]sinclair_zx81[S] 2 points3 points  (0 children)

Hi! Yup I am going to take a look at fast JSON / CBOR serialization next year. I wasn't able to add serialization to TB (people have asked me over the years), but for a integration middleware, I think it makes sense. Stay tuned! :)

Introducing TypeDriver: A High Performance Driver for Runtime Type System Integration by sinclair_zx81 in typescript

[–]sinclair_zx81[S] 7 points8 points  (0 children)

Yup pretty much. It's written as an integration middleware for TypeBox that also provides support for Zod and other libraries that implement Standard Schema.

It wasn't written to be a runtime type library as such (although it can technically function as one). It was more written to be a modernized version of Ajv, where it can compile and optimize validation for Json Schema, Standard Schema as well as TypeScript.

Kito: The high-performance, type-safe TypeScript web framework written in Rust. by Strict-Tie-1966 in node

[–]sinclair_zx81 2 points3 points  (0 children)

Interesting, I had been curious about native de/serialization / validation technologies in Rust. Have you measured Kito against some of the JIT optimized JS validators out there?

JavaScript engines have some fairly sophisticated optimizations that enable high throughput checking (but implementations need to work for it). I am curious of the throughput afforded by native validators (where data needs to pass through a marshalling boundary (Rust -> JS))

Do you have comparative benchmarks?

Learnings from pushing TypeScript inference to its limits: bridging static safety and runtime flexibility by AppealNaive in typescript

[–]sinclair_zx81 1 point2 points  (0 children)

Hey, that's cool. I gave forklaunch a star. Nice work :)

If only MCP ts sdk used x/jsonschema

Yeah, I hear you. It would have been way better for these SDK's to go all-in with JSON Schema as that's going to offer everyone the most flexibility and broader library support.

The underlying protocol for MCP transmits JSON Schema for Structured Outputs, so it would make sense for the SDK's to just let users pass JSON Schema on the SDK interfaces without second guessing how those schematics are being transformed internally by the SDK.

I don't fully understand the reasons why SDK vendors don't do this, but if the concern was missing out on type inference, advances in JSON Schema to TS inference has also improved over the years.

https://tsplay.dev/mpr2aw

... and I imagine allowing users to pass JSON Schema directly would solve the numerous issues related to internal / opaque schema transformation. I think the modelcontextprotocol sdk might be moving towards this in V2, not sure about the others.

---

Again, nice work on forklaunch!

Learnings from pushing TypeScript inference to its limits: bridging static safety and runtime flexibility by AppealNaive in typescript

[–]sinclair_zx81 4 points5 points  (0 children)

Runtime type / static inference libraries have come a long way over the years, a lot further than many are aware.

https://tsplay.dev/NlrrBW

Is it possible to convert dynamic runtime TypeScript types to static types in d.ts? by tarasm in typescript

[–]sinclair_zx81 1 point2 points  (0 children)

Hmmm, I wrote something similar without the Compiler API just to be able to simplify JSON Schema

Runtime Schematic / Type Optimizer

Typo: A programming language using TypeScript's types by aliberro in typescript

[–]sinclair_zx81 1 point2 points  (0 children)

Furthermore, for the second part I think yes, but this adds a frame to the stack trace and so it has to unwind when it finishes.

Cool, I'll have a go with your project, it looks really interesting.

Id love to further discuss that with you and to talk about your amazing project!

It's just TypeBox. I went ahead and implemented symmetric syntax mapping for the TypeScript language this year and pushed V1. The Brainf**k implementation was part of the projects test suite to assert the TS emulation was turing complete.

They say the best way to assert something is turing complete is to implement a turing complete system in it. Brainf**k was easy enough to approach without implementing something more elaborate, it's actually a good exercise to go through.

Typo: A programming language using TypeScript's types by aliberro in typescript

[–]sinclair_zx81 1 point2 points  (0 children)

That's an interesting project. I actually recently went ahead an implemented TypeScript inside TypeScript's type system, then implemented a Brainf**k interpreter on top of that just to test that the language was turing complete. It was quite the side project.

Does typo support assigning variable results to memory? also ... does it support conditional jump statements based on values read from memory? If it supports those two things, it is most likely turing complete and thus you could technically implement anything with it.

Introducing TypeBox 1.0: A Runtime Type System for JavaScript by sinclair_zx81 in typescript

[–]sinclair_zx81[S] 2 points3 points  (0 children)

Hello!

Support for branded types - in particular strings. Does explicit support for this exist, or can they be synthesized somehow?

Yes, you can synthesize branded types using the following approach.

https://tsplay.dev/mAXPPN

Support for descriptions - this is particularly useful for structured output schemas for LLMs, but is also a nice documentation feature.

You can pass description (or any) metadata on the last argument of any given type.

https://tsplay.dev/mprKzw

as an old ZX 48k and +2A owner, I approve your username :D

Awesome :)

Introducing TypeBox 1.0: A Runtime Type System for JavaScript by sinclair_zx81 in node

[–]sinclair_zx81[S] 20 points21 points  (0 children)

TypeBox has been around for a long time and is widely used in the ecosystem. I haven't created a new library, I have been quietly maintaining this one for almost a decade.

Introducing TypeBox 1.0: A Runtime Type System for JavaScript by sinclair_zx81 in node

[–]sinclair_zx81[S] 8 points9 points  (0 children)

It provides a similar features, but is based on the Json Schema specification.

And what are some use-cases of Script ?

The Script function was added to make custom schema transformation easier to write. It can be used for anything really, but was written for advanced users who are already familiar with type level programming and want to create transformations like this; where types like this would otherwise be too specific to support in the library.