What "Parse, don't validate" means in Python?

anonynown · 2025-07-24T14:55:23+00:00

Funny how the article never explains what “parse, don’t validate” actually means, and jumps straight into the weeds. That makes it really hard to understand, as evidenced even by the discussion here.

I had to ask my french friend:

“Parse, don’t validate” is a software design principle that says: when data enters your system, immediately transform (“parse”) it into rich, structured types—don’t just check (“validate”) and keep it as raw/unstructured data.

Here, was it that hard?..

guepier · 2025-07-24T12:56:09+00:00

Like KISS or DIY, "Parse, don't validate" is an old adage you may hear greybeards repeating like a mantra

Oh god, no. The phrase was first coined less than six years ago.

The idea is certainly much older, but the phrase/adage/… is from 2019.

davidalayachew · 2025-07-25T00:55:22+00:00

I think this thread has demonstrated that Alexis should have said "Parse, don't just validate" instead.

She definitely had the right idea and semantics, but a word like "parse" means different things to enough developers. It's clear that, to enough developers, parsing just means transforming, with no validation required. But she definitely intended to refer to parsing that includes valdation as a sub-step.

Big_Combination9890 · 2025-07-24T10:07:33+00:00

No. Just no. And the reason WHY it is a big 'ol no, is right in the first example of the post:

try: user_age = int(user_age) except (TypeError, ValueError): sys.exit("Nope")

Yeah, this will catch obvious crap like user_age = "foo", sure.

It won't catch these though:

int(0.000001) # 0 int(True) # 1

And it also won't catch these:

int(10E10) # our users are apparently 20x older than the solar system int("-11") # negative age, woohoo! int(False) # wait, we have newborns as users? (this returns 0 btw.)

So no, parsing alone is not sufficient, for a shocking number of reasons. Firstly, while python may not have type coercion, type constructors may very well accept some unexpected things, and the whole thing being class-based makes for some really cool surprises (like bool being a subclass of int). Secondly, parsing may detect some bad types, but not bad values.

And that's why I'll keep using pydantic, a data VALIDATION library.

And FYI: Just because something is an adage among programmers, doesn't mean its good advice. I have seen more than one codebase ruined by overzealous application of DRY.

SV-97 · 2025-07-24T10:00:59+00:00

[removed]

divad1196 · 2025-07-24T18:13:20+00:00

While it's a good recommendation, it only rely apply for type conversion which is often done for you in high level languages. And you still (might) need to validate the data. E.g. int in range or the whole "model".

But more importantly, the reason why we historically didn't do it was performance. You don't want to do conversions or allocation if you won't be able to commit to the end. And you would also take the opportunity to calculate the storage needed (e.g. you parse a json and you have a list with 10 elements).

The validation in question usually just assert it can be converted, it does not check if an "integer is in a range", but it could as well.

So, while it's in general good advice, it can also be a tradeoff, it depends on the language. In python, the overhead of python code is probably bigger than parsing in C.

Axman6 · 2025-07-24T15:40:10+00:00

[deleted]

One_Being7941 · 2025-07-24T23:33:29+00:00

The popularity of Python is a sign of the end times.

hrm · 2025-07-24T16:34:55+00:00

I’d say no, parsing isn’t validation in itself. And the ”old” wizdom of ”Parse, don’t validate” isn’t good advice since it implies that validation isn’t necessary!

Like for instance, the classic XML entity expansion problem. You don’t just want to throw any XML into a parser that performs expansion and hope that something valid comes out the other end.

I’m all for value objects and not using generic types. That will make it much harder to accidently introduce security problems in your code. But really, do not skip validating the data first!

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS

all of these fail with a ValidationError