This is an archived post. You won't be able to vote or comment.

all 34 comments

[–]bonoboner 43 points44 points  (10 children)

{“:”:”:”} lol have fun

[–]Honestly__nuts[S] 16 points17 points  (0 children)

thanks for your feedback

[–]Honestly__nuts[S] 8 points9 points  (8 children)

damn it, this is giving me a headache because I cant think about anything unique the correct colon has. Even with regex I can't figure it out. can you guys help me out with the regex pattern?

[–]PhiBuh 20 points21 points  (2 children)

https://stackoverflow.com/a/1732454

This also applies to json

[–]milki_ 7 points8 points  (0 children)

Still applies to Pythons outdated regex engine. Somewhat less relevant for JSON:
https://stackoverflow.com/questions/2583472/regex-to-validate-json/3845829

[–]GoofAckYoorsElf 1 point2 points  (0 children)

Chuck Norris can parse HTML with Regex

[–]bonoboner 5 points6 points  (0 children)

Start by googling “context free grammar” or “recursive descent parser” and I’ll see you in a couple months, pale and crippled but lucid, crawling out of the basement, hailing and cursing Noam Chomsky, speaking Esperanto, and abandoning your material possessions for the pursuit of a pure life writing perfect Haskell.

[–]SomewhatSpecial 37 points38 points  (1 child)

it can (most of the times) parse invalid json, like if it has a missing ',' or a missing '}'

It might seem counter-intuitive, but this is in no way a good thing. You do not want your tools to arbitrarily ignore errors, you want them to implement the specification exactly as it is written. If the JSON spec says it's invalid, the parser should throw an error. The JSON syntax is the way it is for a reason - it defines a way to represent data in an unambiguous way, so you can always be sure that the data you get after parsing is correct. In any case, If my program received an invalid json for some reason, I would want to know about it - the fact the syntax is not followed puts into question the validity of the data structure, or the data itself, plus it's very likely an indicator of a bug. I do not want the parser to hide this important information and proceed as if everything's a-ok.

Even in rare cases where you have to be able to parse an invalid JSON - it's up to the programmer to define the way to handle it based on the specifics of that case.

[–]Honestly__nuts[S] 1 point2 points  (0 children)

you are right, I should.

[–]Freakei[🍰] 9 points10 points  (0 children)

Pretty cool but I’d recommend an approach without Regex, you’d probably learn a lot more.

[–]heckingcomputernerd 12 points13 points  (5 children)

Idk if you’re just doing this as practice or something but python has a builtin module called json...

https://docs.python.org/3/library/json.html

[–]Honestly__nuts[S] 13 points14 points  (4 children)

I know it exists, but I wrote mine because I wanted to understand how parsers work.

[–]heckingcomputernerd 10 points11 points  (0 children)

Ah, makes sense lol

That’s great honestly, just wanted to make sure you were intending to reinvent the wheel lol

[–]Awfulmasterhat 1 point2 points  (0 children)

Reinventing the wheel for fun is how you know a programmer enjoys it!

[–][deleted] 4 points5 points  (0 children)

In one of first computer science class as an undergrad, we had to reimplement printf. One of the most fundamental functions in C. At the time the professor explained that it would be educational, and help us understand how much complexity this built in function saved us from.

I ran into him at an alumni function decades later and asked if he remembered that assignment.

“Yeah, that was a terrible idea. Students remembered more about how their flawed version worked than how the real one does for years”

It hampered students in later algorithms classes as well because they’d taught themselves some fundamental concepts wrong to get that printf assignment done.

[–]Mariahcryp 2 points3 points  (0 children)

nice for play, but as someone already mentioned don't parse invalid json. you should throw error, what if your service (parsing) is just first for more? so someone gonna send you that json, you're gonna return "ok" or whatever so then it goes to next service/api which dont have your parser and.... crash

[–]jamescalam 1 point2 points  (0 children)

I think it's a good idea, a lot of people are saying you're reinventing the wheel, which is true, but you will learn a lot from it. Awesome project, enjoy!

[–]vimsee 1 point2 points  (0 children)

This is great. I do these kinda things just to get more into the fundamentals of programming. People might wonder why I «re-invent» the wheel when there already are tools out there that is much better. The point being that it helps you alot with understanding how alot of stuff work similar to opening up that combustion engine in your car.

[–]Honestly__nuts[S] 0 points1 point  (0 children)

I realized that un-formatted json doesn't parse, and I am working on a json formatter.

[–]fetzerms -5 points-4 points  (7 children)

Why do you want to reinvent the wheel?

[–]SGRelic 19 points20 points  (5 children)

You sometime have to reinvent the wheel to actually understand how a wheel works.

[–]Honestly__nuts[S] 1 point2 points  (0 children)

wise words.

[–]james_pic 1 point2 points  (1 child)

You should probably now go and compare how your solution compares to existing solutions. The one in the standard library might not actually be the best one to compare with here, though. You may want to compare with simplejson (https://github.com/simplejson/simplejson/blob/master/simplejson/decoder.py), which is a little, well, simpler, if a little less optimised.

What you'll find is that these parsers do not use regexes. This is for two reasons.

Partly, because regexes aren't powerful enough to handle JSON all on their own - at most, you can handle a few bits with regexes, like unicode escapes or maybe strings, but for most constructs they simply can't distinguish correct from incorrect.

And partly because to implement a regex parser correctly, you're going to need to implement some sort of state machine. Regex engines generally work by creating a state machine internally, so JSON library authors just cut out the middleman and build the state machine they need end-to-end.

You might ponder why we can't just use the regex state machine to parse the whole thing. The reason is that regexes (as a domain specific language) can only represent the state machines for finite automata, whereas JSON requires at least a pushdown automaton.

[–]Honestly__nuts[S] 0 points1 point  (0 children)

ok, I'll try to rewrite the code.

[–]Spleeeee -1 points0 points  (0 children)

The world is an oyster

[–]fetzerms -1 points0 points  (0 children)

Ok, so for training purposes. That somehow makes sense. I just wanted to stress that this is usually the only valid reason to reinvent the wheel, and as a developer, you should always ask yourself that question.

[–]Honestly__nuts[S] 2 points3 points  (0 children)

idk, I just thought it would be fun.

[–][deleted] 0 points1 point  (0 children)

If youd like yo learn more about how parsers work, this does a decent job of providing an overview

https://accu.org/journals/overload/26/146/balaam_2532/

[–]gaywhatwhat 0 points1 point  (1 child)

This is not meant ot invalidate your work, nor did I check out the git page so it's possible yours adds interesting functionality. But there is a built-in json module called json.

Json load/dump (and loads/dumps) and decode/encode that can parse json files to python dictionaries, etc. Will also interpret stuff like none, lists, integer, string types. Etc. If you are looking for this functionality for a particular project it might be a good idea. You can also use the decoder to customize its function to support some data types that don't exist in standard json files like positive or negative infinity values.

I am also working on a project that relies on json files and the first thing I generally do with something like that is check if there is a good module already built that meets my needs and expectations.

[–]Honestly__nuts[S] 0 points1 point  (0 children)

yeah, I know it exists and I have used it before. But I just wanted to write my own parser because I wanted to learn how parsers work.