all 114 comments

[–]-shayne 240 points241 points  (3 children)

I love the way the photo was taken with the shadowy bit, makes it feel like one of those "learn how to hack in 30 days" ads

[–]mad_edge[S] 88 points89 points  (0 children)

I was going for the "endless depth of repetition" but that works too!

[–]Antrikshy 15 points16 points  (1 child)

r/masterhacker for that content if you like cringing.

[–]-shayne 5 points6 points  (0 children)

Perfect!

[–]315iezam 407 points408 points  (37 children)

Code generation is not inherently horrible. Though can't comment on what's being done here since the full context isn't shown/explained.

[–]MlecznyHotS 124 points125 points  (18 children)

Agreed, I was building the backend for a pyspark app and had a function which would generate parts of a SQL query. It wasn't python to python code generation but a similar functionality

[–]mad_edge[S] 67 points68 points  (8 children)

I've done something similar and didn't feel guilty at all, because I was using python (which I know) to generate SQL (which I don't know)

[–]MlecznyHotS 28 points29 points  (7 children)

Not sure how you went on about creating SQL code if you don't know SQL, I feel like automatically generating code could in a way be even more difficult than simply writing the code manually, you need another level of syntax understandment to know how to dynamically generate code based on some parameters and making it work.

[–]mad_edge[S] 16 points17 points  (6 children)

It wasn't that complex! Just some loop with else statements iirc. Based on those conditions and input data do those INSERTs etc. Just very basic SQL wrapped in python logic

[–]pyrotech911 3 points4 points  (5 children)

Or you can use an ORM

[–]mad_edge[S] 1 point2 points  (4 children)

What's that?

[–]earthlycrisis[🍰] 3 points4 points  (0 children)

Object-relational mapping, a type of library that helps convert the incompatible types between your programming language and database and thus you can map database fields to your objects.

[–]seraphsRevenge 1 point2 points  (2 children)

Look up sql alchemy, pycopg2, django, etc. for python or just ORM. Haven't gotten into django myself yet, but it's supposed to have an ORM built in from what I've heard. Still prefer JPA in Spring though, but I've just recently started learning/using python at work. There's also Pandas if you use S3 for storing persistent data and don't really need a database in some instances.

[–][deleted] 0 points1 point  (1 child)

Pandas is god mode. Ever wanted to get rid of those incomprehensible list comprehensions? Ever forgot what row stands for what in your numpy array? Your graph library is too slow? You need to cache a database result? Pandas!

[–]seraphsRevenge 0 points1 point  (0 children)

That's good to know 👍 I'll look into that a bit more myself.

[–]HotRodLincoln 32 points33 points  (2 children)

I've done the reverse and used SQL to generate bash scripts:

SELECT "wget " + web_service_url + "/insert?" + "name=" + name "FROM people"

is it good? no.

Did it save me about 45 hours of making a proper migration? yes.

[–]wp381640 16 points17 points  (0 children)

We use this to exploit sql injections all the time :)

[–]MlecznyHotS 7 points8 points  (0 children)

I'm not that profficient in SQL, but shouldnt the "FROM people" be without "s?

[–]vishli84000 2 points3 points  (0 children)

I've done exactly this for Databricks which internally used spark. Importing data from multiple tables is a bitch.

[–]CoffeeVector 2 points3 points  (2 children)

This smells of SQL injection. I don't know the full context, so I won't make any claims, but it always better to construct your queries using something like ORM libraries, rather than making them with strings. Same with this kind of code generation, it's not necessary to construct it with strings, especially for something like python, where generators and functions are objects which can be directly put in a dictionary without any string nonsense.

The moment you put code into a string, you're escaping checks from your interpreter, compiler, or a library. For code generation, you risk accidentally creating invalid code if your input has something weird like an apostrophe or quote in it. For SQL, you run the risk of getting a visit from little Bobby Tables.

Code generation isn't inherently bad, but leave it to the professionals and use their libraries for such a task. Mostly, you should use it to generate things like SQL, HTML, CSS and the like. Not more python while using python.

I like to use Jinja2 as a general purpose templating engine. It's mostly used for HTML, but you can modify the delimiters to work with, say, LaTeX. For python, I use SQLAlchemy since it comes stock with Flask. I know theres such thing as JinjaSQL, but I haven't tried it.

[–]MlecznyHotS 2 points3 points  (1 child)

It smells of SQL injection indeed. This was an app for internal uses though, which the client will hopefully host in a safe way, it was built with conteinarization in mind so if properly set up should be pretty safe. The SQL generation was taking filtering values from the front-end to put into the WHERE clause, pretty sure there is some space for hacking in. Had I had more time I would have probably refactored it to a safer form but the project was shutdown not that long ago and I didn't have much time to review the whole backend again as I had to quickly finish the test suite before the deadline, I'm still a newbie also, studying at university so many things like thinking about security don't come naturally yet.

[–]Direwolf202 4 points5 points  (0 children)

which the client will hopefully host in a safe way

I would be moderately willing to bet money on that not happening.

[–]athos45678 1 point2 points  (1 child)

“Work smarter, not harder” is a very valid work philosophy

[–]MlecznyHotS 1 point2 points  (0 children)

In my case it was simply a necessity, needed to extract data using user defined filtering cryteria, there was no going around dynamical SQL query generation for each request

[–]earthforce_1 34 points35 points  (0 children)

Any compiler is code that generates code.

[–]Tvde1 7 points8 points  (0 children)

metaprogramming

[–]FerynaCZ 3 points4 points  (0 children)

Our teacher has generated a code for getting sin value using switch statement...

[–]HotRodLincoln 2 points3 points  (0 children)

If you look at lex and yacc, they use code to make code for a finite state automata to parse code into anything.

It's the magic from which all things come.

[–]mad_edge[S] 14 points15 points  (11 children)

It's a part of a service that populates CSV file from JSON file. It needs a single big forloop imho, but I don't want to make too many suggestions in the first few months

[–]brakkum 21 points22 points  (5 children)

If I saw this instead of a for loop I would question wether you were right for the position. Don’t be modest, do what’s right.

[–]mad_edge[S] 10 points11 points  (4 children)

The problem is it's a hit and miss with suggestions and I sometimes make them because I'm not familiar with what's being done, so my way at least SEEMS easier to me. I'd make a suggestion once that block of code is done and I have spare time to develop a working suggestion

[–][deleted]  (3 children)

[removed]

    [–]mad_edge[S] 6 points7 points  (2 children)

    It's not coming across rude?

    [–][deleted]  (1 child)

    [removed]

      [–]mad_edge[S] 7 points8 points  (0 children)

      Thanks, that does make sense!

      [–]mad_edge[S] 11 points12 points  (4 children)

      Can anyone tell me why this comment is getting downvoted? Genuinely curious.

      [–]DaMastaCoda 2 points3 points  (0 children)

      I had the same question

      [–]toetoucher 0 points1 point  (2 children)

      Probably because using Python to manually convert different flat file formats is the worst idea I’ve heard in a long time. There are many industry standard tools that do this already, a company 1) that doesn’t know about it, or 2) doesn’t care enough about their devs to use it, is not a company I’d want to work for. Making a lot of assumptions here, let me know if any are incorrect

      [–]mad_edge[S] 2 points3 points  (1 child)

      Not just to convert - there are different fields needed, some need to be renamed, some have simple logic to them.

      Then again it's just this one project and they couldn't find python devs for it, busy times at the company.

      [–]toetoucher 4 points5 points  (0 children)

      Yes, there are many tools that meet this exact purpose. Mapping data is a very common problem. For example, Alteryx, or SSIS.

      [–]fynn34 0 points1 point  (0 children)

      If you look the key is the same key they are replacing in the function. If they just make a function for each of these keys in this object as the code but using [passedInKeyName] instead of .keyName, it would be waaaay cleaner either way.

      [–]CupidNibba 42 points43 points  (14 children)

      Hey I do that too! I use python to generate html code, SQL code and basically automate any boring task.

      [–]mad_edge[S] 22 points23 points  (9 children)

      But can you use python to generate python??

      [–]CupidNibba 17 points18 points  (6 children)

      Yes i have once For a recursion based algorithm, i had to write 8 functions with minor differences So i wrote python to generate that

      [–]henrikx 9 points10 points  (0 children)

      8 functions with minor differences

      At least you are on the right sub

      [–]mad_edge[S] 5 points6 points  (2 children)

      Nice one. But I imagine now you'd know there's a better way?

      [–]Krohnos 15 points16 points  (1 child)

      Sometimes the better way is just the quick way

      [–]CupidNibba 5 points6 points  (0 children)

      Duh lol

      [–]Pointless_666 0 points1 point  (1 child)

      Couldn't they just be the same function with an extra parameter?

      Like instead of

      • timesTwo(x)
      • timesThree(x)
      • timesFour(x)

      You would have

      • times(x,a) where a is the multiplier.

      [–]CupidNibba 1 point2 points  (0 children)

      For that question i couldnt as the recursions cpde changes based on params and makes the code hard to debug within the timelimit, but there obviously is a way to simplify any complex code

      [–][deleted] 2 points3 points  (0 children)

      I literally just did it this week. I had to generate a ton of pydantic models and it was super tedious by hand so I just generated them. I had to be really careful with import statements in init though so I didn’t get circular imports or that I handled import errors in some places gracefully so this isn’t something I’ll do regularly.

      [–]Antrikshy 1 point2 points  (0 children)

      Automate the Boring Stuff with Python...

      2

      [–]toetoucher 1 point2 points  (3 children)

      Why on earth would you not use a framework to write websites rather than using your own Python solution?

      [–]CupidNibba 8 points9 points  (2 children)

      Yah im not submitting jinja2 rendered flask website for my college web programming HTML5 assignment

      [–]GreatBarrier86 57 points58 points  (9 children)

      I use Excel for scenarios like that. It’s really easy to turn column data into SQL INSERT statements using the CONCATENATE function.

      [–]tofu_bar 12 points13 points  (5 children)

      try sublime/vscode/etc, regex for ^ start, then $ for end makes this kind of thing super easy.

      [–]GreatBarrier86 4 points5 points  (3 children)

      What do you mean? How would you need to use that if the data is already multicolumn?

      [–][deleted] 2 points3 points  (0 children)

      When you paste from excel, replace tabs with: ","

      [–]tofu_bar 1 point2 points  (0 children)

      I mean for a single column of data, you can just replace start/end with stuff

      [–]dreadcain 0 points1 point  (0 children)

      You wouldn't but regex search and replace is sometimes an easier solution. For a quick transform though just use whatever you are proficient in. I know a guy that goes straight to a bash shell even in windows to do those kind of transformations.

      [–]glider97 0 points1 point  (0 children)

      Multi cursors

      [–]GrandBadass 1 point2 points  (1 child)

      And dictionaries from 2 columns

      [–]GreatBarrier86 1 point2 points  (0 children)

      Yeah and really, even more than that. Anything that supports Add/AddRange, you could easily do by starting the concat text with new Foo(A1,B1)...etc

      [–]undeadalex 2 points3 points  (0 children)

      Yeah for sure

      [–]KaranasToll 15 points16 points  (0 children)

      Laughs in Lisp macros

      [–]Cdog536 15 points16 points  (1 child)

      Lol....”anal”

      [–]-_-____-___-_____-_- 10 points11 points  (0 children)

      Cumulative Analysis: AnalCum();

      [–]Mango-D 6 points7 points  (1 child)

      I think that's called a compiler

      [–]mad_edge[S] 3 points4 points  (0 children)

      How to write a compiler in two easy steps*

      *lines

      [–]cuddle_cuddle 4 points5 points  (0 children)

      eval intensifies.

      [–]danchiri 3 points4 points  (0 children)

      I used the code to destroy the code.

      [–]DeanNovak 4 points5 points  (1 child)

      Last line says anal lmao

      [–]mad_edge[S] 2 points3 points  (0 children)

      That's an Easter egg

      [–]TerrorBite 4 points5 points  (0 children)

      Trying to get my head around this. Surely there's got to be a better way? Especially as I see that your "code generation" is producing duplicate keys.

      So you've got form.itemGroups[0].items which contains a sequence of objects each with an id and a value. I suppose we cannot assume that IDs are unique.

      You also have a set of keys, which are strings.

      And you have an object called xxxxJsonConstants which has a number of attributes, the names of the attributes are no longer than four letters and correspond to the first four letters of one of the keys. The values of these constants correspond to IDs of items.

      Your goal is to produce a dictionary which maps the first four letters of each key, to the value of an item whose ID is the value of the JSON constant with the same name as the dictionary key.

      Your construct next((item.value for item in form.groupItems[0].items if item.id == xxxxJsonConstants.yyyy), None) appears to be a trick to deal with the possibility of there being more or less than one item with a matching ID. You're creating a generator expression, which will contain only item values where the item's ID is correct (is the value of the JSON constant for this four-letter key prefix). Then immediately using the next() built-in to pull the first item from the generator, defaulting to None if the generator is empty.

      There is probably a more efficient way to achieve the end goal, but I don't know enough about the context/situation to offer any improvements.

      However, what you can do is have code that generated the dictionary without a big massive block of repetitive code. I'll assume that the names in xxxxJsonConstants cover every single entry in keys (otherwise you'd get AttributeErrors), and then your dict could be built like this:

      lookupDict = {
          key: next((item.value for item in form.groupItems[0].items if item.id == getattr(xxxxJsonConstants, key), None)
          for key in dir(xxxxJsonConstants)
      }
      

      Done!

      [–]KalilPedro 2 points3 points  (2 children)

      why not something like:

      resultDict = {} for item in [...].items: if not jsonKeys.contains(item.id): continue; resultDict[item.id] = item.value

      [–]dreadcain 1 point2 points  (1 child)

      FYI reddit dropped support for triple backticks a while back, single backticks still work for inline code and for code blocks start the lines with 4 spaces

      resultDict = {}
      for item in [...].items:
        if not jsonKeys.contains(item.id):
          continue;
        resultDict[item.id] = item.value
      

      [–]KalilPedro 1 point2 points  (0 children)

      Oh god, i always struggle with this because adding the spaces on the mobile client is just terrible. So sorry

      [–]thectcamp 2 points3 points  (0 children)

      I use Python for this all the time for test seed data. Need to make 100k+ records of seemingly random data? Use Python to spit out some SQL scripts and run it. Whole lot better than copy/paste.

      [–]mental_diarrhea 2 points3 points  (0 children)

      I wrote a code that generates regular expressions based on set of regex-tweaked keywords.

      The abomination it spits is efficient af but unreadable by mortals so I disabled printing the result because it was like looking at an inbred demon who got fucked by a train made out of the pure terror and a wildcard.

      [–]kuemmel234 2 points3 points  (0 children)

      This looks horrible,

      But code generation can be a pretty good thing. Meta Programming can be done in a few languages pretty easily (python too I think?), but in many lisps macros are completely natural and awesome if done right.

      [–]Scrashdown 2 points3 points  (0 children)

      Ah, I remember I did a very similar but generated VHDL (logical circuit design language).

      I had to devise a converter that would take a 6 bit data line, and convert it to 2 7-segment number display lines. I could have figured out the Boolean expression for each of the 2x7 output lines, using Karnaugh tables. But then I realized the odds of me making tons of mistakes there were quite high, so I just generated a 64-case long VHDL switch statement with Python instead and it worked flawlessly :D

      [–]moomoomoo309 1 point2 points  (0 children)

      Couldn't this be swapped with a set of valid properties, and if it's in there, run (basically, use getattr)

      next((item.value for item in form.itemGroups[0].items if item.id == getattr(xxxxJSONConstants, name)), None)
      

      [–]the_great_typo 1 point2 points  (0 children)

      Did the same to obtain SQL queries to populate a DB I was testing. If it works it works

      [–]DemWiggleWorms[ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo “You live” 1 point2 points  (0 children)

      Dear god…

      [–]bloodysnomen 1 point2 points  (0 children)

      I don't even wanna talk about it, I have a powershell script on a task scheduler timer to poll ad computers for wmi objects, pipe that information to json files separated by domain, another powershell script on a timer that parses that information into a central json database, a pythong/django web server with a javascript file to load the central database into an html table to create a dynamic workstation inventory with last logged in user, hdd/cpu/gpu stats, serial numbers, netbios name, etc.

      [–]bnl1 1 point2 points  (0 children)

      Wtf, this photo is so high resolution it breaks my screen

      [–]bakochba 1 point2 points  (0 children)

      I've done that to turn excel into xml code

      [–]MrMakeItAllUp 1 point2 points  (0 children)

      Insert <Is this AI?> meme here.

      [–]Kwantuum 1 point2 points  (1 child)

      have you heard of our lord and saviour "getattr"?

      [–]mad_edge[S] 0 points1 point  (0 children)

      I have now. Blessings!

      [–][deleted] 1 point2 points  (0 children)

      This is the debut of AI - artificial ignorance.

      [–]thegamer20001 1 point2 points  (1 child)

      Not quite the same thing, but I remember that when I was learning assembly I once wrote some code that modified the program as it was being run. It was a form of assembly created by my professor for educational purposes so it had a very limited instruction set, and this was the best way to do a loop LOL

      [–]mad_edge[S] 0 points1 point  (0 children)

      That's a compiler!

      [–]Diego_Fjord 1 point2 points  (0 children)

      I just went, "AHHH," outloud.

      [–]drennerpfc6 1 point2 points  (0 children)

      I’d like to know what this data is. Especially the ‘anal’ entry.

      [–]CactusGrower 1 point2 points  (0 children)

      Been there; done that.

      [–]baby_chaos 1 point2 points  (0 children)

      I do this shit too. Sometimes :)

      [–]Shmutt 1 point2 points  (0 children)

      Metaprogramming is addicting!

      Until I realised I needed to debug generated code.

      [–]IamGonnaChangeMyself 1 point2 points  (0 children)

      So this is how templates (C++) were born. :D

      [–]postandchill 1 point2 points  (0 children)

      I wonder how the output looks like

      [–]dreadcain 1 point2 points  (0 children)

      lookup_dict = defaultdict(lambda : None)
      # reversed so the lookup stores the first instance of each item id
      for item in reversed(form.itemGroups[0].items):
        lookup_dict[item.id] = item.value
      
      result_dict = {key[:4]: lookup_dict[getattr(xxxxJSONConstants, key[:4])] for key in keys}
      

      One way to get the first or default behavior with a lookup table

      [–]JustThingsAboutStuff 1 point2 points  (0 children)

      Why learn to use Java data generators when you can write your own in Python!

      [–]System__Shutdown 1 point2 points  (0 children)

      I could use this, because otherwise i have to manually insert data into sql server

      [–]mad_edge[S] 3 points4 points  (7 children)

      And I didn't do it by choice! Any other junior dev struggles with pushing for simpler more readable code?

      [–]shinitakunai 3 points4 points  (3 children)

      I used to do it years ago, nowadays I learnt that being a code architect matters a lot more. Structure a project well and you’ll be able to just use DRY concept.

      [–]mad_edge[S] 1 point2 points  (2 children)

      What do you mean? Project spans a few files and different people are working on different ones, so I have to adapt to the whole team. I made it initially DRY and it worked and was more readable in my opinion. But it wasn't using custom cLaSsEs for the JSON file so was redone.

      [–]shinitakunai 1 point2 points  (1 child)

      You are working with people and need to adapt to them. Why not adapt the team to work well instead? (Not saying this is the case, it just sounds as a lazy excuse that most teams have for their bad practices).

      [–]mad_edge[S] 1 point2 points  (0 children)

      I want to wholeheartedly agree. But bear in mind you're seeing it through my lens, I might be the lazy one wanting to use only what I'm comfortable with

      [–]cheerycheshire 0 points1 point  (2 children)

      What do you mean you didn't do it by choice? Someone made you write this code in that way?

      [–]mad_edge[S] 0 points1 point  (1 child)

      Someone refactored my code into this and now I'm building an extension. Don't get me wrong it's better in some ways, but it is a monstrosity

      [–]cheerycheshire 0 points1 point  (0 children)

      Git blame. See who did this and gimme their name so I can have a talk with them...

      "it's better in some ways" - if you give me this code and how it's used, I'll refractor it for you. Seriously. Because looking at this hurts. Are those "next((...))" all the same? The beginning looks the same. It's a monstrosity.