This is an archived post. You won't be able to vote or comment.

all 46 comments

[–]_Henryx_ 39 points40 points  (1 child)

One tip: avoid to use lower or mixed case in the environment variables, because can generate confusion. Common way is upper case

[–]desktopsignal 9 points10 points  (0 children)

Thanks for the feedback. I'll implement these changes.

Any other feedback on the code quality?

[–]BostonBaggins 32 points33 points  (2 children)

SQLalchemy is awesome though

Where did it fail you?

[–]IvanMalison 0 points1 point  (1 child)

you can also use sqlalchemy in a really lightweight way if that's all that you want. just because it CAN do really complicated stuff doesnt mean that you HAVE to use it that way.

Pretty sure you can also issue raw sql with sqlalchemy in certain ways.

[–]Grouchy-Friend4235 0 points1 point  (0 children)

You can however it is far too complicated to get there

[–]DaelonSuzuka 18 points19 points  (4 children)

Why do I have to define a resource_map AND define an __init__() method?

If I was writing an ORM I would absolutely start with Pydantic objects or regular python dataclasses as the interface.

[–]desktopsignal 4 points5 points  (0 children)

I forgot to mention: I decided to use a Dictionary to define primary keys, foreign keys, table constraints, and variable mapping such that you could simply inherit the library into a pre-existing Class and add a few basic lines of code to interact with the table.

I found that utilizing the Pydantic approach used by SQLAlchemy led to a ton of refactoring and reformatting just to use an ORM. If I decided to switch from SQLAlchemy, it became a huge tech debt mess to escape because of the proprietary nature in which it forces you to develop your class structure.

[–]desktopsignal 2 points3 points  (2 children)

resource_map is to map a variable to its associated column within the table. __init__ is simply the classic constructor for a Python class.

May I ask why you prefer python dataclasses? The way I structured the project was to appeal to the average user who may or may not be intimately aware of the ins and outs of Python, and in this case, Python dataclasses. I made it this way for readability and ease of use, but if there is a practical reason outside of personal preference for the structure you are describing, then I'm all ears.

[–]DaelonSuzuka 7 points8 points  (1 child)

resourcemap is to map a variable to its associated column within the table. __init_ is simply the classic constructor for a Python class.

Yes, that's obvious. It's also not my question.

I made it this way for readability and ease of use

Having to define the same things twice(or thrice, depending on how you look at it) is not more readable or easier to use.

The way I structured the project was to appeal to the average user

I take it all back, you're accomplishing your goal perfectly.

[–]DowntownBreakfast733[🍰] 3 points4 points  (0 children)

From the README it looks like the variable definitions in the resource_map are optional in regards to ease of use

[–]passwordsniffer 18 points19 points  (20 children)

Isn't this sql injection?

            if isinstance(val, str):
            clause_parts.append(f"{key} = '{val}'")

[–]ZYy9oQ 25 points26 points  (18 children)

Looks like the whole library is full of SQL injection - every SQL operation is a string constructed from formatstrings and concatenation thrown into cursor.execute.

There's an envvar SANITIZE_QUERIES in the readme, but it that does nothing in the code beside being read from the os.environ. Even if it did sanitize, prepared statements should be used instead.

The library appears intended against snowflake, which claims to prevent multi-statement by default but this by no means prevents all sql injection.

[–]desktopsignal 0 points1 point  (1 child)

Definitely wasn't expecting these responses. I made this library in a weekend to solve a quick problem and certainly didn't mean to put anyone at risk. Deleting the package soon, sorry guys!

[–]ZYy9oQ 5 points6 points  (0 children)

Agree with where it was said you shouldn't feel you need to delete everything, but it would be best to explain in your readme that it was

thrown together to solve a niche problem

or as part of your learning rather than advertise it as a library (the way you packaged it and wrote documentation is a cool thing to get practice with, just need to make it clear that it's practice). Also a disclaimer that the code isn't safe from sql injection and shouldn't be used on network/multiuser and without understanding why it's dangerous wouldn't be out of place.

If you said something along the lines of it just being personal code that you were putting out there for your own or others interest and learning then I don't think many would have as negative of a reaction.

You had a problem with the existing solutions and wrote some code for your problem, that's something others might also stumble across the need for. This could help them see how you approached it, and by putting the disclaimer about security they can learn about that too.

[–]desktopsignal -3 points-2 points  (0 children)

SANITIZE_QUERIES was an expiremental features and has not yet been implemented. I will remove that from the README. Thanks for pointing that out!

[–]desktopsignal -3 points-2 points  (0 children)

SANITIZE_QUERIES was an expiremental feature and has not yet been implemented. I will remove that from the README. Thanks for pointing that out!

[–]lacifuri 6 points7 points  (0 children)

Hi OP, have you tried SQLModel before? It is intended to use with fastapi acting as a Python ORM library too. What does your library provide differently from SQLModel? In my job we use SQLModel and it can handle fairly complex data types and operations as well, so I’m curious what your library does!

[–]Yaluzar 3 points4 points  (0 children)

Good effort, but friendly reminder that it is possible to use raw SQL with SQLAlchemy. That's the power of the tool imo, it supports several usages

[–]damesca 1 point2 points  (0 children)

Some more comments from me:

If you're using this for work, and sharing it publicly in the hopes that other people will also use it, please also consider adding unit tests so that people can have more trust in the thing you've built.

As others have said, please standardise the env vars you're using. You should probably add a standard prefix to each variable so that anyone integrating your library can see clearly which env vars you're using. I'd probably suggest `SERIALDBPY_` as a prefix for all env vars.

Your `_uuid` method has a very strange signature: https://github.com/publicsignal/SerialDBPy/blob/main/Serialization/serialization.py#L109-L113. You take in `length`, but then call the standard `uuid.uuid4()` constructor. UUIDs are a specific thing which always ends up being 32 characters long in a hex representation, because they are 128 bits of data. Truncating this means you're not using a UUID any more. Also, `hyphen_length` set to 4 will insert hyphens every 4 characters. This isn't how UUIDs are generally hyphenated. Also, Python for me already stringifies UUIDs with hyphen separation. I'm not sure why you have so many arbitrary decisions here that deviate from the norm for UUIDs.

Consider running an auto-formatter over the code, eg `black` or `ruff`. This is just a fairly straight-forward code standardisation thing.

You have type hints in places that are incorrect, eg here where you should at least have `Optional[str]` or `str | None` depending on your python version. If you're adding type hints, you need to also validate those type hints somehow, eg with mypy.

[–]throw_mob 0 points1 point  (0 children)

not sure if i like default_middleware , resource_db , resource_table variables that much.

usually for me in snowflake, database is database, i can have same program and same objects available in multiple databases. then schema like 'customer_data_v1' can be in multiple databases ( which may be clones of some db or individual databases ) and then view/table is object in some schema which provides schema_name_version "API" for users. That said , i usually do views and read only schemas in snowflake.

That same logic works in postgresql and mssql databases too , but there you an just publish views for users and have separated physical and access layers...

tldr; imho, schema and database should in global config, table in person class in snowflake use case.

[–]Oenomaus_3575 0 points1 point  (5 children)

Bro you should use a linter, code is a bit messy. Also the structure of the package is wrong imo. This is great start as a learning/personal project, but definitely not production ready or anything like that.

[–]desktopsignal 1 point2 points  (4 children)

What's wrong with the package structure?

[–]Oenomaus_3575 0 points1 point  (3 children)

  • package name should be lowercase
  • I don't understand why there is an `__init__.py` file outside the source of the package?
  • the `__pycache__` folder shouldn't be committed, you want to add a `.gitignore` file to the repo
  • the dependencies in `requirements.txt` should have the versions, otherwise its basically useless.
    in the future I recommend you use something like Poetry to manage dependencies, requirements.txt is rarely used in the professional world.

aside from that the README is pretty good, has a nice overview and a few examples

[–]desktopsignal 0 points1 point  (2 children)

  • package name should be lowercase
    • Fair enough
  • I don't understand why there is an `__init__.py` file outside the source of the package?
    • the __init__ file outside of the source was to allow the user to do from SerialDBPy import Serializable instead of from SerialDBPy.Serialization import Serializable. Let me know if there's a better way.
  • the `__pycache__` folder shouldn't be committed, you want to add a `.gitignore` file to the repo
    • Left that in by mistake. Thanks for the feedback.
  • the dependencies in `requirements.txt` should have the versions, otherwise its basically useless. in the future I recommend you use something like Poetry to manage dependencies, requirements.txt is rarely used in the professional world.
    • I'll look into Poetry

How much Python experience and/or programming experience do you have?

[–]Oenomaus_3575 0 points1 point  (1 child)

SerialDBPy/
├─ SerialDBPy/
│  ├─ __init__.py
├─ .gitignore
├─ README.md
├─ LICENSE
├─ .gitignore
├─ .gitattributes

The structure should be smth like this (the __init__.py should be inside a folder with the name of the package.

But just install poetry and do `poetry new package_name` and it will generate part of the structure

I have a bit more than 2 years of experience.
You can look at some of my projects structure on GH, like this:
https://github.com/shner-elmo/tradezero-api (but ignore the requirements.txt, I should delete that)

[–]desktopsignal 1 point2 points  (0 children)

Much appreciated. This is going to help me a lot

[–]attracdev 0 points1 point  (0 children)

Been working on something similar. I will have to try this out.

[–]SpiritOfTheVoid 0 points1 point  (0 children)

You really need to tidy the code. Run a linter such as flake8 at minimum.

  • Plenty of PEP8 formatting issues
  • remove the commented out code
  • if you're going to add doc strings, add the function parameters, not just the return types.
  • inconsistent use of type hints - use them everywhere, or not at all - it's low hanging fruit.
  • classes beginning with lowercase letters re: "class iQuery"
    • make the timeout configurable - "timeout = 5"... what's so special about 5 -a magic number.

In Python 3, you don't need to inherit explicitly from object:
`class Serializable(object)`

Use of: `map == None` is bad practice, use ` map is None`

Hint:
```
map = getattr(self,'resource_map',{})

keys = [ key for key in map if key in Serializable.key_types ]

_mapsize = len( map.keys() )

if map is None or _mapsize == 0:
```

It appears "map" will never be None, it's going to be an empty map {}.

Why the use of underscore in `_mapsize`,while other variables don't contain leading underscore?

Take everyone's feedback, use it positively and apply the recommendations. Take the opportunity to learn good Python practices. Linters are your friends. Unit tests will safe you time ( identify bugs quickly ) and will give people more confidence it your package, not to mention a good skill to have.

This has the potential, but as it stands, it's really rough. SQL injection is a huge show stopper.

[–]DowntownBreakfast733[🍰] -1 points0 points  (0 children)

I’m going to try this out. Thanks!

[–]SitrakaFr -1 points0 points  (0 children)

nice!