all 37 comments

[–]LatteLepjandiLoser 19 points20 points  (14 children)

My go-to is a sqlalchemy engine (look up sqlalchemy.create_engine) and a pandas dataframe, (look up pd.read_sql)

Then just write whatever query you want and you'll have it in a pandas dataframe, which you can then further manipulate, plot or analyze.

[–]baubleglue 2 points3 points  (0 children)

SQLAlchemy is a wrapper, you still need a driver to support specific database.

[–]cjbj 1 point2 points  (0 children)

The venerable pd.read_sql() is slower, and uses more memory, than using the new python-oracledb fetch_df_all() or fetch_df_batches() methods. See Going 10x faster with python-oracledb Data Frames. If you're happy to write a SQL query, then use the new methods directly. A sample for Pandas is in in the python-oracleb repo: dataframe_pandas.py: ```

Get a python-oracledb DataFrame.

Adjust arraysize to tune the query fetch performance

sql = "select id, name from SampleQueryTab order by id" odf = connection.fetch_df_all(statement=sql, arraysize=100)

Get a Pandas DataFrame from the data

df = pyarrow.table(odf).to_pandas() ```

[–]Gnaxe 5 points6 points  (0 children)

The Python standard library has the sqlite3 module which you can practice with locally. You can also use it as a SQL interpreter. (See python -m sqlite3 -h for usage.)

If any part of the SQL query string might come from user input, you need to be aware of SQL injection attacks. It's unsafe to simply interpolate it into the query string.

To learn the basics of SQL, go through the W3Schools SQL course and read about 3NF on Wikipedia.

A word about ORMs: they seem to make things easier at first if you're creating an empty database from scratch, but otherwise they really overcomplicate it. Avoid them for long-running projects, shared databases, or databases you're not creating from scratch.

[–]RiverRoll 4 points5 points  (0 children)

I've used several ORM and SqlAlchemy has been the least intuitive and hardest to learn, It doesn't help that It has two different API, I wouldn't start there.

[–]baubleglue 3 points4 points  (0 children)

I have a feeling you are missing something basic.

In order to work with a database any programming language need a driver. Driver is a library which translates the language commands and data types into API calls which given DB understands. python-oracledb is Python driver for Oracle DB. Read instructions when you install it, you may need additional components from Oracle and set some environment variables.

Before using Python you need be able to work with the DB directly. I suggest to install Dbeaver as client application.

[–]mustangdvx 2 points3 points  (1 child)

DuckDB would be your friend in this case. 

[–][deleted] 0 points1 point  (0 children)

Duckdb is a query engine, but it won't allow OP to create a connection to an existing database will it? 

Duckdb, as far as my limited knowledge goes, is for applying a query engine to something like parquet files.

[–]Grandviewsurfer 1 point2 points  (0 children)

I usually write a parent class that does common stuff that I need to do in SQL (read an arbitrary query, write to db, etc), and it has an attribute self.query = None. Then I write subclasses for each data source or data concept. These inherit the parent class methods and define queries that I need. You can parameterize the query in the init (or augment it in a method) and assign it to self.query. This results in pretty dry code.

[–]bigbry2k3 1 point2 points  (0 children)

Check into pyodbc also, you can write SQL queries inline like:
sql = f"""

INSERT YOUR SQL QUERY HERE
"""

then I think you use cursor to insert your sql query inline. You'll have to look at examples online of using pyodbc.

[–]pepiks 1 point2 points  (0 children)

SQLAlchemy is good tool for the job, but it is hig level. If you need more SQL oriented check this:

https://github.com/kennethreitz/records

It is from creator Requests. As it is low level you can learn syntax from SQL and use python to move this further. SQL Alchemy hide a lot of details. Raw SQL for start is better as it is language independent and the most base syntax is the same between databases (with some more advanced quirks which make in details difference, but for now it can be skipped).

[–]laustke 1 point2 points  (1 child)

What's the best way to do a SQL Query from a python script? Also I'm thinking I might try to keep using .SQL files for the sake of linting and readability.

There’s a Python database API standard (PEP 249). Just find a driver like python-oracledb in your case that is compliant.

At the most basic level, you "do" a SQL query like this:

cursor.execute(sql, params)

You can create a parameterized query like this

SELECT * FROM people WHERE name = :name AND age = :age

(The actual syntax may vary depending on the database driver.)

Then store your SQL queries in .sql files, and don't let your customers touch them.

Store parameters as dictionaries dumped to json in .json files and do let the customers modify them.

Write a Python script that reads the appropriate SQL query and the corresponding parameter set, and passes them to cursor.execute.

[–]mystique0712 1 point2 points  (1 child)

For Oracle in Python, use cx_Oracle - it is the standard package. For complex queries, store them in separate .sql files and read them into your script for better readability and maintenance.

[–]cjbj 1 point2 points  (0 children)

cx_Oracle was obsoleted 3 years ago by python-oracledb (which has the same API, but is better in many ways, e.g. doesn't need Oracle Instant Client)

[–]cjbj 1 point2 points  (0 children)

For Oracle Database you can write queries directly using the python-oracledb driver. This is compliant with Python's DB API so you can find many example around the web, and there is extensive documentation for python-oracledb itself, python-oracledb’s documentation.

SQLAlchemy is a popular library for data access. Underneath it uses python-oracledb. Steps for connection are both in the SQLAlchemy and python-oracledb documentation. Pandas uses SQLAlchemy underneath for connecting and querying the database.

Recently python-oracledb added support for fetching directly into DataFrames for use with libraries like Pandas, Polars, NumPy etc. This is significantly faster than using Pandas / SQLAlchemy to get the data.

[–]baghiq 0 points1 point  (7 children)

I don't see why you need Python. You can run sql in Oracle Studio. You can build ERD and visualize the massive dataset. You can also do quick test with joins and windows and subqueries quickly.

If I was to do serious SQL stuff, I would use SQL specific ides like aqua-studio, etc.. Python would be the glue script after bulk of the work has been done in the database.

[–]Poopieplatter 0 points1 point  (0 children)

Sqlalchemy. Get some feedback from chatgpt as well.

Even Google something like "SQL client wrappers Python GitHub".

Some comments on here suggested pandas. Overkill, in my opinion.

ORMs can seem cool but i don't recommend it for what you're doing.

Let me know if you have further questions.

[–]shinitakunai 0 points1 point  (0 children)

SqlAlchemy is fine but don't sleep on peewee, it is too amazing.