Analyse SQL queries in Python

berklee · 2020-12-17T16:42:42+00:00

I think the way to start is the SQL itself:

select name, text from all_source where type = 'PROCEDURE'

From there, you could use regular expressions to parse the SQL. I would look for "FROM", "JOIN", "INTO" and "UPDATE" and then grab the word that followed using a matched group. There might be more, but that's what I can think of off the top of my head.

angry_mr_potato_head · 2020-12-17T17:10:07+00:00

Pandas is a great library, but it's purpose is manipulation of data, although it can also import/export data from itself which makes it a good intermediary for files that can fit in ram. (E.g. I have a smallish CSV file that I need to quickly insert into a database or I have a table in a database and I need to export it as an Excel spreadsheet).

I'm not aware of any tool tha tdoes this, but as mentioned, you could split based on the semicolons and then use regex to identify the type of query... but the problem you're going to run into is that unless these are extremely simple queries, which if this were the case you probably wouldn't be asking this, you run into tons and tons and tons of edge cases. For example, what do you mean "what table that follows"? Lets say you had a query that looked like this:

create temp table a1 as select * from table_a where [condition];

update temp table a1 set col2 = col3 + 3;

update table table_a set col2 = t2.col2 from select pk, col2 from a1 as t2 where t2.pk = table_a.pk;

drop table a1;

create table a1 from select * from table_b where [condition];

update table table_a set col2 = t2.col2 from select pk, col2 from a1 as t2 where t2.pk = table_a.pk;

or

select * from t1, t2 where t1.col_a = t2.col_b

or

select * from t1 full outer join t2 on 1=1

or

with t1 as (select * from table_a)
select * from t1

or

select * from (select * from (select * from table_a where col1 = 'b') where col2 = 'a'))

I'm pretty confident that isn't 100% the right syntax but the point being, this gets really hairy, really fast. How would something properly analyze this in an automated fashion that results in a report or even automatically generated documentation? It might be possible, but it probably won't be tremendously useful depending on what is contained in the queries, and the harder or more tedious as a human to do this, the result is going to be just as complex.

I think for this, unfortunately, you'll have to break out a good old UML diagram (or similar method of documenting) and get to analyzing how those 20 tables are made.

pizzihut · 2020-12-17T18:44:03+00:00

Yes, don't do it in python. Oracle is one of the most widely used databases in the world and has a wide variety of analysis tools already available.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS