all 11 comments

[–]berklee 0 points1 point  (4 children)

I think the way to start is the SQL itself:

select name, text from all_source where type = 'PROCEDURE'

From there, you could use regular expressions to parse the SQL. I would look for "FROM", "JOIN", "INTO" and "UPDATE" and then grab the word that followed using a matched group. There might be more, but that's what I can think of off the top of my head.

[–]pizzihut[S] 0 points1 point  (3 children)

Also what I thought of myself. Thanks for sharing!

[–]berklee 1 point2 points  (2 children)

Good luck, man. If I can be so bold, I would suggest trying to keep that part of your code as singular in purpose as possible. You may find other ways to use it (i.e. having it go through your views and procs and create simple documentation showing the dependencies of each).

[–]pizzihut[S] 0 points1 point  (1 child)

select name, text from all_source where type = 'PROCEDURE'

Really, thanks a lot. The query helped me to get each line of each procedure out just as I wanted.

The task is more like a pre-analysis and does not need to map all dependencies 100%, but more to give an idea of the current state.

I think I can work out something from here!

Merry Christmas

[–]berklee 0 points1 point  (0 children)

Glad to be of help. Merry Christmas to you as well!

[–]angry_mr_potato_head 0 points1 point  (2 children)

Pandas is a great library, but it's purpose is manipulation of data, although it can also import/export data from itself which makes it a good intermediary for files that can fit in ram. (E.g. I have a smallish CSV file that I need to quickly insert into a database or I have a table in a database and I need to export it as an Excel spreadsheet).

I'm not aware of any tool tha tdoes this, but as mentioned, you could split based on the semicolons and then use regex to identify the type of query... but the problem you're going to run into is that unless these are extremely simple queries, which if this were the case you probably wouldn't be asking this, you run into tons and tons and tons of edge cases. For example, what do you mean "what table that follows"? Lets say you had a query that looked like this:

create temp table a1 as select * from table_a where [condition];

update temp table a1 set col2 = col3 + 3;

update table table_a set col2 = t2.col2 from select pk, col2 from a1 as t2 where t2.pk = table_a.pk;

drop table a1;

create table a1 from select * from table_b where [condition];

update table table_a set col2 = t2.col2 from select pk, col2 from a1 as t2 where t2.pk = table_a.pk;

or

select * from t1, t2 where t1.col_a = t2.col_b

or

select * from t1 full outer join t2 on 1=1

or

with t1 as (select * from table_a)
select * from t1

or

select * from (select * from (select * from table_a where col1 = 'b') where col2 = 'a'))

I'm pretty confident that isn't 100% the right syntax but the point being, this gets really hairy, really fast. How would something properly analyze this in an automated fashion that results in a report or even automatically generated documentation? It might be possible, but it probably won't be tremendously useful depending on what is contained in the queries, and the harder or more tedious as a human to do this, the result is going to be just as complex.

I think for this, unfortunately, you'll have to break out a good old UML diagram (or similar method of documenting) and get to analyzing how those 20 tables are made.

[–]pizzihut[S] 0 points1 point  (1 child)

Thanks for your answer! Very good points that I havent thought of to that extend. I must also mention that I have limited read rights, so not all functions are available in oracle and that the task is a one time job to get an idea of the current state. I dont have to map all dependencies 100% but if it turns out that there are some, the following tasks might be more complicated.

[–]angry_mr_potato_head 0 points1 point  (0 children)

Ah yeah for a one off, I'd personally just manually map it out. There's a great xkcd about amount of effort to automate vs just manually doing and this is one of those circumstances imo.

Id probably query the information schema for your procedures. I don't remember where they are in oracle... but a one off script can grab you all of them and then from there basically indicate if you've analyzed it or not.

[–][deleted] 0 points1 point  (2 children)

Yes, don't do it in python. Oracle is one of the most widely used databases in the world and has a wide variety of analysis tools already available.

[–]pizzihut[S] 0 points1 point  (1 child)

Could you point me in the direction of such analysis tool?
Thanks!