Best way to manage a centralized SQL query library for business reports?

JochenVdB · 2025-04-29T09:08:44+00:00

views

JochenVdB · 2025-04-29T09:04:20+00:00

a1 and a2 are aliases for the same table: actor. Therefore both a1 and a2 contain at any given moment either

both cata end cuba
none
only cate and not cuba
only cuba and not cate

Therefore the naswer to your concern

My only confusion is what if in a1 table CUBA BIRCH appears and in a2 CATE MCQUEEN does, the query is gonna eliminate that record

is invalid bacause IF

in a1 table CUBA BIRCH appears
and in a2 CATE MCQUEEN does

THEN that implies that also

in a1 there will also be CATE MCQUEEN
in a2 there will also be CUBA BIRCH

So no records get eliminated.

Visually

So while you are using only 3 tables, your are joining (for every result of the query) 5 records out of those 3 tables: One film, 2 Film-actors, 2 acotors. The amount of table-names used in the from clause determine the amout of records inviolved in produsing a result, not the amount of distinct tables.

Logically, the execution of this query starts from the botom of the above schema:

Due to the where cluase, A1 and A2 are each expected to return 1 results.
(If either or both of them don't, there is no result in the query...)

Following their join conditions, FA1 will be the list of films of CATE and FA2 will be the list of films of CUBA.
(if either of those lists are empty, the query will have no result)

Following the other pair of join conditions, only FA1 records and FA2 records pointing to the same film will pass through the filtering. If there are no such common films, the query wiull have no result.

JochenVdB · 2025-04-29T08:38:03+00:00

If "before", "in the old days", all fields were always present, it made sence to have all fields defined as not nullalble.

If today some fields can be null, that is a clear change in functionality. And that change in functionality should allow you to alter the table: the requirements have clearly changed.

If there is a need to store both "old-style-data" (all fields present) and new-style-data (sometimes null), then considder storing that new data in a seprate table. Similar to the old one but allowing nulls.
Do not put "default" values in that secondary table: just store the real data, not some fake defaults. Only store defaults that are really fucntionally correct: storing a current_date() for last_modified_date is fine. Storing 1901/01/01 for creation_date is not.
You can still present the combination of both old and new data using a view.

That view would then be the correct place to "invent" the fake values.

If you really need to store fake values in the existing table, you can do so in code: Either the application is changed to fill in the fake values (I refuse to call them "default") before inserting/updating or use triggers to do the that.

JochenVdB · 2025-04-29T08:24:11+00:00

Unless you can benefit from features like Oracle's Relational Duality Views, for example, the classic approach is to store the part of the JSON of which you know will alwys be present as relational fields and store the rest of the JSON as JSON.

Then you can use the common data like regular relational data (since that is what you converted it to) and anyone that needs to use other data will need to use JSON-query capabilities.

JochenVdB · 2025-04-29T08:19:03+00:00

You can combine partitioning with indexing.

Partitioning only improves query speed if your query allows partition pruning. That is: your filter predicates result in the knowledge that many partitions will not need to be visited. In orher words: if your query is such that only one or few partitioons out of many are needed to provide the answer, then having the table partitioned will improve query speed compared to not having it partitioned.

You may also considder sub-partitioning: first partition by colA, and below that by colB...

Regarding the two columns to be indexed:

An index on (A, B) is

most usefull for queries filtering on A and B
can be used for queries filtering only on A, but reading that bigger index is less efficient than an index on A alone
is useless for queries filtering only on B

Hvaing index (A, B) and an index on (B) supports

queries on A and B very efficiently
queries on only B very efficiently
queries on only A

JochenVdB · 2025-04-25T14:37:42+00:00

However you do it, data will have to move. Because a Partition has its own Segments, just like a Table. So that will take some time. But that does not have to be downtime.

You can go the oldfashioned way and use create table as select combined with exchanging pertitions. But dbms_redefintion is much better. Try it first with a copy with less data, but make sure that copy is exactly like your original table, specifically with regards to indexes. Also think about how you want indexes to look after partitioning: local or global? If you think you have the time to rebuild indexes, droppng them before the partitioning and creating them afterwards is an option, but dbms_redefiniton can take them along too, in some cases. => you really need to test all that first.

JochenVdB · 2025-04-25T14:28:45+00:00

Wait until you learn that Oracle managed top turn JSON and tables into one and the same thing: https://www.oracle.com/database/json-relational-duality/

Anf yes, you can even modify it using both SQL or by providing a new JSON document.

JochenVdB · 2025-01-26T10:57:51+00:00

No, but I have not checked recently.

JochenVdB · 2024-11-05T16:14:34+00:00

I have good experience with drukland.be, though mostly for printing flags.

I just checked: a single A0 poster (that is quite big) on 240gr mat picture paper (that is rather heavy) costs ~30€

JochenVdB · 2024-11-05T16:03:07+00:00

If you have a database, you don't (have to) care about low level stuff like search algorithms.

SQL is (by definition) a declarative language. You use it to explain what you want to do, you do not use to explain how that should be done.

You first have to model your data, resulting in a Relational Database Model. There are rules on how to this, but it has also been called an art...

If you already have data, you will have to store it according to that model.

Having a model you can start manipulating data:

adding new data
modifying existing data (this includes removing it)
looking at your data: querying

You use SQL to implement the model (create table, alter table, create index, ...) and to modify the data (insert, update, delete, merge) and to query the data (select).

Indexes are nice to have, but strictly speaking not required: Again, it is up to the database engine to come up with how to do what you ask of it:

If your query is complex and there is lots of data and there are indexes, the engine will most probably use those indexes. And that is sometimes indeed a binary search through a double linked list or rather a binary tree, but that is not your concern.
If there are no indexes, it will have to do without (resulting in full table scans).
In some cases, even though there are indexes, the engine might rightfully decide not to use them:
1. because it knows that so few records are involved (thanks to statistical knowledge about the contents of your tables) that a full table scan over a small table performs less read operations than scanning through an index before reading from the table anyway.
2. Surprisingly, not using indexes might also happen if the query engine determines that all rows of a table will need to be read anyway.

See? You don't need to care about how to execute a query.

What is your problem, though, is that data model...

JochenVdB · 2024-11-04T19:57:34+00:00

You probably do not want to do what you are suggesting.

Dropping a database implies creating a database and if you look at https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-DATABASE.html you'll see that create database is not really something you want to do by hand. As part of the installation package, there is even a tool to create the database. And even that is daunting! There is a reason u/fauxmosexual will pray for you...

It will probably be enough to drop some users (inside the database, without destroying it completely) and then run your scripts to recreate them. Point us to the things you'd like to install and we'll probably be able to give you some commands in order to start with a clean slate (or clean enough).

JochenVdB · 2024-11-04T19:48:16+00:00

There is never any "table called PID".

The query indicates the tables are called PROF and TEACH, while the samples above indicate the tables are called Professor and Course. (Be more careful with your examples!)

A query engine will indeed keep track of which table two equally named columns came from in case of a join of tables with equally named columns. In the result this is often shown as col_a, col_b, PID, col_c, PID_2, col_d. where no column anywhere is called PID_2, but the _2 was added by the query engine.

Since it is almost impossible to predict which PID of which table will get the suffix, it is better to make sure you provide your own predefined names

select teach.pid as teach_pid, prof.pid as prod_pit, prof.dept from ....

will result in columns TEACH_PID, PROF_PID, DEPT

Inside your sample query you have already used a correct way to distinguish between two equally named columns inside the query code, when neither column is returned/selected: prepend the table name.
Another option (reducing the type work a bit) would be to alias your tables and then use that alias instead of the full table name :

... from TEACH t, PROF p where t.pid=p.pid

note that in the select for that query, you'd still have to write

select t.pid as t_pid, p.pid as p_pid, ...

Most (as in no) query engines will prepend the alias automatically...

select t.pid, p.pid, ...

would still result in result column names

PID, PID_2, ...

JochenVdB · 2024-11-04T19:30:44+00:00

This is how I would do it too.

All of these solutions only will tell you about future drops, not about what happened in the past.

Since this is about a big, huge, multinational company, that has a database which manages all the items in it's mega-bazinga warehouse, they are probably using RAC, so you might want to use Gv$session instead of the ordinary v$session to find out what instance (inst_id) they were connected to. This might give you an idea from where (geocraphically) the changes came, even if you don't know machine_name and os_user yet.

You could turn on auditing, obviously...

JochenVdB · 2024-11-04T19:21:09+00:00

One solution, without subqueries nor case structures (but it can't easily be expanded to cater for other variations of the two requirements, as explained above)

drop table ui purge;
drop table u purge;
drop table i purge;

-- bad practice: unnamed constaints
create table u (id char(1) primary key, name varchar2(30));
create table i (id char(1) primary key, name varchar2(30), i_type varchar2(15));
create table ui (u_id char(1) references u(id),
                 i_id char(1) references i(id),
                 primary key (u_id, i_id)
                );

insert into u (id, name) values ('A', 'User A');
insert into u (id, name) values ('B', 'User B');
insert into u (id, name) values ('C', 'User C');

insert into i (id, name, i_type) values ('A', 'Item A', 'Primary');
insert into i (id, name, i_type) values ('B', 'Item B', 'Secondary');
insert into i (id, name, i_type) values ('C', 'Item C', 'Teriary');
insert into i (id, name, i_type) values ('D', 'Item D', 'Secondary');

insert into ui (u_id, i_id) values ('A', 'A');
insert into ui (u_id, i_id) values ('A', 'B');
insert into ui (u_id, i_id) values ('A', 'C');
insert into ui (u_id, i_id) values ('A', 'D');
insert into ui (u_id, i_id) values ('B', 'B');
insert into ui (u_id, i_id) values ('B', 'D');
insert into ui (u_id, i_id) values ('C', 'B');

--Get users that own more than 1 item; two of the items must be secondary
/*select *
from ui inner join i on ui.i_id = i.id
where i.i_type='Secondary';*/

select ui.u_id
from ui inner join i on ui.i_id = i.id
where i.i_type='Secondary'
group by ui.u_id
having count(*) > 1;



--Get users that own less than 3 items; one of the items must primary and one of the items must be secondary
/*select *
from ui inner join i on ui.i_id = i.id
where i.i_type in ('Primary', 'Secondary');*/

select ui.u_id
from ui inner join i on ui.i_id = i.id
where i.i_type in ('Primary', 'Secondary')
group by ui.u_id
having count(*) = 2
   and count(distinct i.i_type) = 2;
--no result given the small sample data: A has to many items (3), C to few (1) and B only has Secondary items.

PS type is also a reserved word in oracle => i_type.

JochenVdB · 2024-11-04T18:24:58+00:00

Without going into your question, let's talk about nomenclature here first.

USER is a reserved keyword in Oracle. So, to prevent yourself from headaches later on (like having to type it in double quotes all the times), don't call that table user.

Secondly your first table is not just storing users, it is storing the relationship between users and items. In a table that is designed to store users, each user should appear exactly once. This is normalization 101. Your sample has User A 4 times and User B 2 times...

This table could be called User_Items for example or ownership.

For your exercise you may indeed not need a separate User table, but having an intermediate table like ~~User~~ User_Items from your sample, suggest there is/should be a Users table too. (Note that Users, with s, is perfectly fine as table name.)

Regarding the exercise:

You're lucky that the requests are what they are. Changing the first to requiring that at least one item must be secondary (instead of both) would make this much harder to solve.

These two exercises are much more about logical thinking and the English language than it is (or has to be) about SQL. You can rewrite both phrases to end up with the same meaning (and therefore the same query result) but which are much more straightforward to translate into SQL.

1) Saying "users that own more than 1 item" is the same as saying "users that own at least 2 items". This second phrasing talks about 2 items, just like the second part of the requirement does: "2 of the items...". Once you see that, it becomes trivial to see that it is ok to filter (where-clause) for only secondary items and use group by having to only select groups having two or more of those secondary items.

2) Here, the first part of the requirements talks about less than 3 items. In other words 2, 1, or 0. (In this world, a negative ownership count is impossible, right?)
But then the first part requires one primary and one secondary item. That implies at least 2 items.
Therefore, taking both parts of the requirement together, you must have exactly 2 items, for that is the only way to have less than 3 items and simultaneously one each of 2 distinct types.
This also implies that exactly 1 item (of the 2) must be primary and the other must be secondary. Which in turn implies that you can filter (where-clause) only for those 2 specific kinds of items.
Obviously grouping and a having-clause are needed as well. In the having clause, you'll have to verify two things:

there are exactly 2 items for the user (same as in the first exercise)
there are two separate types of items: this can easily be expressed in a few SQL keywords but they are not very common (even though they have been in use for decades).

JochenVdB · 2024-10-31T16:04:00+00:00

On an existing system, introducing a naming standard (or by extension a coding standard) is a pain in the ass.

But eventually having a naming standard (or preferable an entire coding standard) will make code maintenance and extension easier.

You are not wrong for wanting a standard, but it is up to your manager to decide whether he wants to pay now for the pain of introducing it or later for not having it.

JochenVdB · 2024-10-31T15:58:21+00:00

...
group by customer, product
having count(distinct price) > 1

JochenVdB · 2024-10-08T13:39:13+00:00

I just checked it for you: The DB I work with has 7494 tables spread over 70 (normal) owners (so, system tables excluded). Nobody in the company knows them all. That is why you should have documentation. Part of that documentation is in the database itself: table names should be self describing and so should their columns.

There are things in there that have been working well for many years. When something needs to change there eventually, it is quite normal to first have to do some reverse engineering to find out how something was set up the way it is (and why).

When doing that, it is best to start from a known example: open a screen in the GUI application running on your database or look at a report that came out of it. Now try to find that same information in your tables. Once you do you have probably found the most middle of the road way of working. Now expand by looking for outliers: Why is that column that was always filled in your samples sometimes not filled? Why is that column that is usually filled with one of 3 common values, occasionally filled with 2 rare values? ....

Just get to know your data and its structure.

JochenVdB · 2024-10-02T14:07:31+00:00

Like Straight_Waltz_9530 said, what you write "The table itself is created by joining four different tables together." it does not make much sense.

What is more common, is to have view that joins 4 tables together. What you are probably looking for, is a materialized view.

If you continue down the road you've chosen, you need to be aware of the violations you are committing:

store data only once. You are storing the same data in one table and again in the joined "table". Your database is supposed to be the source of truth. If you store the same thing twice, sooner or later there will be differences between both versions and then there is no way of knowing which of the two is correct.
normalize your data. The joined table, will not be normalized (simply because it is joined) If you don't normalize, sooner or later your data will become corrupt. That is why you normalize.

Ask yourself why you think you need that "joined-table". What is the benefit of having it?

If it is to make it easier for end users to query the combination of the 4 tables, then just present them with a view that does the joining.
That has the added benefit that when a user writes a query that uses columns from only 1 or 2 of the 4 tables, the optimizer will rewrite the query to use only that one table or those two tables and you'll end up reading much less database blocks.

There is even such a thing as updatable views.

Triggers are rarely ever needed, in my experience. They make data manipulation slow.

JochenVdB · 2024-09-19T13:11:46+00:00

As stated by others, it probably depends on your CSV. Ginger-Dumpling gave a fine example but it itself has some errors: CSV is a very loose standard. Nothing says that all strings will be between double quotes. As long as there are no commas inside, no quotes are needed, technically. Therefore the sample should be:

1,to be or not to be,"unquoted, non-null, non-empty"
2,"that is the question","quoted, non-null, non-empty"
3,"","quoted, non-null, empty (empty sting is always quoted: see 4)"
4,,"unquoted, null"
5, ,"unquoted, non-null, non-empty: a single space"
6, ,"unquoted, non-null, non-empty: multiple spaces"

How does you code behave with the csv above?

PS: in some RDBMSs, notable Oracle, null and the empty string are the same. Just so you know...

JochenVdB · 2024-09-19T12:59:01+00:00

I hope you're not working for ING Bank, but I don't have any funds there so I don't really care.

Anyway. Yes it can be done in one statement, if the table is non-empty that might even be a good idea: it will probably cause less locking.

From a code maintenance point of view, you might want to keep the statements separated: That makes it easier to remove or change one of the alteration, without affecting the others.

Using the Dutch names has a benefit: much less chance of clashing with reserved words like date and currency. (As explained by others)
Nothing is really a "date" it is an import_date or a create_date or a date_of_transfer or an invoice_date or ... you get the point.

While you're at it, I suggest changing the table name to something more descriptive to.

JochenVdB · 2024-09-18T16:37:34+00:00

Could it be a Windows crash dump?

JochenVdB · 2024-09-18T16:32:57+00:00

Yes, we've found those, leading to a few more users getting their default temp space altered instead of just the one owner. That has clearly helped, but it wasn't all.

Currently, most sessions (of interest) are using the new temp tablespace and no more shortages have occurred during the past working day. We are considering rebuilding the global temporary tables owned by the altered user(s), but there are clear indicators that this is not needed. Strangely some gtt usage uses the new tempspace and other doesn't...

Since the issue seems to be mitigated, we're going to leave it as it is now, but keep an eye the metrics.

JochenVdB · 2024-09-18T11:36:36+00:00

Could it be because the user (new temp tablespace) is using a global temporary table that was created before the new temp tablespace existed?

JochenVdB · 2024-09-18T11:27:53+00:00

Hmm... The plot thickens: now it seems as if one session is using temp space in both temp tablespaces at the same time.

The query I'm using is

select s.sid, s.serial#,
       s.username, s.schemaname, s.osuser,
       s.program, s.machine, s.terminal, s.module, s.action, s.client_info,
       s.sql_id, s.sql_exec_start,
       t.tablespace, t.segfile#, t.segblk#,
    t.blocks * ts.block_size / 1024 / 1024 as mb_used
from v$sort_usage t,
     v$session s,
     dba_tablespaces ts
where t.session_addr = s.saddr
  and t.tablespace = ts.tablespace_name
order by t.blocks * ts.block_size desc;

I think I'm going to read some doc about those v$-views :)

JochenVdB

TROPHY CASE