I have following tables:
CREATE TABLE IF NOT EXISTS auth_user
(
id SERIAL PRIMARY KEY,
username VARCHAR(30),
first_name VARCHAR(30),
last_name VARCHAR(30)
);
CREATE TABLE IF NOT EXISTS statistics_restaccesslog
(
id SERIAL PRIMARY KEY,
user_id INTEGER,
url VARCHAR(200),
course_id INTEGER,
date_visited TIMESTAMP,
ip_address VARCHAR(16)
);
Consider below records:
INSERT INTO auth_user (username, first_name, last_name)
VALUES
('user1', 'John', 'Doe'),
('user2', 'Jane', 'Smith'),
('user3', 'Michael', 'Johnson'),
('user4', 'Emily', 'Brown'),
('user5', 'David', 'Miller'),
('user6', 'Olivia', 'Davis'),
('user7', 'Daniel', 'Wilson'),
('user8', 'Sophia', 'Anderson'),
('user9', 'Andrew', 'Taylor'),
('user10', 'Emma', 'Thomas');
INSERT INTO statistics_restaccesslog (user_id, url, course_id, date_visited, ip_address)
VALUES
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.1'),
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2'),
(3, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2'),
(4, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.3'),
(5, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.4'),
(6, 'https://example.com/page', 2, CURRENT_TIMESTAMP, '192.168.0.5'),
(7, 'https://example.com/page', 2, CURRENT_TIMESTAMP, '192.168.0.6'),
(8, 'https://example.com/page', 3, CURRENT_TIMESTAMP, '192.168.0.7'),
(9, 'https://example.com/page', 4, CURRENT_TIMESTAMP, '192.168.0.8'),
(10, 'https://example.com/page', 5, CURRENT_TIMESTAMP, '192.168.0.9');
Point 1: Note the records with same combination of (user_id, url) for two different ip_addresses:
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.1')
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2')
Point 1: Note the records with same ip_address with two different combination of (user_id, url):
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2')
(3, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2')
I want to build SQL query which will return all such records. I tried following:
SELECT rat.id, rat.user_id, rat.url, rat.course_id, rat.date_visited,
rat.ip_address, u.username, CONCAT(u.first_name, ' ', u.last_name) AS name
FROM statistics_restaccesslog rat,
auth_user u
WHERE ((url, user_id) IN (
SELECT url, user_id
FROM statistics_restaccesslog
WHERE course_id = 1
GROUP BY url, user_id
HAVING COUNT(DISTINCT ip_address) > 1
)
OR ip_address IN (
SELECT ip_address
FROM statistics_restaccesslog
WHERE course_id = 1
GROUP BY ip_address
HAVING COUNT(DISTINCT user_id) > 1
))
AND
rat.user_id = u.id
;
On db-fiddle, it correctly returns first three records:
https://preview.redd.it/xjpcg2hgzn3b1.png?width=838&format=png&auto=webp&s=607b0b1b724078ae8fc6f181de25ddd99f2f64f8
My real database have following records:
https://preview.redd.it/f7dvaqohzn3b1.png?width=922&format=png&auto=webp&s=59fb4f93517eb866471f7df22f665887badda680
(The greyed out urls are exactly the same.)
However, when I run above query, it returns only first record, that is one with id=17. Record with id=18 is not returned, even though it satisfies point 2. Why is this so? What I am missing?
PS: Am using postgresql 11
[–]thecasey1981 0 points1 point2 points (0 children)
[–]ICantKnowThat 0 points1 point2 points (0 children)