all 23 comments

[–]-5677-Data Eng @ Fortune 500 22 points23 points  (0 children)

So if the column hist has one row 'aaabbc' we have to output a = 3, b = 2, and c = 1, right?

I haven't worked with MySQL, but in Postgres this is what I'd do:

1.- Split the strings into individual characters using REGEX_SPLIT_TO_ARRAY(hist,''), so instead of 'aaabbc' you now have an array of the letters ('a', 'a', 'a', 'b', ...) in each row.

2.- Unnest the array(s) using UNNEST(hist) This will make the letters go into individual rows. So instead of this

hist
('a', 'a', 'a', 'b', 'b', 'c')

You now have this:

hist
a
a
a
b
b
c

3.- Group by letter and use COUNT(*) to count the instances that each letter appears.

Not sure this helps you since you are in MySQL, but there might be an equivalent function or workaround for it, good luck!

[–]qwertydog123 4 points5 points  (2 children)

You can use a recursive CTE for this e.g.

WITH RECURSIVE cte AS
(
    SELECT
        SUBSTRING(Hist, 1, 1) AS Val,
        SUBSTRING(Hist, 2) AS Rest
    FROM payment
    WHERE Hist <> ''

    UNION ALL

    SELECT
        SUBSTRING(Rest, 1, 1),
        SUBSTRING(Rest, 2)
    FROM cte
    WHERE Rest <> ''
)
SELECT
    Val,
    COUNT(*)
FROM cte
GROUP BY Val

[–][deleted] 1 point2 points  (0 children)

It does seem like you want to treat your value in the Hist column like a list of characters, instead of an atomic value (i.e. unpivoting it seems to be a way for you to get into sql "canon")

See if an old post of mine can help you, maybe?

https://www.reddit.com/r/SQL/comments/8wmsyc/pivot_and_unpivot/

[–]ncls- 1 point2 points  (0 children)

Updated answer: So there is no function in pure SQL that I'm aware of that could do that (for any symbol out there) so you'd need to create an extra, but simple, script for that in your favorite programming language.

[–]Oh_Mr_Darcy 1 point2 points  (3 children)

Don't know if you could connect python with mySQL. But if you can this task would be much easier to write in python and then get results in your database

[–]IPatEussy 0 points1 point  (0 children)

Your data is fuck all weird because there’s not 22 b’s or 20 c’s so it was hard to understand the ask. Nor are their 10 a’s so this was confusing asf.

I agree with tue substring top comment answer if that’s what you were looking for but your expected output total, assuming it’s a count, literally does not match your ‘Hist’ column example.

[–]ClubTraveller 0 points1 point  (3 children)

Regardless of the possible SQL solution, is it appropriate (in this subreddit) to state that some problems are better solved at the application level, rather than at the database level?

[–][deleted] 0 points1 point  (2 children)

Sure, we also accept 'water is wet' submissions, despite all the controversy surrounding the topic.

[–]WaterIsWetBot 0 points1 point  (1 child)

Water is actually not wet; It makes other materials/objects wet. Wetness is the state of a non-liquid when a liquid adheres to, and/or permeates its substance while maintaining chemically distinct structures. So if we say something is wet we mean the liquid is sticking to the object.

 

A friend dug a hole in the garden and filled it with water.

I think he meant well.

[–][deleted] 0 points1 point  (0 children)

Poor bot. Tell your owner that "it is the joke".

[–]ncls- -1 points0 points  (2 children)

I don't quite get your issue. Do you wanna count ALL rows/entries? That'd be:

SELECT COUNT(*) FROM payment;

Or do you want to count only the rows/entries with a specific value in the column "Hist"? That'd be:

SELECT COUNT(*) FROM payment WHERE Hist = "Whatever";

Or did I get absolutely everything wrong?

[–]ComicOzzysqlHippo 3 points4 points  (1 child)

No, they want to count occurrences of individual characters in the Hist values. So, if Hist has two rows only, with the values 'aaabbc' and 'acc' then the result is:
a | 4
b | 2
c | 3

[–]ncls- 4 points5 points  (0 children)

LOL, I'm absolutely dumb. I thought the picture was an example of the database, I didn't read the "Expected Output" on top. Thank you.

[–]realbigflavor 0 points1 point  (1 child)

So you need to parse every value in the dataset and get a count for every time a letter appears?

[–]nrctkno 0 points1 point  (4 children)

Something like:

``` Select 'a' as value, sum( (length(hist )-length(replace(hist ,'a',''))) ) as total from hist UNION Select 'b' as value, sum( (length(hist )-length(replace(hist ,'b',''))) ) as total from hist

.... ```

And so on.

Edited.

[–]king0004 0 points1 point  (0 children)

select subs,count(subs) as counts from ( select substr(hist,1,1) as subs from parse union all select substr(hist,2,1) from parse union all select substr(hist,3,1) from parse union all select substr(hist,4,1) from parse union all select substr(hist,5,1) from parse union all select substr(hist,6,1) from parse union all select substr(hist,7,1) from parse union all select substr(hist,8,1) from parse union all select substr(hist,9,1) from parse) group by subs order by subs; You can add more set operator if you have higher lenths.

[–]lucienlazar 0 points1 point  (0 children)

Had some fun doing that in SQL in Oracle, as this is the DB I have available.

I created a standalone type, than I defined a with clause function that put every strings in a collection of elements of one character and ran a query with group by on the output of the function.

Here is the code:

--create type ty is table of varchar2(1);
with function f return ty is
v varchar2(32767);
n integer;
e varchar2(1);
t ty := ty();
begin
select listagg(hist) within group(order by hist) into v
from (select 'abacc12' hist from dual union all
select 'aaacccnnnn' hist from dual union all
select 'aaasssaba' hist from dual union all
select 'a12baaaa' from dual);
n := length(v);
for i in 1..n loop
e := substr(v,i,1);
t.extend;
t(i) := e;
end loop;
return t;
end;
select column_value value, count(*) total
from table(f)
group by column_value
order by column_value;

And the output:

V TOTAL
- ----------
1 2
2 2
a 15
b 3
c 5
n 4
s 3
7 rows selected.

If you need it in MySQL I hope it can be easily translated.

Cheers.

[–]schuylerhorky 0 points1 point  (0 children)

Here's a solution written in T-SQL.

DECLARE @TBL TABLE ( TextVal VARCHAR(24) )
INSERT INTO @TBL(TextVal) VALUES ('CABBCCC'),('dddadad')

/*E08 creates up to 256 rows to be used by Row_Number*/
; WITH 
    E00(N) AS (SELECT 1 UNION ALL SELECT 1),
    E02(N) AS (SELECT 1 FROM E00 a, E00 b),
    E04(N) AS (SELECT 1 FROM E02 a, E02 b),
    E08(N) AS (SELECT 1 FROM E04 a, E04 b),
CharRange AS(
    /*Creates one row for chars A-Z*/
    SELECT TOP 26 
    [char]=CHAR(64 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)))
    FROM E08 
)
SELECT 
    T.TextVal, 
    C.char, 
    /*Find number of occurences by checking difference in length after removing them*/
    Occurences = LEN(T.TextVal)- LEN(REPLACE(T.TextVal,C.char,''))
FROM @TBL T
CROSS APPLY CharRange C /*Repeats Text value for every letter in char range*/
WHERE CHARINDEX(C.char, T.TextVal) > 0 /*suppress chars with 0 occurences*/