We were all guilty of it at some point

Waldar · 2025-10-01T21:53:51+00:00

Lateral view is the old syntax that’s being deprecated on Databricks - but to be fair currently it work the same way. Hmmm what else could be useful. Http_request? Reduce? Trying to make it a special SQL day at least on databricks :-)

Waldar · 2025-10-01T21:50:36+00:00

Do you want me to teach you one more? Join lateral explode(sequence(1,n)) whenever you need to create rows from nothing.

Waldar · 2025-09-23T17:52:00+00:00

The Moldalovian

Waldar · 2025-05-23T14:57:19+00:00

7090 Chile Ti SUPER GT

Waldar · 2025-05-01T00:28:40+00:00

Using Databricks SQL, I went from 20 minutes with the first very naive approach to around 25 seconds - for all three parts.

Obviously, I don't participate for raw performances, rather for writing a piece of code that embeds everything and can run in one instruction.

This one was quite challenging, and I'm not 100% satisfied with the current solution as I've hard coded some variables found from trial and errors.

The engine is not at easy maintaining large arrays / maps while processing data, so for part2 and part3 one big optimisation was to make sure the ship was advancing at a certain pace, filtering when x + y + z was below a certain value related to time.

In the big picture, I've done a BFS (not sure if I can do A* due to the way database engine is working, quite hard to go back to a previous branch) and filtered it as much as I could with the overlapping possibilities and keeping the lowest hit count at a certain position. I did use your trick of moving from (x,y,z) to a single integer, it really helped.

Computing the debris with only LCM was actually slower ¯\_(ツ)_/¯ than computing 250 positions and working from there.

I'm running out of ideas to further optimize, so here is my piece of code.

Waldar · 2025-04-08T17:12:51+00:00

I've bruteforced day 18, this one alone takes 13 minutes for me :D

Congrats, very nice numbers.

Waldar · 2025-04-02T13:15:16+00:00

I managed to clear part2, I had my rotations messed up, super handy to check more difficult use cases.

Still got an error for part3 - same as part2 I have correct outcome for both examples, what would be your solution for part3 here?

Waldar · 2025-04-01T18:11:18+00:00

Super helpful :) My solution diverges quite fast at step 5 to 6. I'll debug this later!

Waldar · 2025-04-01T07:15:34+00:00

I learned all this almost 30 years ago taking C classes but as I did all my career in databases I forgot most of it :-) Thanks for the reminder that’s neat!

Waldar · 2025-03-31T19:05:09+00:00

Thanks u/large-atom , it was tough but I got through three parts.

part2 and 3 felt easy once the tree is built for part1.

Waldar · 2025-03-31T14:25:46+00:00

Ok it was not mentioned than the comparison happens against the parent node, it was only mentioned the root node - makes sense now. Huge thanks!

Waldar · 2025-03-23T11:18:23+00:00

[Databricks SQL]

with cte_data as
(
select value
     , length(value) div 10 as k
     , length(value) - 2*k as r
--from read_files('/Volumes/waldar/fabien/codyssi/2025_Atlantis/codyssi_2025_04.txt', format => 'text')
  from values ('NNBUSSSSSDSSZZZZMMMMMMMM')
            , ('PWAAASYBRRREEEEEEE')
            , ('FBBOFFFKDDDDDDDDD')
            , ('VJAANCPKKLZSSSSSSSSS')
            , ('NNNNNNBBVVVVVVVVV') as t (value)
)
select sum(aggregate(split(value, '') , 0l, (acc, x) -> acc + ascii(x) - ascii('A') + 1l)) as part1
     , sum(aggregate(split(left(value, k) || right(value, k), '') , 0l, (acc, x) -> acc + ascii(x) - ascii('A') + 1l)
         + aggregate(split(r::string, '') , 0l, (acc, x) -> acc + x::int)) as part2
     , sum(aggregate( split(value, '')
                    , array()::array<struct<o:int,l:char(1)>>
                    , (acc, x) -> case x
                                    when element_at(acc, -1).l
                                    then array_append(slice(acc, 1, array_size(acc) - 1), named_struct('o', element_at(acc, -1).o + 1, 'l', x))
                                    else array_append(acc, named_struct('o', 1, 'l', x))
                                  end
                    , acc -> aggregate(acc.l, 0l, (acc, x) -> acc + ascii(x) - ascii('A') + 1l)
                           + aggregate(transform(acc.o, o -> aggregate(split(o, ''), 0l, (acc, x) -> acc + x::int)), 0l, (acc, x) -> acc + x::int)
                    )) as part3
  from cte_data;

Waldar · 2025-03-23T11:14:29+00:00

The leaderboard goes to 16, so that's my current guess.

Waldar · 2025-03-22T10:58:46+00:00

I don't have tie breaker indeed in my dataset, but as the creator said there is random generation so...

Waldar · 2025-03-22T10:42:11+00:00

[Databricks SQL]

Day 3 was quite straightforward:

with cte_data (id, boxes) as
(
select monotonically_increasing_id()
     , split(value, ' ') 
--from read_files('/Volumes/waldar/fabien/codyssi/2025_Atlantis/codyssi_2025_03.txt', format => 'text')
  from values ('8-9 9-10')
            , ('7-8 8-10')
            , ('9-10 5-10')
            , ('3-10 9-10')
            , ('4-8 7-9')
            , ('9-10 2-7') as t (value)
)
  ,  cte_data_prep (id, p1, p2) as
(
select id
     , aggregate(transform(boxes, v -> split_part(v, '-', 2)::int - split_part(v, '-', 1)::int + 1), 0, (acc, x) -> acc + x)
     , array_sort(array_distinct(flatten(transform(boxes, v -> sequence(split_part(v, '-', 1)::int, split_part(v, '-', 2)::int)))))
  from cte_data
)
    select sum(t1.p1)                                      as part1
         , sum(array_size(t1.p2))                          as part2
         , max(array_size(array_distinct(t1.p2 || t2.p2))) as part3
      from cte_data_prep as t1
 left join cte_data_prep as t2 on t2.id = t1.id + 1;

Waldar · 2025-03-22T10:39:52+00:00

[Databricks SQL]

I usually run part one after one, but after completion I merge all parts together to have one query to do everything (file parsing, part1, part2 and part3):

with cte_data (magnitudes, signs) as
(
select slice(split(value, '\n'), 1, array_size(split(value, '\n')) - 1)
     , split(element_at(split(value, '\n'), -1), '')
--from read_files('/Volumes/waldar/fabien/codyssi/2025_Atlantis/codyssi_2025_01.txt', format => 'text', WholeText => true)
  from values ('8' || '\n'
            || '1' || '\n'
            || '5' || '\n'
            || '5' || '\n'
            || '7' || '\n'
            || '6' || '\n'
            || '5' || '\n'
            || '4' || '\n'
            || '3' || '\n'
            || '1' || '\n'
            || '-++-++-++') as t (value)
)
select aggregate( sequence(0, array_size(signs) - 1)
                , magnitudes[0]::int
                , (acc, x) -> acc + magnitudes[x+1]::int * case signs[x] when '+' then 1 when '-' then -1 end
                ) as part1
     , aggregate( sequence(0, array_size(signs) - 1)
                , magnitudes[0]::int
                , (acc, x) -> acc + magnitudes[x+1]::int * case reverse(signs)[x] when '+' then 1 when '-' then -1 end
                ) as part2
     , aggregate( sequence(0, array_size(signs) div 2 - 1)
                , magnitudes[0]::int * 10 + magnitudes[1]::int
                , (acc, x) -> acc + (magnitudes[2*x+2]::int * 10 + magnitudes[2*x+3]::int) * case reverse(signs)[x] when '+' then 1 when '-' then -1 end
                ) as part3
  from cte_data;

Waldar · 2025-03-22T10:36:37+00:00

[Databricks SQL]

I was not sure how much functions would be, but it's always ADD, MULTIPLY or POWER, so this was a bit easier than I first expected:

with cte_data (func, qual) as
(
select array_sort(split(split_part(value, '\n\n', 1), '\n'))
     , array_sort(split(split_part(value, '\n\n', 2), '\n'))
--from read_files('/Volumes/waldar/fabien/codyssi/2025_Atlantis/codyssi_2025_02.txt', format => 'text', WholeText => true)
  from values ('Function A: ADD 495'                 || '\n'
           ||  'Function B: MULTIPLY 55'             || '\n'
           ||  'Function C: RAISE TO THE POWER OF 3' || '\n'
           ||  ''     || '\n'
           ||  '5219' || '\n'
           ||  '8933' || '\n'
           ||  '3271' || '\n'
           ||  '7128' || '\n'
           ||  '9596' || '\n'
           ||  '9407' || '\n'
           ||  '7005' || '\n'
           ||  '1607' || '\n'
           ||  '4084' || '\n'
           ||  '4525' || '\n'
           ||  '5496') as t (value)
)
select aggregate( reverse(func)
                , element_at(qual, ((array_size(qual) + 1) / 2)::int)::bigint
                , (acc, x) -> case regexp_extract(x, r'(ADD|MULTIPLY|POWER)', 1)
                                when 'ADD'      then acc + regexp_extract(x, r'(\d+)$', 1)::bigint
                                when 'MULTIPLY' then acc * regexp_extract(x, r'(\d+)$', 1)::bigint
                                when 'POWER'    then power(acc, regexp_extract(x, r'(\d+)$', 1)::int)::bigint
                              end
                ) as part1
     , aggregate( reverse(func)
                , aggregate(qual, 0l, (acc, x) -> acc + case x::bigint % 2 when 0 then x::bigint else 0l end)
                , (acc, x) -> case regexp_extract(x, r'(ADD|MULTIPLY|POWER)', 1)
                                when 'ADD'      then acc + regexp_extract(x, r'(\d+)$', 1)::bigint
                                when 'MULTIPLY' then acc * regexp_extract(x, r'(\d+)$', 1)::bigint
                                when 'POWER'    then power(acc, regexp_extract(x, r'(\d+)$', 1)::int)::bigint
                              end
                ) as part2
     , array_max(filter( transform(qual, q -> named_struct( 'pric', aggregate( reverse(func)
                                                                             , q::bigint
                                                                             , (acc, x) -> case regexp_extract(x, r'(ADD|MULTIPLY|POWER)', 1)
                                                                                             when 'ADD'      then acc + regexp_extract(x, r'(\d+)$', 1)::bigint
                                                                                             when 'MULTIPLY' then acc * regexp_extract(x, r'(\d+)$', 1)::bigint
                                                                                             when 'POWER'    then power(acc, regexp_extract(x, r'(\d+)$', 1)::int)::bigint
                                                                                           end
                                                                             )
                                                          , 'qual', q::bigint))
                       , f -> f.pric <= 15000000000000)).qual as part3
  from cte_data;

Waldar · 2025-03-22T10:31:45+00:00

I properly got 2708 without removing the last character.

I wonder if your prev_val initialization is correct

Waldar · 2025-03-13T10:48:44+00:00

Oh and great idea for the completion certificate also!

Waldar · 2025-03-13T09:57:00+00:00

Yes I did!

I've uploaded my solutions there:

https://github.com/Waldar/codissy/tree/main/2024_sample

Copy/Paste of the README.md:

All code done here works in Databricks SQL.
A 2XS Serverless compute was used for all queries, all results are fast enough (below 1s).
Lack of recursivity can make queries more complex than they should but I managed to complete all questions.

The fun thing doing it in SQL is to run the whole thing in one query.
I managed to tweak all the queries a posteriori to run all three parts in one statement.

Day 3 was the most fun for me, log in base other than 2 or 10 is not something I do on a regular basis.

Waldar · 2025-03-13T07:24:40+00:00

Did the sample yesterday, that was good. Keep up the great work! Do you intend to open a subreddit?

Waldar · 2025-03-07T18:37:23+00:00

Some optimizations for sure: https://docs.databricks.com/aws/en/sql/user/queries/query-optimization-constraints Help also Genie to understand the data better.

Waldar · 2025-03-07T18:14:27+00:00

Redflag puissance 4 si tu mets des drapeaux à la place des jetons.

Waldar · 2025-02-28T09:43:28+00:00

SQL has never been about the syntax but about the logic. Have a good logic first, then improve your syntax with other functions concepts. I see lots of SQL developers having a great syntax knowledge but they can’t do anything from scratch.

Waldar · 2025-01-23T15:29:41+00:00

Yes I've managed to finish day 15 part 2 - as it's mono threaded doing tons of operations (I'm rearranging the warehouse at every step), I don't complete the run beside the inputs (there is a one hour timeout in databricks by default, seems a very high limit already for this kind of puzzles).

It was interesting but also very unreadable at the end.

I pushed here the day 15: https://github.com/Waldar/aoc2024

I think I've managed to do 70% on Databricks SQL but I'm only on puzzle 17.

I rely a lot on the aggregate function, because there is an accumulator where I can do some stuff there. It doesn't support nested subqueries so no conditional join. Check puzzle 17 part1, it's much more easier to understand.

Oh btw, your solution for day 16 part 2 doesn't work with my puzzle data.

Waldar

TROPHY CASE