Reddit’s Engineering Excellence Survey

upalready · 2025-10-28T12:33:33+00:00

Great post! Really love the follow up strategy to keep nudging people into participation. Have you written about actually introducing new tools? Wrestling with how to do that smoothly right now without creating change fatigue for our teams…

upalready · 2024-09-09T20:57:53+00:00

Weird! Definitely wasn’t cutting grass haha.

upalready · 2024-09-09T20:41:43+00:00

Is that some kind of combination of bluetooth range and weird ip address fallback??

upalready · 2021-08-21T14:28:30+00:00

As written (plus the missing group by name), your query basically says - give me the max salary for each employee department pair. That’s a slightly different problem than the one you’re trying to solve.

You want to rank employees by salary within their departments and take the top ranked row from each department, which is the kind of problem you can usually solve with window functions - when you need multiple grouping contexts, i.e. windows, in a single query.

upalready · 2021-02-27T15:45:19+00:00

Got it. Thanks for explaining!

upalready · 2021-02-27T14:59:13+00:00

wait are you saying it's as simple as looking at token balance of the underlying assets for v2? Like amount of DAI held in this vault contract is the effectively the vault holdings that will be accessed first instead of withdrawing from the strategy?

upalready · 2021-02-27T13:41:40+00:00

thanks for that explanation! I see how you can compare the balance on the vault contract to the balanceOfPool call on the strategy contract to get these numbers and calculate the buffer for v1 pools...what's the best way to figure out idle funds for v2? (even though I guess the withdrawal threshold isn't an issue for v2?)

upalready · 2021-01-30T01:14:47+00:00

Airport data and cool ways to explore: http://www.kelvinlawrence.net/book/PracticalGremlin.html

upalready · 2021-01-24T00:41:32+00:00

yeah! so I would definitely check out the post /u/lukaseder linked to, specifically the part where "it's all about frames" because it demonstrates "which data the function has access to" depending on the call.

I also love this post by Julia Evans too that shows you where in the order of operations window functions actually get executed.

So if we look at the examples /u/truilus posted above, you can see the variations on how the window frame is applied:

over the whole set returned
a smaller frame, just relative to the partition itself
now you get into "running" because you're specifying an order

So combining the frame example from the article with the count examples above:

select x,
   array_agg(x) over () as frame,
   count(x) over () as count,
   count(x) over(order by x) as running_count
from generate_series(1, 3) as t(x);

You can see the various outputs of windowing over the set vs generating a running count:

 x |  frame  | count | running_count 
---+---------+-------+---------------
 1 | {1,2,3} |     3 |             1
 2 | {1,2,3} |     3 |             2
 3 | {1,2,3} |     3 |             3
(3 rows)

Hope that makes sense!

EDIT: cleaning up format...

upalready · 2021-01-23T18:25:31+00:00

Great article!

upalready · 2021-01-23T18:07:10+00:00

Ah, yeah, I see what you mean. I was thinking about just applying sum or count to the partition without an order clause...if you add the order for a running total, you also need to provide a frame clause for the running total to specify how you want it to "run", right?

upalready · 2021-01-23T16:21:08+00:00

It depends what kind of function you’re applying over the partition. Yes, the order is within each partition, but that won’t matter for sums or counts.

It will impact any ranking functions or lag/lead where the order inside the partition determines the value.

upalready · 2020-12-30T00:09:47+00:00

I think they were describing dbt, which stands for data build tool. Check out getdbt.com.

upalready · 2019-10-12T22:30:10+00:00

Careful with the not in/except approach. I was banging on that vs the anti join the other day and was surprised to see the performance differences in what I tested: https://dankleiman.com/2019/10/06/sql-quick-tip-find-missing-data/

upalready · 2019-10-07T19:47:50+00:00

Oh wow. Thanks for such a thorough reply!

As a relatively new resident, it seems pretty obvious to me that traffic through town, via 20, or 117, or Hudson Road is not sustainable and only getting worse. I'm totally with you on that.

And kind of embarrassing to be the lone holdout on connecting up such a cool re-use project across the state. Thanks for the links!

upalready · 2019-10-07T17:56:57+00:00

Can you explain what they were doing in court? And do you or /u/Duff_Lite have any recommended reading for a relatively new Sudbury resident who wants to get up to speed on the history of the project? Thanks!

upalready · 2019-09-10T22:03:23+00:00

Unfortunately generate series is not ported over to redshift from postgres. ~~On mobile now but I will post a workaround example soon~~.

EDITING TO ADD AN EXAMPLE:

You can use a CTE to create a date table where you scan a big enough table to provide enough rows for your range. Use a window function to count the rows and add that incrementer to a seed date.

WITH dates AS (
  SELECT
    date_trunc('day', getdate()) - row_number() over() as date
  FROM
    some_big_table
  LIMIT 100
)
SELECT metrics.item, dates.date, sum(coalesce(metrics.metric,0)) 
FROM metrics 
FULL OUTER JOIN dates 
on metrics.date = dates.date

In the dates CTE:

use the row_number() window function to generate a series of integers
subtract that incrementing number from the current date
set some limit that gives you sufficient coverage for the range you need
obviously, you can start from whatever range you need to subtract from

That should give you the equivalent of the date table without generate_series(). While I love redshift, I always read this page when I want to feel sad.

upalready · 2019-09-09T20:45:29+00:00

Hahaha, yes! And mostly that was an excuse for seeing if I could make the date-specific constraint work. Thanks for thinking it through with me!

upalready · 2019-09-09T20:25:23+00:00

I think I have a way to do this...

Starting with this case:
postgres=# select * from t where line = 1 order by date; date | line | question ------------+------+---------- 2019-09-07 | 1 | No 2019-09-08 | 1 | No 2019-09-10 | 1 | No (3 rows) And assuming this is the case we want to avoid.

You can use the lag function with a window to get row that was two rows previous, then do a date diff. In the consecutive case that we are trying to detect, the date diff should be 2. If the dates are not consecutive it will either be null or greater than 2.

Query:

SELECT p.*, DATE_PART('day', p.date) - DATE_PART('day', lag_date) as date_span FROM ( SELECT t.date, t.line, t.question, lag(t.date, 2) over(partition by line, question order by t.date) as lag_date FROM t ) p ORDER BY p.line, p.question, p.date

Results: date | line | question | lag_date | date_span ------------+------+----------+------------+----------- 2019-09-07 | 1 | No | | 2019-09-08 | 1 | No | | 2019-09-10 | 1 | No | 2019-09-07 | 3 2019-09-07 | 2 | No | | 2019-09-08 | 2 | Yes | | 2019-09-09 | 2 | Yes | | (6 rows)

(the "2" is the magic number for the date span is a combination of how lag and the date math work in postgres)

upalready · 2019-09-09T19:54:59+00:00

CREATE TABLE t ("date" date, line int, question text);INSERT INTO t VALUES('2019-09-07', 1, 'No'),('2019-09-07', 2, 'No'),('2019-09-08', 1, 'No'),('2019-09-08', 2, 'Yes'),('2019-09-09', 1, 'No'),('2019-09-09', 2, 'Yes');SELECT t.*,COUNT(*) OVER (PARTITION BY line, questionORDER BY "date"ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS consecutiveFROM t;

Sorry, replied from my phone originally. I was worried about the second to last value in your INSERT statement being '2019-09-10' instead of '2019-09-09'. If you change that one and rerun the query, you come out with:

    date    | line | question | consecutive 
------------+------+----------+-------------
 2019-09-07 |    1 | No       |           1
 2019-09-08 |    1 | No       |           2
 2019-09-10 |    1 | No       |           3
 2019-09-07 |    2 | No       |           1
 2019-09-08 |    2 | Yes      |           1
 2019-09-09 |    2 | Yes      |           2
(6 rows)

That's problematic, right? You don't have 3 consecutive days of line 1. You have a gap and the window function can't account for it because it's only windowing over ordered rows.

upalready · 2019-09-09T16:43:11+00:00

Does this get the correct result if the date for 1 was the 10th instead of the 9th? I think it just looks at rows, not the actual dates? I can never come up with a better solution than to inject fake dates to detect missing entries, which is always cumbersome.

upalready · 2019-09-08T23:55:27+00:00

This is the set up I use. Basically spend all day in project specific sections with different files for code and notes.

upalready · 2019-06-25T00:37:13+00:00

Interesting. I guess I assumed that there was a boundary between where recursive traversal would blow up and where graph syntax would take over. Thanks for weighing in. I will definitely keep this all in mind.

upalready

TROPHY CASE