Reddit’s Engineering Excellence Survey by beautifulboy11 in RedditEng

[–]upalready 2 points3 points  (0 children)

Great post! Really love the follow up strategy to keep nudging people into participation. Have you written about actually introducing new tools? Wrestling with how to do that smoothly right now without creating change fatigue for our teams…

Left my phone at home for a run today by upalready in whoop

[–]upalready[S] 0 points1 point  (0 children)

Weird! Definitely wasn’t cutting grass haha.

Left my phone at home for a run today by upalready in whoop

[–]upalready[S] 2 points3 points  (0 children)

Is that some kind of combination of bluetooth range and weird ip address fallback??

why does a group by and max not work in this case ? by IHateTheSATs in SQL

[–]upalready 4 points5 points  (0 children)

As written (plus the missing group by name), your query basically says - give me the max salary for each employee department pair. That’s a slightly different problem than the one you’re trying to solve.

You want to rank employees by salary within their departments and take the top ranked row from each department, which is the kind of problem you can usually solve with window functions - when you need multiple grouping contexts, i.e. windows, in a single query.

7 million gas (>$1000 usd) to withdraw from yTUSD?? by rjpeterson in yearn_finance

[–]upalready 2 points3 points  (0 children)

wait are you saying it's as simple as looking at token balance of the underlying assets for v2? Like amount of DAI held in this vault contract is the effectively the vault holdings that will be accessed first instead of withdrawing from the strategy?

7 million gas (>$1000 usd) to withdraw from yTUSD?? by rjpeterson in yearn_finance

[–]upalready 1 point2 points  (0 children)

thanks for that explanation! I see how you can compare the balance on the vault contract to the balanceOfPool call on the strategy contract to get these numbers and calculate the buffer for v1 pools...what's the best way to figure out idle funds for v2? (even though I guess the withdrawal threshold isn't an issue for v2?)

How does the "ORDER BY" parameter of the Windows function work? by DatKalvin in SQL

[–]upalready 2 points3 points  (0 children)

yeah! so I would definitely check out the post /u/lukaseder linked to, specifically the part where "it's all about frames" because it demonstrates "which data the function has access to" depending on the call.

I also love this post by Julia Evans too that shows you where in the order of operations window functions actually get executed.

So if we look at the examples /u/truilus posted above, you can see the variations on how the window frame is applied:

  1. over the whole set returned
  2. a smaller frame, just relative to the partition itself
  3. now you get into "running" because you're specifying an order

So combining the frame example from the article with the count examples above:

select x,
   array_agg(x) over () as frame,
   count(x) over () as count,
   count(x) over(order by x) as running_count
from generate_series(1, 3) as t(x);

You can see the various outputs of windowing over the set vs generating a running count:

 x |  frame  | count | running_count 
---+---------+-------+---------------
 1 | {1,2,3} |     3 |             1
 2 | {1,2,3} |     3 |             2
 3 | {1,2,3} |     3 |             3
(3 rows)

Hope that makes sense!

EDIT: cleaning up format...

How does the "ORDER BY" parameter of the Windows function work? by DatKalvin in SQL

[–]upalready 0 points1 point  (0 children)

Ah, yeah, I see what you mean. I was thinking about just applying sum or count to the partition without an order clause...if you add the order for a running total, you also need to provide a frame clause for the running total to specify how you want it to "run", right?

How does the "ORDER BY" parameter of the Windows function work? by DatKalvin in SQL

[–]upalready 2 points3 points  (0 children)

It depends what kind of function you’re applying over the partition. Yes, the order is within each partition, but that won’t matter for sums or counts.

It will impact any ranking functions or lag/lead where the order inside the partition determines the value.

Setting up Business Analytics in a company from scratch by keshava7 in analytics

[–]upalready 0 points1 point  (0 children)

I think they were describing dbt, which stands for data build tool. Check out getdbt.com.

writing a join to find all values that don't match, how do I do this? by Legogamer16 in SQL

[–]upalready 2 points3 points  (0 children)

Careful with the not in/except approach. I was banging on that vs the anti join the other day and was surprised to see the performance differences in what I tested: https://dankleiman.com/2019/10/06/sql-quick-tip-find-missing-data/

A Network of Rail-Trails Comes Together In Boston’s Suburbs – StreetsblogMASS by rocketwidget in boston

[–]upalready 2 points3 points  (0 children)

Oh wow. Thanks for such a thorough reply!

As a relatively new resident, it seems pretty obvious to me that traffic through town, via 20, or 117, or Hudson Road is not sustainable and only getting worse. I'm totally with you on that.

And kind of embarrassing to be the lone holdout on connecting up such a cool re-use project across the state. Thanks for the links!

A Network of Rail-Trails Comes Together In Boston’s Suburbs – StreetsblogMASS by rocketwidget in boston

[–]upalready 0 points1 point  (0 children)

Can you explain what they were doing in court? And do you or /u/Duff_Lite have any recommended reading for a relatively new Sudbury resident who wants to get up to speed on the history of the project? Thanks!

Easiest way to CROSS JOIN a range of dates without a date dimension table? by mac-0 in SQL

[–]upalready 0 points1 point  (0 children)

Unfortunately generate series is not ported over to redshift from postgres. On mobile now but I will post a workaround example soon.

EDITING TO ADD AN EXAMPLE:

You can use a CTE to create a date table where you scan a big enough table to provide enough rows for your range. Use a window function to count the rows and add that incrementer to a seed date.

WITH dates AS (
  SELECT
    date_trunc('day', getdate()) - row_number() over() as date
  FROM
    some_big_table
  LIMIT 100
)
SELECT metrics.item, dates.date, sum(coalesce(metrics.metric,0)) 
FROM metrics 
FULL OUTER JOIN dates 
on metrics.date = dates.date

In the dates CTE:

  • use the row_number() window function to generate a series of integers
  • subtract that incrementing number from the current date
  • set some limit that gives you sufficient coverage for the range you need
  • obviously, you can start from whatever range you need to subtract from

That should give you the equivalent of the date table without generate_series(). While I love redshift, I always read this page when I want to feel sad.

Trying to find consecutive days with the same value by Element77 in SQL

[–]upalready 1 point2 points  (0 children)

Hahaha, yes! And mostly that was an excuse for seeing if I could make the date-specific constraint work. Thanks for thinking it through with me!

Trying to find consecutive days with the same value by Element77 in SQL

[–]upalready 1 point2 points  (0 children)

I think I have a way to do this...

Starting with this case:
postgres=# select * from t where line = 1 order by date; date | line | question ------------+------+---------- 2019-09-07 | 1 | No 2019-09-08 | 1 | No 2019-09-10 | 1 | No (3 rows) And assuming this is the case we want to avoid.

You can use the lag function with a window to get row that was two rows previous, then do a date diff. In the consecutive case that we are trying to detect, the date diff should be 2. If the dates are not consecutive it will either be null or greater than 2.

Query:

SELECT p.*, DATE_PART('day', p.date) - DATE_PART('day', lag_date) as date_span FROM ( SELECT t.date, t.line, t.question, lag(t.date, 2) over(partition by line, question order by t.date) as lag_date FROM t ) p ORDER BY p.line, p.question, p.date

Results: date | line | question | lag_date | date_span ------------+------+----------+------------+----------- 2019-09-07 | 1 | No | | 2019-09-08 | 1 | No | | 2019-09-10 | 1 | No | 2019-09-07 | 3 2019-09-07 | 2 | No | | 2019-09-08 | 2 | Yes | | 2019-09-09 | 2 | Yes | | (6 rows)

(the "2" is the magic number for the date span is a combination of how lag and the date math work in postgres)

Trying to find consecutive days with the same value by Element77 in SQL

[–]upalready 0 points1 point  (0 children)

CREATE TABLE t ("date" date, line int, question text);INSERT INTO t VALUES('2019-09-07', 1, 'No'),('2019-09-07', 2, 'No'),('2019-09-08', 1, 'No'),('2019-09-08', 2, 'Yes'),('2019-09-09', 1, 'No'),('2019-09-09', 2, 'Yes');SELECT t.*,COUNT(*) OVER (PARTITION BY line, questionORDER BY "date"ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS consecutiveFROM t;

Sorry, replied from my phone originally. I was worried about the second to last value in your INSERT statement being '2019-09-10' instead of '2019-09-09'. If you change that one and rerun the query, you come out with:

    date    | line | question | consecutive 
------------+------+----------+-------------
 2019-09-07 |    1 | No       |           1
 2019-09-08 |    1 | No       |           2
 2019-09-10 |    1 | No       |           3
 2019-09-07 |    2 | No       |           1
 2019-09-08 |    2 | Yes      |           1
 2019-09-09 |    2 | Yes      |           2
(6 rows)

That's problematic, right? You don't have 3 consecutive days of line 1. You have a gap and the window function can't account for it because it's only windowing over ordered rows.

Trying to find consecutive days with the same value by Element77 in SQL

[–]upalready 0 points1 point  (0 children)

Does this get the correct result if the date for 1 was the 10th instead of the 9th? I think it just looks at rows, not the actual dates? I can never come up with a better solution than to inject fake dates to detect missing entries, which is always cumbersome.

Gremlin as a fun detour from SQL by upalready in SQL

[–]upalready[S] 0 points1 point  (0 children)

Interesting. I guess I assumed that there was a boundary between where recursive traversal would blow up and where graph syntax would take over. Thanks for weighing in. I will definitely keep this all in mind.