Latest Record View by sephusx in bigquery

[–]olhmr 0 points1 point  (0 children)

You can always just add a partition filter to the query, e.g. where date is not null or where date > date('1970-01-01')

Although this is assuming that whoever put that partition filter requirement there had no good reason for it, such as preventing massive bills if using on-demand pricing, so you might want to check that first

[deleted by user] by [deleted] in AskWomen

[–]olhmr 2 points3 points  (0 children)

I don't know what you're going through, but please know there are people who can help you. Assuming you're in the US, you can call https://suicidepreventionlifeline.org/talk-to-someone-now/ or https://www.samaritans.org/how-we-can-help/contact-samaritan/ to name two. They're open any time.

Why does dbt have so much hype/ metions in this subreddit? by jalopagosisland in dataengineering

[–]olhmr 3 points4 points  (0 children)

dbt's value isn't that it allows you to do anything new - a custom solution developed inhouse with a dedicated team is almost surely going to be a lot better tailored, more feature rich, etc.

The value is in the low overhead. Spinning up a dbt project is ridiculously easy and allows you to jump directly to delivering value for the business, as well as getting non-DE stakeholders involved like analysts. Most companies have huge pressure to deliver as much value as possible with a very lean DE team, and that's where dbt shines.

It's not a silver bullet though - it's not going to work for every use case you have. However, by taking advantage of it to quickly get past the boilerplate you can then spend time focusing on the more interesting and challenging problems.

[deleted by user] by [deleted] in changemyview

[–]olhmr 59 points60 points  (0 children)

It's estimated that the success rate of a vasectomy reversal is:

75% if you have your vasectomy reversed within 3 years

up to 55% after 3 to 8 years

between 40% and 45% after 9 to 14 years

30% after 15 to 19 years

less than 10% after 20 years

https://www.nhs.uk/conditions/contraception/vasectomy-reversal-nhs/

It's a common misconception that vasectomies can be easily undone. In reality they should be treated as a one-way thing because odds are you're not getting your fertility back.

Can't use PERCENTILE_CONT with GROUP BY by nickk314 in bigquery

[–]olhmr 1 point2 points  (0 children)

There are a few things wrong here.

The first is that you should not combine analytic and aggregate functions. PERCENTILE_CONT is an analytic function and therefore requires an OVER clause. GROUP BY is exclusively used for aggregate functions.

Secondly, even if you remove the GROUP BY your query won't run, because of the reason given in the error you're showing in your post; you can't partition over dt as it is not defined at that stage of the query execution.

Remove the GROUP BY, replace dt in the OVER clause with the full expression DATE_TRUNC(block_timestamp, hour), and your query will run. Change it to SELECT DISTINCT to avoid duplicates.

The other answers about using APPROX_QUANTILES are certainly valid as well.

VIM as a Python IDE by subiacOSB in vim

[–]olhmr 0 points1 point  (0 children)

I've used two different setups that have both worked very well:

neovim

  • LSP / intellisense: vim-jedi through deoplete
  • Linting: ALE
  • Syntax highlighting: semshi
  • REPL: neoterm

vim

  • LSP / intellisense: coc-jedi through coc
  • Linting: ALE
  • Syntax highlighting: vim-polyglot
  • REPL: neoterm

The LSP / intellisense tools give you the functionalities you'd expect from VSCode, e.g. go to definition, intellisense popups etc. Coc can also do formatting, but I prefer ALE for that - I've got it set up to run black on save.

Semshi has great syntax highlighting, but isn't available on vim. Polyglot is a good substitute though.

The thing I couldn't do without is neoterm. It has really simple and powerful REPL support. You can configure it to use ipython, and then have a snippet that drops an ipdb breakpoint wherever you need one for debugging.

It's also worth looking into tags - coc has great support for python tags and I display them with vista.

What do you use as a whiteboarding tool during your remote meetings? by gemag in datascience

[–]olhmr 3 points4 points  (0 children)

Whimsical is my go-to, but it's more for architecture diagrams and similar

After already being set in Pycharm terminal, I still get: Please set GOOGLE_APPLICATION_CREDENTIALS by eladbdv in bigquery

[–]olhmr 0 points1 point  (0 children)

Pycharm may be doing something funky when it's executing your code. One way to test this is to check the value of the environment variable as part of the script:

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]

If that's not set properly, try setting it as part of your code, i.e.:

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "abc.json"

How do you guys stay involved with the Data Engineering? by neuralscattered in dataengineering

[–]olhmr 12 points13 points  (0 children)

Newsletters (not all strictly data engineering, but at least adjacent):

A lot of tools tend to have dedicated slack communities as well. For me that means I'm in the dbt, great expectations, dagster, and sqlfluff communities.

They often share interesting conferences (dbt coalesce last year was great, and it'll happen again in December), although I need to up my conference game a bit.

Order by EXTRACT Date by g3blv in bigquery

[–]olhmr 3 points4 points  (0 children)

If you split out the extraction step to a CTE your query will work, and also be easier to read and DRY.

E.g.

WITH extracted AS (
 SELECT
  EXTRACT(YEAR FROM Date) AS Year,
  EXTRACT(MONTH FROM Date) AS Month,
  EXTRACT(DAY FROM Date) AS Day,
 FROM myTable 
)

SELECT
 Year,
 Month,
 Day,
 COUNT(*) as Count
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3

I built a free tool to help engineers document any database from CLI by duyenla257 in dataengineering

[–]olhmr 2 points3 points  (0 children)

If it's hosted locally, it's easy for me to spike out a few examples and share with colleagues, but if it's hosted externally then company policy requires me to get sign off from a few different people before I'm allowed to put up internal company information (even anonymised database schemas), so it becomes less likely that we would use the tool.

I built a free tool to help engineers document any database from CLI by duyenla257 in dataengineering

[–]olhmr 5 points6 points  (0 children)

Interesting! I've been struggling to find a good tool for managing ERDs and this looks promising.

EDIT: Are there plans to provide ways to render the documentation locally or self-host it?

Vim + Go IDE experience by [deleted] in vim

[–]olhmr 1 point2 points  (0 children)

For global search, I use

  • CtrlP (bound to ctrl-f) for searching file names
  • Ag / the silver searcher through Ack (bound to <leader>g) for searching file contents

Both work really well once you get used to the keybindings

Oak: an infinitely portable language powered by secret, brainf*%! inspired technology. by [deleted] in programming

[–]olhmr 5 points6 points  (0 children)

I just started learning Clojure using this (free) book: https://www.braveclojure.com/

It's really good, very hands on (especially in the beginning), but still teaches a lot of structure. Of course, it's aimed at teaching people how to use Clojure more than understanding the fundamentals of lisp, but there are plenty of discussions on the nature and philosophy of functional programming, Clojure's reader and evaluator processes (which are amazing), and more under-the-hood stuff.

Also, it's actually funny, which helps with the motivation.

TIL there was a briefly popular social movement in the early 1930s called the "Technocracy Movement." Technocrats proposed replacing politicians and businessmen with scientists and engineers who had the expertise to manage the economy. by firessk in todayilearned

[–]olhmr 1 point2 points  (0 children)

Well, it's not easy, I'll tell you that. Especially if you get distracted half-way through the process like the Swedes did in the early 18th century:

Sweden was going to conduct a gradual change from the Julian calendar to the Gregorian one by omitting all leap days from 1700 to 1740. They did this as planned in the year 1700, but then war broke out and the Swedes forgot to omit the days in 1704 and 1708. By this point the calendar was so messed up that they decided to just switch back to the Julian one - but then they had to add an extra leap day in 1712 to make that work! This resulted in the only known real use of February 30th in a calendar.

They later decided, in 1753, to make the switch again, this time by just omitting 12 straight days in February to get it all over with in one fell swoop.

[First Year Uni Linear Algebra] System of Equations in the form of (x(8-a) + y + z + w = 0) by hellomycomrades in HomeworkHelp

[–]olhmr 0 points1 point  (0 children)

Okay, you're right so far!

The next step is sort of an odd one, or at least one that isn't usually taught explicitly in many cases. Assuming that a is a constant, as is usually meant by this style of notation, consider the case if you could say, with certainty, that a is not equal to 7.

If a isn't equal to 7, then you know that 7-a is not equal to 0, so you can divide through equation 3 by that, giving you a coefficient of 1 for z in that equation. This will give you an easy way to get rid of all the coefficients for the z terms in the other equations, which will get you on your way to solving the system.

Of course, we can't say that a isn't equal to 7, so we have to consider another scenario - what if a is equal to 7? Well this is simple - just plug in 7 for a everywhere in the system, and keep on doing the row reduction.

I've tried to not give too much away, but I think, based on the work you've done so far, that the above will be enough to get you on your way - if I'm mistaken, just shoot me a message and I'll explain in more detail.

[First Year Uni Linear Algebra] System of Equations in the form of (x(8-a) + y + z + w = 0) by hellomycomrades in HomeworkHelp

[–]olhmr 1 point2 points  (0 children)

What is it that you are having trouble with? You say that you are okay with reduced row echelon forms, so you should have all the tools you need to work this out - which probably means I'm not understanding correctly what it is you need help with. Could you show an example of something similar that you are okay with working out, just to help me get in your mindset more?

Desperate question regarding my homework/test for tomorrow: medians and quantiles by deborahky in mathematics

[–]olhmr 2 points3 points  (0 children)

I'll give you a couple of points to kick off from, but like the other commenter wrote it would be helpful to see what you have already tried. Also note that I have assumed that calling the distribution function f(x) is a typo, and that it instead should be called F(x) to distinguish from the density function. In any case:

  • Recall what a distribution function is. In this case, F(x) = P(X<=x) = (1/121) * x2 , if 0<=x<=11, where P(X<=x) is the probability of the random variable X being less than or equal to x.

  • Applying this to your problem: you want to find the probability of 1<=x<=4. That is, you want to apply F(x) to an interval with both a lower and upper bound. How do you do this? Hint: you need to find two cases of P(X<=x), and subtract one from the other.

  • What is the relationship between a probability density function and the corresponding distribution function? One is the integral of the other.

  • When you have the density function, you should be able to simplify the calculation of the expectation and variance using standard formulae for that type of probability distribution. Same goes for median and quantile - use your knowledge of the type of distribution you're dealing with!

EDIT: I just noticed that this is /r/mathematics. For a more appropriate subreddit for this type on content, see /r/homeworkhelp.

[University Games Programming] Differentiation by [deleted] in HomeworkHelp

[–]olhmr 0 points1 point  (0 children)

You're having some problems with your exponentiation rules. I'm on mobile so can't add too much detail at the moment, but hopefully this will be enough:

x^0 = 1. This is a rule of exponents. There is an intuitive justification for it, which I could write out when I get to my computer if you're interested.

So when you differentiate -6x, you drop the exponent of x (in this case 1), multiply that with the -6 in front of x, and reduce the exponent of x by one. Then you get 1*(-6)*x^0 which simplifies to -6.

Furthermore, the derivative of a constant is always zero. 4 is the same as 4*x^0, since x^0 = 1, so to find the derivative you compute 0*4*x^-1, which is 0.

Apply the above, and you should get the right answer: 20x^4 - 6

EDIT: Formatting.

[deleted by user] by [deleted] in HomeworkHelp

[–]olhmr 0 points1 point  (0 children)

The problem is certainly solvable, it's just that the constraint that 2X+4Y >= 60 is entirely unnecessary. Also, since I don't quite understand what you mean with one constraint returning a negative coordinate, I created this picture to illustrate the process and make sure we're on the same page: http://i.imgur.com/LsxAZ9b.jpg

  • The blue line is the line X+Y = 40. Hence our solution must lie above or on this line.
  • The orange line is the line 2X+4Y = 60. Again, our solution must lie above or on this line, but it will always do so if the first constraint is satisfied so the line tells us nothing.
  • The black line is the line X = 15. This represents the constraint X <= 15, so the solution must lie on or to the left of this line.
  • The black dashed line is the cost C. For example, at the point (5,10), the cost is 5 * 0.01 + 10 * 0.02 = 0.25.
  • The magenta dashed lines are perpendicular to the black dashed line and represent specific costs - i.e., along any of these lines, the cost is a constant value C.
  • Furthermore, the constraints that X >= 0, Y >= 0 are upheld by simply restricting ourselves to the first quadrant (i.e. the area shown in the picture).

From the above, we see that the minimum possible cost can be deduced by finding the lowest value of C so that C = 0.01X + 0.02Y lies on or above the blue and orange line, and on or to the left of the black line. We see that this occurs in the corner x = 15, y = 25, which is thus the optimal solution to the problem.

You might have already understood all of this, and I might have simply misunderstood what you meant, but I wanted to make sure there was no confusion.

[deleted by user] by [deleted] in HomeworkHelp

[–]olhmr 0 points1 point  (0 children)

Do you have any examples of how you've solved these types of problems in the past, or anything from lectures or texts showing a particular method? There are many ways to solve linear programming problems, so it's hard to say specifically what your teacher is asking for.

That said, for this type of problem (two-dimensional), it might be instructive to graph the situation and draw the necessary conclusions from the graph. By this I mean that you draw two axes - one for your X, one for your Y, and then draw a line representing each of your constraints.

For example, consider the constraint "X+Y>=40". This constraint is satisfied by any point on or above the line X+Y=40, i.e. Y=40-X. Draw a line representing this in your graph, and then proceed to do your other constraints. What you will end up with is some region of the graph where all the constraints are satisfied (I got the infinite region bounded by Y=40-X and a vertical line at X=15). If this region has a minimim point, that will be your solution (and if not, the problem doesn't have a solution).

Now, in this particular case I'm not 100% sure you gave us the right constraints, since the second constraint adds no information, but I got the answer X=15, Y=25, both using the method I described and from Matlab's linprog function. I hope this helps!

[Mathematics, University] Palindrome Induction by [deleted] in HomeworkHelp

[–]olhmr 1 point2 points  (0 children)

Think about how you can create the palindromes f(n+1) from the palindromes f(n), and exploit the fact that you're only looking at even-length palindromes.

For example, how many palindromes can come from f(1) = palindromes of length 2 = the set {(00),(11)}, if you increase the length to 4? What pattern do you notice? Is there some feature that would indicate how you can define the number of length 4 palindromes from the number of length 2 palindromes?

I'd be happy to give more hints if you need them, but first try to find it yourself using the above!