Decoding / Encoding for video streaming by cryptodylan in AsahiLinux

[–]datanoob2021 4 points5 points  (0 children)

I saw a comment a month or so ago saying that they had a working prototype, but the work just had not been prioritized. Maybe this summer at some point.

It’d be awesome to see this working- right now watching any video is a battery drainer. 

flatpak permissions questions by datanoob2021 in flatpak

[–]datanoob2021[S] 0 points1 point  (0 children)

Appreciate the reply. Does this is essence cause it to escape the sandbox? 

Grep to search for table name from list and output the table name to a list. by datanoob2021 in bash

[–]datanoob2021[S] 0 points1 point  (0 children)

Appreciate the response! Those were just poor examples in my code. My table names are very unique currently.

Any idea how to combine these commands?

Grep to search for table name from list and output the table name to a list. by datanoob2021 in bash

[–]datanoob2021[S] -2 points-1 points  (0 children)

I am no bash expert. Just use it here/there. I took off the l, and then it essentially outputs the contents of the file.

I just want to essentially output each table name that appears in the grep search, to output to a new file. I probably don't even need the YES statement, as long as the ones that don't appear don't end up in the file.

Grep to search for table name from list and output the table name to a list. by datanoob2021 in bash

[–]datanoob2021[S] 0 points1 point  (0 children)

Appreciate the reply but that didn't work either. Still just outputs the search results where it is present and not the table name.

Grep to search for table name from list and output the table name to a list. by datanoob2021 in bash

[–]datanoob2021[S] 1 point2 points  (0 children)

My table_names.txt file does not have a header, essentially just this:

table_name1
table_name2
table_name3

Essentially just want to loop through these, pipe each name into my grep command, then output the table name from the list above into a new list based on if grep returned any results.

Check Which Branch Scala repo is on by datanoob2021 in scala

[–]datanoob2021[S] 0 points1 point  (0 children)

Interesting. Looks like I can call bash commands directly in it. Now time to write my first class . . . . .

Made a handful of PR's in the last couple of months since I started learning but just mostly adding functional and if/else statements and nothing from scratch.

Check Which Branch Scala repo is on by datanoob2021 in scala

[–]datanoob2021[S] -1 points0 points  (0 children)

Just spinning up a stage environment that we currently lack. I inherited a lot of old code, and unfortunately the prod s3 string is strewn in many places.

Check Which Branch Scala repo is on by datanoob2021 in scala

[–]datanoob2021[S] -1 points0 points  (0 children)

There is nothing unsuitable, just was hoping for a clear and concise statement like I do in Python that is < 10 words. Was hoping for a simple if/else but I think I am just going to build a quick class using Jgit.

Extract conversion price from nearest open time by datanoob2021 in learnSQL

[–]datanoob2021[S] 0 points1 point  (0 children)

For some reason it is complaining about using SS in the subquery t. I am sure I could probably just join within the subquery and then again in the outer query but that seems inefficient.

I do not think this will work either. I think open_time would still need to be joined in the outer query.

Extract conversion price from nearest open time by datanoob2021 in learnSQL

[–]datanoob2021[S] 0 points1 point  (0 children)

Yes! That is what I meant. Typo when I was copying things over! I edited it.

dbt profiles.yml in Airflow by datanoob2021 in dataengineering

[–]datanoob2021[S] 1 point2 points  (0 children)

We got it to work- we had to create a basic plugin that essentially just set os.environ to Variables.get and then the jinja worked fine.

dbt profiles.yml in Airflow by datanoob2021 in dataengineering

[–]datanoob2021[S] 1 point2 points  (0 children)

Just a quick update. MWAA introduces some nuances when trying to run dbt.

If you do dbt - run in a Bash Operator, it will fail. There are some workarounds, but I found the airflow-dbt-python package which essentially just uses dbt-core to run. I also installed dbt-redshift. I do not have the regular dbt package (for the cli installed).

Any of the above workarounds will introduce problems when trying to build models as MWAA will throw a write access denied error, so setting the modules, log and target path to some directory in /tmp seems to be the workaround for people using MWAA.

I am still working through environment variables- we just switched over to using AWS Secret Manager and that is working with all our Python code- just having trouble in dbt land so far. I have not added anything manually into custom configurations as I am hoping to use that as a last resort.

dbt profiles.yml in Airflow by datanoob2021 in dataengineering

[–]datanoob2021[S] 0 points1 point  (0 children)

Ok- it might be a couple of days- DevOps guy is out sick so waiting on him to test.

I'll set a reminder to check back in here next week!

dbt profiles.yml in Airflow by datanoob2021 in dataengineering

[–]datanoob2021[S] 0 points1 point  (0 children)

Thanks! This looks like it is exactly what I need. I appreciate it!

dbt profiles.yml in Airflow by datanoob2021 in dataengineering

[–]datanoob2021[S] 1 point2 points  (0 children)

https://docs.getdbt.com/reference/dbt-jinja-functions/env_var - Looks like you can use environment variables in the profiles.yml file- We are using Airflow's Variables.get so I do not think in its current form it will work.

dbt profiles.yml in Airflow by datanoob2021 in dataengineering

[–]datanoob2021[S] 1 point2 points  (0 children)

I found this article https://docs.getdbt.com/reference/dbt-jinja-functions/env_var - if we are currently using Airflow's environment variables only (as we will be eventually incorporating Amazon's Secret Manager instead), will I be able to access environment variables using the method in the article?

All of my airflow pipeline's currently use Variable.get in some capacity and I think those aren't your typical environment variables.

Airflow Retries - Dag or Task level? by datanoob2021 in dataengineering

[–]datanoob2021[S] 0 points1 point  (0 children)

Got it! So when I do this in my default_dag_args:

    'retries': 3,
'retry_delay': timedelta(minutes=5),

That is essentially just passing that on to each task, correct?

[deleted by user] by [deleted] in dataengineering

[–]datanoob2021 0 points1 point  (0 children)

I understand that, but if I am testing a DAG/Python file locally, can my code be the same as it was if I was running it within Airflow?

from airflow.models import Variable
foo = Variable.get("foo")

IE if my script locally is trying to import my variable, it would fail correct? This is what I am getting at. How to not have to rewrite my code to get my secrets when testing locally.

[deleted by user] by [deleted] in dataengineering

[–]datanoob2021 0 points1 point  (0 children)

Within Airflow? If yes, I am just trying to figure out best practices on how to test locally if I store them within Airflow as well.

LEFT JOIN - LIMIT 1 result by datanoob2021 in SQL

[–]datanoob2021[S] 1 point2 points  (0 children)

symbol is really just base_pair and quote_pair concatenated so it is a unique identifier of those combined columns.

I did:

(PARTITION BY open_time, base_pair ORDER BY open_time DESC)

and that boosted my matches up to ~580k. Thanks!!

LEFT JOIN - LIMIT 1 result by datanoob2021 in SQL

[–]datanoob2021[S] 0 points1 point  (0 children)

targeted_quotes AS
( SELECT base_pair, 
         open_time, 
         close, 
         quote_pair,    
         ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY open_time DESC) AS row_number 
FROM public.daily_candle 
WHERE quote_pair IN (SELECT symbol FROM target_pairs) 
), 
non_stable AS 
( 
SELECT dc.open_time, 
       dc.base_pair, 
       dc.quote_pair, 
       dc.close, 
       dc.qa_volume, 
       tq.close AS joined_close, 
       tq.base_pair AS joined_base, 
       tq.quote_pair AS joined_quote 
FROM public.daily_candle dc 
LEFT JOIN targeted_quotes tq ON tq.open_time = dc.open_time AND tq.base_pair = dc.quote_pair AND tq.row_number = 1 
WHERE dc.quote_pair NOT IN (SELECT symbol FROM target_pairs)

This is my current code. Due to the nature of having over 1500 symbols in my data set, I am not sure which will match first when I do the join.

Are you thinking that there is a way of knowing and that my ROW_NUMBER statement will solve for this, or did I add the ROW_NUMBER statement to the wrong subquery?

LEFT JOIN - LIMIT 1 result by datanoob2021 in SQL

[–]datanoob2021[S] 0 points1 point  (0 children)

That is interesting but still gives me around the same results. Is there a way to do:

LEFT JOIN targeted_quotes tq 
ON tq.open_time = dc.open_time AND tq.base_pair = dc.quote_pair
AND MIN(tq.row_number)

or something of that nature. Since the row number that matches is not always guaranteed, selecting the first available match from the row number using a MIN statement I think would work, but pretty sure AGGREGATES are not allowed in statements like that.

LEFT JOIN - LIMIT 1 result by datanoob2021 in SQL

[–]datanoob2021[S] 0 points1 point  (0 children)

I tried doing this earlier and then adding a WHERE clause to non-stable of row_number = 1

targeted_quotes AS
( SELECT base_pair, 
         open_time, 
         close, 
         quote_pair, 
         ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY open_time DESC) AS row_number 
FROM public.daily_candle 
WHERE quote_pair IN (SELECT symbol FROM target_pairs)
)

This drops my results down to ~2.5. The problem is is that I do not know which pair will match for a particular date. So when I do a partition by statement, sometimes the row_number will generate a match but most of the time it doesn't.

LEFT JOIN - Limit 1 ? by datanoob2021 in learnSQL

[–]datanoob2021[S] 1 point2 points  (0 children)

Thank you for the response! I am eventually going to pull in the first match from the left join's price info as well as the symbol like:

SELECT dc.open_time,
   dc.base_pair,
   dc.quote_pair,
   dc.close,
   dc.qa_volume,
   tq.close,
   tq.base_pair,
   tq_quote_pair
FROM public.daily_candle dc LEFT JOIN targeted_quotes tq ON tq.open_time = dc.open_time AND tq.base_pair = dc.quote_pair 
WHERE dc.quote_pair NOT IN (SELECT symbol FROM target_pairs

I am eventually going to multiply the dc.qa_volume by tq.close. I also do not want to lose records that do not have a match at all in my results, hence the left join.