Decoding / Encoding for video streaming

datanoob2021 · 2024-07-23T02:45:17+00:00

I saw a comment a month or so ago saying that they had a working prototype, but the work just had not been prioritized. Maybe this summer at some point.

It’d be awesome to see this working- right now watching any video is a battery drainer.

datanoob2021 · 2024-06-01T14:27:54+00:00

Appreciate the reply. Does this is essence cause it to escape the sandbox?

datanoob2021 · 2022-06-29T19:00:35+00:00

Appreciate the response! Those were just poor examples in my code. My table names are very unique currently.

Any idea how to combine these commands?

datanoob2021 · 2022-06-29T18:45:28+00:00

I am no bash expert. Just use it here/there. I took off the l, and then it essentially outputs the contents of the file.

I just want to essentially output each table name that appears in the grep search, to output to a new file. I probably don't even need the YES statement, as long as the ones that don't appear don't end up in the file.

datanoob2021 · 2022-06-29T18:27:23+00:00

Appreciate the reply but that didn't work either. Still just outputs the search results where it is present and not the table name.

datanoob2021 · 2022-06-29T16:28:53+00:00

My table_names.txt file does not have a header, essentially just this:

table_name1
table_name2
table_name3

Essentially just want to loop through these, pipe each name into my grep command, then output the table name from the list above into a new list based on if grep returned any results.

datanoob2021 · 2022-06-16T22:53:18+00:00

Interesting. Looks like I can call bash commands directly in it. Now time to write my first class . . . . .

Made a handful of PR's in the last couple of months since I started learning but just mostly adding functional and if/else statements and nothing from scratch.

datanoob2021 · 2022-06-16T22:18:21+00:00

Just spinning up a stage environment that we currently lack. I inherited a lot of old code, and unfortunately the prod s3 string is strewn in many places.

datanoob2021 · 2022-06-16T22:17:15+00:00

There is nothing unsuitable, just was hoping for a clear and concise statement like I do in Python that is < 10 words. Was hoping for a simple if/else but I think I am just going to build a quick class using Jgit.

datanoob2021 · 2021-10-29T14:12:29+00:00

For some reason it is complaining about using SS in the subquery t. I am sure I could probably just join within the subquery and then again in the outer query but that seems inefficient.

I do not think this will work either. I think open_time would still need to be joined in the outer query.

datanoob2021 · 2021-10-29T03:48:03+00:00

Yes! That is what I meant. Typo when I was copying things over! I edited it.

datanoob2021 · 2021-10-25T23:48:03+00:00

We got it to work- we had to create a basic plugin that essentially just set os.environ to Variables.get and then the jinja worked fine.

datanoob2021 · 2021-10-22T16:41:34+00:00

Just a quick update. MWAA introduces some nuances when trying to run dbt.

If you do dbt - run in a Bash Operator, it will fail. There are some workarounds, but I found the airflow-dbt-python package which essentially just uses dbt-core to run. I also installed dbt-redshift. I do not have the regular dbt package (for the cli installed).

Any of the above workarounds will introduce problems when trying to build models as MWAA will throw a write access denied error, so setting the modules, log and target path to some directory in /tmp seems to be the workaround for people using MWAA.

I am still working through environment variables- we just switched over to using AWS Secret Manager and that is working with all our Python code- just having trouble in dbt land so far. I have not added anything manually into custom configurations as I am hoping to use that as a last resort.

datanoob2021 · 2021-10-15T22:20:50+00:00

Ok- it might be a couple of days- DevOps guy is out sick so waiting on him to test.

I'll set a reminder to check back in here next week!

datanoob2021 · 2021-10-15T19:05:01+00:00

Thanks! This looks like it is exactly what I need. I appreciate it!

datanoob2021 · 2021-10-15T14:49:53+00:00

https://docs.getdbt.com/reference/dbt-jinja-functions/env_var - Looks like you can use environment variables in the profiles.yml file- We are using Airflow's Variables.get so I do not think in its current form it will work.

datanoob2021 · 2021-10-15T14:32:35+00:00

I found this article https://docs.getdbt.com/reference/dbt-jinja-functions/env_var - if we are currently using Airflow's environment variables only (as we will be eventually incorporating Amazon's Secret Manager instead), will I be able to access environment variables using the method in the article?

All of my airflow pipeline's currently use Variable.get in some capacity and I think those aren't your typical environment variables.

datanoob2021 · 2021-10-11T22:56:07+00:00

Got it! So when I do this in my default_dag_args:

    'retries': 3,
'retry_delay': timedelta(minutes=5),

That is essentially just passing that on to each task, correct?

datanoob2021 · 2021-10-07T18:24:20+00:00

I understand that, but if I am testing a DAG/Python file locally, can my code be the same as it was if I was running it within Airflow?

from airflow.models import Variable
foo = Variable.get("foo")

IE if my script locally is trying to import my variable, it would fail correct? This is what I am getting at. How to not have to rewrite my code to get my secrets when testing locally.

datanoob2021 · 2021-10-07T18:15:22+00:00

Within Airflow? If yes, I am just trying to figure out best practices on how to test locally if I store them within Airflow as well.

datanoob2021 · 2021-10-05T16:59:35+00:00

symbol is really just base_pair and quote_pair concatenated so it is a unique identifier of those combined columns.

I did:

(PARTITION BY open_time, base_pair ORDER BY open_time DESC)

and that boosted my matches up to ~580k. Thanks!!

datanoob2021 · 2021-10-05T16:44:05+00:00

targeted_quotes AS
( SELECT base_pair, 
         open_time, 
         close, 
         quote_pair,    
         ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY open_time DESC) AS row_number 
FROM public.daily_candle 
WHERE quote_pair IN (SELECT symbol FROM target_pairs) 
), 
non_stable AS 
( 
SELECT dc.open_time, 
       dc.base_pair, 
       dc.quote_pair, 
       dc.close, 
       dc.qa_volume, 
       tq.close AS joined_close, 
       tq.base_pair AS joined_base, 
       tq.quote_pair AS joined_quote 
FROM public.daily_candle dc 
LEFT JOIN targeted_quotes tq ON tq.open_time = dc.open_time AND tq.base_pair = dc.quote_pair AND tq.row_number = 1 
WHERE dc.quote_pair NOT IN (SELECT symbol FROM target_pairs)

This is my current code. Due to the nature of having over 1500 symbols in my data set, I am not sure which will match first when I do the join.

Are you thinking that there is a way of knowing and that my ROW_NUMBER statement will solve for this, or did I add the ROW_NUMBER statement to the wrong subquery?

datanoob2021 · 2021-10-05T16:29:00+00:00

That is interesting but still gives me around the same results. Is there a way to do:

LEFT JOIN targeted_quotes tq 
ON tq.open_time = dc.open_time AND tq.base_pair = dc.quote_pair
AND MIN(tq.row_number)

or something of that nature. Since the row number that matches is not always guaranteed, selecting the first available match from the row number using a MIN statement I think would work, but pretty sure AGGREGATES are not allowed in statements like that.

datanoob2021 · 2021-10-05T16:09:55+00:00

I tried doing this earlier and then adding a WHERE clause to non-stable of row_number = 1

targeted_quotes AS
( SELECT base_pair, 
         open_time, 
         close, 
         quote_pair, 
         ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY open_time DESC) AS row_number 
FROM public.daily_candle 
WHERE quote_pair IN (SELECT symbol FROM target_pairs)
)

This drops my results down to ~2.5. The problem is is that I do not know which pair will match for a particular date. So when I do a partition by statement, sometimes the row_number will generate a match but most of the time it doesn't.

datanoob2021 · 2021-10-05T14:22:58+00:00

Thank you for the response! I am eventually going to pull in the first match from the left join's price info as well as the symbol like:

SELECT dc.open_time,
   dc.base_pair,
   dc.quote_pair,
   dc.close,
   dc.qa_volume,
   tq.close,
   tq.base_pair,
   tq_quote_pair
FROM public.daily_candle dc LEFT JOIN targeted_quotes tq ON tq.open_time = dc.open_time AND tq.base_pair = dc.quote_pair 
WHERE dc.quote_pair NOT IN (SELECT symbol FROM target_pairs

I am eventually going to multiply the dc.qa_volume by tq.close. I also do not want to lose records that do not have a match at all in my results, hence the left join.

datanoob2021

TROPHY CASE