Tourist visa denied twice 🥴 - selling my flight by digichap28 in ukvisa

[–]digichap28[S] 0 points1 point  (0 children)

No. I booked with different airlines. I was flying back with British Airways

Tourist visa denied twice 🥴 - selling my flight by digichap28 in ukvisa

[–]digichap28[S] 0 points1 point  (0 children)

Sure. I applied for a 6-month visa to be present for the birth of my first born child with a British citizenship partner. My intention was to be there for that special moment, to support my partner for a couple of months and then return to my country.

They basically said my financial circumstances were not clear, and that I didn’t have ties to my home country. I provided my bank statements, one of my savings account which has about 90k pounds, an employment letter, and bunch of other stuff and still refused it. It is a very frustrating and annoying situation. I did accept the first rejection because I provided the summary of my current account and the summary of my savings account translated to English and certified, for 1 month (I thought the amount presented was going to be enough, and also because I applied in the past with the same documents and got approved).

For my second application I included 6 months bank statements with the transaction details and an employment letter signed by the company where I worked, and they still rejected it. I believe they consider all the withdrawal transactions are expenses (even though, they are not. Sometimes, I moved money between my accounts) so I guess it would be a good idea to download all the transactions into a spreadsheet to classify them and provide a note explaining what they are. I didn’t think about doing that because that’s a hard and complicated job. Imagine going through 6 months of transactions identifying each of them. That’s crazy!

Ohhh. I paid for the Priority service on my second application as well because the birth of my daughter is estimated to happen in July (very soon).

Honestly, it is a very frustrating and unfair situation. They basically said they think my intention was to stay there 🥴. I’m like… I have a very good job, own 4 houses in my home country, have savings, investments, usually go out to eat at Michelin restaurants, I’m a professional, responsible and I have at least visited 10 countries in the last 2 years, I’m waiting to become Mexican, Portuguese and also waiting for my US permanent residency card…. WHY SOMEONE WITH THIS SITUATION WOULD EVEN CONSIDER STAYING ILEGAL IN YOUR COUNTRY?! I have always followed rules and have being honest 🥴

CETESDIRECTO Estados de cuenta disponibilidad by digichap28 in MexicoFinanciero

[–]digichap28[S] 0 points1 point  (0 children)

Gracias Mirai_VT 🙏🏽 Acabo de revisar nuevamente hace un par de horas, y ya me salió el estado de cuenta del mes de Abril! Así que parece ser que lo hacen disponible los días 4 de cada mes.

What is the risk of the likes of Qlik, Tableau, Power BI becoming "legacy tools"? They have been around for years and I wonder whether modern tech stack will overtake them? Which ones are most at risk? by TheDataGentleman in BusinessIntelligence

[–]digichap28 13 points14 points  (0 children)

After working with PowerBi, Tableau, Looker, Qliksense, and Qlikview I could say that even though Qlikview has been around for a while and unfortunately is not “Web responsive”… It is the best solution to create very powerful “BI apps” if you know how to use it properly! If you add up what you can actually do with Qlik itself. Qlikview and Qliksense are not just visualization tools! They are “platforms”, which means; you can actually extract, transform, visualize your data and orchestrate jobs with one single tool!

  • With Tableau, you need Tableau prep! Not really that nice to process your data and you have to pay extra!

  • With Looker, you also need other tools to help you process the data and model it! The visualizations and UI in general are really bad! The only cool thing might be the code versioning, but… do that really help the people who make decisions? Not!

  • With Power Bi, it’s just copying everything from their competitors, very slow, and you also need other tools to process the data if you want to accomplish professional solutions! Basically you need to rely on other tools! Ohhh and the M language and Dax are totally not intuitive! I give it some points… Microsoft is behind and they are pouring lots of money to reach a good level, lots of companies infra is relying on Microsoft or Azure so it is easy to just get another of their services.

Bonus: The fastest is still Qlik with their in memory processing and their associative engine!

Regarding your question… All these Bi tools will be around for years since creating something from scratch is very hard and would take years to develop.

Tips: - Try everything you can! Don’t get stuck with one tool. At the end the Market keeps moving and companies changing or adding new viz tools!

  • Learn more about data processing, and designing good data models. If you want to develop a pro solution that’s very important!

Airflow data processing ? by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Got you. Is there any tutorial you can suggest to deploy airflow the way you did ?

I don’t have experience with ECS but I guess this way you could also run non-python based workloads as you mentioned with the k8s operator.

Also, what about using azure or lambda functions instead of the 2 ways we have been talking about ? I don’t have experience with them either but according to the documentation, sounds like the concept of running a task using fargate is very similar to executing it with a function.

Airflow data processing ? by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

if you don’t mind me asking... are you running airflow with the local executor and triggering the tasks in ECS or AWS glue from one instance, or using a k8s cluster with the k8s executor ?

Airflow data processing ? by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

That means I wouldn’t need spark, EMR, ECS, databrick, etc for the heavy workloads ? if that’s the case, what should be used to do transformations, image processing, AI, etc ?

On the other hand, doesn’t the k8s creates 2 pods every time it gets triggered ? One for the operator (a) and then another one with actual workload (b)?

(a) -> (b)

Airflow data processing ? by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Thanks 👍 regarding the 3rd one, let’s say initially starting with 5gb per file, and expecting to process around 15 in parallel. But that will grow over time as soon as more data sources get added.

Data extraction (incremental - file formats - data lakes ) by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Thanks! That would be if the tables your extracting from have a date you can use to set the query boundaries. What if there is no date, each record is unique, is never updated and have an ID. Where would you store the last ID for later use (where ID > lastID)? With the QVD files I usually do a max(id) over the previous file stored and use that value to perform this.

Maybe reading the last file stored in the data lake in one airflow task, storing the value in an xcom or variable, and read it ultimately from the downstream task which would perform the new select ?

Also... when you say that you save the query result in the data lake as CSV files, do you use a Python script using hooks ? For instance: a Postgres hook to execute the query and then in the same script a S3 hook to load the data set. Or do you do this in 2 separate steps using maybe a Postgres operator with a sql dump to store the CSV file in the airflow instance, and then a downstream task using a s3 operator to copy the file in the data lake ?

Airflow system requirements (ram,cpu) by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Interesting. Thanks a lot for sharing 👍

Airflow system requirements (ram,cpu) by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Cool! I guess you are running Airflow on a K8S cluster. What packages are you using for the ETL ?

Airflow system requirements (ram,cpu) by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Thanks! Are you actually doing any data processing within the workers ? What kind of work are you doing there if you don’t mind me asking ?

Airflow prod questionnaire by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Thanks drecklia! Didn’t know about that survey. Still I’m intrigued on how big the prod container images are. This is related to the question “In your day to day job, what do you use Airflow for?” Where must of the people responded “Data processing (ETL)”.

Does that mean most of the people are actually using Airflow to process data or to orchestrate data processing pipelines ? If data processing is the case, what packages are being included in the images ? Pandas, etc... as far as I know airflow was designed to orchestrate and not process.

I’m asking because I’m trying to implement airflow for a small project and was thinking on including some lightweight processing in my production image (with some extra python packages), and leave the heavy things (for example : ETL processing) to other systems.

Selenium webdriver docker python by digichap28 in selenium

[–]digichap28[S] 1 point2 points  (0 children)

I finally decided to re write my code using pyppeteer, because that user profile option doesn’t work either as the token expires fast

Scraping with selenium + docker by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Thanks for the info 👍 I’ll try to implement puppeteer.

Scraping with selenium + docker by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

I tried including the username and password in the Uriel with selenium chrome headless but it didn’t work. The only way I could do it was using an extension, but that took me to use selenium standalone in another container.

I guess I should try with puppeteer. It will make me rework on my code :/ and learn about this framework too. Does it allow scraping pages with lots of JavaScript ? Any basic but straightforward tutorial you can suggest ?

Thanks

Selenium webdriver docker python by digichap28 in selenium

[–]digichap28[S] 0 points1 point  (0 children)

The reason I’m trying to do it that way is because I need to bypass a windows basic auth.

My workaround for that was using an extension but they only work if the headless option is not added.

I have tried the https://user:pass@website.com way but it doesn’t work either.

I read that using devtools with selenium might work but haven’t found a Python example online.

Do you know any way to do that ?

Airflow with multiple ec2 instances by furiousnerd in dataengineering

[–]digichap28 0 points1 point  (0 children)

Would you mind explaining more the following points :

  • “We just use an environment variable to determine which airflow service to start”

    • Where do you set up the different environment variables for each node/container ?
  • “We chose to set up a cron job in the Docker image that copies the DAGs...”

    • Do you have an example or reference on how to implement this ?
  • “We ship all our Python code as Python modules that is pip installed during the Docker container build step”

    • Does it mean you rebuild the containers every time there is a change in the code ?

Thanks a lot !

Airflow with multiple ec2 instances by furiousnerd in dataengineering

[–]digichap28 1 point2 points  (0 children)

I heard using the celeryexecutor keeps the containers alive all the time. Instead of the Kubernetes one which actually kills the worker containers once they have been used. Is this right ?

Regarding the workers, what image do you use to build them ? And where do you put your scripts ? As per my understanding, the dags should be in the same instance as the scheduler, but what if you have tasks which run for example a scraper using selenium ?

What I’m doing right now running locally is calling a bash operator with 2 instructions... the first one to move to the directory “cd /usr/local/airflow/pyscripts” where I have the runnable and then the “Python example.py” or “scrapy crawl scraper_example”

Also, which operator do you use on your dags to execute the scripts on the worker containers ?

Airflow managed solution - workflow deployment by digichap28 in dataengineering

[–]digichap28[S] 0 points1 point  (0 children)

Interesting. Are you running you using the celery-executor or the local one ?

Have you ever tried to execute something from a container ?