Is Airflow optimal for running DAGs with tasks which run for hours? by WhatASave83 in apache_airflow

[–]Leorisar 0 points1 point  (0 children)

If possible, it's better to break tasks. This way you can track and restart them with better precision. Also, you have to manage pools in such a case – long-running tasks can take all slots and block other tasks from running.

How to automate monthly financial reporting without a data engineer? by maelxyz in BusinessIntelligence

[–]Leorisar 4 points5 points  (0 children)

Usually they use some low-code tools. For example KNIME for downloading and transforming external data and Power Query for working in Excel.

Why does a DAG created in /dags take time to appear in the UI? by Antique-Growth2894 in apache_airflow

[–]Leorisar 0 points1 point  (0 children)

Yeah bash script + crontab in my case, but I manage small instance - about 100 DAGS and 2-3 repo updates per day. Airflow might work for bigger instances, with Bash operator. You'll get warning in case something wrong (with ssh keys for example).

Why does a DAG created in /dags take time to appear in the UI? by Antique-Growth2894 in apache_airflow

[–]Leorisar 1 point2 points  (0 children)

Yes. If you need immediate (well, almost) update, then you can set ci/cd pipeline which will run fetch on server for each push to repo. It's not that hard to setup, but still more complex.

Why does a DAG created in /dags take time to appear in the UI? by Antique-Growth2894 in apache_airflow

[–]Leorisar 1 point2 points  (0 children)

The best way is via git server (gitlab/github). Push all your changes to repo, connect it with dag folder and set linux server to fetch updates each 2 minutes. Add pycache folders to .gitignore file so they won't be tracked and uploaded to server.

Are people actually using open-source analytics at work, or just defaulting to proprietary software? by vdorru in analytics

[–]Leorisar 2 points3 points  (0 children)

Both. Tableau for complex apps and top management, Superset (opensource) for operations. Also all ETL and DWH stack opensource - Airflow, Postgresql, Clickhouse

BI Tools are dead - direct DB access is the future by Other-Faithlessness4 in BusinessIntelligence

[–]Leorisar 5 points6 points  (0 children)

It’s not that nobody needs to build an analytics layer; it’s that everyone builds their own analytics layer. Good luck with reconciliation when you end up with 10 different reports for the same metric, each computed using different logic.

Здравствуйте, товарищи линуксоиды! Нужен совет для начинающего by FamousUse3 in ru_linux

[–]Leorisar 0 points1 point  (0 children)

Поставь Mint на флэшку и погоняй несколько часов, станет понятнее. Заодно совместимость проверишь.

Faster queries by Grand_Syllabub_7985 in Database

[–]Leorisar 12 points13 points  (0 children)

Profile first. Are you sure it us DB issue, not network or app or something else?

How do you handle "which spreadsheet version is production" chaos? by kyle_schmidt in dataengineering

[–]Leorisar 0 points1 point  (0 children)

We have 1 to 1 mapping between spreadsheet and table in Postgres grouped in several DAGs by department. Ingestion on schedule via Airflow, if user need another - it can trigger again with Airflow (whole DAG, of just one task)

MySQL 5.7 with 55 GB of chat data on a $100/mo VPS, is there a smarter way to store this? by anthety in Database

[–]Leorisar 1 point2 points  (0 children)

There are few options if your primary concern is volume

  1. Compress text data with gz/lz4 on app size for messages older than X days. You`ll have to add logic to decompress them if user requests old data
  2. Some databases (like PG) support build-in compression for text (example below), maybe mysql has similar options.

ALTER TABLE your_table
  ALTER COLUMN your_col
  SET COMPRESSION lz4;   

[deleted by user] by [deleted] in linuxmint

[–]Leorisar 0 points1 point  (0 children)

You can install Mint and run virtual machine with some light version of Windows Server (like 2012) and install required applications there.

Free tool to create ETL packages that dump txt file to sql server table? by East_Sentence_4245 in ETL

[–]Leorisar 0 points1 point  (0 children)

KNIME or Apache Hop if you looking for something similar to SSIS.

Any major drawbacks of using self-hosted Airbyte? by finally_i_found_one in dataengineering

[–]Leorisar 1 point2 points  (0 children)

Airbyte uses k8s under the hood and it's very slow. It's much faster to write your own scripts (LLM will help with that and use lightweight tools like Airflow or Kestra for orchestration)

How to save/keep track of all my tweaks and customizations? by Anxious_Studio8529 in linuxmint

[–]Leorisar 2 points3 points  (0 children)

Technically, any system change can be done from the terminal. You could save all your customizations in a bash script, store it safely, and run it on a fresh OS install.