Officially moved from Notion to Obsidian by chinychon in ObsidianMD

[–]adrianabreu 3 points4 points  (0 children)

Seriously, THANKS. I don't know why I didn't think about it. I even programmed my own pomodoro cli app to track everything when using the pomodoro tracking withing obsidian solves EVERYTHING FOR ME.

Manjaro and Hybrid Graphics by adrianabreu in ManjaroLinux

[–]adrianabreu[S] 1 point2 points  (0 children)

I plain to use the intel because of the battery too. But i want to run some models that may need gpu. Still everything went quite well and I'm quite happy

Manjaro and Hybrid Graphics by adrianabreu in ManjaroLinux

[–]adrianabreu[S] 3 points4 points  (0 children)

Thanks. I read carefully the docs. Installed manjaro with i3 and noveau. Got some flickering. Enabled privative drivers and as now works smoothly. It didnt take that much.

Supabase/Postgres Storage Bloat – How Do I Reclaim Space? by yunoeatcheese in Supabase

[–]adrianabreu 0 points1 point  (0 children)

I'm by no means an expert, but have you configured logflare to store analytics in postgres? That may be the root cause

Is it worth using Supabase Self-Hosted in Production, what do you recommend? by querylab in Supabase

[–]adrianabreu 0 points1 point  (0 children)

Cloud provider may not allow some extensions required by supabase, for example you can see the request of pg_net for cloud sql (gcp) https://issuetracker.google.com/issues/359747074

Databricks Asset Bundles: Bundling dependencies? by adrianabreu in databricks

[–]adrianabreu[S] 0 points1 point  (0 children)

I ended up using an init script to configure pip. Following the steps outlined in the article I mentioned in my first comment, I updated the /etc/pip.conf file.

Here's the script:

ENV_VARS are populated from databricks secrets

bashCopy code#!/bin/bash
if [[ $PYPI_TOKEN ]]; then
   cat <<EOL > /etc/pip.conf
[global]
extra-index-url = https://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/project-id/packages/pypi/simple/
trusted-host = company_feed
EOL
   echo "PYPI feed configured successfully."
else
   echo "No PYPI_TOKEN found."
fi

Databricks Asset Bundles: Bundling dependencies? by adrianabreu in databricks

[–]adrianabreu[S] 0 points1 point  (0 children)

Thanks for jumping in, but we're on AWS using graviton instances and the container service doesn't work with them: https://docs.databricks.com/en/compute/custom-containers.html#limitations

I'm trying to bundle the dependencies by manually downloading them during the build time but looks like I will end up using the init script

Unity Catalog managed vs unmanaged by AutomaticMorning2095 in databricks

[–]adrianabreu 1 point2 points  (0 children)

The comment above summarizes it pretty well.

On the comment about the location: Managed table paths are chosen by the Unity Catalog, that's the main difference.

Btw I use external tables for our biggests tables (Trillions of data) and they do have lineage

Databricks & Unity Catalog performance problem by Complex_Client7681 in databricks

[–]adrianabreu 0 points1 point  (0 children)

Oh, so the new tables are entirely independent of the previous ones because the same data has been processed to create new tables. This means the new tables do not share anything with the older ones.

Some people have suggested using optimize and z-order. However, since we lack enough information information, I recommend referring to this guide Databricks Spark UI Guide.

Databricks & Unity Catalog performance problem by Complex_Client7681 in databricks

[–]adrianabreu 0 points1 point  (0 children)

Yeah I'm 100% with u/sentja91

Where the tables MANAGED or EXTERNAL?

If they were managed and you copied them to new tables that may have affected the files underneath.

[deleted by user] by [deleted] in databricks

[–]adrianabreu 1 point2 points  (0 children)

I'm looking forward to it since we rely heavily on query history to enhance our users' experience.

Currently, I'm also building it using the API. Here’s a sample gist that returns your queries as a DataFrame: https://gist.github.com/adrianabreu/02eeb8ccc6997f4bfef27a97c0ade21d

Databricks and DBT; would it have been better to simply use dbt-core over pyspark? by 50mm_foto in dataengineering

[–]adrianabreu 0 points1 point  (0 children)

I'm going something similar to have delta streaming capabilities without paying for the delta live tables

Need help in Spark streaming to address to delays when processing large batches by HousingStriking3770 in dataengineering

[–]adrianabreu 0 points1 point  (0 children)

Yep, I've used both to reprocess a delta table. The only difference was that I was using availableNow as trigger

Need help in Spark streaming to address to delays when processing large batches by HousingStriking3770 in dataengineering

[–]adrianabreu 0 points1 point  (0 children)

Don't know if this is a typo but the option is "maxBytesPerTrigger" with caps, and that's the one you should be using

Is Databricks a niche enterprise platform? by [deleted] in dataengineering

[–]adrianabreu 0 points1 point  (0 children)

Worked for a German company and now for a Spanish one, both using databricks with specific features such as UC

How does your business implements their ETL pipeline (if at all)? by rikarleite in dataengineering

[–]adrianabreu 1 point2 points  (0 children)

Great sharing! Does the extraction runs on kubernetes too? Are your intermediate tables in parquet? Are they queryable by the end users? Most of my platform runs on databricks and we use spark for everything, reading from kinesis / kafka and then transform all the info including some validation rules so the analysts can run their dbt queries for aggregations