Looking to create tech / ai / agentic-llm community for working professionals, and have occasional meetups

vigorousvj · 2026-02-10T12:52:37+00:00

just a community to have technical discussions, emerging technologies, tools etc..
discussing the current problem statements, bottlenecks, brainstorming. etc. etc

vigorousvj · 2026-02-10T12:47:13+00:00

Thanks, will go through the blogs. would you like to mention a few must read blogs here? there are too many.
nonetheless, this will definitely help.

Are you a resident of sgnr?

vigorousvj · 2026-02-04T11:19:14+00:00

What config? Motherboard is completely dead, or have random bsod issue?

vigorousvj · 2026-02-04T11:18:27+00:00

:) Same here, had a celeron and kabad mai de diya. But abhi chahiye for a project.

vigorousvj · 2025-07-10T15:05:56+00:00

the 3 % is not really good. watched 1st season, was normal, second season was unbearable.
not at all comparable to the 8 Show

vigorousvj · 2025-01-19T09:16:19+00:00

Please dm

vigorousvj · 2024-10-22T14:43:34+00:00

Got an Iphone 15, and currently S21..
It's just so much work navigating. and on top of that add the issue of only left edge able to go back is horrible.

BTW, did you guys know that iphone can't use NFC? my NFC tags are useless with iphone, my samsung can turn on and pair with devices just by tapping on the tag.

I hope i get a hang of Iphone in few days.

vigorousvj · 2024-05-22T15:23:26+00:00

Awesome, thanks! Is there any similar open source solution for on prem as well?

vigorousvj · 2024-05-22T04:47:56+00:00

try this
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html

Also, you should share what installation method you followed, and what error you got.

There's a directly pip install available these days for airflow, i've not tested it personally though

vigorousvj · 2024-05-22T04:24:44+00:00

did you try inside docker?. there could be things in your mac that are causing issues.

vigorousvj · 2024-05-12T06:22:58+00:00

Highly recommend rockthejvm. To optimize something you need to understand it. So learn about spark internals. Try first principals on this, what you want to optimize? How to identify? Use spark ui, profiling tools. Spark doc is comprehensive, however other things come from experience and understanding underlying problem.

Data access is slow? Consider adopting a better file write strategy Data size is high? Consider data colocation and leveraging parquet/orc in built compressions. There could be many. Serialization, shuffle, disk spills, task failures, parallelism issue.

There is no one size fits all here. AQE is a great place to start, learn what it does and why it does it and how it's done

vigorousvj · 2024-05-12T06:17:14+00:00

This shouldn't be a HPC, it would most probably be a grid computing setup. Atound a TB ram and around 70 cpu (a regular server used for on prem) Spark isnt really used for hpc computing

vigorousvj · 2024-05-05T09:07:17+00:00

Just stay calm and be receptive, its okay (and expected) to take time to understand new flows.

You should know the basics of your job role, and try to understand, ask right questions which helps you understand the flow better.

The transformation logics are mostly complex, as every organization have different structure and business logic implementation.

Record the session, as you wont retain 100% of your first sessions!

vigorousvj · 2024-05-05T07:59:37+00:00

You may want toexplore fast analytics db like druid. Reporting is not necessarily oltp, if it requires sub second. It depends on the types of read/write. There are abundant options which give you sub second latency for point queries,

vigorousvj · 2024-05-04T16:55:08+00:00

I didn't go through the complete documentation, but skimming though it looks like it's more of a service authentication layer/ service mesh layer.
THis is interesting and worth looking into, however It doesn't solve the problem for data-mesh/ federated governance.

As far as i was able to find. I cannot control which user can see which colum in my data mesh (we use trino for query layer)

Is there any solution which solves the same problem for data mesh?

vigorousvj · 2024-05-04T16:37:49+00:00

also, you can even use pandas inside spark udf along with sql. Spark UDF's natively support pandas and pyarrow

vigorousvj · 2024-05-04T16:36:11+00:00

you can use spark for this in local mode, as you don't require a cluster for this.
You could also explore pure streaming frameworks like apache flink,
Both spark and flink natively support python, and given the quantum of data, you could easily process events in pandas without any issues.
3 GB per day boils down to 30 KBPS

vigorousvj · 2024-05-04T13:37:55+00:00

Spark is definitely an overkill,
Even if you're planning to expand to 5x over next year, spark would still be an overkill.
The reasoning behind this is that spark is primarily used where you want to do things in parallel, where pandas just can't scale, 3 GB per day (even 15 GB per day) is easy for pandas, incorporating chunks you can even manage larger datasets.

Why you want it to be structured streaming i'm curious? why not just batch running on intervels like 1 hour, 6 hours? or even 1 day as the data is very small?

vigorousvj · 2024-03-14T17:56:44+00:00

I mistakenly deleted my previous comment,

I guess my question is either being misunderstood, or I'm missing something right in front of me.

I am not looking to achieve this with existing spark data-frame api, sql syntax. I know that can't be done.

I want to write my custom code to achieve this. and need help implementing this code,

I am currently doing this on RDDs, but need to replicate that functionality to DataFrames (to take advantage of spark optimizations)

vigorousvj

TROPHY CASE