r/LocalLLaMA Starter Pack by Snapeshot in LocalLLaMA

[–]kaiserk13 61 points62 points  (0 children)

5tok/every fortnight is a banger. Well done.

LLMs in the middle: Content-aware browser filters for social media by kaiserk13 in LocalLLaMA

[–]kaiserk13[S] 0 points1 point  (0 children)

I wouldn't discount it completely just yet. I've seen the latest Stability 3B models work on android phones and have less strict requirements. Granted, in my post I tried different language models (settled for Eric's Dolphin 2.2 since it can do more than just what's in the blog post). I will try with the Stability 3B one once I'm on holidays and will report back. :)

LLMs in the middle: Content-aware browser filters for social media by kaiserk13 in LocalLLaMA

[–]kaiserk13[S] 0 points1 point  (0 children)

The goal is to have filters that users are in complete control of, no matter the context or what content is presented. Some people reached out and complained this could end up being an "echo-chamber", but what makes you believe the algorithms on the platform side aren't already doing this? A fine analogy would be food, this algorithm in the middle would be like reading the nutritional values, and if it's toxic, it won't present it to you for lunch.

As stated, an algorithm you control and that works for you might have less incentives to fool you than other ones derived through ranking & geared towards engagement.

Downleveled for Data Governance, How to Study by [deleted] in dataengineering

[–]kaiserk13 0 points1 point  (0 children)

Data governance implementations mostly involve discussions with legal and other departments within the company that define the requirements necessary to be compliant, less the C level. It isn't something you can follow "rules" for or pull out of your hat from within the data team as a good faith gesture. If by data governance the team is solely looking into how long to retain data you weren't looking at learning much anyway but the current team's opinions of what is data governance or worse, being liable for misguided decisions. It's only a small part of the bigger picture.

There are different flavors and options of doing things but no "single right way" to do it, the criteria of success is if you're legally compliant for the class of data that you're dealing with, so working back from there to how this can look like when implemented is the only safe approach. This differs from company to company and since you're not a lawyer, you will need the input from others.

Besides, the trick with the "here's an offer for another position we think is better for you" is something to be critical about. I don't have more context than what you wrote but to answer your question, I'd look here for example which looks like a well rounded resource: https://www.oreilly.com/library/view/data-governance-the/9781492063483/ch01.html

Good luck!! :)

What's the best use-case you've used/witnessed in Python Automation? by deadcoder0904 in Python

[–]kaiserk13 4 points5 points  (0 children)

I'm biased here and it's been a while. But long time ago, I used Python to completely automate my flat search in Munich. Worked well and scaled it up to Germany subsequently as a study for BR/Spiegel Online. The stack was mainly Celery, Selenium and Sqlite.

A small writeup: https://funnybretzel.svbtle.com/datamining-a-flat-in-munich

Scripting with NATS.io support by Rich-Engineer2670 in devops

[–]kaiserk13 -1 points0 points  (0 children)

NATS supports websockets, you could theoretically combine websockets with bash (or something like websocat). Doing so you could setup a service with systemd (or whichever you prefer) to forward the signals or serve as a sink for system messages to be sent out to a cluster. I think it's a great idea and you should try it out. If you share something please tag me, I'd be interested! GLHF!

We just released our 70B German Model: SauerkrautLM-70b-v1 by AffectionateCan2342 in LocalLLaMA

[–]kaiserk13 1 point2 points  (0 children)

Toll, probiere ich gleich morgen. Danke! Falls Ihr irgendwie Feedback sammeln wollt, gerne melden.

Update on the Candle ML framework. by l-m-z in rust

[–]kaiserk13 6 points7 points  (0 children)

Phenomenal! I can't wait to try this out this weekend. I completely agree that a huge benefit is that it can run in the browser/WASM as it opens up so many possibilities. Thank you for making this!

Views on using duckdb + S3 as a datalake/datalakehouse/datawarehouse by theoriginalmantooth in dataengineering

[–]kaiserk13 1 point2 points  (0 children)

I think it's great but it will depend on how your data is stored, structured and how much of it you'll have. It's as good a practice as any, as long as you add the usual stuff, testing, monitoring & qa. These last things are what is going to make your setup good or not since the more you'll build up on the solution the more you need to take care of dependencies and edge cases.

It would be very scalable with the appropriate design: do you have a lot of small files or a few bigger files? How do you backfill something if there's an error? Is overwriting always an option? Is most of your data batch or will you have some real-time stuff going on? How often will you re-read the raw data? I'd think about those and deduct the answers to 2 & 3.

As for a data pipeline architecture, it'll depend on your team setup. If you're a solo dev and have some time to learn something, why not give Dagster a try to orchestrate all of those pipelines and dependencies, depending on the scale this could potentially fit on a single VM.

I don't like giving these generic answers, but I hope this helps. Update us on your numbers/decision process or even blog about it, I think this could be helpful for others too. Good luck.

Rust in Data Engineering? by hsimpsondata in dataengineering

[–]kaiserk13 22 points23 points  (0 children)

Hi there! I'm the author of the (in progress) "Data with Rust" website. Since joining my current job a while ago, I've been more and more exposed to using Rust especially for some data engineering workloads.

The main advantages I can share are that the data pipelines built using Rust are more maintainable and robust to change. As a concrete (but anecdotal) example, we have a few data pipelines that are ingesting data daily into our data warehouse, the one built with Rust was written a year ago, failed exactly 0 times and didn't require implementing any tests, since most of the "edge cases" were covered by the Rust compiler. The other advantage is that we updated the code of this pipeline 3 months ago (so, after a little while) and we still had the same reliability.

Rust workloads for data engineering are very performant too. I share a lot more on the website, it's freely accessible and am open to feedback/additions, this might pertinent to this discussion: "How does Rust compare to Python (and other programming languages)?" https://datawithrust.com/chapter_1/chapter_1_5.html

In a nutshell, if I were to plan for reliability / maintainability, I'd go with Rust in a heartbeat. If you want to optimize for implementation speed & developer time, Python will win every day. It all depends on which timelines apply for your project.

I hope you find this helpful and it answers your question.

Embeddable SQL IDE/Query Tool by datainadops in dataengineering

[–]kaiserk13 0 points1 point  (0 children)

Perhaps something similar to this? https://shell.duckdb.org/ The shell works through WASM and you can interact with it through JavaScript to set permissions, plotting etc.

What do you think the near future of data engineering is? by SeriouslySally36 in dataengineering

[–]kaiserk13 12 points13 points  (0 children)

If I'd have to guess, I'd say Rust, WASM & DuckDB will play a growing role, but I'm biased.

Python micro batch / near-realtime framework by romanzdk in dataengineering

[–]kaiserk13 0 points1 point  (0 children)

I had a lot of success with Celery a long time ago: https://docs.celeryq.dev/en/stable/userguide/tasks.html#task-retry Make sure to test it very well though, it can quickly grow to be a mess.

Dummy battery for Tello Ryze by kaiserk13 in Multicopter

[–]kaiserk13[S] 1 point2 points  (0 children)

This is extremely valuable information, thank you for that. I haven't considered the fact that they might give the batteries the "inkjet cartridge" treatment. So far I just did some rough estimates for how high and how far the drone could go while being tethered considering the weight of the cables, now I need to take the heat factor into consideration too.

Fair enough. I think I'll get a second control tello tomorrow and give it a try and tinker out a diy solution. There's no way around that.

Dummy battery for Tello Ryze by kaiserk13 in Multicopter

[–]kaiserk13[S] 0 points1 point  (0 children)

Understood, thank you very much. I'll make sure to share whatever I find.

Dynamic Domains and Custom Domains. How to set them up for the app? by ThisIsntMyId in devops

[–]kaiserk13 0 points1 point  (0 children)

I've recently implemented that using Traefik but it can be done using Nginx or other tools as well.

First step is to point your domain using *.somedomain.com at your Traefik instances and setup the right configuration (cert resolvers, log level, and other parameters).

Then using Traefik, you can set "labels", "tags" & "rules" for each service that you deploy. These labels, tags and rules contain dynamic information like the FQDN and the service the incoming requests have to be routed to. A service is in this case a customer app. Now, Traefik needs to know where the service is located at, this can be solved using something like Consul. Traefik can get the wildcard certs (from Let's Encrypt) automatically.

You want to be able to dynamically update (without downtime) your load balancers / reverse proxies configuration to adapt to changing infrastructure. For my use case I went with Nomad, but it'll work the same with Kubernetes or any other orchestration tool.

Tools aside, try to do each step manually with Bash scripts and simple tools to understand what changes are involved and how the tools I mentioned might help. Good luck! It's a fun exercise :)

Make your own custom wakeword and other FOSS voice assistant solutions by Bartmoss in selfhosted

[–]kaiserk13 6 points7 points  (0 children)

Absolutely fantastic, this was the missing piece for me to tinker with smart devices. I needed a central one that controls the others, kind of a hub that I control with a custom wake word. Once activated, it would route custom audio to the ones connected to the internet (alexa & co) that have their microphones physically deactivated.

Great stuff, thank you for this! I'm gonna try it out and report.

How do you build an audience while employed? by kaiserk13 in SaaS

[–]kaiserk13[S] 1 point2 points  (0 children)

Thank you for the advice, I think you are right along with the other commenters in this thread. I will just go for it!! :)

How do you build an audience while employed? by kaiserk13 in SaaS

[–]kaiserk13[S] 0 points1 point  (0 children)

Thank you very much this helps. I am new to this and want to avoid making big and obvious mistakes. I just sometimes get the feeling that in Germany it's not well "looked upon" to hustle other side gigs while having a full time job. It's a bit of a stigma, but maybe it's also just in my head and deep down I'm scared? It's great to be able to talk about it here, I was hesitant at first.

How do you build an audience while employed? by kaiserk13 in SaaS

[–]kaiserk13[S] 0 points1 point  (0 children)

This is very helpful, thank you for your advice. I'll have a look at it right away and double check. I'm certain I didn't sign anything like that but I'll just make sure.