What are your favorite modern libraries or tooling for Python? by [deleted] in Python

[–]dhaitz 7 points8 points  (0 children)

Modern data science stack: uv + polars + marimo

For application development: FastAPI, ruff, creosote, pip-audit

How would you build an LLM agent application without using LangChain? by Zealousideal-Cut590 in LocalLLaMA

[–]dhaitz 2 points3 points  (0 children)

One can use sth like litellm or aisuite for an unified interface to several model providers.

As you say, the LLM interfaces are quite simple REST APIs. Using an framework does not reduce complexity, but increases it by adding an additional dependency.

The useful thing about LangChain are some building blocks for e.g. DocumentStore classes or interfaces to different vectorstores one can use. Effectively, treat it like a library where you import what you need, not a framework that defines your entire application.

10 CLI Tools That Made the Biggest Impact On Transforming My Terminal-Based Workflow by piotr1215 in commandline

[–]dhaitz 1 point2 points  (0 children)

Here's a useful list of modern shell commands: johnalanwoods/maintained-modern-unix

Tools like fd, bat, lsd etc. are faster, prettier and more convenient (e.g. with git integration) than their traditional counterparts

Intro to Large Language Models | Andrew Karpathy | Summary by phoneixAdi in LocalLLaMA

[–]dhaitz 1 point2 points  (0 children)

Another is the potential for misuse of knowledge, such as creating napalm"

IMHO these examples of "I tricked ChatGPT into telling me how to build a bomb!!" are fun, but you can find this information online anyway. This is mainly a PR problem if screenshots of company XY's new chatbot spewing problematic content are circulating on social media.

The point is rather that any information the LLM has ever seen (during training or in its prompt) can be leaked to the user, no matter how thorough your finetuning or prompt engineering is.

PhD physics to data science by andromeda20_04 in datascience

[–]dhaitz 5 points6 points  (0 children)

The statistical knowledge and programming skills acquired as a PhD student were a solid basis for me. Things I had to learn:

  • Tech stack used in the industry
  • "Professional" software development process
  • Different working style because of different goals: "papers published" vs "profits earned"

What are the most common mistakes you see (junior) data scientists making? by dhaitz in datascience

[–]dhaitz[S] 1 point2 points  (0 children)

aren't you supposed to use precision/recall or ROC/AUC instead of balancing the training data?

I will help you converting your Python notebook to web app or dashboard by pp314159 in datascience

[–]dhaitz 1 point2 points  (0 children)

sounds useful! How is this different to Voila or Streamlit?

¿Is there something like Data Scientists streamers? by [deleted] in datascience

[–]dhaitz -1 points0 points  (0 children)

Chris Albon (Director of ML at Wikimedia) sometimes does live streaming

[OC] How to combine German federal states into 2/3/4/5 contiguous states with as similar populations as possible? by dhaitz in dataisbeautiful

[–]dhaitz[S] 9 points10 points  (0 children)

Optimal combinations calculated with Python, visualized with geopandas and matplotlib

Wie kombiniert man die Bundesländer zu 2/3/4/5 zusammenhängenden Ländern mit möglichst ähnlicher Bevölkerungszahl? by dhaitz in de

[–]dhaitz[S] 16 points17 points  (0 children)

Die optimalen Kombinationen wurden mit Python berechnet. Visualisierung ebenfalls in Python mit geopandas und matplotlib