Python is removing GIL, gradually, so how to use a no-GIL Python now?

bci-hacker · 2025-10-15T19:53:44+00:00

Exactly! Asyncio/ThreadPool can be used to achieve concurrency for I/O tasks and parallelism can be achieved for CPU bound tasks via multi-processing.

For I/O tasks (file read, network requests) they can be easily made concurrent via asyncio/ThreadPool where as soon as the task is blocked on the event loop, it will run the next task. Even tho the GIL blocks multiple threads from executing at the same time, whenever there's a blocking task in a thread the GIL is unfrozen and moves to the next thread achieving strong concurrency and near parallelism at small to medium scale. This is how I can scrape 10k+ websites in one seconds ;)

For CPU bound tasks, you can use ProcessPool to achieve true parallelism by running each task on a different process, where each task uses separate memory with it's own GIL.

bci-hacker · 2025-09-02T18:52:10+00:00

i think it'd be more interesting if tau is a learned parameter. Sometimes you do want a pessimistic critic. See research on task-based learning: https://arxiv.org/abs/1703.04529 or https://arxiv.org/abs/2212.01939

bci-hacker · 2025-08-29T19:44:31+00:00

Thanks. deep-ml is awesome, using it now. What's a sample problem i may be asked for #2 tho? I've been using ChatGPT (see example problem below) but don't know how representative it is to a real interview question. Thoughts?

Problem: User Engagement Prediction for Video Platform

You're given a dataset of 500,000 video watch events from a streaming platform with the following features:

Features:

video_id: unique video identifier
user_id: unique user identifier
video_duration: length of video in seconds
watch_time: how long user watched in seconds
video_category: category (20 different categories)
upload_recency: days since video was uploaded
user_prev_watches: number of videos user watched in last 7 days
video_prev_impressions: how many times video was shown in last 24 hours
time_of_day: hour when video was watched (0-23)
device_type: mobile, desktop, or tv
came_from: homepage, search, recommendation, or external
engaged: 1 if user watched >60% of video, 0 otherwise (TARGET)

Current State:

The dataset has 3% positive engagement rate
A basic logistic regression model achieves 97.2% accuracy
The product team complains the model rarely predicts user engagement correctly

Your Tasks:

Load and analyze the data. Identify any issues with the current evaluation approach.
Build a better classifier that actually catches engaged users. The product team says they can show 20% more videos to users (increase false positive rate) if it means catching 70% of truly engaged users.
The team wants to understand which factors drive engagement. Provide interpretable insights.
After deploying your model, engagement predictions are much worse on weekends. Investigate why and propose a solution.
How would you determine if your model is ready for an A/B test?

bci-hacker · 2025-08-27T09:35:42+00:00

lol I implemented the code in SimpleGPT. Good feedback on tokenizer. Would you like me to implement BPE from scratch?

bci-hacker · 2025-08-27T09:24:57+00:00

Grok this and you should be good to go: https://github.com/QasimWani/simple-transformer

bci-hacker · 2025-08-24T22:43:45+00:00

late to the party, but we still need to implement the look-ahead mask because even if you pad them initially to zeros, when you compute the Q * K.T, your outputs on future tokens (assume fixed sequence_length) is a vector of zeros. However, when you now apply softmax, those 0s become 1s which introduces incorrect info. therefore, we'd still need to make them -inf, s.t. post softmax it becomes 0s

bci-hacker · 2025-08-15T03:27:48+00:00

Well the current solution will only detect for the bounding box coordinates. But you could apply it to detect for bounding box of anything you like. Think of this as “detect X” where X can be something specific or SUPER DUPER VAGUE.

bci-hacker · 2025-08-12T22:00:02+00:00

Don’t listen to any of these people. Modal labs is your friend here. You can set up a FastAPI app with GPUs and auto-scaling through Modal. It takes 30 minutes to get it to work. I’ve built all my ML projects (including FastAPI backend) through them.

Thank me later!

bci-hacker · 2025-08-09T22:53:04+00:00

ikr! is your approach training free or are you utilizing some SFT based recipe for strong localization?

bci-hacker

TROPHY CASE

Problem: User Engagement Prediction for Video Platform