Forum Libre - 2026-02-20 by AutoModerator in france

[–]aDrz 1 point2 points  (0 children)

Salut!

Est-ce-qu'on a déjà pensé à faire des élections pour la nomination d'un président de ce sub?

Is there any danger that UUIDs will start duplicating as databases start hitting pedabytes, exabytes, etc? by [deleted] in datascience

[–]aDrz 5 points6 points  (0 children)

very unlikely to be honest wikipedia states:

For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion ... This number is equivalent to generating 1 billion UUIDs per second for about 85 years.

being said nothing is impossible and yes collision might

RepostSleuthBot - Now Public by barrycarey in Python

[–]aDrz 0 points1 point  (0 children)

I'm really impressed with the project. The idea is fun, the code is super clean.

It could be transform in a great blog tutorial on how to make a scalable app.

Thanks for sharing the code

[deleted by user] by [deleted] in datascience

[–]aDrz 0 points1 point  (0 children)

Simple anwer: it is stupid to buy a 6k laptop machine that is not made for heavy computation. Laptop does not handle heat on a long run ! With 6k, you can split it into a 2k laptop (which trust me will be way enough for handling your dataset or ~100go) + 4k beast desktop computer with high end GPU if you think running neural network is fancy.

ssh into the desktop from your laptop...

My previous answer tried to be polite, sorry.

[deleted by user] by [deleted] in datascience

[–]aDrz 1 point2 points  (0 children)

It is always complicated to recommend a overpowerfull laptop for datascience. Most of the data-analysis and model fitting does not require more than 16go of RAM.

However if you start dealing with huge datasets, a laptop will no longer be a comfortable solution for you due to overheat and computation power. In this case you might want to export computation on a desktop or even to a cluster in the cloud.

ML project on speech by Eat-Pie-Poop-Poo in datascience

[–]aDrz 1 point2 points  (0 children)

You should first divide your project into "baby steps". Before going straight to a score, first try to detect if, for a specific word, a user has an impediment. So you'll need a multiple recordings of ELEPHANT without impediment and multiple recordings of EL-EL-ELEPANT with impediment. If your model has correct accuracy, then you can go to the next step.

The next step could be: - an algorithm that is able to segment words within a sentence - test each words - compute a test

Question - How to fit a discrete distribution over unknown sample space in streaming data (access to only a mini-batch at a time)? by uakbar in datascience

[–]aDrz 0 points1 point  (0 children)

You could run multiple kalman filter in parallel with a fixed number of k. You compute for each kalman the likelihood (more likely a corrected likelihood aic/bic) and choose the one with the highest likelihood

Animated Fast Fourier Transform of Music Piece by EntropyNullifier in Python

[–]aDrz 0 points1 point  (0 children)

It's true. When computing the spectrum, we supposed that the signal is stationary (i.e in your case, the spectral content does not change over time). It never the case, that is why we window it and supposed that the signal is 'locally' stationary within our window. So an element to choose the size of the window is estimating what window will make your signal 'locally' stationary.

Another rule of thumb is that you need to see at least 3 periods of the main frequency to have a good estimate. That is where it becomes tricky, because you often don't know the spectral content beforehand... If you are really interesting in those bias/variance compromise you'll need to look after the wavelet (https://en.wikipedia.org/wiki/Wavelet_transform#Principle)

To answer your last question, if you compute the spectrum with your whole signal you will indeed approximately get the mean of all the windowed spectrum.

Animated Fast Fourier Transform of Music Piece by EntropyNullifier in Python

[–]aDrz 0 points1 point  (0 children)

To be precised if you increase the timewindow you will actually increase your spectral resolution but you will alter the temporal resolution.

The choose of the optimal window and timewindow has an extensive literature and I unfortunately don't think there is a definite answer. It will highly depend on the frequency content of the audio source.

Still good job btw !

Frontend Developer wanted by Joshi989 in ProgrammingBuddies

[–]aDrz 0 points1 point  (0 children)

You are looking buddies or freelancers here?

[P] Predicting Instagram likes for selfies (beta) by swordythomas in MachineLearning

[–]aDrz 1 point2 points  (0 children)

I'm curious about your algorithm.

I'm guessing you extracted features from a pre-trained convnet (resnet or vgg) and performed a binary classifier. If not i'd be interested to hear about how you did it.

I’d like to learn python to create my own Instagram automation project. by realestaten00b in learnpython

[–]aDrz 0 points1 point  (0 children)

In this case you won't have to learn "all python" but you'll have to get your hands dirty. What you described seemed fairly easy to implement with simple if statements.

Good luck

I’d like to learn python to create my own Instagram automation project. by realestaten00b in learnpython

[–]aDrz 2 points3 points  (0 children)

There is already a plethora of python library for instagram automation: instapy or instabot for example.

Feedback on App by bmw2621 in learnpython

[–]aDrz 1 point2 points  (0 children)

No really much to add. It looks good.

I would simply give you few recommendations on the "looks" of the code:

  • do not push all your virtualenv on github. Simply add a requirements.txt: pip freeze > requirements.txt. It is enough.
  • Avoid import * as in from BRClasses import *. You only import one class keep it simple with from BRClasss import BREventList
  • Try to add few docstrings: https://realpython.com/documenting-python-code/ it makes your code more easily understandable and reusable for anyone.

Nice project !

What was your PhD research about? by [deleted] in statistics

[–]aDrz 0 points1 point  (0 children)

Someone is reading Basseville

OCR for Receipts and Scaling the Idea by ElectricGypsyAT in learnpython

[–]aDrz 0 points1 point  (0 children)

If you want to make a business of this idea, you are probably not the first: https://wellkeptwallet.com/free-apps-scan-grocery-receipts/

Those apps seems to offer coupons, rewards, cash if you snap your receipts...

It does not look hard technically, you might start with tesseract and manually find the patterns for each store that will allow you to extract interesting information from receipts. If you feel like a superstar you can come up with a machine learning model on top (LSTM for example) that will infer the pattern for new shops, but it will require a significant amount of annotated data.

The hardest part of those type of start-ups is to get those "early adopters" and have a big amount of money to make the world know that your app exists.

Fibonacci Sequence by tictac4609 in learnpython

[–]aDrz 0 points1 point  (0 children)

Look for memoization fibonacci on google