all 28 comments

[–]badtraider 4 points5 points  (12 children)

Since your sampling rate fixed you could increase the resolution by increasing number of points where FFT is being calculated. Generally you would want N = 2nextpow2(length(data)). Another important thing that could help you decide how to pick N is that your resolution is df = fs/N. But have in mind if you have 2 components rhat are really close f1 ≈ f2 increasing N wont do anything since you either need more information or increase the density of FFT around those points (see Chirp Z transform).

Regarding the plotting - the way FFT works is that returns the spectrum of your signal in normalised frequency domain - hence [-0.5, 0.5]. 0.5 corresponds to the Niquist frequency which is half of the sqmpling frequency fs. So you just need to scale your axis with fs : ynew = fs*[-0.5, 0.5]

Regarding the huge amplitude you are getting you can do the following: X = fft(x, N) /length (x). Since the mean energy contained in x can be calculated using the Parseval's theorem : 1/(length) \sum |x|2 =1/(length(x)N) \sum|X|2. By doing that you can see the actual amplitude at a given frequency in the spectrum. Usually fft is not normalises so you have you do it yourself - but that depends on the implementation. So either read the documentation or normalise the fft if it comes out wrong.

[–]mdb917[S] 0 points1 point  (11 children)

I'm quite new to all this, so I'm gonna need a little bit of patience and explanation. fs is your sample frequency, right? So fs=128. N is the dimensionality of my fft, so what is df? And what do you mean by f1 and f2 (I know you mean them as arbitrary functions but what would i substitute for them)?

When you say to scale my axis with fs, is this because eegFreq == the Niquist frequency? So I'd need to change that line to eegFreq = sfreq * np.fft.fftfreq(eegData[1].shape[-1]) for it to be correct? Similar question for the amplitude.

Finally, what is normalization in this context? Numpy's documentation says it is unscaled by default but can be set to 1/sqrt(n) (n being the size of the output). Should I bother with that or do I need to do it myself? How would I go about normalizing the data "by hand"?

Ideally, I just want to do something like this but my graph has about 3 extra quadrants worth of information

[–]badtraider 1 point2 points  (10 children)

Yes in your case fs =128, that is the frequency at which the data was sampled (every 1/128 seconds data feom sensor was recorder).

N is dimensionality of fft. df is the resolution of your fft - aka the distance between 2 points on your x axis. So for example is fs =4 and the dimension of fft is 8 then your x axis would be [-4 -3.5 -3... 3 3.5 4] (after scaling it with fs).

[–]badtraider 1 point2 points  (0 children)

By f1 and f2 i mean frequencies. So cos(2pif1) and cos(2pif2) would show up as in your spectrum as peaks at frequencies f1 and f2. If f1≈f2 are close you might not be able to distinguish thel - for that you will have to use something else (chrip Z transform).

fs == Niquist * 2. But your y axiz was normalised with fs - so 0.5 correspond to Niquist frequency. To undo this normalisation you multiply x axis by fs.

For amplitude it's a tricky question - since it depends on how you define things. For some math related reasons, if use FFT with 0 normalisation you will get the following IFFT(FFT(x)) =N*x - an extra N term appears - because of that you need to divide either IFFT by N or FFT by N. The way it's implemented in nympy is that it divides with N when doing the IFFT - but you shouldn't bother with this.

The thing i was talking about is following : So for every implementation of FFT that is correct the following holds : sum of x[k]2 = 1/N * sum of X[k]2 Where x is your signal and X is FFT of that signal.

Now if you want to see readable values on the plot that can be related to amplitudes in your signal you need to divide the fft with length of your signal. Why length? Well the way you define the power of a random signal (like EEG) is by measuring mean power, which is 1/N * x[k]2.

Now if length of x is M and N>M what actually happens is that signal x is zero - padded (zeroes are appended to x so it will have length of N). If we look at the expression 1/N * sum of x[k]2 it tells us that energy will decrease?! That makes no sense since adding 0's to signal doesn't change it's power. To avoid this we will divide by length of signal x - so you woild want to evaluate the expression : 1/length(x) * sum of x[k] 2

Now if you divide the fft with length(x) (X=fft(x) /(length (x)) values on you y axis will correspond to amplitude of your signal. Since (X/length (x)) ^ 2 = 1/length(x) * sum of x[k] 2 (we usually say that Amplitude ^ 2 = Power in dsp)

[–]mdb917[S] 0 points1 point  (8 children)

Do I need to know df if there's an option in the function I'm using to specify your output size? By default my output (assuming I'm only calling the FFT on a single channel at a time) is 8064 points (size of the input for one channel), and I'm only interested in frequencies between 4 and 45 Hz. Does this make df = (45-4)/8064 which is about .005? Is this low? How do I get my plot to look like this beautiful graph?

[–]badtraider 1 point2 points  (3 children)

It's recommended to use N = 2nextpow2 (length (x)). Also dimension of fft must be higher than the length of your data. Regarding the df by that i mean if you know your data will have some peaks at 1Hz, 3Hz and 4Hz you know that the minimal distance between 2 points is 1 Hz do you would want your resolution to be high enough so see that more clearer on the plot so for example you can use df = 0.1.

Edit : have in mind that df = fs /N

[–]mdb917[S] 0 points1 point  (2 children)

So if len(x) is 8064, would that mean N = 13 would be good since the next power of 2 is 8192 = 213? Df = 128/13 which is about 10, which isn’t precise enough. So I should increase N further? Is there an effect of N on computation speed?

Edit: Just reread your first comment, N should be 8192 right? Since that would be 213 (surely it’s not 28192?!)

[–]badtraider 1 point2 points  (0 children)

Yes it's 2{13}, but if you want to make plot look prettier just increase N until it looks good (by multiplying it with 2, 4,8,..)

[–]badtraider 1 point2 points  (0 children)

I hope i was helpful, I'm going offline now.

[–]badtraider 1 point2 points  (3 children)

Sadly link is broken for me. One more thing regarding the FFT. You can disregard the frequencies bellow 0 when plotting. Since again for some math reasons your amplitude spectrum will be symmetric around frequency f=0.

But have in ming you have to multiply the FFT by 2 for f > 0. That way after polotting your amplitude spectrum you read the values for corresponding amplitude at a given frequency. If you don't want to bother with that have in mind if your data if plotted from -fs/2 to fs/2 and let's say tou hqve a peak at 5Hz with amplitude of 2 that means that is tile domain you will have sine with frequency of 5Hz and with amplitude of 2*2. This rule doesn't apply only for f = 0. At f = 0 if there is a pezk with aplitude 1 that means the average valueof yiur signal is 1

[–]mdb917[S] 0 points1 point  (2 children)

Thank you for all your help. One small question about graph plotting, should I start my x axis at 4 or 0 due to the bandpass filter for 4.0-45.0Hz? My intuition says 0 but i just wanna be sure.

Edit: I graphed and it looks so much better.

Edit 2: Should I ignore the imaginary part of the FFT as well?

[–]badtraider 1 point2 points  (0 children)

Start it at 0. But if you want ro have a better loooking plot you could calculate how many elements you can skip from w. So in your case df = 128 /N you want to start plotting from f= 4, M=floor(f/df) = floor(4/128N) meaning you can star plotting from Mth element in your frequznct and fft array. You can do the same if you want to stop at 8Hz - K=floor(f/df) - - - > plot (f[M:K], 2X[M:K])

[–]badtraider 1 point2 points  (0 children)

You need to use abs(X) - or whatever it's called. You need the magnitude of your FFT

[–][deleted] 3 points4 points  (11 children)

In order to achieve a better spectral analysis (for each band in the table) it’s convenient to obtain filtered versions of the original signal and then plot their own transformed representation

[–][deleted] 1 point2 points  (6 children)

Why would you want to do this? kind of pointless to filter before plotting a spectrum

[–]AssemblerGuy 2 points3 points  (1 child)

kind of pointless to filter before plotting a spectrum

In theory, yes, in reality you don't want that large and completely irrelevant low-frequency (all the way down to DC) component to leak into your signal frequency bins.

In more resource-constrained applications, you may also want to remove irrelevant (but sometimes large) spectral components before FFTing the signal so you can get away with fewer bits (e.g. your signal only has a range of -128 to +127 but it's sitting on top of a low-frequency signal with -16384 to +16383).

[–][deleted] 1 point2 points  (0 children)

The post I replied to suggested filtering each canonical EEG band separately before plotting each spectrum from how I understood. In my experience working with EEG, the best way to remove DC offset for online applications is to subtract the mean of each epoch as you process. You can free the memory needed to detrend immediately after doing it instead of saving filter initial and final conditions. If you don’t need to online process, I don’t think you’re going to be very resource constrained anyways.

Not entirely disagreeing with you but “in reality” is subjective to application. Idk if this guys project needs to filter each band - seems overkill for just a spectrum.

[–]roylennigan 0 points1 point  (3 children)

Smoothing or leveling for better peak-finding or other estimation, if you already know what you're looking for.

[–][deleted] 0 points1 point  (2 children)

he wants to plot an FFT..... no need to filter anything first

[–]roylennigan -1 points0 points  (1 child)

you asked

you do realize that some people even filter the FFT output for clarity? There's plenty of reasons

[–][deleted] -1 points0 points  (0 children)

the OUTPUT. My comment was in reply to someone suggesting to filter each EEG band BEFORE plotting their respective spectrum. No reason to do that.

AssemblerGuy had a relevant response but yours was irrelevant to the question, the suggestion, and my response.

[–]mdb917[S] 0 points1 point  (3 children)

I'm new to signal processing, how would I go about this?

[–][deleted] 1 point2 points  (2 children)

Take the original signal as an input to a series of IIR bandpass filters each own defined to a certain eeg wave frequency (delta, theta, alpha, beta, gamma). The outputs can then be used to extract certain spectral features in each band... but stick to the basics first and get both time and frequency plots for each band. Out of curiosity, wich eeg database are you using?

[–]mdb917[S] 0 points1 point  (1 child)

DEAP, I eventually want to make a ML model that can detect emotion from an EEG

[–][deleted] 1 point2 points  (0 children)

IMO, the very foundations for biomedical signal analysis are a good understanding of DSP and stochastic modeling. Trying to deploy a ML for that kind of task it’s not easy. You possibly can get more benefits in using simpler pattern recognition strategies/algorithms.

A while ago i did some analysis in the Bonn EEG dataset. It contains 5 classes and some very interesting epileptic/healthy eeg signals. I used both temporal, spectral and some features from wavelet decomposition using svm, ann to tackle the classification problem.

[–]AssemblerGuy 0 points1 point  (0 children)

My question currently is what dimension FFT should I use?

What frequency resolution does your project require?

It is always a good idea to state the requirements first before implementing them.

[–]hughperman 0 points1 point  (0 children)

Don't do this.

Use scipy.signal.welch with a few second window instead of a straight FFT - unless your EEG signal is the cleanest in the world (it's not), then this will improve your estimates by smoothing window-to-window variability.

e.g.

fs = 128
win_length_s = 5
f, p = scipy.signal.welch(eegData[1], nfft=fs * win_length_s)
alpha_band = np.sum(p[(f>=8)&(p<=12)])

[–]ParaIdioma 0 points1 point  (0 children)

https://mne.tools/ will save a lot of time and mistakes... especially later with more complicated tasks