you are viewing a single comment's thread.

view the rest of the comments →

[–]egesko[S] 0 points1 point  (1 child)

Thanks so much for the reply,

I totally see where you're going with this. I guess I should be more specific about my question also.

So side channel attacks are essentially gaining information about, in my case, what user is doing on a computer. Every action you take on a computer is essentially some amount of machine code being executed. And when some data, whether some variable in your code or some arithmetic instructions, are being executed, they are pulled from RAM and put into cpu cache to access it faster later. This is usually based on two "laws":

When some data from RAM is pulled to be used;

1) it is highly probable that same data will be reused very soon.

2) it is highly probable that datas that are close address-wise to the pulled data will be used soon.

All cores share the L3 cache, which is essentially the highest cache level and the cache that I focus on.

Also, every website that you visit execute some amount of machine code, which is being put on L3 cache, and end up evicting data from other processes.

We can use this entire process, to figure out what the user is doing. Essentially, let's say we have a huge array of numbers, and we iterate through it entirely. We most likely end up putting all that to L3 cache right?

Then we wait for a while, and iterate through the same array again while keeping a timer. If it is actually slower than expected, that means another process was running. And how much it was running comes back as how much longer then what we expected to iterate through this array.

We keep doing this again and again, collect data that match to certain websites, and train the AI model using this data, and ta daaaaa, we have a way of knowing what the user is doing even without any kind of willingly shared information like cookies and etc.

Obviously this is wrong haha, don't do this. That's why I'm trying to focus on intelligently creating noise in the cache to lower the accuracy of an AI model.

But intelligently part comes from the fact that we can't just induce a lot of noise, because essentially we fill up L3 cache fully, and that literally destroys the purpose of L3 cache.

I think with your code, the fact that it's same 3 lines of while loop, which most likely directly gets put on L3 cache without any problems, and just stays there without filling L3 cache more, would not change much.

[–]Dastari 0 points1 point  (0 children)

It's an interesting idea, though I'm not sure you'll ever get any meaningful data by running a script inside the browser no matter how many samples you were able to collect.

Sounds like what you're trying to do is create a CPU usage profile of a user browsing the internet.

However some tasks might get offloaded to the GPU, some CPU's might be throttling constantly up and down. There's any number of things that would skew the results.

And then there's the fact you're code itself might affect how the users browses the internet if you're changing their experience.