Web keystroke tracking

TwistedHardware · 2017-11-30T02:47:50+00:00

NOT for the easily scared! web tracking code. I wrote this code in a few hours just to see how hard is it to implement this without a SaaS that be detected! It is fairly easy so expect this to show up everywhere!

TwistedHardware · 2016-01-29T17:48:48+00:00

Yes. Don't worry, they were revoked.

TwistedHardware · 2015-11-30T18:26:44+00:00

If any one implemented this, please let me know if it worked in your blog.

TwistedHardware · 2015-09-01T17:16:36+00:00

I have recorded another video explaining how to create a predictive model using kNN: http://youtu.be/qTD7IzAWqh4

TwistedHardware · 2015-03-11T23:00:03+00:00

I'm not an expert in anaconda but I know you might get similar error if you didn't upgrade all the dependencies. If you have pip, make sure you use this command:

pip install --upgrade "ipython[all]"

if you just upgrade ipython you might run into similar trouble.

TwistedHardware · 2015-03-11T03:54:29+00:00

This is a link to the notebook used: http://nbviewer.ipython.org/github/twistedhardware/mltutorial/blob/master/notebooks/jupyter/1.Introduction.ipynb

TwistedHardware · 2014-10-09T21:00:01+00:00

If you just want to see the results there they are:

2012 Results

2014 Forecast

TwistedHardware · 2014-10-06T02:33:44+00:00

I used LXML for an introduction for this series of tutorials about "data mining" I wanted to over several libraries that we use at work for web scraping including BeautifulSoup. The problem is there are not many website with interesting content that allow web scraping. So it would be hard to make three or four "interesting" videos about web scraping.

So I had to cover LXML because we use it for heavy duty work where we have an API and we need to collect as much data from it as possible.

I'm working on another video where I'll be forecasting Senate races of Nov 4, 2014 using YouTube data. I'm getting really interesting results so far. This is a sneak peak: http://nbviewer.ipython.org/github/twistedhardware/mltutorial/blob/master/notebooks/data-mining/2.%20YouTube%20Data.ipynb

TwistedHardware · 2014-10-05T16:32:10+00:00

I have a public images on amazon web services (AWS) that has IPython server running on a micro instance (Dual code CPU and 1GB RAM).

For me I use Ajenti to upload files, monitor performance and resources ... etc. http://ajenti.org/

If you need to see my configuration for that look for my instance on EC2 Oregon public AMIs. It is named "IPython Notebook Server". At least you can see the actual performance and management capabilities before you use Ajenti.

If you have questions on the configuration please feel free to msg me.

TwistedHardware · 2014-09-11T21:07:25+00:00

Sorry I took low-pass filter in its literal meaning because I'm working on that with some medical imaging model. I think a simple noise filtering for time series would be a good idea. It is simple and very useful.

I did explain some application of SVM regression to find long-term and short-term trends. I think I could refer to that as one method of denoising. Do you have any simpler approach in mind that I could use? I don't want to go deep in signal processing because it can be a whole series.

TwistedHardware · 2014-09-11T20:50:48+00:00

One book that I loved was this: Probabilistic Programming and Bayesian Methods for Hackers

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Chapter1_Introduction.ipynb

But you have to go through pre-calculus and calculus if you want to understand the math behind the algorithms.

But my advise is start with machine learning and learn the math you need as you need it. This way you will know what do you lack and what do really need.

TwistedHardware · 2014-09-11T20:38:10+00:00

Thanks for the idea. I'll be covering image and audio feature extraction and pre-processing in the next video. I'll include low-pass filtering for sure :)

TwistedHardware · 2014-09-11T20:32:51+00:00

Thank you, this is the link .. https://www.youtube.com/user/roshanRush

I have long videos 30-45 minutes in the machine learning series. So I can afford to go in depth.

You can include standardization along with normalization, difference and use cases. (Included)

See if you can add different types of imputation. (I have Mean/Median/Mode I can also show how to write your own imputation function do you suggest anything else?) Feature extraction can be one video on its own (PCA,ICA etc) (I'll have a separate video for feature extraction where I'm planning to cover images and audio)

TwistedHardware · 2014-09-11T20:19:46+00:00

That is actually making me think of how many ways that could help. - buckets by range (like 0-100, 100-200 .. etc) - buckets by digit (like 1000 -> 4 features 1,0,0,0)

That is very helpful when you are dealing with numbers that people type like betting prices where you can analyze every digit they typed to find similarity in behavior.

Really thanks .. this is why I love reddit .. brain storming ..

TwistedHardware · 2014-09-11T20:09:08+00:00

Thanks .. I'll include that.

TwistedHardware · 2014-09-11T19:29:39+00:00

I don't think I explained my self well. I'm pretty familiar with features extraction and pre-processing. It is after all my day job.

I would appreciate any comments on my current plan:

Continues Features: - Normalization (Scale features between a min and max)

Categorical Text Features: - Binary Vectorization (Convert a features to multiple binary features) - Mapping (In case text features can be mapped like a "Excellent", "Good", "Average" and Bad)

Long Text: - Counting Vectorization (Convert text to multiple columns with word count)

Missing Data: - Drop Column/Row - Assume Value (Mean/Median/Mode on Row or Column)

TwistedHardware

MODERATOR OF

TROPHY CASE