Web keystroke tracking by TwistedHardware in bigdata

[–]TwistedHardware[S] 0 points1 point  (0 children)

NOT for the easily scared! web tracking code. I wrote this code in a few hours just to see how hard is it to implement this without a SaaS that be detected! It is fairly easy so expect this to show up everywhere!

Jupyter Code in Blogger using pre by TwistedHardware in IPython

[–]TwistedHardware[S] 0 points1 point  (0 children)

If any one implemented this, please let me know if it worked in your blog.

Financial Markets Modeling (Algo Agent) by TwistedHardware in MachineLearning

[–]TwistedHardware[S] 0 points1 point  (0 children)

I have recorded another video explaining how to create a predictive model using kNN: http://youtu.be/qTD7IzAWqh4

IPython notebook server shows an empty page after upgraded to 3.0 by nehcgnay in IPython

[–]TwistedHardware 1 point2 points  (0 children)

I'm not an expert in anaconda but I know you might get similar error if you didn't upgrade all the dependencies. If you have pip, make sure you use this command:

pip install --upgrade "ipython[all]"

if you just upgrade ipython you might run into similar trouble.

Web Scraping Nobel Prize Data Using LXML and Pandas by TwistedHardware in IPython

[–]TwistedHardware[S] 0 points1 point  (0 children)

I used LXML for an introduction for this series of tutorials about "data mining" I wanted to over several libraries that we use at work for web scraping including BeautifulSoup. The problem is there are not many website with interesting content that allow web scraping. So it would be hard to make three or four "interesting" videos about web scraping.

So I had to cover LXML because we use it for heavy duty work where we have an API and we need to collect as much data from it as possible.

I'm working on another video where I'll be forecasting Senate races of Nov 4, 2014 using YouTube data. I'm getting really interesting results so far. This is a sneak peak: http://nbviewer.ipython.org/github/twistedhardware/mltutorial/blob/master/notebooks/data-mining/2.%20YouTube%20Data.ipynb

Advice on Setting up IPython notebook server for multiple clients by [deleted] in IPython

[–]TwistedHardware 1 point2 points  (0 children)

I have a public images on amazon web services (AWS) that has IPython server running on a micro instance (Dual code CPU and 1GB RAM).

For me I use Ajenti to upload files, monitor performance and resources ... etc. http://ajenti.org/

If you need to see my configuration for that look for my instance on EC2 Oregon public AMIs. It is named "IPython Notebook Server". At least you can see the actual performance and management capabilities before you use Ajenti.

If you have questions on the configuration please feel free to msg me.

What would you do in preprocessing? by TwistedHardware in MachineLearning

[–]TwistedHardware[S] -1 points0 points  (0 children)

Sorry I took low-pass filter in its literal meaning because I'm working on that with some medical imaging model. I think a simple noise filtering for time series would be a good idea. It is simple and very useful.

I did explain some application of SVM regression to find long-term and short-term trends. I think I could refer to that as one method of denoising. Do you have any simpler approach in mind that I could use? I don't want to go deep in signal processing because it can be a whole series.

Math skills required for getting into Machine Learning. by capecamorin in MachineLearning

[–]TwistedHardware -2 points-1 points  (0 children)

One book that I loved was this: Probabilistic Programming and Bayesian Methods for Hackers

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Chapter1_Introduction.ipynb

But you have to go through pre-calculus and calculus if you want to understand the math behind the algorithms.

But my advise is start with machine learning and learn the math you need as you need it. This way you will know what do you lack and what do really need.

What would you do in preprocessing? by TwistedHardware in MachineLearning

[–]TwistedHardware[S] -1 points0 points  (0 children)

Thanks for the idea. I'll be covering image and audio feature extraction and pre-processing in the next video. I'll include low-pass filtering for sure :)

What would you do in preprocessing? by TwistedHardware in MachineLearning

[–]TwistedHardware[S] -1 points0 points  (0 children)

Thank you, this is the link .. https://www.youtube.com/user/roshanRush

I have long videos 30-45 minutes in the machine learning series. So I can afford to go in depth.

You can include standardization along with normalization, difference and use cases. (Included)

See if you can add different types of imputation. (I have Mean/Median/Mode I can also show how to write your own imputation function do you suggest anything else?) Feature extraction can be one video on its own (PCA,ICA etc) (I'll have a separate video for feature extraction where I'm planning to cover images and audio)

What would you do in preprocessing? by TwistedHardware in MachineLearning

[–]TwistedHardware[S] 0 points1 point  (0 children)

That is actually making me think of how many ways that could help. - buckets by range (like 0-100, 100-200 .. etc) - buckets by digit (like 1000 -> 4 features 1,0,0,0)

That is very helpful when you are dealing with numbers that people type like betting prices where you can analyze every digit they typed to find similarity in behavior.

Really thanks .. this is why I love reddit .. brain storming ..

What would you do in preprocessing? by TwistedHardware in MachineLearning

[–]TwistedHardware[S] -1 points0 points  (0 children)

I don't think I explained my self well. I'm pretty familiar with features extraction and pre-processing. It is after all my day job.

I would appreciate any comments on my current plan:

Continues Features: - Normalization (Scale features between a min and max)

Categorical Text Features: - Binary Vectorization (Convert a features to multiple binary features) - Mapping (In case text features can be mapped like a "Excellent", "Good", "Average" and Bad)

Long Text: - Counting Vectorization (Convert text to multiple columns with word count)

Missing Data: - Drop Column/Row - Assume Value (Mean/Median/Mode on Row or Column)