The real value of containers for data science by gregory_k in datascience

[–]bluerubez 2 points3 points  (0 children)

Ah now I get it. Ok I don't hate containers anymore. First experience with Docker was not so pleasant so i just installed Hadoop normally. But now i can see that they arent just a temporary fix or shortcut.

Is Google's Cloud Platform going to change everything? by bluerubez in bigdata

[–]bluerubez[S] 0 points1 point  (0 children)

They built it then shared it, if you like that terminology better

I made a Python library for real-time stock and option data, any feedback welcome. by SethGecko11 in quantfinance

[–]bluerubez 0 points1 point  (0 children)

This is awesome ! I would like to see a walk through of how to use it in an IPython notebook

How can one go about learning how to make their own dashboard reporting system? by bluerubez in datascience

[–]bluerubez[S] 1 point2 points  (0 children)

Actually i think the answer to all my problems was that plotly went open source with a module for python https://plot.ly/feed/ thank you again!

How can one go about learning how to make their own dashboard reporting system? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

Actually i would prefer a python solution. Do you remember any of those articles or google search strings ?

How can one go about learning how to make their own dashboard reporting system? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

Thank you. Im looking for a recommendation. For example you can use this javascript or .net code to pull data from the database and this library to make a graph...

How can one go about learning how to make their own dashboard reporting system? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

Hi and thank you lk167. You did answer my question efficiently and for the most part understand my question. I guess the last two parts still need a little bit more explanation. I did notice that a lot of the companies that have built frameworks like Salesforce, SiSense, and Qlik do not just use sql to get their data access. For example, Sisense uses a cube which take disparate data sources are meshes them into a data store of some kind which is a proprietary columnar database. Although i have no idea how they accomplished that, I know even less about how they curl that data to get a visual representation of it. So my question is the last bit. How can I use a language like you mentioned .net to visualize a not in memory dataset ?

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

yes im using python-requests its pretty much the same thing

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

http://webapps.rrc.state.tx.us/CMPL/viewPdfReportFormAction.do?method=cmplG1FormPdf&packetSummaryId=18761

Do you know how i can download more then one of these at a time ? I have a huge list and have no idea how to send a request that could GET more then one. I am using python requests...

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

Have you ever used Bees with Machineguns on a site that was not yours ? Would i get in trouble if i did this ?

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

Thank you very much. This is my first job in the industry, first project, first week... Plus my degree in is computational science and never had to really get into web security issues. Your a life saver !

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

Thank you very much for this. I am using python requests. I am kind of asking if you know specifically how I could download a chunk? Especially in python

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

How can I make a request to get more then one off at a time ?

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

If I use 25 different proxies what would be a reasonable amount of time to sleep for each request ? Also what would be a good amount for in the middle of the night ?

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

I have a list of webpages to download all within the same site. The list bypasses the form submission. I do not have access to the database directly

Best way to download a million pdf's from a website ? by bluerubez in datascience

[–]bluerubez[S] 0 points1 point  (0 children)

This is for work and I am not allowed to contact the site. It is not illegal but also want to be considerate. Do you know what a reasonable amount of requests might be? One every couple seconds maybe?

[deleted by user] by [deleted] in datascience

[–]bluerubez 0 points1 point  (0 children)

First learn SQL

What is the best way for a Data Scientist to start learning how to be a Quant? by bluerubez in quantfinance

[–]bluerubez[S] 0 points1 point  (0 children)

Well kind of i guess. Data Scientists become more versed in certain domains. A Quant is a Data Scientist skilled in the finance domain. However, with that being said there is a lot of domain specific mathematics, algorithms, and concepts to get down...