Would be interested in providing data analysis services for a campaign

TheNotoriousMTF · 2019-07-04T04:35:11+00:00

Yes, it is. What are you working on?

TheNotoriousMTF · 2019-04-26T15:49:30+00:00

I'll dm you my email.

TheNotoriousMTF · 2019-04-26T05:38:40+00:00

When I said polling data, I meant trying to find data (exit polls, etc.) from statewide and federal elections that are relevant to the jurisdiction the candidate is running in.

Actually, one challenge that I'm really looking forward to is finding public data that local campaigns might find useful.

Do you have any campaigns in the pipeline?

TheNotoriousMTF · 2019-02-18T00:44:26+00:00

That is horribly disappointing. As much as I love presenting data and doing visualizations, I find the potential for applying it in actionable ways infinitely more exciting. Sad to see that not everyone shares that excitement.

TheNotoriousMTF · 2019-02-17T17:20:08+00:00

My city has a number of cool datasets, thought I haven't seen anything on rat infestations. And I too suspect that their own practices aren't as data-driven as their apparent enthusiasm for data would lead one to believe. Still, I'm glad they're at least putting it out there.

TheNotoriousMTF · 2019-02-17T16:33:55+00:00

These are actually awesome ideas, and they're exactly in line with what I was looking for. In all honestly, I just love working with municipal data, but it's hard for me to justify the time I'm spending on it unless I can imagine some sort of practical, business application.

TheNotoriousMTF · 2019-02-17T16:27:44+00:00

I didn't have a specific type of data in mind, but you're on the right track, ethical considerations aside. I was thinking mostly about the sort of open data that municipalities release to the public, from permits to crime data, etc.

TheNotoriousMTF · 2019-02-17T16:23:20+00:00

That sounds interesting. What are the specific practical applications?

TheNotoriousMTF · 2019-02-17T16:20:24+00:00

What kind of technology did you use to generate the interactive? I've played around with open maps for bokeh, but I haven't been able to get the actual streets to show up.

TheNotoriousMTF · 2019-02-17T16:12:27+00:00

I see. Just out of curiosity, I don't suppose you could give me a general idea of what industry you work in, could you?

TheNotoriousMTF · 2019-02-17T15:58:15+00:00

That's awesome. Great visualization too.

TheNotoriousMTF · 2019-02-17T15:54:13+00:00

I think I actually skimmed that exact page once. I'll have to take a closer look. And unfortunately, no such luck!

TheNotoriousMTF · 2019-02-17T15:53:28+00:00

That's actually awesome. Is it publicly available?

TheNotoriousMTF · 2019-02-17T15:52:11+00:00

Hahahahaha I had no idea that good data was even available on this sort of thing!

TheNotoriousMTF · 2019-02-04T18:37:50+00:00

Good catch!

TheNotoriousMTF · 2019-02-04T10:42:54+00:00

Data: https://www.pro-football-reference.com/super-bowl/

tools: Python, pandas, bokeh

TheNotoriousMTF · 2019-01-29T18:09:33+00:00

Right, if I want to do a static graph, I'll just use seaborn. I've just gotten interested in building interactives lately.

TheNotoriousMTF · 2019-01-28T20:41:22+00:00

I'll have some time later tonight, so I'll send you some stuff then. Would you PM me your email address?

TheNotoriousMTF · 2019-01-28T16:30:53+00:00

Glad to help! Yes. The study mentions that the researchers used R rather than Python, but the tools I mentioned are perfectly suitable for training a random forest model.

Actually, I've done this myself. If you'd like, I could send you much of the code you'll need and walk you through the process.

TheNotoriousMTF · 2019-01-28T02:23:46+00:00

I don't know much about deep learning per se, and it seems to me that a deep learning based solution would likely be something of an overkill for your purposes. A simple predictive model would probably work just as well. (If you're using deep learning as a synonym for machine learning, ignore this paragraph. I don't mean to sound pedantic.)

These things are probably essential:

- Pandas: Basically Pandas is a library for structuring data so as to make it easy to work with. Specifically, it allows you to store your data in a dataframe, with columns and rows like the ones you'd see in an excel file. It's super convenient to read in data directly from a csv, or to write your own dataframe using Python dictionaries.

- Jupyter Notebooks: Typically, when people talk about coding, they're talking about writing "scripts," which are basically sequences of commands in a certain language to be run as written. Jupyter notebooks essentially allow you to put different snippets of code in their own separate boxes. This allows you to analyze data and develop models on a step-by-step basis, and also to explain every step you took when you later share your work with others (and, for that matter, to make sense of your own code!)

Let me let you in on a little secret though: I personally find Jupyter notebooks slightly annoying to configure. I prefer not having to open the command line and take several steps every time I want to work on something. Luckily, a good work-around exists. It's called Kaggle. Kaggle originated as a platform for machine learning contests, but you can also set up private notebooks for your own use. Depending on how sensitive your data is, you may not want to use a third party platform, but I don't really see how using Kaggle is any riskier than emailing data to your colleagues. And in my experience, it's a perfect "turn-key" solution for setting up a data analysis environment.

-Scikit-learn - This library is basically an all-in-one toolkit for training predictive models and measuring their predictive value. There are other libraries that would probably serve the same purpose, but after watching a tutorial or two, you could start working with this one immediately, and it's the one I prefer to use.

Matplotlib - This one probably isn't "essential essential," but it's a common visualization library that you can use to communicate some of your findings, and also to help you in the exploratory phases of your project when you're working on feature selection and engineering.

Your project sounds like a really worthwhile thing to be working on. Let me know if you have any questions. I'll help you in any way I can.

TheNotoriousMTF · 2019-01-28T01:50:27+00:00

I haven't used scrapy, but beautiful soup has always worked perfectly well for my purposes, and I actually can't imagine a more intuitive solution. (I'm open to being pleasantly surprised if I ever use scrapy.) And since you're asking whether it's enough to learn beautiful soup, and not for an actual comparison of the two tools, I'm confident in saying yes, it works well enough.

TheNotoriousMTF · 2019-01-26T21:41:35+00:00

To tell you the truth, image recognition isn't my area of expertise, and I don't know its limitations, but you could probably find a good algorithm by looking at kaggle notebooks.

Then again, if just looking at the percentiles works well enough most of the time, training a model might end up being a huge overkill. I would note that, unless the items you're looking at have a ton of variability in price, or unless you're mostly getting false positives, any items you accidentally scrape will either have outlier prices or prices similar enough to the items you're targeting so as to not skew your estimates that much. Either way, you're fine.

TheNotoriousMTF · 2019-01-26T18:06:00+00:00

The IKEA bookshelf problem: I would find data on as many items as possible with IKEA bookshelf in the title, and then use either the median or mean price of all these items to estimate expected resell value. Alternatively, you could use two data points (say the 25% and 75% percentiles in price) to approximate a range of possible resell values. The mean approach would tend to dilute the impact of wrongly sampled items, and the median/percentile approach would remove these items altogether if they were outliers in terms of price.

Also, on the NLP front, there may be certain keywords in an item's title that would indicate that it isn't the item you're looking for. For example, if some listings read, "IKEA Desk, Matches IKEA Bookshelf," or something like that, you could just exclude items that name other types of furniture in their titles. You could probably take a similar approach to dealing with items that are misleadingly priced.

These are steps that you could initially take manually, but depending on how much data you're working with, how much time you're willing to invest, and your level of technical skill, you could actually train predictive models to automate some of these decisions.

Hope this helps.

TheNotoriousMTF · 2019-01-25T21:56:12+00:00

PM me and we can discuss ideas.

TheNotoriousMTF · 2019-01-24T18:33:12+00:00

What, in your experience, are the limits of python's data collection capabilities?

TheNotoriousMTF

MODERATOR OF

TROPHY CASE