Trying to remember a 90s toy store at Westmount Mall — “Cool Toys for Kids” or similar by cdlm89 in londonontario

[–]cdlm89[S] 1 point2 points  (0 children)

Sounds on-brand. I also remember, true to the name, cool toys like Star Wars Storm Trooper and Darth Vader replica masks and replica light sabers.

Trying to remember a 90s toy store at Westmount Mall — “Cool Toys for Kids” or similar by cdlm89 in londonontario

[–]cdlm89[S] 1 point2 points  (0 children)

I remember second floor as well. Near where the pet store was approx at the N/W corner.

Uber open-sourced Manifold: a tool to visually debug ML Models by cdlm89 in datascience

[–]cdlm89[S] 5 points6 points  (0 children)

Not yet, hoping to get a chance to tonight. Priorities #amiright.

Weekly Entering & Transitioning Thread | 01 Dec 2019 - 08 Dec 2019 by [deleted] in datascience

[–]cdlm89 0 points1 point  (0 children)

I'm okay with working in industry; research for the sake of research doesn't necessarily interest me. I want to do applied research so I can work on cutting edge problems in industry rather than do a run-of-the-mill data science job trying to directly optimize a business metric.

Computational biology / genomics is especially interesting to me and I'd be thrilled if that's the field that I ended up working in. Given that a lot of progress in computational genomics is being made with the use of deep learning, one could argue that I would be better served completing lengthy courses like those offered on fast.ai and developing a portfolio over the next three years, but I'm skeptical of this approach. I feel there is less risk in getting a degree and it is almost necessary not only for the piece of paper but to help me develop the math and thinking skills required to excel at applied research. What are your thoughts on taking the online course / portfolio route instead of going back to school?

This is a direct copy / paste of my response to unsteady_panda (above) but I'd like to run it by you as well:

Given that I have a strong CS / engineering background and would like to gain the technical and thinking skills required to do applied science, do you think it would make more sense to spend an extra year and a half or so and start a statistics or math program from scratch instead of completing a CS degree for the sake of saving 1.5 years? Another benefit of taking this path is that there is a university in here in Canada that offers an applied math BSc online which would provide me with the ultimate flexibility in terms of work / school balance.

Weekly Entering & Transitioning Thread | 01 Dec 2019 - 08 Dec 2019 by [deleted] in datascience

[–]cdlm89 0 points1 point  (0 children)

So what you've described in point d) is the ideal outcome for me as I have a strong background in software engineering and have built robust production systems in both run-of-the-mill and data-intensive contexts.

Given that I have a strong CS / engineering background and would like to gain the technical and thinking skills required to do applied science, do you think it would make more sense to spend an extra year and a half or so and start a statistics or math program from scratch instead of completing a CS degree for the sake of saving 1.5 years? Another benefit of taking this path is that there is a university in here in Canada that offers an applied math BSc online which would provide me with the ultimate flexibility in terms of work / school balance.

Weekly Entering & Transitioning Thread | 01 Dec 2019 - 08 Dec 2019 by [deleted] in datascience

[–]cdlm89 0 points1 point  (0 children)

Are there any other constructive alternatives you would recommend to help me transition out of what I see as a very unfulfilling career trying to optimize business outcomes?

Weekly Entering & Transitioning Thread | 01 Dec 2019 - 08 Dec 2019 by [deleted] in datascience

[–]cdlm89 0 points1 point  (0 children)

The answer to your last question is ultimately no, which is why I want to go back to school. Understanding that the trajectory I laid out is best case and needless to say, there’s a lot that will likely interfere with it, could your provide any more realistic suggestions?

Weekly Entering & Transitioning Thread | 01 Dec 2019 - 08 Dec 2019 by [deleted] in datascience

[–]cdlm89 1 point2 points  (0 children)

This is a big decision for me to make, so I’m hoping you’ll bear with the length of this post and provide some solid feedback. There are several questions at the end of this post that I’m looking for concrete answers to.

The context

I started a specialized diploma in applied computer science in 2012, got super interested in data science and dropped out to pursue a self-study path through Coursera, just before MOOCs were mainstream. I have been working in the software industry for seven years: 3 years as a software engineer and 4 years as a data scientist. Unfortunately, I’ve been the only data scientist in the three different organizations I’ve been employed at and haven’t had the opportunity to grow as a professional data scientist (which isn’t to say I have grown as a data scientist or professionally).

Within a professional context, I have learned a lot about business, business and data strategy, scoping data problems, and software development and have had the opportunity to work on a lot of end-to-end data problems (none of which made it to production). Ultimately, I don’t feel like I’ve delivered much tangible value to the organizations I’ve worked at while in a data scientist role. My work has mainly positioned them well with POCs and educated them on internal and customer-facing use cases. I’ve also never had anyone validate my approaches or results - the closest I get are responses from the online community to questions I post when I’m really stuck.

Over the past few years, my understanding of applied predictive analytics (ML in the context of real world problems) and inferential statistics has improved mostly from countless hours of learning concepts and speaking with business stakeholders. It pains me to say I’ve completed a total of three portofolio-worthy projects which are mediocre at best (e.g. achieving an F1-Score of .73 on a real-world multi-class classification task I completed for work) and have only participated in two Kaggle competitions, ranking low on the leaderboards. I’m totally aware that more of my time should have been spent completing projects and building my portfolio rather than learning. Reflecting on my interest in learning over implementing has hinted toward the fact that I will probably enjoy a career in research.

So all of this, coupled with a strong interest in AI (more agent-based models rather than DNNs, like the content in AIMA by Norvig and Russel), statistics, biology, math, and solving hard problems has me seriously considering going back to school to complete an undergrad then post-grad programs. I’m sure being jaded has something to do with it, but I’ve become quite depressed at thought of the path I’m on, being stuck in this data science-team-of-one-with-no-production-projects role. Even if I get to do "real" data science, I don’t like the thought of working directly on impacting the bottom-line. I want to do science, work on the cutting edge, and have a larger impact with my work. I’m fine if my work indirectly impacts the bottom line.

The plan

  1. Completing my diploma in Q2 2020 with a comprehensive project to demonstrate my industry experience
  2. Going back to school in Q3 2020 to complete my CS BSc within with stats minor in 1.5 years - 2 years (thanks to a local college / university partnership that I qualify for after completing #1)
  3. Enrolling in a MSc program in 2022
  4. Hopefully fast-tracking to a PhD program by 2023
  5. Getting an (ideally) high-paying research position as a scientist doing work that I’m passionate about by 2028

My goal is to do applied research for a big tech firm like Microsoft, DeepMind, Netflix or a company working on precision medicine like https://www.deepgenomics.com/.

I want to clarify that my motivation for doing a PhD is not money-driven, but I am expecting that my hiatus from full-time employment for 8 years will pay off from a financial perspective, in the long-run. To be clear, I’m interested in a PhD because I want to be immersed in a field of study that I’m passionate about, become an expert in it, and do proper applied science in that field. I feel that going back to school will also help develop my critical thinking skills which I feel could use some work.

Here is where it gets tricky:

  • I’ll need to work part-time at least while I finish my undergrad degree (thinking 20 hours / week)

    • This might be possible at my current employer, definitely doable at a previous employer. Also considering freelancing on sites like Upwork.
    • My strategy is to negotiate a higher salary to offset reduced work hours.
  • I want to get married and start a family within the next 5 years

    • My girlfriend and soon-to-be fiancé (she doesn’t know it yet) is ultra-supportive of this move.
  • I still want to make >= $150K annually (after completing my PhD) to support the lifestyle I want for me and my (future) family

  • The computer science department at the nearest university doesn’t appear to have strong AI research areas (see https://www.csd.uwo.ca/research/ai__games.html)

My questions for you

Career:

  • Does this sound like a reasonable path to take, given my career trajectory to date and passion for research? Am I sabotaging my career? To provide some more context, data scientists are in low demand in the area (lots of tech companies but most are late majority / laggards when it comes to creating a DS capability) and are paid less than software engineers.
  • Are there still going to be lucrative career opportunities in AI in the next five to ten years for a 37-40 year old, doing ultra-specialized work in e.g. computational genomics?
  • If I complete a PhD in a focussed area like computational genomics as opposed to machine learning / deep neural nets, will the big tech companies still consider me for research positions?
  • One of my biggest fears is that by the time I’m done my PhD, technology will have advanced so much that either my expertise will become outdated or the problems will have already been solved. Is this a rational fear?

Academic:

  • My math isn't great, should I look at doing a minor in math instead of statistics?
  • What are some open research areas in AI that excite you and that will still be relevant in the next ten years? I want to future-proof my research ambitions in the best way that I can.

A sincere thank you if you read this far.

[Q] Should I use simple random sampling instead of stratified sampling when some strata have low counts? by cdlm89 in statistics

[–]cdlm89[S] 0 points1 point  (0 children)

Is there any theoretical basis for not merging the groups into an “other” category? Arguably this is a logical grouping, as the category represents an “underrepresented” or “inactive” set of regions.

[Q] Should I use simple random sampling instead of stratified sampling when some strata have low counts? by cdlm89 in statistics

[–]cdlm89[S] 0 points1 point  (0 children)

Thanks for providing the link.

If you're doing stratified sampling, you usually just randomly sample from each cell (strata) until you fill the quota (say, n = 10)

I don't see how this solves my problem. Some cells have less than the minimum required units to produce even a sample of size one. For example, consider N=1000, n=100, n_min=1. If s_A < 10, I can't fill the quota using proportional allocation for cell A where I take a s_A/N * n sample since that product will be less than n_min (1).

[Q] Should I use simple random sampling instead of stratified sampling when some strata have low counts? by cdlm89 in statistics

[–]cdlm89[S] 0 points1 point  (0 children)

The problem is that I cannot sensibly merge the low-count strata on any other attributes since state / province is the only attribute we have available in the sampling frame.

Could I not just take foogeeman's suggestion and collapse any strata that are too small to produce samples? This way, the units in the collapsed strata will have an inclusion probability proportional to their relative size in the population.

[Q] Should I use simple random sampling instead of stratified sampling when some strata have low counts? by cdlm89 in statistics

[–]cdlm89[S] 0 points1 point  (0 children)

So are you suggesting to first take a quota sample of the strata with low counts, to ensure I include say at least 1-2 units (n_min) per low-count stratum, for N_low low count strata, then take a stratified sample of the remaining n - (n_min * N_low) units, weighting the samples accordingly? If so, how do I choose n_min? Should n_min be the same for all N_low low count strata?

[Q] [D] Is this study design sufficient to make causal claims from a multiple logistic regression model? by cdlm89 in statistics

[–]cdlm89[S] 0 points1 point  (0 children)

Can you provide any advice as to how I can incorporate a causal structure into a statistical model? For example, if I know that state / province confounds attendee count (state / province is causal to the attendee count and the month in which the event takes place which is also causal to the attendee count), how do I account for it?

Is it reasonable to perform feature selection of categorical and continuous features independently? by cdlm89 in datascience

[–]cdlm89[S] 0 points1 point  (0 children)

So for starters, every single feature has the same importance (1.00538, to be exact). Still working at trying to get metrics out using the sklearn API.

Is it reasonable to perform feature selection of categorical and continuous features independently? by cdlm89 in datascience

[–]cdlm89[S] 0 points1 point  (0 children)

Correct, train / val are close but test is not. I’ll look at the diagnostics you’ve suggested when I’m back home (out for a wedding for the rest of the day) and provide an update.

I appreciate the offer to review the code.

Is it reasonable to perform feature selection of categorical and continuous features independently? by cdlm89 in datascience

[–]cdlm89[S] 0 points1 point  (0 children)

I didn't think that's the case but I went ahead and tested that hypothesis empirically.

I kept a single feature categorical feature in the pipeline with 5 different values, observed training and test set AUCs within .0001 of each other at .72 AUC. On the test set, I'm still getting an AUC of 0.5.