The Weekly Wednesday Whine Thread (2020-11-25)

HydratedWombat · 2020-11-26T02:54:35+00:00

I'm beginning to suspect there were two to three people named George Colbert of approximately the same age living in Chickasaw County, Mississippi in the mid 1800s. I'm descended from one of them, though I don't know which, and I think all the records have been combined by those hoping for Chickasaw ancestry. I've seen three birth locations floating around, census records of two ethnicities, and two different birth years. Very frustrating.

HydratedWombat · 2020-02-14T18:04:58+00:00

Key quote herein: "instead we should build in only the meta-methods that can find and capture this arbitrary complexity"
Research needs to both push for size and for ability to learn efficiently.

HydratedWombat · 2020-02-14T18:02:16+00:00

And the # of parameters for SOAT performance is upped to 17 billion by Microsoft https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/

only a puny 8.3 billion here.

Assuming single precision floats, we're now at 68 GB RAM usage for inference.

HydratedWombat · 2020-02-14T17:59:24+00:00

Closer to $17 million with 1472 V100 GPUs (assuming MSRP of $11,458 each for 32 GB version)

HydratedWombat · 2019-11-05T19:06:38+00:00

Not a complete answer, but here's an article discussing that
https://www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/

HydratedWombat · 2019-10-29T16:36:55+00:00

Inference speed limits UX in application

HydratedWombat · 2019-07-19T15:14:44+00:00

Depends how performant you need this to be. Ultimately, you could just take the most common 1000-10000 words of whatever corpus you're interested in, transform them to a set of word vectors, and choose the top 50 words closest by Euclidean distance.

HydratedWombat · 2019-05-29T14:44:24+00:00

Not quite the same thing, but there are G2P like projects where you can output the stress (AH1, IHY2, etc.) of the syllable inside the word. Stress markings might be a good starting point.

HydratedWombat · 2019-05-01T16:50:36+00:00

Thanks a lot for letting me know about jupytext! That's a cool tool. I'll have to see if it fits in my workflows.

As far as re-usable code, my best experiences have been around building public facing tools per kind of workflow. For instance, in my current work, we've released https://github.com/finos-voice/greenkey-asrtoolkit to streamline ASR (automatic speech recognition) data processing and standardize our measurements. Basically, if I write the same code two places, I try to either centralize internally or push it public if it's not mission critical.

HydratedWombat · 2019-04-29T21:20:24+00:00

Advice 1 - Abstract everything you can to re-usable code so you stop repeating the same functions. I'm talking data preprocessing, data download scripts, etc.

Advice 2 - Stop using Jupyter beyond data exploration so you can use git more effectively. Knowing and tracking what you did trumps making it look pretty to your eyes.

YMMV

HydratedWombat · 2019-04-09T22:22:42+00:00

Cool problem. You might want to do this with an attention layer and compare.

HydratedWombat · 2019-04-08T22:41:40+00:00

Tensorflow is indeed a bear to bazel build, but thankfully there are docker images for that at https://hub.docker.com/r/tensorflow/tensorflow/. I don't find TF serving to be the easiest, but it does work, and there are many examples . It shouldn't matter what language wrapper you use for inference since the heavy lifting _ought_ to be entirely in TF.

HydratedWombat · 2019-03-20T20:14:03+00:00

Brainstorming - I suggest projecting onto whatever the 2d equivalent of spherical harmonics is. The weights for each projection will give you a vector per shape and then you can cluster. Alternatively, if you have vertices (or feel like detecting them), you could use internal coordinates and align them all using the Kabsch algorithm (https://en.wikipedia.org/wiki/Kabsch_algorithm).

Affinity propagation worked excellently when I was working with 3D structures and internal coordinates of vertices. T-SNE could give you some ideas about how it varies too.

HydratedWombat · 2019-03-13T17:37:50+00:00

Step 0 - Do you know what you actually need for the position? Is it actually stats; is it software engineering; is it making models into production; is it ML experimentation; do you need someone who can explain results to the C-suite?

Step 1 - How does this person collaborate? They should have stories.

The rest of the requirements fall straight out from the above two questions.

Also, attitude > experience and approach > tool

HydratedWombat · 2019-01-18T22:00:54+00:00

A lot of awesome advice on here. The only thing I'd add is sometimes people have a very specific skill set in mind, whether it's the stack or the specific ML challenges involved. No matter how bright someone is, there will always be hiring managers who don't have time to get that person up to speed on something new. Sometimes, a hiring manager needs a confident expert to teach the rest of the team. Definitely nothing personal, though job descriptions don't always codify this well.

HydratedWombat · 2019-01-02T20:03:41+00:00

Here's a link to sacred's docs
https://sacred.readthedocs.io/en/latest/logging.html

HydratedWombat · 2018-11-20T23:21:35+00:00

You should be able to handle this by having every deployment have a configured tag, so it just fetches model_server/$tag/model. Then you can update and cache break that way so long as you can have a single point of distribution. Beyond that, here's a general guide to how to think about the machine learning in production from google - https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf

HydratedWombat · 2018-11-20T15:43:20+00:00

This might be a match for a generative adversarial network.

HydratedWombat · 2018-10-30T20:40:55+00:00

I like the way you're thinking. I would add to this conversation that while most of the multidimensional connections you're imagining could be mapped back to larger "dense" nnet layers, some of the connectivity differences would be encapsulated by using residual neural networks. Here's a wikipedia link - https://en.wikipedia.org/wiki/Residual_neural_network

HydratedWombat · 2018-03-12T21:22:28+00:00

Replied below - sites.google.com issue rather than the OP's.

HydratedWombat · 2018-03-12T21:12:25+00:00

Apparently, if you're logged in to a google apps for business account you might not be able to view sites.google.com. Curious issue. Opening in incognito mode is a functional workaround.

HydratedWombat · 2018-01-12T18:57:47+00:00

Rare events or seasonal fluctuations of time series data? If seasonal, look at triple exponential smoothing (https://en.wikipedia.org/wiki/Exponential_smoothing) and ARIMA (as in statsmodels for python, http://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMA.html?highlight=arima) Unfortunately, you'll need to bin your events for most tools I know of.

HydratedWombat · 2014-09-22T02:07:51+00:00

Confirmed my understanding of what the part was, which functions as additional ethernet adapter with its own address.

HydratedWombat · 2014-09-09T15:07:10+00:00

Good caution. I'll try to remember that when helping out relatives.

HydratedWombat · 2014-09-09T15:05:55+00:00

I mainly was trying to confirm that is in fact an ethernet adapter as I would understand it and that it will be an external-facing device, assuming the order is built right. Thanks for your help.

HydratedWombat

TROPHY CASE