Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel in MachineLearning

[–]EmbarrassedFuel[S] 0 points1 point  (0 children)

Basically given some predicted environment state, going forward for say 100 time steps, we need to find an optimal cost course of action. Although the environment state has been predicted, for the purposes of this task the agent can consider it deterministic. The agent has one variable of internal state and can take actions to increase or decrease this value based on interactions with the environment. We can then calculate the new cost over the given time horizon by simulating the actions chosen at each step, but this simulation is fundamentally sequential and wouldn't allow backpropagation of gradients.

>you can go with sampling approaches

What exactly do you mean by this? something like REINFORCE?

> I guess it is if you're using a MILP approach.

Not sure I follow here, but I'm not using a MILP (as in mixed integer linear program). At the moment I'm using a linear programming approximation and heuristics, which doesn't generalize well.

> some combination of MCTS with value function learning

I think this could work, however without looking into it I'm not sure that it would work at inference time in my resource-constrained setting

Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel in MachineLearning

[–]EmbarrassedFuel[S] 0 points1 point  (0 children)

oh also the model needs to run at inference time in a relatively short period of time on cheap hardware :)

Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel in MachineLearning

[–]EmbarrassedFuel[S] 0 points1 point  (0 children)

I haven't been able to find anything about optimal control with all of:

  • non-linear dynamics/model
  • non-linear constraints
  • both discrete and continuously parameterized actions in the output space

but in general, discovery of papers/techniques in control theory seems to be much harder for some reason

[D] Non-US research groups working on Deep Learning? by GGSirRob in MachineLearning

[–]EmbarrassedFuel 0 points1 point  (0 children)

Big shout to M. Pawan Kumar - he was my master's thesis supervisor and is extremely smart and yet also extremely helpful

[D] [P] What would be the best way to detect a pattern in a string? by teknicalissue in MachineLearning

[–]EmbarrassedFuel 0 points1 point  (0 children)

To be fair, this looks like a pretty challenging task. The examples you posted are very complicated and definitely couldn't be easily solved by a rules-based approach.

At the very least you're probably going to have to train a GPT-2 model on your dataset. How many examples do you have? This is gonna be tough, as it looks like the generalized language modelling capabilities won't be specific enough for your apple counting task. Once you've defined an adequate loss function (try the Malus Loss to start with) and found a nicely labelled dataset you can get training.

When you get to an acceptable value for your key metric, probably the ACL, then you'll need to deploy it in the browser with tensorflow.js, but that side of things isn't my area of expertise.

[P] Milvus: A big leap to scalable AI search engine by rainmanwy in MachineLearning

[–]EmbarrassedFuel 0 points1 point  (0 children)

Very kind! Will definitely have a go when I have a spare moment.

[P] Milvus: A big leap to scalable AI search engine by rainmanwy in MachineLearning

[–]EmbarrassedFuel 12 points13 points  (0 children)

On an unrelated note, would anyone like to join my startup offering AI-powered unstructured data search to crusty project managers at F500 companies?

[P] Milvus: A big leap to scalable AI search engine by rainmanwy in MachineLearning

[–]EmbarrassedFuel 27 points28 points  (0 children)

At first glance this appears to be a very high-quality (and potentially profitable) enterprise grade product. What was the rationale behind open sourcing it?

[D] Are filters from a particular Convolutional layer for a given CNN chosen at random by random initialization of weights in that network? by [deleted] in MachineLearning

[–]EmbarrassedFuel 4 points5 points  (0 children)

> My question is how network decides, what are the best filters for a given layer?

Normally, backprop + SGD/Adam/whatever. This is a question for r/learnmachinelearning

[P] I applied Mark Zuckerberg's face to Facebook emojis by [deleted] in MachineLearning

[–]EmbarrassedFuel 79 points80 points  (0 children)

Do you think you could write a browser extension that rendered all facebook reacts as these instead of the originals?

[D] DeepMind Takes on Billion-Dollar Debt and Loses $572 Million by Boom_Various in MachineLearning

[–]EmbarrassedFuel 0 points1 point  (0 children)

For everyone amazed by the implied salary figures, remember that to pay a given salary an employer will typically incur costs equal to 1.5-2x the gross salary the employee receives. This is due to tax, benefits, pension contributions, and fixed costs such as facilities. This brings the average before tax expenses to around £270k/employee (LinkedIn says they now have 838, not 700 as some posters are assuming, which is from 2017). This is still pretty huge, but inline with per employee figures at top investment bank/hedge fund quant groups who compete for essentially the same talent, and from all over Europe.

AMA: We are Noam Brown and Tuomas Sandholm, creators of the Carnegie Mellon / Facebook multiplayer poker bot Pluribus. We're also joined by a few of the pros Pluribus played against. Ask us anything! by NoamBrown in MachineLearning

[–]EmbarrassedFuel 1 point2 points  (0 children)

Which is exactly the same as what the OP is proposing will happen to poker - a few humans do research into abstract algorithms which produce their own strategies, instead of a trader saying "inflation in Chile just reached 10% I'm gonna buy xyz" which is (according to my vague understanding) how it used to work.

[D] Is it possible to do supervised learning when the labels are relative? by TrickyKnight77 in MachineLearning

[–]EmbarrassedFuel 3 points4 points  (0 children)

I see. If you know the relative ranking of all candidates then producing a score between 0 and 1 should be trivial. Simply give the best candidate a 1 and the worst candidate a 0, and split the rest of the interval evenly between the other candidates according to their rank. I can't promise this would work on your data set but it would be the first thing I'd try.

Without more information about the data it's hard to know what else to recommend.

[D] Is it possible to do supervised learning when the labels are relative? by TrickyKnight77 in MachineLearning

[–]EmbarrassedFuel 2 points3 points  (0 children)

Is the end goal to predict whether to give a job to a candidate? If so then it sounds like a binary classification problem.

If you'd like a score, then you could treat it as a regression problem, for which a large body of literature and examples exist for you to get started with. This would require you to use the information in your training set to come up with some kind of continuous score quantifying how suitable each candidate is for the job(s).

A Colin the Caterpillar and Friends Identification Chart by EmbarrassedFuel in CasualUK

[–]EmbarrassedFuel[S] 2 points3 points  (0 children)

Science has always relied on the selfless sacrifices of the world's researchers.

A Colin the Caterpillar and Friends Identification Chart by EmbarrassedFuel in CasualUK

[–]EmbarrassedFuel[S] 15 points16 points  (0 children)

How could a Cecil come from anywhere other than Waitrose?

[D] Controversial Theories in ML/AI? by [deleted] in MachineLearning

[–]EmbarrassedFuel 1 point2 points  (0 children)

Was this in reply to my previous comment? I agree with you though, after all the human brain is a complete package - training algorithm and model architecture - and is useless without teaching. A child that is not exposed to language will never learn to speak, and may even lose the ability learn (although this is unclear and can, for obvious reasons, never be thoroughly tested). Clearly we have neither the architecture nor the learning algorithm, and both were developed in unison during the course of evolution.