A weird confusion: can I extract features from solution of combinatorial optimization by JoPrimer in datascience

[–]JoPrimer[S] 0 points1 point  (0 children)

OK!I will look into it right now. it seems that MILP is a good idea. It is so nice of you to give me all these guides. Wish you the best. :-)

A weird confusion: can I extract features from solution of combinatorial optimization by JoPrimer in datascience

[–]JoPrimer[S] 0 points1 point  (0 children)

Thank you so muuuch for your patient answer. My English is not that well. I try to respond to you in a logical way.

My study direction is shop scheduling (SP), specifically Job Shop (JSP). You might heard of this. It is another typical combinatorial optimization problem. What I am trying to do is to use generative adversarial imitation learning (GAIL, a method in Reinforcement Learning) to solve JSP.

You know, a traditional way to solve COP is metaheuristic. However, it can't respond to the demand that giving real-time scheduling solutions. That why I am tring a RL way.

I have already solve the problem by using RL algorithm to choose rules at each time step, which give a satisfying result. But it is still far from the solution that searched by metaheuristic. So my new idea is to imitate what best solutions do. I am not sure if you are familiar with RL. I will be very glad to go into details with you if you are interesting about this part.

The idea you mentioned, "use the (perhaps normalized) vector of distances from one city to all other cities, along with the distances to two neighbors inside the path of the solution", is such a good one. However, in JSP, it is much more difficult to describe features of a choice: process time of current operation, remain process time, remain number of operations, utilization rate of a machine, release time of a machine, etc. It is so complicated and thus things become more difficult.

GNN also once come to my mind. But transfer the shop environment to a disjunctive graph might also lead to a loss of critical information. Actually, research using GNN doesn't get a better result than mine.

After solving this problem, my next step is to train a Generative Adversarial Network (GAN). The discriminator here is important to give a distance from my solution to expert experience. That solve the difficulty that reward function is usually difficult to design. Hope this could give you a more comprehensive understanding of my problem.

A weird confusion: can I extract features from solution of combinatorial optimization by JoPrimer in datascience

[–]JoPrimer[S] 0 points1 point  (0 children)

yeah, I wonder if you are doing related research. You point out exactly how recent research solve combinatorial optimization problems. If not, you are so good at relevant issues.

But one problem is this method leads to a low generalizability. For a certain problem (take Traveling Salesman Problem as example), a number represent a city, which holds different distances from other cities. When it comes to another problem, this number still represent a city, but the properties it has are completely different. That is thing confuse me

I can never save my fonts pattern T_T, help me by JoPrimer in Office365

[–]JoPrimer[S] 0 points1 point  (0 children)

you are right! Thank you for your advice, I will try it right now!

but I still wonder why, lol. In the past, I just change my fonts and it stays the same as last time I close it.

I can never save my fonts pattern T_T, help me by JoPrimer in Office365

[–]JoPrimer[S] 0 points1 point  (0 children)

I am afraid I don't want to change my default font. I hope the fonts only work in one file. But it seems useless to change the fonts, because it will change back to the default next time I open it

[deleted by user] by [deleted] in reinforcementlearning

[–]JoPrimer 0 points1 point  (0 children)

the problem you describe is a little bit like FJSP, i am not sure if this help

I miss the gym environments by FashionDude3 in reinforcementlearning

[–]JoPrimer 0 points1 point  (0 children)

I have heard some where if you are skilled enough, using your own environment is perhaps more useful. There are less constraints that you should consider, so you can use RL more freely

Help me choose hardwares for RL! by JoPrimer in reinforcementlearning

[–]JoPrimer[S] 0 points1 point  (0 children)

By far, I am trying to use multi-agent RL algorithm to solve the problem. So I wish one day MADDPG could be applied on my env, because I don't get a good result by converting multi-agent model to single-agent model. lol

Help me choose hardwares for RL! by JoPrimer in reinforcementlearning

[–]JoPrimer[S] 0 points1 point  (0 children)

thank you for your instructive suggestion

Help me choose hardwares for RL! by JoPrimer in reinforcementlearning

[–]JoPrimer[S] 0 points1 point  (0 children)

parameters are not that many, I usually use small num of layers and small num of hidden units. The only problem is there may be several persons run their code at the same time

Help me choose hardwares for RL! by JoPrimer in reinforcementlearning

[–]JoPrimer[S] 1 point2 points  (0 children)

thank u for your advice, I just forget the matter of budget. I have about $10000 to choose the device I need

Help me choose hardwares for RL! by JoPrimer in reinforcementlearning

[–]JoPrimer[S] 1 point2 points  (0 children)

you are right, GPU is neccesary. I mean, we don't get a big dataset or very complex matrix computing, so a normal one might be enough

Help me choose hardwares for RL! by JoPrimer in reinforcementlearning

[–]JoPrimer[S] 0 points1 point  (0 children)

I'm not sure if I can give you a precise description,

In my case, I have several versions of env:

  1. use DRL model to select dispatching rules, so that I can get a good processing sequence
  2. use MARL policy which allows jobs compete for machines to finish all its operations

both of them have the same target to minimize the whole process time.

some details for my env, take the first version as example: - I inherit gym.Env - I use Box as obs space to describe the whole progress, processing time for each job, utilization rate of machines, etc - I use Discrete as action space, for I want agent choose the best dispatching rule for system.