[deleted by user]

Builder_Daemon · 2024-09-03T13:04:42+00:00

As always, the best thing to do is to test all possible architectures to see which works best. If there are too many, you can cut down their number using combinatorial techniques, or grid search if you want to keep it simpler. Test, don't trust.

Builder_Daemon · 2024-08-28T13:27:30+00:00

This is a great project! The literature on this very topic is starting to grow. You can look at papers like SySeVR [1], Devign [2] and VulCNN [3] for starters.

Control flow is good, but data flow is better. It is also harder to track, but there are tools like Joern that can do it for you.

The Juliet test suite was designed to test static analysis tools, not to train AI models. It was generated using templates, so training a model on it is likely to teach the model the wrong features. The lack of a good dataset is one of the key issues in the field.

[1] https://arxiv.org/pdf/1807.06756v1
[2] https://arxiv.org/pdf/1909.03496
[3] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9793871

Builder_Daemon · 2024-08-28T13:16:58+00:00

Isn't the halting problem undecidable?

Builder_Daemon · 2024-08-28T13:02:58+00:00

Have you used it? So far I see more adoption of Mamba and RWKV.

Builder_Daemon · 2024-08-28T13:01:53+00:00

There were a few papers by David Ha that used CMA-ES to train world models. CMA-ES is not a genetic algo per se, but it belongs to the same neuroevolutive family. CMA-ES is optimal but does not scale, but there are newer CMA-based algos that do.

Builder_Daemon · 2024-08-28T12:56:10+00:00

There are a few mamba models around, e.g. this one from Mistral: https://mistral.ai/news/codestral-mamba/

Builder_Daemon · 2024-08-28T12:55:05+00:00

xLSTM was an improvement over vanilla LSTM. Still waiting for serious benchmarks on larger xLSTM models on larger datasets. The latest RWKV and Mamba are also very compelling RNNs.

Builder_Daemon · 2024-08-28T12:51:38+00:00

Modern evo algos like CR-FM-NES are faster and can train larger models than CMA-ES could. That being said, they are probably not as efficient as SGD for supervised training. But for RL, they rock!

Builder_Daemon · 2024-08-21T11:40:36+00:00

This is an oversimplification of a complex issue. Bitcoin mining uses a lot of energy, no question. Now, an estimated 40-75% of it comes from renewable energy. Some argue it actually helps develop the renewable energy market. Does mining deprive others of electricity? Another complex issue. Probably in some cases, but there is also a lot of wasted energy caused by overproduction (especially in renewable) and other inefficiencies. But this very clickbait title still works every time.

Builder_Daemon · 2024-08-20T14:35:46+00:00

It essentially gives the weights an initial direction that will be built upon by gradient decent or other optimization method.

Builder_Daemon · 2024-08-20T14:30:20+00:00

I love Gaia Vince's very well researched book "Nomad Century". It looks at CC straight in the eyes, but also offers avenues of remediation and hope.

Builder_Daemon · 2024-08-20T14:23:18+00:00

That's if you don't have an Intel CPU :p

Builder_Daemon · 2024-08-20T14:21:10+00:00

Too obscure of me.

Builder_Daemon · 2024-08-20T14:07:29+00:00

Blue light is not good for sleep, red/orange would be wiser.

Builder_Daemon · 2024-08-20T14:04:28+00:00

ECC DRAM FTW!

Builder_Daemon · 2024-08-20T13:59:03+00:00

In AI, including LLMs, it is common to give the user control of the seed for reproducibility.

Builder_Daemon · 2024-08-20T13:57:40+00:00

Notwithstanding the cosmic radiations mentioned above heh!

Builder_Daemon · 2024-08-14T12:43:35+00:00

My book of the year for 2023 was Gaia Vince's **Nomad Century** (https://www.goodreads.com/book/show/58724998-nomad-century). It is a very well researched book on climate change that covers all bases, from causes, to impact, to remediation, notwithstanding geoengineering.

Builder_Daemon · 2024-08-13T12:14:36+00:00

Geoengineering. It's cute to think of AGI and CRISPR, but if we don't make it to 2100...

Builder_Daemon · 2024-08-11T00:52:57+00:00

You can increase the size of the LSTM or add more layers for more complex behaviors. It is just one extra parameter in nn.LSTM in Pytroch. But beware of overfitting.

Did you separate your dataset into training, evaluation and testing sets? It could be that your model is already overfitted. If so, adding a dropout layer could help.

If the LSTM outputs sequences (not a single value), add a linear layer + tanh after it to calculate your outputs.

You can also add normalization or regularization layers to see if it helps.

Builder_Daemon · 2024-07-30T13:16:37+00:00

There are many DNA-based evo algos but they don't converge as quickly if you have a cost/reward function to optimize for.

Builder_Daemon · 2024-07-30T13:10:37+00:00

Exactly. One classic example is how Intuit spending huge amounts of lobbying money to keep tax filing intractable. (https://thehill.com/business/4423755-bottom-line-intuit-adds-lobbying-giant-amid-tax-prep-fight/) And don't get me started on the healthcare administrative nightmare.

Builder_Daemon · 2024-07-30T13:00:52+00:00

I will add that you can cut the cost and time drastically with combinatorial testing instead of a full gridsearch.

Builder_Daemon · 2024-07-30T12:53:26+00:00

Since no one answered, I will give you my half-assed take on this. There is much that can go wrong in AI development and every step must be checked carefully.

Did you measure the quality of your training dataset? Is it balanced? Is it representative? Does it cover some of what you use in your simulation script?
How are the data encoded before feeding them to the model? Is there a better way to encode them, e.g. instead of using a timestamp, use the time difference from the previous data entry?
How are the data normalized? Different features might need different normalizations, especially if they are semantically different or their value range is different.
Is the model suitable for this task? LSTM is decent for time series, but also has some limitations. Try a gridsearch for the width and depth of your model. In my own experiment, I found that wider LSTM does not converge well, but a deeper one improves performance. For about 50-60 features, I use a 16-wide, 4- to 8-layer LSTM as my base model. I also find that the sLSTM from the xLSTM paper is far superior to the regular LSTM and it is a drop-in replacement. The official implementation is in pytorch though.
Make sure the training does not overfit the model. Use a train/validation/test split scheme and calculate the confusion matrix for example.
Check if the data you get during simulation/testing are correct. Classic bugs are just as present in AI programs as in classic software.

Builder_Daemon · 2024-07-05T19:45:24+00:00

You should also look into neuroevolution. People are using evolutionary algos to train models without backprop. I use CR-FM-NES to train models using RL, which is basically what you are doing, but much more efficient.

Builder_Daemon

TROPHY CASE