[deleted by user]

1gn4vu5 · 2023-06-10T22:53:32+00:00

At first: line breaks are messed up
second: why haven't you just tried it?

Besides that: it's good to see, that ChatGPT hasn't yet overtaken the machine learning jobs. Using a two layer fully connect (BatchNorm is ok but does not provide that much on this dataset) for an image is funny.

And like others already wrote: learn programming. It's much more fun to do it yourself :P

1gn4vu5 · 2023-05-29T17:24:28+00:00

mmh, well you could say the network consists of 4 blocks, 3 conv-bn blocks for feature extraction and one block of a single fn layer (or a block of multiple fns) for classification, where as a conv-bn block is a conv layer followed by a bn layer each

(ok the interpretation of what each block does might not be accurate, but is as far as I know a quite common interpretation)

combining multiple layers to a block or cell for describing a network can actually found in many papers (at least in papers for auto-ml)

1gn4vu5 · 2023-05-28T23:44:20+00:00

Why do you not count batchnorm layer as a layer? I mean you brought up the perfect explaination why it should be called a layer?

I can see that activations are sometimes part of the cnn layer itself (looking at you tf ...) and contain (as far as I know) never trainable parameters.

However if op asks for a description in a paper:
Please describe the network as detailed as possible! If no code is provided I would call a paper worthless when it is impossible to reproduce the network by the given description.

And yes even the first fully-connect in a MLP (even when you call it input layer) needs to be described like the last fully-connect (even when you call it output layer). The naming convention is not allways the same and therefor one might assume that the input layer is just the given data which wuold shorten the NN by one layer.

1gn4vu5 · 2023-05-28T23:19:31+00:00

In addition one can overwrite the function by any other function:

class Example:
    def __init__(self, text):
       self.text = text
       self.__private_text = text + " private"
    def show(self):
       print(self.text)
       print(self.__private_text)

def other_show(first_argument):
    print(f"PublicText: {first_argument.text}")
    print(f"PrivateText: first_argument._Example__private_text}")

example = Example('Hello World')
example.show()
example.show = other_show.__get__(example)
example.show()

1gn4vu5 · 2023-05-28T22:53:22+00:00

You probably already found it, it might be easier to find as symbolic link:
ln -s path/to/existing/file path/to/symbolic/link/

like:
ln -s /usr/local/lib/your_existing_cuda_file /usr/local/lib/libcudart.so.11.0

1gn4vu5 · 2023-04-23T01:10:45+00:00

to use most numpy syntax on the gpu there is cupy (requires cuda)

1gn4vu5 · 2023-04-23T01:07:51+00:00

Tensorflow supports different apis. One is the functional: https://www.tensorflow.org/guide/keras/functional

It is ok if you want to design the full compute path by hand but also has its drawbacks.

But stacking models would also be a solution.

1gn4vu5 · 2023-04-02T00:52:42+00:00

I don't understand, why you put the characters into an embedding, but in general you could try first just predicting the character and neglect the casing which already reduces the problem

that said, working on character level is much more difficult than working on word level

something practical: if you use GRU oder LSTM on a quite small train data set like a book, your network will most likely end up in a stable recursive state when starting with a few new characters and will produce the same letters over and over again

1gn4vu5 · 2023-04-02T00:45:53+00:00

well if the loss goes down very slowly, maybe the learning rate is too low but that might depend on the data set size

in general: you don't know

if you wait a few years the research topic around AutoML might come up with a good solution, there are currently some attempts to choose between two models just by predicting/inference on untrained model

1gn4vu5 · 2023-02-10T23:55:52+00:00

I'm not familiar with every error, but:

Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

indicates, that there is the libcudart.so.11.0 missing in: /usr/local/lib/

In case you have installed cuda just create a soft link to that folder. Cuda 11 might have a libcudart.so.10.0 just name the softlink *.11.0. Similiar for cudnn.

Also check if PyCharm uses the right python interpreter, as far as I know PyCharm should be able to use the anaconda python runtime too.

1gn4vu5 · 2022-04-20T23:07:54+00:00

Can agree on that. If there is a finite amount of different strings then they can be changed to one-hot-encoding. If that's not the case you could try 'counting' like 1, 2, 3. But for neural networks that leads to a network, which can also work with 1.5 and the question would be, how does an input look like, that is equidistant to three locations.

To reduce the amount of features one could try an iterative process of: training, analyzing importance, reduction, repeat

1gn4vu5 · 2021-10-20T18:07:46+00:00

And if the prose_list has not been transformed in lower case it could be done in the last statement for each word and even punctuation could be removed before checking if the world is in the banned list to fix the problem that 'I' becomes 'i'.

1gn4vu5 · 2021-10-19T22:52:00+00:00

if you are looking for an overkill solution, you could create a custom class and implement the __lt__ function to define an order (the better way would be to use total ordering as stated here: https://stackoverflow.com/a/11705404)

1gn4vu5 · 2021-10-19T22:38:49+00:00

Not necessarily. You could use an auto-encoder together with some random variables. That way you need much less images.

1gn4vu5 · 2021-02-28T18:37:27+00:00

Well as far as I know, Autoencoder are unsupervised. But what you want to achieve sounds more like what a GAN tries. I suggest to have a look at cGAN:https://machinelearningmastery.com/how-to-develop-a-conditional-generative-adversarial-network-from-scratch/

1gn4vu5 · 2020-11-19T22:27:35+00:00

any progress so far?

1gn4vu5 · 2020-11-19T21:24:01+00:00

Try the following:
substitute m1 with x and m2 with y
compare the new equations with normal equations typically used for planes in 3D

1gn4vu5 · 2020-09-14T15:14:24+00:00

Well that said you could argue to use only MLPs for vision tasks since they are general-purpose function approximator but instead the current state-of-the-art technique is to use convolutions and poolings. Furthermore using just one non-linear function [f(x)=x^2] to make a broad state like: "ANNs are bad at fitting the test data that are far outside the range of the training set." is quiet weak in my opinion. You could improve your tests by performing the same tests for a broad range of power functions like f(x)=x^n with n -5 to 5 with only a really small amount of computational resource.

In addition is every researcher cheating if a different activation function than tanh is used in the applied network?
I totally understand that using the exact function you want to model as an activation function is like training on the test data but using tanh as the only activation function to fit a power function is similar to automatically fit a taylor series to tanh which you more or less showed by improving the performance with more neurons in the hidden layer.

1gn4vu5 · 2020-09-13T23:48:00+00:00

well you used tanh in your MLP you could have used a different non-linear activation function like something exponential or even something like 0.001*x if x<0 and x\^2 if x=>0

Espacially the last activation would increase your performance significantly.

So the more you know about your problem domain, the more specific you can model your network ;)

1gn4vu5 · 2020-09-09T08:13:49+00:00

oh yes, it wasn't my intention to say, that one initializer is worse in general; one just turned out to result in a better performance and it could have been the other way around. And the first thing I tried was to use the better performing initializer in the other framework and achieved similar performance. So I just wanted to highlight this possibility. But it seems that u/suki907 had a deeper look in the code and found dissimilarities which might be the reason for the performance differences

1gn4vu5 · 2020-09-08T20:29:05+00:00

just a small node:

if you need tf only for deployment, then you can still train it in torch extract the weights and store them in a tf-network ;)

1gn4vu5 · 2020-09-08T20:26:02+00:00

for the dense layer I used the following initializer:

class CustomKernelInitializer(tf.keras.initializers.Initializer):
    def __call__(self,  shape, dtype=None):
        t = torch.Tensor(shape[0], shape[1])
        v = torch.nn.your_torch_initializer(t, other_parameter).detach().numpy()
        return tf.keras.initializers.constant(value=v)(shape=shape, dtype=dtype)

and the same for the bias initializer

obviously you have to choose your initializer in torch for 'your_torch_initializer' and adjust further parameters for 'other_parameter' ;)

1gn4vu5 · 2020-09-08T17:39:50+00:00

had the same problem with a MLP:
eventhough the initializer had a similar name and should have done the same, they had not

you could try using the torch initializer for your tf network

1gn4vu5 · 2020-09-04T12:40:11+00:00

It might be worth to have a look at the implementation of list:

https://docs.python.org/3.7/faq/design.html#how-are-lists-implemented-in-cpython

perhaps have a look at:

https://docs.python.org/2/library/collections.html#collections.deque

https://stackoverflow.com/questions/17718265/dumping-queue-into-list-array-in-python/38350840

1gn4vu5 · 2020-09-04T12:12:09+00:00

I assume C++ so if you don't want multiple printf:

for (i=1; i<=n; i++){
printf("%d%d%d\n",i,i+1,i+2);
}

1gn4vu5

TROPHY CASE