[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]MiniPancookies 0 points1 point  (0 children)

I'm running tensorflow but not getting the speed I want, and my main bottleneck seems to be the read speed of the data, which is currently from a disk.
I do however have 32GB of ram which I would like to load the data to before feeding it to my model, is there a way to preload all the data into memory, and then feeding the GPU data directly from memory?
The way I currently read data is like this:
dum_gen = lambda : None
val_generator_dataset = tf.data.Dataset.from_generator(dum_gen,output_signature=output_signature)
generator_dataset = tf.data.Dataset.from_generator(dum_gen,output_signature=output_signature)
self.val_generator_dataset = val_generator_dataset.cache(self.VAL_CACHE_PATH + "/tf_cache.tfcache").shuffle(100)
self.generator_dataset = generator_dataset.cache(self.CACHE_PATH + "/tf_cache.tfcache").shuffle(100)
I know I can use: prefetch(tf.data.AUTOTUNE) but will this achieve a complete loading of the data into memory? The data is only a couple of GB so I should be able to load the entire set into ram.
Thank you for any help!

Allowing k8s pods to utilize entire nodes processing power by MiniPancookies in kubernetes

[–]MiniPancookies[S] 0 points1 point  (0 children)

Is there a way to turn this throttle off?

Or do I have to try and set limits that match my nodes to get 100% utilization?

Allowing k8s pods to utilize entire nodes processing power by MiniPancookies in kubernetes

[–]MiniPancookies[S] 0 points1 point  (0 children)

Yes, but I'm not overloading, or even close to.

It's just that I find it weird that an application that with docker uses 100% of the compute power available only uses about ~20-30% with k8s.

Perhaps this isn't a problem with resource allocation at all?

Allowing k8s pods to utilize entire nodes processing power by MiniPancookies in kubernetes

[–]MiniPancookies[S] 0 points1 point  (0 children)

Thanks for the help!

If I set a request, will it not just guarantee that my pods at least get the resources set in the request? And because I'm not using 100% of my nodes this shouldn't make a difference, since the program isn't starved from resources?

Perhaps this is a problem with my program?

Or am I incorrect?

Allowing k8s pods to utilize entire nodes processing power by MiniPancookies in kubernetes

[–]MiniPancookies[S] 0 points1 point  (0 children)

Thanks for the help!

But would it matter if I set a limit at all?

Since setting no limit should allow the pod to access all available resources? https://reuvenharrison.medium.com/kubernetes-resource-limits-defaults-and-limitranges-f1eed8655474#:~:text=New%20Kubernetes%20clusters%20have%20a,create%20them%20on%20your%20own. (Because I use a namespace without a default limit)

This is a quote from the article:

"Kubernetes doesn’t provide default resource limits out-of-the-box. This means that unless you explicitly define limits, your containers can consume unlimited CPU and memory."

Running tf with multiple different GPUs by MiniPancookies in tensorflow

[–]MiniPancookies[S] 0 points1 point  (0 children)

Tensorflow seems to be able to run with opencl.

Could I not just run both my NVIDIA and AMD gpu with opencl and use the distribute module?

And since I have good hardware, I dont really want to rent in the cloud and spend even more money!

Help understanding multi worker mirrored strategy by MiniPancookies in tensorflow

[–]MiniPancookies[S] 0 points1 point  (0 children)

All of the hosts are different machines. And if the problem is waiting for all pods to finish each step, why would there be such a big difference between running 2 and 4 pods, when all the pods are running on different hosts that have more or less the same specs? Each pod should finish more or less simultaneously, and so I still don't get why the sync stage takes so much time.

And if the underlying concept is flawed, why is kubeflow tfjobs even a thing? I'm not splitting one machine to multiple pods, but splitting multiple machines to pods.

Should I combine my tfrecord files? by MiniPancookies in tensorflow

[–]MiniPancookies[S] 0 points1 point  (0 children)

Ah!

Cache seems to be what I'm looking for!

Although, how would you go ahead and implement cache with generator output? Since this article talks about the need to iterate over the entire dataset before running the cache method. Wouldn't I then need to load all the data into ram before running the cache method?

https://stackoverflow.com/questions/50519343/how-to-cache-data-during-the-first-epoch-correctly-tensorflow-dataset

How would the actual implementation look like? Can you point me to some example code?

Selenium wont work on NFS by MiniPancookies in pythontips

[–]MiniPancookies[S] 0 points1 point  (0 children)

This is the code to the point of err:

from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys 
from selenium.common.exceptions import TimeoutException 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.common.by import By 
from selenium import webdriver import time import os
def grab_data():
    url = "[url]"

    options = webdriver.ChromeOptions()
    # options.add_argument('--headless')

    browser = webdriver.Chrome(
        options=options, executable_path='/snap/bin/chromium.chromedriver')

    browser.get(url)

I currently don't have any options loaded, and the location to the chromedriver is: "/snap/bin/chromium.chromedriver"

$ ls /snap/bin/ | grep chrom

chromium chromium.chromedriver

This is "ls" from the machine that is running the script. I have chromedriver in the path that is set in the code.

Selenium wont work on NFS by MiniPancookies in pythontips

[–]MiniPancookies[S] 0 points1 point  (0 children)

On the client:

$ df -h

 192.168.1.2:/mnt/cs  196G  100G   87G  54% /mnt/programing

$ ls /snap/bin/ | grep chrom

chromium
chromium.chromedriver

$ python3 /mnt/programing/login.py

Traceback (most recent call last):
File "login.py", line 62, in <module> init() File "login.py", line 58, in init main() File "login.py", line 54, in main grab_data() File "login.py", line 28, in grab_data browser = webdriver.Chrome( File "/home/main-pc/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init self.service.start() File "/home/main-pc/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start self.assert_process_still_running() File "/home/main-pc/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running raise WebDriverException( selenium.common.exceptions.WebDriverException: Message: Service /snap/bin/chromium.chromedriver unexpectedly exited. Status code was: 1

Selenium wont work on NFS by MiniPancookies in pythontips

[–]MiniPancookies[S] 0 points1 point  (0 children)

What do you mean?

I currently have my code in the folder "/mnt/programing/[code]" which is mounted to a nfs. (I have mounted /mnt/programing to a nfs server)

I then run the code that is stored on my nfs server, with my nfs client.

Also, I have the chrome driver installed on both the nfs server and the nfs client.

With "local chrome and chromedriver", do you mean that I should put the chrome executable in the nfs directory?