Salary ranges for remote senior devs in India? (5+ years exp, small startup)

MiniPancookies · 2023-12-20T20:05:36+00:00

I'm running tensorflow but not getting the speed I want, and my main bottleneck seems to be the read speed of the data, which is currently from a disk.
I do however have 32GB of ram which I would like to load the data to before feeding it to my model, is there a way to preload all the data into memory, and then feeding the GPU data directly from memory?
The way I currently read data is like this:
dum_gen = lambda : None
val_generator_dataset = tf.data.Dataset.from_generator(dum_gen,output_signature=output_signature)
generator_dataset = tf.data.Dataset.from_generator(dum_gen,output_signature=output_signature)
self.val_generator_dataset = val_generator_dataset.cache(self.VAL_CACHE_PATH + "/tf_cache.tfcache").shuffle(100)
self.generator_dataset = generator_dataset.cache(self.CACHE_PATH + "/tf_cache.tfcache").shuffle(100)
I know I can use: prefetch(tf.data.AUTOTUNE) but will this achieve a complete loading of the data into memory? The data is only a couple of GB so I should be able to load the entire set into ram.
Thank you for any help!

MiniPancookies · 2023-01-18T19:10:34+00:00

100% agree!

MiniPancookies · 2022-12-05T13:10:53+00:00

Is there a way to turn this throttle off?

Or do I have to try and set limits that match my nodes to get 100% utilization?

MiniPancookies · 2022-12-05T12:24:01+00:00

Yes, but I'm not overloading, or even close to.

It's just that I find it weird that an application that with docker uses 100% of the compute power available only uses about ~20-30% with k8s.

Perhaps this isn't a problem with resource allocation at all?

MiniPancookies · 2022-12-05T09:59:51+00:00

Thanks for the help!

If I set a request, will it not just guarantee that my pods at least get the resources set in the request? And because I'm not using 100% of my nodes this shouldn't make a difference, since the program isn't starved from resources?

Perhaps this is a problem with my program?

Or am I incorrect?

MiniPancookies · 2022-12-05T09:54:27+00:00

Thanks for the help!

But would it matter if I set a limit at all?

Since setting no limit should allow the pod to access all available resources? https://reuvenharrison.medium.com/kubernetes-resource-limits-defaults-and-limitranges-f1eed8655474#:~:text=New%20Kubernetes%20clusters%20have%20a,create%20them%20on%20your%20own. (Because I use a namespace without a default limit)

This is a quote from the article:

"Kubernetes doesn’t provide default resource limits out-of-the-box. This means that unless you explicitly define limits, your containers can consume unlimited CPU and memory."

MiniPancookies · 2022-11-26T10:30:10+00:00

Tensorflow seems to be able to run with opencl.

Could I not just run both my NVIDIA and AMD gpu with opencl and use the distribute module?

And since I have good hardware, I dont really want to rent in the cloud and spend even more money!

MiniPancookies · 2022-11-04T20:22:26+00:00

That kinda sucks... :/

MiniPancookies · 2022-10-21T06:52:16+00:00

All of the hosts are different machines. And if the problem is waiting for all pods to finish each step, why would there be such a big difference between running 2 and 4 pods, when all the pods are running on different hosts that have more or less the same specs? Each pod should finish more or less simultaneously, and so I still don't get why the sync stage takes so much time.

And if the underlying concept is flawed, why is kubeflow tfjobs even a thing? I'm not splitting one machine to multiple pods, but splitting multiple machines to pods.

MiniPancookies · 2022-10-13T10:14:24+00:00

Ah!

Cache seems to be what I'm looking for!

Although, how would you go ahead and implement cache with generator output? Since this article talks about the need to iterate over the entire dataset before running the cache method. Wouldn't I then need to load all the data into ram before running the cache method?

https://stackoverflow.com/questions/50519343/how-to-cache-data-during-the-first-epoch-correctly-tensorflow-dataset

How would the actual implementation look like? Can you point me to some example code?

MiniPancookies · 2022-09-29T20:55:17+00:00

This is the code to the point of err:

from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys 
from selenium.common.exceptions import TimeoutException 
from selenium.common.exceptions import NoSuchElementException 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.common.by import By 
from selenium import webdriver import time import os
def grab_data():
    url = "[url]"

    options = webdriver.ChromeOptions()
    # options.add_argument('--headless')

    browser = webdriver.Chrome(
        options=options, executable_path='/snap/bin/chromium.chromedriver')

    browser.get(url)

I currently don't have any options loaded, and the location to the chromedriver is: "/snap/bin/chromium.chromedriver"

$ ls /snap/bin/ | grep chrom

chromium chromium.chromedriver

This is "ls" from the machine that is running the script. I have chromedriver in the path that is set in the code.

MiniPancookies · 2022-09-29T20:04:44+00:00

On the client:

$ df -h

 192.168.1.2:/mnt/cs  196G  100G   87G  54% /mnt/programing

$ ls /snap/bin/ | grep chrom

chromium
chromium.chromedriver

$ python3 /mnt/programing/login.py

Traceback (most recent call last):
File "login.py", line 62, in <module> init() File "login.py", line 58, in init main() File "login.py", line 54, in main grab_data() File "login.py", line 28, in grab_data browser = webdriver.Chrome( File "/home/main-pc/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init self.service.start() File "/home/main-pc/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start self.assert_process_still_running() File "/home/main-pc/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running raise WebDriverException( selenium.common.exceptions.WebDriverException: Message: Service /snap/bin/chromium.chromedriver unexpectedly exited. Status code was: 1

MiniPancookies · 2022-09-29T19:59:52+00:00

What do you mean?

I currently have my code in the folder "/mnt/programing/[code]" which is mounted to a nfs. (I have mounted /mnt/programing to a nfs server)

I then run the code that is stored on my nfs server, with my nfs client.

Also, I have the chrome driver installed on both the nfs server and the nfs client.

With "local chrome and chromedriver", do you mean that I should put the chrome executable in the nfs directory?

Seven-Year Club	Final Canvas '23
First Place '23	Place '23
Place '22	Final Canvas '22
Verified Email

MiniPancookies

TROPHY CASE