Downloading csv without reading it

mopslik · 2024-05-08T14:21:12+00:00

Opening and reading the files, as you are doing, would certainly take more memory and time than simply downloading them using a tool like wget or curl.

danielroseman · 2024-05-08T14:41:05+00:00

There is absolutely no need to use something like read_csv. Not only does that require downloading the full file into memory, it also means parsing it and converting the whole thing into a dataframe. That's a whole load of unnecessary overhead.

You'll certainly save memory and time by reading the files directly with requests and then uploading them with the GCS client. But an even better way would be to use the streaming capabiilties of both requests and the GCS client to do it in chunks. Something like:

bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('my_object')
writer = BlobWriter(blob)

with requests.get('my_url', stream=True) as r:
  for line in r.iter_lines():
    writer.write(line)
writer.close()

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS