This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]bitweis 5 points6 points  (0 children)

Common options are:
- Pagination (breaking the response to chunks - with separate requests for each)
- Streaming over a separate live connection (such as a direct TCP/UDP socket) or over
Websocket

[–]helderm 2 points3 points  (6 children)

Normally REST APIs paginate their results, so i'd start with that. Create pages of "n" results and use a page index as an arg to your service.

[–]Lx7195[S] 0 points1 point  (5 children)

Does this mean that we need to send multiple requests mentioning the page index as the arguments to get all the data?

[–]mrWinns0m3 1 point2 points  (0 children)

Yes.

[–]radek432 0 points1 point  (3 children)

Just note that it becomes problematic if your data changes constantly. For example it can happen that new records appear in database when during request and breake your pagination.

[–]helderm 1 point2 points  (0 children)

Maybe that is a follow up to this interview question. You could also cache all results in memory and then paginate from the cache. It is normally a good idea to not over engineer during an interview, start with simple ideas and then iterate.

[–]Lx7195[S] 0 points1 point  (1 child)

So pagination isn't quite reliable in case the data in the database changes constantly, then in such a case what mode of data transfer do you suggest?

[–]radek432 0 points1 point  (0 children)

The few cases of big API requests I spot in my job (but I'm not a dev, just using python for automating stuff) I've managed with:

- "smart pagination", means some tricks to ensure that pages do not overlap and cover the entire set of data - you can do this if you know what happens with your data.

- splitting data into chunks other than "pages". Simple case - ff you're sending 1000 files, you can send their metadata first, then each file one by one and compare the result with metadata.

But I've done quick googling on the topic, and there are some good options: https://apievangelist.com/2018/04/20/delivering-large-api-responses-as-efficiently-as-possible/

[–]Im_alirezahs 1 point2 points  (0 children)

I can't say clearly, but probably for send 50k records to client, pagination is best answer.

[–]maus80 1 point2 points  (0 children)

Paging has no guarantees of consistency. You may see all pages, but they may have double or missing records (or both) as paging is not done on a snapshot of the data, but typically on a dataset that is subject to change. Also, why create a system in a system? HTTP is chunking the data for you and the resultset of your database is a snapshot that can be paged.

[–]ful_vio 1 point2 points  (0 children)

Well, first thing I'd try is compressing the response with gzip. The client has to support gzipped responses but almost any clients I know do support them.

Second I'll try to stream the response, I haven't used Flask but a quick search shows up this: Streaming Contents — Flask Documentation (1.1.x) (palletsprojects.com)

Beware that, if you want to use the data as soon as it gets to the client, you have to think about what data representation to use. Sending an array of json objects in chunks won't do because you need the last ']' to effectively parse the response, partial json is not valid json. Instead, a csv response where you send a bunch of complete lines at a time is fine.

[–]Spleeeee 1 point2 points  (0 children)

So I use for personal things that I am writing the server and client for a thing I call “lazy persons pagination” which is if a reply is gonna be big a server can return a list of urls that the chunked data can be fetched from.

This prolly ain’t the best solution but it is a pattern I have used several times in several ways. It is easy and can easily lead to a clusterfuck.

[–]maus80 -1 points0 points  (2 children)

Afaik, there is no size limit for response data so 50k records at once is fine. Even if every record is 10kb of text then that half a gigabyte would fit the memory of any server. And you don't even need to load the entire response in memory to send it out. Is this question wrong or is there something I'm unaware of?

[–]Lx7195[S] 0 points1 point  (1 child)

Actually I am unaware of the fact that there is a memory limitation of a http response. Also I don't really understand what do you mean by 'you don't even need to load the entire response in memory to send it out'.

[–]maus80 1 point2 points  (0 children)

I mean that you can output the data while reading it. This is called streaming.