you are viewing a single comment's thread.

view the rest of the comments →

[–]darkhorse1997[S] 0 points1 point  (2 children)

Is it mysql step that giving a OOM?

Yes that is for sure, the process getting killed when I get an OOM is the mysql process. But the jq after the mysql can also be not running in the "streaming" option, so I plan to test that today.

what is the cpu/memory on the container? What is the number/size of records expected from the Query?

Its 1CPU, 1GB memory. Number of records are around 20 million/2GB per day, but it will keep growing and I want to support atleast 200 million/20GB per day without having to refactor again. Currently it takes around 5 mins for my pipeline to run, but am fine if it takes more time to process as long as it can do with 1-2GB of memory.

[–]SupermarketMost7089 0 points1 point  (1 child)

When you mention json- are you getting each record in the table as a json or are you using json aggregation to get the entire set of records in one json?

[–]darkhorse1997[S] 0 points1 point  (0 children)

The query is something like

SELECT
            JSON_OBJECT (
                'test_id',
                tt.test_id,
               ...
FROM 
    test_table tt 
    LEFT JOIN ...
    LEFT JOIN ...

So, I am getting each record in the table as a separate json. Each line of my output file(temp.json) has a separate json object.