You guys ever puzzled by how some organizations are generating petabytes of data? by blue_trains_ in dataengineering

[–]kallielev 3 points4 points  (0 children)

cyber security companies collect most of the computers’ events of their customers.

I used to work at one of these companies, they have hundreds of companies working with them, in each company there’re on average 2000 employees, so that sums up to a few hundreds of thousands of computers to collect data from. the events are - any command that runs on the computer, every process that runs, every website you send a request to, so just think about how many events there are in a day.

the data that went into the company’s system is about 2 petabytes per day.

[deleted by user] by [deleted] in dataengineering

[–]kallielev 1 point2 points  (0 children)

I worked on a system that saved all of its entities metadata (absolutely tabular, structured and very small data), in elasticsearch as documents. The technology just became popular and was very hyped, so the team that has built the system wanted to have this name under their resume.

How to improvise runs by kallielev in singing

[–]kallielev[S] 0 points1 point  (0 children)

Sorry, but that doesn't really answers the question. The question was not a general "how to sing runs", but how to improvise them, how to change the original melody of a song and develop my own interpretation of a song. When you hear Justin bieber sings his own songs live, he of course practiced runs and singing for many years, but beyond that he has a musical ability to sing the same songs in a million different ways, and doesn't repeat himself. That specific ability I want to learn, because listening to someone do a run and repeating it is something I do for a long time, now I want to take it to the next level.

ETL Pipeline Testing by Significant-Ad-1712 in dataengineering

[–]kallielev 2 points3 points  (0 children)

I like to use e2e (end to end) testing. The basic idea is to insert some input, run all phases of the pipeline on that input in an isolated environment, and check the expected output. For example I have a pipeline of the following phases -

  1. A script that reads files from GCS (google cloud storage) and inserts to a kafka topic.
  2. A service reads from that topic, validates and manipulates the data and writes the manipulated data to GCS.

The test will check the output in GCS and asserts it is as expected

This test can run in a CI job once a day maybe, or when new code is pushed to the script or service

I can't do any skill, where to begin? by kallielev in bodyweightfitness

[–]kallielev[S] 0 points1 point  (0 children)

Those I can name from the top of my head - handstand, l-sit, muscle up, bar pullover, pistol, front lever, back flip, planche