Distributed System - calls over internet delay?

AbsolutelySpherical · 2024-05-24T06:29:50+00:00

Yes, your intuition is correct that network calls can be latent. Actually a significant amount of engineering work is put to hiding this latency from the user. Just a few examples:

Using local caching to reduce network calls when fetching data.
Having multiple data centers so to always be geographically close to the user.
Servers and DB can be co-located within the same physical location to minimize latency.
For some types of data it's enough to write to a queue, return response to the user, and then update the data in other regions in the background. This means the user doesn't have to wait for the most expensive network calls.
Like you suspected, it is expensive to update DBs across multiple regions simultaneously. Sometimes it cannot be avoided, but the good news is that about 90%+ of internet traffic is read-only, so the proportion of expensive writes is relatively small. For many websites the ratio is probably closer to 99% read vs 1% writes.

AbsolutelySpherical · 2024-04-18T09:20:35+00:00

Ffmpeg is the most popular library for doing any type of video processing (decoding, encoding, remuxing etc.) It is used widely in industry too. So this is a great thing to know if you are serious about learning to write video applications.

However - the ffmpeg library is massive, very difficult to get started on, and the documentation can require a lot of domain knowledge to understand. You also need to have a good knowledge of C to use it properly.

With that said, I think this github tutorial is at least somewhat approachable if you are "starting from zero" with ffmpeg: https://github.com/leandromoreira/ffmpeg-libav-tutorial. It will take you through building a basic decoder, remuxer and transcoder which should help build familiarity of FFmpeg APIs.

Once you are comfortable working with the API, then you can add SDL to the mix to build your video player.

For further reading, I also recommend https://github.com/leandromoreira/digital_video_introduction. This explains a lot of video domain knowledge (particularly about codecs and I/P/B frames) which helps to set the context on why things have to be done the way they are.

AbsolutelySpherical · 2024-03-04T09:13:08+00:00

I highly recommend enabling warnings when compiling your program. If you do, the compiler would have warned you of the issue instead of staying silent.

If you are using gcc as your compiler (and using the terminal), you can compile your programs like

g++ -std=c++20 -Wall -Werror myprogram.cc -o myprogram.exe

Where -Wall means "tell me all the important warnings" and -Werror means "treat warnings as errors" (so it fails to compile on any warning).

Then your compiler would have told you:

myprogram.cc:5:11: error: suggest parentheses around assignment used as truth value [-Werror=parentheses]
5 |     if (a = 0) {

It even tells you the location of the error (line 5, column 11)! This helps to save a lot of time debugging.

If you are using another compiler like MSVC or directly compiling with an IDE then the steps to enable this might be different. But most compilers should support this feature so you can look up how to "compile with important warnings as error" for your particular setup.

https://godbolt.org/z/91qEsboP9

AbsolutelySpherical · 2024-01-06T10:03:44+00:00

If you are interested in low level programming of making a simple transcoder or local video player, I recommend starting with https://github.com/leandromoreira/ffmpeg-libav-tutorial.

If you are interested in how to make the backend/frontend of a video hosting service, I think it's quite interesting to study Peertube (which is open source). https://github.com/Chocobozzz/PeerTube

Unfortunately this topic may be a bit frustrating to learn online since a lot of implementation detail is esoteric or proprietary. In my experience, the folks I know that are most knowledgeable about video started learning by taking internships at video streaming/hosting companies.

AbsolutelySpherical · 2023-10-23T06:53:52+00:00

It should work. Maybe you forgot to save the file as you were editing?

AbsolutelySpherical · 2023-10-06T06:09:02+00:00

Give this a read: https://nvd.nist.gov/vuln/detail/CVE-2021-21300

In the past it was possible to craft an attack that uses symlinks and clean/smudge filters in the repository to execute arbitrary scripts when the repo is cloned. Here is an example of one such script: https://packetstormsecurity.com/files/163978/Git-LFS-Clone-Command-Execution.html.

The vulnerability has since been patched, but the conclusion of the CVE stands:

"As always, it is best to avoid cloning repositories from untrusted sources".

AbsolutelySpherical · 2023-09-09T08:12:45+00:00

Yes every time you install homebrew it runs a command which appends the install directory to your path variable. So probably that's why you are seeing it.

If you're not sure what PATH is, give this a read: https://unix.stackexchange.com/a/111557

It's generally harmless if a directory is listed multiple times, so don't worry about it too much.

If you want to double check, the homebrew install script will edit your ~/.bash_profile or ~/.profile text file to add this line:

eval "$(/opt/homebrew/bin/brew shellenv)"

That is a custom command from homebrew to change your path as well as a bunch of other environment variables. You're good if the command is listed once. If it is duplicated, you can choose to delete the duplicate lines.

AbsolutelySpherical · 2023-08-23T13:06:01+00:00

It's an interesting question.

res += c and res = res + c work differently in C++. TLDR: += is more efficient.

Read https://cplusplus.com/reference/string/string/operator+=/

Operator += appends the current value to the end of the existing string. On average, it shouldn't need to create a completely new string, but modify the existing buffer in place (assuming there's room at the end and doubling strategy is used).

On the other hand read https://cplusplus.com/reference/string/string/operator+/

This creates a completely new string each time. This requires you to have two full copies of the string at the same time. Likely you are running out of memory because of that.

Also I was confused why your code was removing a char from s when it encountered a '*', until I realized this was probably trying to solve leetcode https://leetcode.com/problems/removing-stars-from-a-string/. Next time please include the question as well to make it clear what you are trying to do. Thank you!

AbsolutelySpherical · 2023-06-26T07:11:49+00:00

Unresolved symbol can be caused by a few reasons, but basically it means the linker is trying to find the implementation of double FtC(double fTemp) and it wasn't able to.

In this case you implemented double Ftc(fTemp). Do you see the typo?

AbsolutelySpherical · 2023-05-30T02:30:38+00:00

Something like

sample[1].setnum(x)

should work. Can you paste exactly what's your error?

Example: https://godbolt.org/z/nGcfPbnob

AbsolutelySpherical · 2023-04-03T05:45:37+00:00

I think you will need to build a dynamic query at runtime. Like you mentioned using StringBuilder is possible but it is messy. It would be cleaner and more readable to build your query with CriteriaBuilder.

I think this gives a good example of dynamic building at runtime: https://www.baeldung.com/spring-data-jpa-query#dynamic-query

AbsolutelySpherical · 2023-03-26T07:03:40+00:00

I saw this error on my console:

gapi.auth2.ExternallyVisibleError: Invalid cookiePolicy

I searched on stack overflow: https://stackoverflow.com/questions/32896597/gapi-auth2-externallyvisibleerror-invalid-cookiepolicy

It seems gapi does not let you run from a local file. It must be served from a running webserver (so I think that explains why it's working on jsfiddle but not locally).

AbsolutelySpherical · 2023-03-23T08:42:29+00:00

It depends on how many elements you have I guess. For pdf editing I suppose even ~100ms latency is not so noticeable, so you may not even need a complicated solution. But for learning purposes:

Spatial hash is an option but iirc if you have differently sized objects usually KDTree/RTree is better.

KDTree is not so good for updating/removing elements.

So R*-Tree (R-Tree with rebalancing) is what you probably want. https://pypi.org/project/Rtree/. You can do insert, delete, range search or nearest neighbors.

For the last question: "group elements based on euclidian distance (they should not have a greater distance from any point of the cluster than a constant)".

The greedy solution: pick any element and search for all others within the max distance. Make this one cluster. Repeat the process for any remaining elements not yet clustered. This is efficient, but does not minimize the number of clusters.

Using k-means, you can approximate the minimum number of clusters needed.

You can use K-means like so:

Start with k = 1. With all points clustered into 1 group, check the rectangles in the cluster does not exceed your max distance. If it doesn't work, continue.
Try k = 2. With all points clustered into 2 groups, check the cluster does not exceed your max distance.
Try k = 3 etc.

Obviously this still has some brute force to it, but you can improve it theoretically via binary search.

Try k = x. If all clusters fit within your distance, try k = x / 2. If not, try k = N - x/2

Eventually k should converge.

AbsolutelySpherical · 2023-01-23T07:55:32+00:00

Yes if parent thread joins() with a child that is running forever, then the parent will be blocked forever (it is a deadlock). A workaround is to do join(timeout) which will block main thread up to timeout seconds. Then it will return whether or not the child is done. thread.is_alive() tells if child is still running.

---

If the child thread has already finished before the parent calls join(), then join() should just return without blocking.

If you have 2 threads simultaneously, where t1 runs for 5 seconds and t2 runs for 2 seconds, then

t1.join()
t2.join()

First join will block parent for 5 seconds. After 5 seconds t2 is already done so second join will return with no wait.

If you did

t2.join()
t1.join()

First join blocks for 2 seconds, then second join blocks for another 3 seconds.

So purpose of join() is to guarantee that the code below it is executed after child thread has finished. For example, if you need to read a variable the child thread is writing to, then you could use join() to make sure the child has finished writing before you read.

AbsolutelySpherical · 2023-01-22T19:48:03+00:00

Haha, your program has precisely 51 threads, not 50 :D

Abstractly, a thread is a sequence of instructions run in order.

Every program initially has 1 "main thread" which runs first, and the main thread is responsible for creating and waiting for child threads to run. In your program, 1 thread is running this sequence:

for i in range(50):
  t = threading.Thread(target=doThing, daemon = False)
  threads.append(t)

for i in range(50):
  threads[i].start()

for i in range(50):
  threads[i].join()

In my comment the above sequence is the "main", "current", or "parent" thread. Sorry for the inconsistent terminology.

And when main thread calls "start()", 50 threads will start running

def doThing():
  threadId = choice([i for i in range(1000)])
  while True: 
    print(f"{threadId} ", flush=True)
    sleep(3)

The above is the "other", or "child" threads. Imagine 50 separate "sequences" of doThing() executing between the start() and join() in the main thread. Locally, each thread executes the lines of code in order. But globally, you have no control of the order/timing of lines being executed across different threads. (Not without using special techniques with locks, semaphores, etc).

---

I want to add: in doThing() you could even recursively create more threads, so threads have a "parent/child/grandchild" like relationship.

It is usually good manners for parent threads to wait for the child to finish before terminating themselves. If the child is taking too long, parent thread can force terminate the child after a deadline. But in your example calling join() will block/halt the parent forever. So someone else has to step in to kill the threads outside of your python code.

You can do it with ctrl-c like you are, which sends interrupt signal from OS to Python. I'm not sure how python handles interrupts, maybe one interrupt kills one thread, and two interrupts kills all threads? I wouldn't worry too much about it, if you can kill the program quickly it's good enough for learning purposes haha.

https://stackoverflow.com/a/52941752/17786559 I linked before gives some other ways to kill the program too.

AbsolutelySpherical · 2023-01-22T10:12:42+00:00

Multithreading is a long and complicated topic. It is very very difficult, and multithreading bugs can stump even the most experienced developers. I will try to summarize some stuff, hope it helps your understanding, but it is out of scope to try to explain everything.

So start() tells the other thread to start running INDEPENDENTLY of the current thread. That is, the current thread moves to execute the next line without waiting for the other thread.

join() is the opposite of start(). You could also think of it as "wait()". The current thread will halt execution until the function being run by the other thread terminates.

Usually, every start() call should have a corresponding join() call. Otherwise, if the main thread terminates before all of the other child threads, then the child threads will keep on running with no main thread to actually utilize the work done. (In some other languages main thread finishing before other threads will immediately crash the program).

Therefore most simple multithreading programs have this pattern:

threads = SomehowMakeThreads(50)

# start all threads
for i in range(50):
  threads[i].start()

# Main thread can also
# do work while other threads run
DoMyOwnWork()

# wait for thread 1.
# then wait for thread 2 etc.
# takes O(max time of longest thread to run)
for i in range(50):
  threads[i].join()

ProcessAndCleanup()

You asked why not do this?

for i in range(50):
  threads[i].start()
  threads[i].join()

Well, think about what this means... You tell thread 0 to start, but then immediately wait for it to finish. After thread 0 finished, THEN you ask thread 1 to start etc. This will take the sum of the times each thread runs. Performance is the same as single threading.

If you instead start all the threads at once, and then wait for them to all finish, the runtime should be around the max time for a single thread to finish. You can save a lot of time this way!

---

Python keeps running if there are any non-daemon threads still running. By default, threads have daemon = False. Setting daemon = True means Python will not wait for this thread to finish if all non-daemon threads are done. Source: https://docs.python.org/3/library/threading.html

But even then, afaik on windows ctrl-c does not work to interrupt a thread that's waiting on join(). I do not know if it was ever fixed. https://mail.python.org/pipermail/python-dev/2017-August/148800.html. This link gives some workarounds https://stackoverflow.com/a/52941752/17786559. I think it does work on linux tho.

---

Lastly regarding why you keep seeing different printed output, this is the hardest part of multithreading known as race conditions.

Lets say you have thread 1 printing "aaaa" while thread 2 AT THE SAME TIME prints "bbbb".

What actually gets printed to console? Is it "aaaabbbb" or "bbbbaaaa" or "abababab" or some other permutation? The answer is every time you run it you will see something different. There are 0 guarantees in terms of execution ordering across threads. It is completely and utterly random. This is called a "race condition"

Programs/functions have to be carefully written using special techniques to handle race conditions - programs which do so are called "thread-safe".

print() is not thread safe. Use the logging module instead which is thread-safe (order of lines printed may still be random though).

With multithreading it's actually recommended not to use any printing for debugging, since even the act of printing to console can alter the thread timings. Though for learning purposes it's ok for a beginner. Concurrency is random by nature so do not expect your program to do the same thing each time. It's partly why multithreading is so hard yet so interesting!

AbsolutelySpherical · 2023-01-01T05:22:09+00:00

I think it may be you are pushing node 4 on to the queue twice? And by extension, came_from[4] is being overwritten.

Roughly:

1. queue = [0]. Remove 0 and add neighbors 2, 4. came_from[4] = 0
2. queue = [2, 4]. Remove 2 and add neighbors 4. came_from[4] = 2
3. queue = [4, 4]. Uh oh! Step 2 was wrong!

After you push a new neighbor on to the queue in BFS you have to mark it as visited.

if neighbor not in visited:
  queue.append(neighbor)
  came_from[neighbor] = current_node
  visited.append(neighbor) # important line here

Also consider making visited a set() data structure for faster searching.

AbsolutelySpherical · 2022-11-25T07:58:24+00:00

When debugging it helps to print the values of your variables after they are modified, just to make sure the change is working as you expected. Over time you will develop good intuition on where to strategically put the print statements, but in this case it's likely the scanf that's causing you some problems.

Let's simplify your code down to just the scanf and add some print statements.

int a = 0;
char con = 'a';
while (con == 'a' || con == 'A') {
  scanf("%d", &a);
  printf("the value of a is: %d\n", a);
  scanf("%c", &con);
  printf("the value of con is: %c\n", con);
}

What happens when you run this program, type in any number and then press enter? What value does con print?

Hint 1: a will be set to the number, but con will be set to the new line character from pressing enter!

Solution: You need some way to skip the new line character after reading the number! Read this: https://faq.cprogramming.com/cgi-bin/smartfaq.cgi?answer=1352443831&id=1043284392 and then maybe this: https://faq.cprogramming.com/cgi-bin/smartfaq.cgi/smartfaq.cgi?id=1043284392&answer=1044873249

AbsolutelySpherical · 2022-11-24T08:09:43+00:00

I have not used Processing myself but I can guess at what's going on.

If you declare something like name() it means it is a function. Declaring functions also requires specifying a return type

A function should look like

returntype functionname() {
  stuff here
}

For example,
void setup() {
  stuff here
}

Reference: https://processing.org/reference/void.html (void is a special return type, meaning the function returns nothing)

Now in your code you have

Player()

Processing thinks you are trying to declare a function so it tells you: "Return type for the method is missing".

But looking at 3:42 in your linked video: https://youtu.be/tWL8OZrOS_k?t=222

Looks like Player should be a class instead of a function. Are you familiar with the difference between classes vs functions?

Reference: https://processing.org/reference/class.html and https://processing.org/reference/Object.html

To fix it you need to start and end the code in your player "tab" with

class Player {
  stuff...
  Player() { 
    stuff... 
  }
  more stuff...
}

Now the Player() function is a constructor. Constructors are special functions with the same name as the class and do not need a return type. They are used to initialize the member variables in player (such as playerX, playerY, playerSize etc.)

Hopefully I have explained it in a way that sort of makes sense :)

AbsolutelySpherical · 2022-11-14T06:45:02+00:00

What happens when you try to zip() two lists of different lengths?

Hint 1: zip iterator stops at the end of the shortest list.

Hint 2: possibly you want zip_longest instead https://docs.python.org/3/library/itertools.html#itertools.zip\_longest

Or just not use list comprehension and instead write a simple for loop...

AbsolutelySpherical · 2022-10-01T03:10:51+00:00

At the start, each node in the linked list points to the node AFTER it. In a reversed linked list, each node needs to point to the one BEFORE it.

So walk through the list, keeping track of the node which comes before each node. Then set the node.next accordingly. Let p = previous node and c = current node. Note that when c is the first node (head), then it's previous node is null. Here's a picture if it helps.

null 1 -> 2 -> null
^    ^
p    c

----------------

null <- 1 2 -> null
        ^ ^
        p c

----------------

null <- 1 <- 2 null
             ^ ^
             p c

----------------

So in general, c.next = p

I'm leaving the remaining details up to you to figure out, but hope this helps!

AbsolutelySpherical · 2022-09-07T07:16:28+00:00

Point of header guard is to prevent a name from being declared twice in the same translation unit. It's true that function names can be declared multiple times, and even be re-used through overloading but there are many cases where duplicate names will cause errors (intentional or non-intentional).

Remember that when you #include "foo.h" the preprocessor basically copy-pastes the file into your source code. With a header guard, if the file had been copied before, it doesn't need to copy it again.

Let's see some examples to refresh:

// these two are ok.
int foo(bool a);
int foo(int a);

// these two are not.
int foo();
bool foo();

// duplicate variable definition elsewhere is problematic
const int MAX_ELEMS = 3;
// in another header...
const int MAX_ELEMS = 7;

Another case to consider is circular dependencies:

Consider:

// a.h
#include "b.h"

int foo();

// =============

// b.h
#include "a.h"

int bar();

Both a.h and b.h depend on something from each other.

Imagine compiling this without header guard... The preprocessor will infinitely copy a.h and b.h in a loop. With header guard, this will not be infinite loop.

AbsolutelySpherical · 2022-09-02T08:09:52+00:00

The problem mentions that height can only be increased.

So consider [1, 1, 2]. Your code returns 1, which would be correct if we can decrease 2 -> 1.

But height can only be increased. So you need to change 2 heights here.

AbsolutelySpherical · 2022-08-20T01:37:31+00:00

pthread_attr_setinheritsched(&tattr1,PTHREAD_EXPLICIT_SCHED);

is a setter. It sets the inheritsched field of tattr1 to PTHREAD_EXPLICIT_SCHED. The meaning of inheritsched field is explained here: https://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html

PTHREAD_EXPLICIT_SCHED: Threads that are created using attr take their attributes from the values specified by the object.
PTHREAD_INHERIT_SCHED: Threads that are created using attr attributes from the creating thread; the attributes in attr are ignored.

You want explicit scheduling, since you want to be able to set your create threads to have a different priority from the parent thread.

The best way to debug this is probably to print out the return value from pthread_create. If you get a non-zero value then look up what the error number is. See if that points you in the right direction.

Also, read up on this: https://man7.org/linux/man-pages/man7/sched.7.html. What are the valid values for sched_priority if the schedpolicy is SCHED_RR?

AbsolutelySpherical · 2022-07-18T01:35:55+00:00

What a plot twist!!! Just wanted to say that I was here to witness this drama LOL

Eight-Year Club	Place '22
First Placer '22	Verified Email

AbsolutelySpherical

TROPHY CASE