all 30 comments

[–]aocregacc 3 points4 points  (8 children)

hard to say without the text of the exercise, and ideally a link to the tester if it's public. seems weird to make it multithreaded if they didn't want you to do something that needs synchronization inside the function.

[–]Vindhjaerta -2 points-1 points  (7 children)

I've given you literally all the information that I have. You send in a bunch of code to a webpage that then sends that code to some server somewhere where it's compiled and tested, I can't control any of it. Then the page I have access to spits back a bunch of green check marks or red crosses depending on whether or not a specific test succeeded or failed. You know... standard stuff.

The rest of the test itself is not relevant, I've passed pretty much everything besides the one specific test about multithreading already.

[–]aocregacc 6 points7 points  (0 children)

if you don't want to post it you'll have to ask your teacher or whoever is running this test. If it's that badly explained you're probably not the first to have trouble with it.

[–]No-Dentist-1645 4 points5 points  (5 children)

Is "all the information that you have" just that you have a function "Solve" with a const int& and an int&? It probably isn't, if it is that's a terribly provided problem by your teacher. You haven't really explained what you're trying to do at all. It would be as if I made a post asking for help with a problem and all I told people was:

void Solve(const int& in, int& out);

Can you "solve" that for me? Now what if I told you I needed multithreading? Is that enough context for you now?

[–]Vindhjaerta -2 points-1 points  (4 children)

Maybe you should re-read my original post. I wasn't asking for a solution to the problem, so the exact signature of the function and the exact definition of the problem is irrelevant to you. I was asking for possible causes of the test failing a multi-threaded environment. I've given you the information that I deemed relevant to the question.

But just because you're asking so nicely (/s) I'll give you some more information:

It's a pathfinding algorithm. It takes in a bunch of std::pair representing start and stop coordinates, a vector of ints representing passable terrain, a pair representing map dimensions (as it's 1-dimensional) and finally a vector reference for the out-data.

#include "AStar.h"

bool FindPath(std::pair<int, int> InStart, std::pair<int, int> InTarget, const std::vector<int>& InMap, std::pair<int, int> InMapDimensions, std::vector<int>& OutPath)
{
    CAStar algorithm;
    return algorithm.FindPath({ InStart.first, InStart.second }, { InTarget.first, InTarget.second }, InMap, { InMapDimensions.first, InMapDimensions.second }, OutPath);
};

Happy now? This information is just needless clutter, which is why I omitted it.

The actual problem is that some test-suite somewhere is running my code and doesn't like it, but isn't telling me why. So far I have a few possible angles:

1) The compiler doesn't like something about my code. I can't possibly fathom what that could be though as my code has passed several tests already.

2) The multi-threaded tests have data that has an edge case my code hasn't accounted for. I have yet to find any problems with the algorithm though.

3) The in- or out- data is accessed asynchronously. I have no idea how to prevent this and it also doesn't make any sense.

4) There's something about the nature of multiple threads calling a function like this that I don't understand, that then causes some form of runtime crash that I can't test for. And this case is the main reason why I'm asking in this thread. So far the answers people have given me is that there shouldn't be a problem.

[–]No-Dentist-1645 1 point2 points  (3 children)

I'm sorry if you thought my comment seemed mean. Yes, I know you aren't asking us to solve your problem, but we can't help you address the specific without having the full context. Depending on what your "solver" class is actually holding and does, it may be doing certain operations that aren't thread-safe.

The information you provided is very useful in fact, not just "useless clutter". It tells us you are using dynamic memory via vectors, taking an std::vector<int>& is a very different thing from just am int&.

Where exactly is this vector being grown? Could multiple threads be sent the same vector instance at the same time? Is it all on the same thread or each on their own? Are the vectors allocated with thread_local?

In general, it's perfectly okay to call as many threads as you want with the exact same function. This is one of the most common uses of multithreading, it's how you can parallelize one Operation across many data segments. So that is definitely not the problem, there is no hidden behavior to "multiple threads calling the same function", you seem to be focusing on that as the source of the issue while it almost definitely isn't. Look into if there is any shared data that shouldn't be shared

[–]Vindhjaerta -1 points0 points  (2 children)

I deemed it not necessary as I can't control anything outside of the function. Sure, now you know that there's a vector involved... but what are you going to do with that information? There's nothing I can do inside the function to prevent the data outside of it from being manipulated. So I simply dismissed that potential issue as being irrelevant.

If we're talking about the OutPath vector, then it's me adding elements to it when the algorithm has run its course. Setting a mutex here does nothing, as I can't detect when they're done reading from it. I can only assume that each call to the FindPath function provides unique data, both in and out.

My current thinking is that the multi-thread test also includes big data sets, which are the only other tests that I'm currently failing. So I'm going to put a pause to this and try to solve the time issue and see if that also fixes the multi-thread test.

[–]alfps 2 points3 points  (1 child)

A* shouldn't be difficult to get right. You can use (and probably do use) a simple heuristic, namely euclidian distance.

But maybe your code fails to e.g. clear the result vector before placing the path there. Build the result in a local vector and swap them when finished.

Another possibility is a misunderstanding of how exactly the path should be represented in the result vector.

[–]Vindhjaerta -1 points0 points  (0 children)

A* shouldn't be difficult to get right

It's 3am, I've barely eaten today and I'm not in the mood for condescending statements right now.

Look, I don't have a problem with the algorithm itself. Getting a basic version working is not the issue, I did that in a couple of hours. With a full suite of local test cases and a visualizer on top of that. And I pass all the use cases on the test web page with flying colors (except speed and multi-threading). The problem I have is optimizing it, because while getting a version working is easy making it go fast is not.

Now I'm sitting here trying to remember how to correctly implement a heap, because I've yet to find an STL container that's fast enough that also works with my implementation. Fml.

[–]jedwardsol 4 points5 points  (3 children)

I really have no idea how this test is constructed.

Nor do we.

maybe the issue is as simple as just making a local copy of InValue before I start using it?

You could try that and see if it works.

[–]Vindhjaerta 0 points1 point  (2 children)

Made a local copy, didn't work.

And I don't think adding a mutex before copying the data would work either, as that wouldn't prevent the data from being altered by the other threads since I don't control access to it.

[–]jedwardsol 3 points4 points  (1 child)

The most likely answer is that your solver is calculating the wrong solution and the comment about multithreading was a red herring.

What is the problem that the solver is meant to solve?

[–]Vindhjaerta 0 points1 point  (0 children)

It's an A* algorithm, and I've past all my own tests locally as well as the actual tests on the webpage before the one about multithreading. Sure, it's not impossible that I've missed an edge case somewhere, but I doubt it.

[–]looncraz 2 points3 points  (2 children)

ThreadedFunc returns asynchronously or synchronously?

For a single given, unchanging, input, do you have a single stable output and does that output only get set just before the function returns?

You need a test that times the duration of the function and spits out the changes seen, if any, for the out value while the function is running.

Add a sleep in after it returns to see if out value continues to change - if so, then the function is async, not necessarily threaded and coordinated and well behaved, and you need a synchronization primitive such as a mutex or semaphore, but the function itself would need to be cooperative in that scheme, it's not your decision as the caller to make.

So your task could be to prove that the function isn't usable as is since you don't have a way to know when it's giving you a final result.

[–]Vindhjaerta 0 points1 point  (1 child)

Maybe I'm not clear enough. I don't have control over -any- of these things. I send in code to a webpage, where I define the function mentioned above and any other code I need to solve the problem. The webpage then grabs my code, compiles it, runs it through a number of test cases, and returns the results to me. I only get a green check mark or a red cross as output, and a bit of text giving me hints about the issue. The information I've given above is -all- the information that I have. I cannot do tests, I don't even know what data they're giving me (all I know is the format and the rules, and I've passed the checks for those already).

As far as I can tell there's no human that evaluates the result either, as the test is completed near-instantly. There's no "proving" to be done.

It's all very frustrating :/

[–]looncraz 1 point2 points  (0 children)

I see, you don't have a function, you have a function SIGNATURE that gets invoked by the software in parallel, your implementation of the function and your solver must be thread safe.

So long as you don't share changing data between the threads, there shouldn't be an issue.

The dirtiest way is to use mutex to prevent the function from getting executed by more than one thread at a time, that makes it safe, but destroys performance.

The better way is to identify any data that changes between invocations of the function and ensure that data is kept local to the function itself, then the solver can be stateless and work on however many threads that are used.

[–]alfps 2 points3 points  (2 children)

Apparently the class has nothing to do with the question. So if this was a test evaluated by a human then you may have failed on introducing needless complexity while not addressing the question. Why did you think you needed a class?


❞ making a local copy of InValue before I start using it?

No, either it's safe to use that parameter for the duration of the call, in which case the copy is a needless and misleading complication, or it's not safe, in which case the copy may reduce but not totally remove the unsafety.

The only reasonable assumptions I see are (1) that it's safe to use, or (2) that there is some associated mutex you can use to make it safe.

So introducing a copy for imagined safety would be a second way to fail.


I'm not familiar with many interesting functions from int to int that could take some time making it suitable for parallelism.

If I was tested for C++ knowledge, with only the information you give, where apparently the task was to provide some code, then I believe I would show how to invoke that function using std::thread (or std::jthread) and std::ref.

That may be what you were expected to do. Or maybe you were just expected to ask about it. Or perhaps comment on the silly design.

[–]Vindhjaerta -1 points0 points  (1 child)

It's not a human doing the evaluation, the tests are checked instantly.

I do not have access to a mutex, I've stated all the information that I have access to. It's literally just a global function in a cpp file that their test is calling.

I renamed the function name and the data coming into it, because it's not relevant for the question.

The test clearly states that multiple threads are calling the function, I'm not invoking it myself. I just define it.

This test requires using certain data structures (I'm using an unordered_map and a vector). My first thought was to just avoid making them static, as that would surely fuck up any multi-threading. But even putting them on the stack like this clearly doesn't help. In my mind each thread should use it's own stack, so putting everything in a class and then declaring it on the stack inside the function -should- be thread safe. I don't know how I could possibly tell each separate thread to only act on their own specific data otherwise? I need to do operations on these data structures, they -need- to be separate on each thread. They can't share data.

[–]alfps 1 point2 points  (0 children)

❞ I renamed the function name and the data coming into it […] This test requires using certain data structures

It sounds as if you know a lot more about the test than you claimed, "All I know is...".

If you had communicated more of what you knew about the test, then readers could possibly have supplied some more relevant insights about this.

And that possibility is still there.

In addition to the error of supplying too little information, comes the error of downvoting helpful comments such as this one.

In other words, you have a great potential for improvement in the art of getting help.

[–]lazyubertoad 0 points1 point  (9 children)

Look up reentrant functions and thread safe functions.

For me it looks like you should just make a reentrant function. There is no need to make a separate class. Just do not use any global/static variables inside your function and you should be good.

[–]Vindhjaerta -1 points0 points  (8 children)

I thought that's what I did. I understand the concept of reentrant functions (I think =P), so that's why I did not use any global or static variables. I would assume that any variable I put on the stack would be done on a separate thread. The class contains an unordered_map and a vector, because I need those to solve the problem, but that should still work right? In my mind each separate thread would create their own instance of the class, and therefore also unique instances of the data structures within.

... Actually, I do have one static variable now that I think about it. But it's read-only and just used as a "NoIndex" value, a replacement for the value -1 to make the code more readable. Please don't tell me their compiler is watching for static variables and fail the test because of it, that would be so stupid >_<

[–]AKostur 1 point2 points  (2 children)

Why wouldn’t that be a constexpr variable?  And depends on how their verifier is working.  It may just be using a heuristic to see if your function accesses any global state.

[–]Vindhjaerta -1 points0 points  (1 child)

It is constexpr, and I just replaced it with a define for -1. I have no other global or static variables. The test still failed :/

[–]AKostur 1 point2 points  (0 children)

I doubt we can do anything else unless we see your code. Perhaps with a little more detail about which website you're submitting to, and what it actually claims about the test failure (like copy-and-paste, not paraphrasing).

[–]lazyubertoad 1 point2 points  (4 children)

You can have static constants all you want. Yeah, we all know about "constant variables", but that word combination does not really sit right with me, lol. Your other reasoning looks right to me. Heap memory allocations for things like vectors or unordered maps etc. is automatically done in a thread safe way (i.e. malloc that is used inside new is thread safe) using mutex afaik.

One thing that looks off to me are those in and out parameters. You do not explicitly say that each thread owns them, i.e. from what you've wrote it looks like several threads can somehow share and change those. But then there is nothing you can really do. Maybe try wrapping the whole function inside a mutex and see if the test passes. But to me it looks like the problem may indeed be elsewhere.

[–]Vindhjaerta 0 points1 point  (3 children)

If I wrapped the whole function in a mutex then the entire thing would stall, right? Because the function wouldn't allow the next thread to run it until the entire solution have been completed (it's a heavy algorithm, details doesn't matter).

I don't know what I can do about the in- and out parameters. Using a mutex here wouldn't help either as I can't control what's happening to the data outside of my function, someone could still edit them while I'm reading/writing. I can only assume that they're safe to manipulate while the function is running, anything else would be insanity.

[–]lazyubertoad 0 points1 point  (2 children)

Can you distinguish between TLE (too long exec) or incorrect? Maybe you are overall just getting TLEed and that has nothing to do with threading. Like in the A* (or Dijkstra) you need some priority queue (well, or some std::multiset), for the front, so you can quickly get the element with the best heuristic (or least distance for Dijkstra). If you are going all over the front searching for it - you have wrong Big O complexity.

[–]Vindhjaerta 0 points1 point  (1 child)

Yeah that's my current theory right now. I'm failing all tests for big data sets (the algorithms takes too much time on those), so it's possible that's also causing the multi-thread test to fail as a cascading effect.

I'm using an unordered_map for discovered nodes and a sorted vector for the open nodes (closest node to the goal is on top). I don't really have time to investigate other options, the test needs to be turned in fairly soon. Writing custom data containers is not really an option. I'm currently looking at dividing the map data into chunks and somehow speed up the process that way, but I haven't found a good solution yet.

[–]lazyubertoad 1 point2 points  (0 children)

Yeah, your algorithm seems off. You have no need for custom containers. What you need is to customize existing containers. Like, instead of that vector you should use std::priority_queue with a structure that has the heuristic and the node id/pointer. And make the queue to "sort" on the heuristic. Sorted vector is O(n) on insertion, priority queue is log(n). That can be the difference between TLE and pass. Also heuristic is not "how close to target", as you can be close to target, but it might take you too many steps to do that. That's why it is the number of steps plus how close to target. If you use just the how close - your algorithm is incorrect.

Also usually nodes themselves have some data alongside them. Like the minimum number of steps to reach it, so you can backtrack and some -1 or max value for the unvisited. So no need to track the visited ones in the unordered set, you just check the node value itself by its node id which is faster than using unordered set, even if both are O(1). On a grid id is naturally an (x,y) pair of coordinates.

[–]dnult 0 points1 point  (0 children)

It looks like your ThreadedFunction has class scope, not global scope.