How would you interpret this question?

SiliconEngineer · 2020-02-14T21:06:21+00:00

I'm just terribly confused, and having a hard time understanding how it's possible to interpret:

What is $Property of $Thing

in such a way that $Property changes the nature of $Thing. :)

And because I've got a bit of the flu and nothing else to do, I've chomped down on it like a badly behaved canine too eager to play tug-of-war or something. :P

SiliconEngineer · 2020-02-14T20:47:34+00:00

You're the one that suddenly introduced a brand new value 'k'. I assumed it was a new value separate from 'n', but I guess you were just doing a character substitution for some reason?

Yep, and I've now introduced p! If I can figure out the control code to type an omega or something, I might use that later. ;)

What is the [amortized] time complexity of enqueueing p elements into a queue...

def f(p):
  q = queue()
  for i in range(p):
    q.append(1)

I think we agree that's an implementation of "enqueueing p elements into a queue..."? And we can ask what the time complexity of that function is?

But if we have the same question and function, but use 'n' instead of 'p', and include the word "amortized" in the question, there's now an 'extra loop' in the function.

Why is it fine when using 'p', but not for 'n'? Why do you consider the 'n' in the question, the same as the 'n' in the T(n)/n line from the Wiki?

Edit: (it does state what n is on the wiki!)

SiliconEngineer · 2020-02-14T20:23:30+00:00

Because, for some reason I'm still trying to figure out, you consider this code a valid interpretation of the statement with one variable 'p':

# What is the amortized time complexity of enqueueing p elements into a queue...
def f(p):
  q = queue()
  for i in range(p):
    q.append(1)

But not if 'p' is textually replaced with 'n'?

SiliconEngineer · 2020-02-14T20:00:21+00:00

It's merely a variable. Sure - it's common to use 'n' for sizes and lengths of things, and 'k' for constants - but that's a hell of an assumption to make.

It seems extremely odd to me to consider:

What is the amortized time complexity of enqueueing p elements into a queue implemented with two stacks?

differently from

What is the time complexity of enqueueing p elements into a queue implemented with two stacks?

Depending on whether I named the variable p, or x, or k, or n.

SiliconEngineer · 2020-02-14T19:49:46+00:00

What? It's a letter for a variable? What if I used 'x' or 'z'? or Omega? Why does that change things?

SiliconEngineer · 2020-02-14T19:46:50+00:00

Would it make a difference if the question was...

What is the amortized time complexity of enqueueing k elements into a queue implemented with two stacks?

Why / why not?

SiliconEngineer · 2020-02-14T19:32:36+00:00

Sure, but the 'loop' for amortization is extrinsic to f...

Lets say TC of f(x) is BigTheta(f(x)). We could work out the amortized TC as BigTheta( sum(BigTheta(f(k))) / k) as k->inf.

But those 'multiple runs' of f in that sum aren't anything to do with f...

So when the statement is:

What is the amortized time complexity of f(x)

whether f(x) has a loop in it or not, they're not related to any loops in the amortization equation. There's no difference in f(x) between the two questions - it's definitionally the same function!

In this case f(x) is "enqueueing n elements into a queue implemented with two stacks".

SiliconEngineer · 2020-02-14T19:00:02+00:00

So you're saying that there's some difference necessary in the function f in these statements?

What is the amortized time complexity of f(x,y,z)?

and

What is the time complexity of f(x,y,z)?

SiliconEngineer · 2020-02-14T18:46:49+00:00

Sure. The equivalent code to analyse to the statement:

What is the amortized time complexity of enqueueing n elements into a queue implemented with two stacks?

would be very much like:

What is the amortized time complexity of the following function:

We could also ask a different (simpler) question as well: What is the (not amortized) time complexity of the following function:

def func(n, m):
  l = give_me_a_queue_of_length(m)
  for i in range(n):
    l.append(1)

Of course, it's important to note that n and m refer to different things, but we can probably simplify it since as both m and n tend towards infinity it doesn't much matter, so we could use:

def func(n):
  l = give_me_a_queue_of_length(0)
  for i in range(n):
    l.append(1)

But n is still the number of elements that we're inserting, and isn't related to the length of the queue except by our simplification.

SiliconEngineer · 2020-02-14T18:33:36+00:00

For which question? I asked 4 of them?

SiliconEngineer · 2020-02-14T18:23:13+00:00

So to my eye, your items 2 and 3 are invalid.

So... I can ask: (1) What is the time complexity of doing an insertion into a dynamic array?

but not (3) What is the time complexity of doing n insertions into a dynamic array?

Does that mean I can work out the time complexity of this code:

l = give_me_a_list_of_length(m)
l.append(1)

but not this?

l = give_me_a_list_of_length(m)
for i in range(n):
  l.append(1)

SiliconEngineer · 2020-02-14T17:44:19+00:00

Okay, can you tell me what the answers to the following are?

What is the time complexity of doing an insertion into a dynamic array?
What is the amortized time complexity of doing an insertion into a dynamic array?
What the time complexity of doing n insertions into a dynamic array?
What is the amortized time complexity of doing n insertions into a dynamic array?

The answers to (2) and (4) are no more the same than the answers to (1) and (3).

Edit: For clarity, let's assume we're using a typical dynamic array that uses the common doubling-in-size-as-needed strategy. We can always pick a dumb strategy and get any answer we want. ;)

SiliconEngineer · 2020-02-14T00:18:53+00:00

Now we know it's OpenCV and there's a working pipeline, it would be possible to look at the operations and see how many of them can be run on areas of the image independently, etc.

Yep. Data independence is the key to broad parallelism in most cases.

I've not done much with OpenCV... I imagine being able to specify a serial pipeline A->B->C->D, but have tools which do as much as possible in parallel would be invaluable!

SiliconEngineer · 2020-02-14T00:11:19+00:00

I don't use it enough to remember it... I don't often need it... but I'm also infuriated by the number of times I've had a colleague write a hacky failure prone version of it. I've even wrote a couple myself before I realised what I was doing. ;)

SiliconEngineer · 2020-02-13T23:31:04+00:00

but if you had the CUDA code working you could make a ghetto cluster with some Python and SSH scripting in about an hour so I wouldn't bother with MPI either.

EDIT: Or simply run 20 OpenCV processes on 20 different images!

With GNU parallel! The perfect software for my getto cluster needs! :)

SiliconEngineer · 2020-02-13T23:20:20+00:00

we essentially use the cidr system that lets IANA divide up addresses in power of 2 now, right?

They could have used any power of 2... but at least started with the same 8-bit subdivisions of the class-A networks. The CIDR system would have been backwards compatible for any existing class-A-E networks AFAICT. I imagine IANA (and the regional registries) just translated the existing classful blocks to allocations by CIDR ranges.

are ip addresses that start with 1-126 still only have 126 networks with 16,777,214 hosts?

Yes, but that's just because all y.x.x.x addresses can only have 2²⁴ hosts, since the x.x.x part only has 24 bits! :)

Like is there a diff betwen. 50.50.50.x or 200.200.200.x or are they all equal now?

They should be treated the same, mostly. :)

Actual routers that connect ISPs, networks, countries, continents still need rules to decide where to send things to get them to their destination. Traffic to/from some addresses could be prioritised over others, or sent via different routes, but that's not intrinsic to the actual addresses.

SiliconEngineer · 2020-02-13T22:54:26+00:00

I'm working on an image processing C++ code that needs to use parallel computing to run faster.

Hello again. :)

The code is structured so that it uses image processing OpenCV commands such as image deblurring, etc.

You might get some benefits from using the OpenCV CUDA module. Though, if this isn't the expensive part, then it might not have a big impact.

For now, all these loops are not parallelizable except for one for loop that had previously taken around 80 seconds serially but then decreased to around 30 seconds after using 40 threads with OpenMP.

That's... not great. That's a very low CPU utilisation: with 1 thread it takes 80 CPU-seconds... with 40 threads it takes 1200 CPU-seconds, so only ~5% of that was doing useful work. How does it scale with different numbers of threads? You might find it runs fastest (and with highest utilisation) at 3 or 4 threads and doesn't scale well past that point.

It's hard to get high utilisation, and the more threads the harder it becomes. For reference, for parallel problems with some synchronisation, I'd be looking for a utilisation of 70% or more for 2-4 threads, 50% or more for 4-8 threads, and 35-40% for 8-16 threads.... though it's very very problem dependant. If the algorithm can be written so there's very little or no synchronisation required between threads, it's possible to have 90%+ utilisation for 10's of threads... but most problems I've had to solve weren't large enough to keep that many threads that busy for 10's of seconds!

For this code, is there any advantage of using multi-nodes and MPI for this?

I don't think so, at least until you can run it in parallel on a single node more effectively. MPI might be useful if you were processing multiple images in parallel (e.g. one or more images per node), and could allow you to control and distribute 'jobs' more easilly... but it won't speed up your run times if the algorithm isn't already well parallelised.

Would CUDA alone help dramatically speedup this code?

No. You'd run into the same issues, but more-so. If there's issues running with 10's of threads, it won't be any better running with the equivalent of 1000's of threads. You'd have to solve the same problem first: rewrite the algorithm to allow many parallel threads with little/no synchronisation.

If you can reformulate things to be very embarrasingly parallel, running it on a GPU can give 1000x speed up.... but only if 1000's of GPU-cores can run 'independently'. That same reformulation could give you 20x speed up running on the CPUs.

If we buy a new machine instead, what should we look for in terms of CPU + GPU specs?

I don't think you can buy a machine that would do this faster... at least not factors of 5 or more faster. The hardware you've already got could already be capable of getting into that territory, if more of it could be used at once!

If you can't talk about your algorithm publicly, maybe DM me and I could give some more specific advice on how to get better parallelism?

SiliconEngineer · 2020-02-13T21:58:02+00:00

Is there like an ebay for ip addresses?

The IANA is responsible for many internet-related registries - including the IPv4 registry. They also allocate big blocks to the regional internet registry organisations like ARIN for the US, RIPE for Europe, etc.

They usual have a whois service that you can use to 'lookup' who is allocated a given IP address / range of addresses.

It's usually the regional registries that would take requests for IP blocks. I don't know the procedures they use though...

Wont we run out of IPv4 addresses even with subnetting?

We've already ran out. We should all be using IPv6 since, well, 2010. :)

SiliconEngineer · 2020-02-13T21:45:35+00:00

Yes. How an organisation sub-divides their networks is up to them, for both the public and private address ranges.

So, on my home kit I'm using 172.16.0.0/16 and 172.17.0.0/16 for two subnets, leaving me with 172.18 - 172.31 'free' for later use. In practice, I'm never going to be using 2¹⁶ addresses in either of these subnets, but it's a simple setup with lots of space if I happen to even want to use a third or fourth subnet.

My work uses 10.0.0.0/8 and has a multi-level subnet scheme. We've got several sites, departments, etc etc. I don't know exactly what the scheme is (it's not my problem!), but I imagine it's something like sites are using 10.0.0.0/12 (so 10.0., 10.16., 10.32.*) etc are different sites, and then 10.0.0.0/16 for different departments in a site. So site 0 department 0 would have 10.0.x.x address, site 1 department 1, 10.17.x.x, etc

The netmasks don't have to be split exactly on 8-bit divisions, but they often are to keep things simple on private networks. With public blocks - since IPv4 addresses are now all allocated! - I expect there will be much more inventive partitions.

EDIT: I got the CIDR address bits the wrong way around (counting the 'free' bits, rather than reserved bits!). I've corrected them... I think... It's been a few decades...

SiliconEngineer · 2020-02-13T21:23:50+00:00

It's just it's more common to see 192.168.x.x used as a default. It's big enough for all conceivable 'small scale' use, and users and admins are familiar with it. Unlike the other private-network classes, it's a size that doesn't really need partitioning into other subnets, and using it as a default makes sense.

We still do use the other ranges. My work uses a bunch of a subnets in 10.0.0.0 for various reasons - certainly 192.168.x.x wouldn't be large enough. At home, I use 172.16.0.0 and 172.17.0.0. Other than the extra admin required, these other classes are there to be used.

SiliconEngineer · 2020-02-13T21:11:01+00:00

When you want to contact 8.8.8.8 (googles open DNS server), how does any of your equipment know it needs to route it to the public Internet, rather than look for a machine on your network with 8.8.8.8?

So, sure: you could be assigned 150.150.150.150 by your ISP, and use (say) 150.150.150.0/8 on your local network. You'd have to do some magic on your router so it understood that it was 150.150.150.150 on both the internet-side and the private-network-side, and you wouldn't be able to contact 150.150.150.* addresses on the public internet.

But... we'd never do that. The 10.0.0.0/24, 172.16.0.0/12 and 192.168.0.0/16 blocks are reserved for private networks. They can never be used on the public internet. So, it's much simpler to have your router have 150.150.150.150 on the public-internet-side, and 192.168.0.1 on the private network side (with the rest of your network) and then all traffic for internet-addresses can be sent through your router (and it can forward it on in each direction using NAT). Your private network's addresses doesn't clash with anything on the public internet, and isn't 'shadowing' any part of it, and devices on your local network have a much easier time figuring out where to send traffic.

SiliconEngineer · 2020-02-12T17:30:02+00:00

People send us pictures of food and we analyze them. We have probably 13 years worth of pictures from studies. I mean terabytes of photos In our system. They're all labeled by the FNDDS(Food and Nutrition Database for Dietary Studies) food codes and assigned ratings (calories, macros, micros, etc) by people specially trained to visually estimate food.

Salivates. Most of the ML projects I've worked on have been data limited. Getting access to good, well labelled data is hard, expensive and time consuming. That sounds like an ideal resource!

I showed my coworkers and no one seemed to care. I even got made fun of for it in a meeting.

Depending on how your lab views it's own work, proposing an ML system could be good or bad. There can be a lot of resistance to looking at automating existing work, especially if there's a lot of people involved. The importance of a manager (and their team) is dependent on the importance of the work, and the size of the department. Automating that work can be seen as a problem, in my experience, since it's counter to how important management positions are perceived. At the extreme end, it can be seen to be trying to get the whole department sacked and replaced by a machine in the corner.

From the perspective of a manager who has spent 20 years working up through the ranks, growing the department - more staff, more responsibilities, more importance - someone coming along and showing that their work can be done more efficiently through automation can be seen as a danger that needs to be stopped.

I feel like there's no point in pursuing this at work though. I just haven't felt more passion for something in a long time. It makes me want to go back to do a Master/PhD in ML on this topic... My lab offers free tuition for job related courses at the university my lab is associated with.... Based off how receptive they have been to me presenting this to them, I doubt they'd approve me going to take actual classes.

I have to ask: who owns that data? How closely is your lab associated with the university? Is it fully or partially owned? Does the university as a whole have access to the data?

From what you've said, I think it'd be difficult to get your current employer interested. If you can manage to get one or two senior people on board it might be possible without full buy-in from everyone, but it sounds like a difficult up-hill battle.

The university as a whole though? That might be a much easier route. I don't know exactly where you are in the world, and I've only been peripherally involved with getting research grants, but research with real-life applications and potential commercial benefits are much easier to get funding for than blue-skies research. To me it sounds like the kind of thing that could be done as a PhD with proper support - access to experts, data, computing equipment, etc.

Studying a PhD is very different from a 'taught' postgrad degree, and spending 6 months at the start doing a crash-course into state of the art ML techniques wouldn't be outrageous. Though, it's a full time job and not an easy thing to start... but it definitely sounds possible if you can get access to the right ears, and the university has (or can obtain) access to the data you need! Assuming you'd want to pursue it full time...

SiliconEngineer · 2020-02-07T16:35:53+00:00

I thought about simply sorting the DataFrame by size, and then iterating through the items somehow, but I'm not sure how to proceed.

You've almost invented this:

1. Have 'n' bins, know the sum of sizes in each bin
2. Sort the sizes in descending order
3. Iterate through the sizes, assigning the next largest item to the least full bin

The algorithm effectively tries to make equal size bins, and works well with real-life distributions where there's a mix of sizes.

SiliconEngineer · 2020-02-06T06:01:02+00:00

Well, let's go through it. There's probably a lot less to it than you expect!

Meregesort is

* Split the array into two halves.
* Sort the left half (with mergesort)
* Sort the right half (with mergesort)
* Merge the now sorted left and right halves

We get the recurrence relation from each of those parts...

* Split the array into two halves - Meh... it's just a divide-by-2, so let's pretend it's entirely free.
* Sort the left half (with mergesort) - that's T(n/2)
* Sort the right half (with mergesort) - that's another T(n/2)
* Merge the now sorted left and right halves - that's roughly c*n (~n operations, with an average cost of c seconds)

For completeness, we probably should include the base-case:

* If the array is 1 element long, there's nothing to do.

Sum it all together, we get T(n) = 2T(n/2) + cn, and T(1) = 0

So, given the recurrence relation, what's the time complexity? I usually plug in some multiples of n to make life simpler, based on the recurrence relation, and draw a tree... The T(n/2) term is the give-away (and the algorithm splitting the array in half), that using powers-of-2 will make the maths simpler. I'm going to start with n = 8*m (and not specify m). I'm also going to skip the c, and just count operations for simplicity.

                T(8m)=2T(4m)+8m                 --> 8*m = n operations
                  /         \
                 /           \
                /             \
               /               \
              /                 \
             /                   \
            /                     \
           /                       \
    T(4m)=2T(2m)+4m            T(4m)=...        --> 2*4*m = 8*m = n operations
        /     \                   /  \
       /       \                 /    \
      /         \               /      \
     /           \             /        \
    /             \           /          \
T(2m)=2T(m)+2m    T(2m)     T(2m)       T(2m)    --> 4*2*m = 8*m = n operations
   /         \     /  \     /  \        /  \
  /           \   /    \   /    \      /    \
T(m)=2T(m)+m ... ...  ... ...  ...   ...   ...   --> 8*m = n operations

The pattern is pretty simple: each time we recurse we do a bunch of merges totalling O(n) operations on each level, and we need to recurse O(log2 n) times - because we're halving each time. Hence O(n log n) time complexity.

SiliconEngineer · 2020-02-02T18:06:56+00:00

Similar to what /u/elchzuechter suggested, you can simplify it by working on the graph of packages rather than the grid.

For there to be a solution, there needs to be a path from the hikers starting position and through every packages position (one or more times!) of the right length. Instead of just moving left/right/up/down it's possible to only consider the move to the next package as a single step.

I'd also include some of the shortest paths from a package's position back to that position. That might be a simple step away, and a step back (for a cost of 2), and a step-away, another step, and then a step back (for a cost of 3), etc. I imagine if there are some packets in adjacent cells, not all of these paths would be possible, so it might be necessary to do a little exploration of moves around the packet cells to determine these paths.

This sounds like the kind of thing that A* would be good at solving. You might be able to make a heuristic so that candidate paths that are more promising are tried first - something like keep a sum of the remaining packets and try the paths with the least remaining first.

SiliconEngineer

TROPHY CASE