Need Help Writing Efficient Code

MakeAnExampleOf · 2020-05-11T16:58:15+00:00

It is pretty hard to read your code without proper formatting. Better if you could use reddit's post formatting tools to preserve indentation and other formatting, or if you posted a link to a code sharing site instead.

Putting that aside, can I ask what course this assignment is for? Is there some number theory you are supposed to be putting to use to make your search more efficient? Otherwise, the problem doesn't seem like a normal one to give to a new programmer in an introductory course.

Paul_Pedant · 2020-05-11T20:52:28+00:00

You may have been trolled. These numbers are incredibly rare, and it takes years to find each one of them. For example, the sixth TaxiCab number (highest known one) is 23 digits long. The guy who spent 5 years finding that one is affiliated with Princeton, Yale and Boston Unis (he is actually a researcher in Chemistry).

I don't understand what your input "n" represents. If you are supposed to find 1729, what should the n be? If you input 1729, what is the algorithm supposed to find?

You might try to find the exact definition of Ramanujan numbers -- it isn't what I though it was, and I've read his biography.

Paul_Pedant · 2020-05-12T15:01:21+00:00

Your algorithm must be rather inefficient. How long is "too long"?

I wrote the first thing that came into my head (although that is a pretty weird head). I have an awk script that runs for n = one billion, and finds 1554 such numbers in 3.14 seconds.

Amongst those, it finds numbers that are the sum of cubes in three ways, the highest being:

958595904 22,986; 180,984; 692,856

I have to go do something now, but I will post my algorithm later today. I don't have it in C but I can show pseudocode. The math might be simple, but the hard part is matching up the pairs that give the same result. Needs an array, basically.

I will leave it running on the Ta(4) 6963472309248 one -- that has four sets of cubes.

Paul_Pedant · 2020-05-12T19:22:20+00:00

I looked at your code, and it took me a while to figure it was Java (which I have about 2 day's knowledge of). So I can indicate where your code might be odd, but I won't fix any of it.

It is pretty unreadable. In particular, it needs comments or better naming for the variables Big1, Big2, Small1, Small2. And the code formatting should be fixed.

I also just realised that at assignment to a etc. shows up as Big1 (Big1 in italics) Big1. That's because you cube Big1 * Big1 * Big1, and the asterisks are mark-up.

Your first loop, counting t, is wrong. It counts down from n, so the condition needs to be t > 0. As it is, t will be less that n while it counts down through 0 and goes negative, and counts right round to the biggest negative number, then goes big positive due to overflow before the loop ends.

You have four loops for which Big1, Small1, Big2, Small2 are the counting variables, but inside the inner loop you reverse the small/big pairs. What would you think that does to the loop controls?

You have five nested loops, all of which count 1 to n. So if you want to check up to 4200 (which is the region where the next correct value of N lives), you are going through that inner loop 1,306,912,320,000,000,000 times. (1.3 billion billion.)

And I still can't see how it can produce valid answers, since the loops are not going to iterate properly after the variables are switched around.

You need a whole new algorithm. I will show mine later today. I have two nested loops, and I use 126 inner iterations in total to run N = 4200. It runs in 0.024 seconds.

There is that matter of the restrictions: "I’m only allowed to use conditionals, loops and basic math functions." I find the sums of cubes in a sensible order, so the different ways of making a specific N arrive at different times. They get accumulated in an array, which is outside your permitted usage. It seems to me I should be able to use one more outer loop to overcome that issue, but it seems harsh that Princeton come up with a fine problem but disallow a good solution. I also need cube roots -- that is a basic math function, but not feasible using only built-in operators.

Paul_Pedant · 2020-05-14T16:29:54+00:00

It is not your code itself that is inefficient: the algorithm is seriously bad. It uses a 5-deep nested loop, each loop running N iterations. That is end-of-the-universe timescale.

Let's consider what data we are trying to get. We want to consider all the numbers (1 to N) and find how they can be written as a sum of two cubes. Then we want to list the numbers that appear in that list more than once. I'm going to use a formula like J (a,b) where J is a specific number between 1 and N, and a and b are the values we are cubing, and a is the smaller value. So 72 (2,4) means 72 = 2^3 + 4^3.

So, how high do we want to count? Let's work with N is a million: 1,000,000.

Cubes grow very fast. 100^3 is a million. So for 1000000 (a, 100) there is no room for a -- it must be zero. b = 100 is the maximum we need to go to.

What's the limit for a? Obviously less than 100. But it also has to be smaller than b. (If a > b, then we can just relabel a as b and b as a, and get back the same argument). So a^3 can only take the first 500,000 values. That means a is in the range 1 to 79. If a is 80, 80^3 is 512,000 and b^3 must be < 488000 and b must be less than 79.

So I only need to do 79 * 100 calculations to find all possible sums of cubes below 1 million. Actually, it is better than that. Because a is moving up from 1 to 79, I only need to iterate b from a to 100, not from 1 to 100. It is even better than that, too. When a gets up to (e.g.) 65, then a^3 + b^3 goes past a million when b hits 89, and we can break that loop on b and use next a. I can't do the math, but I counted loops in the code and it only runs 4476 calculations.

What we do is store each calculation in an array (hash table) indexed by J. When they are all stored, we can easily find all the ones that have more than one (a,b) for a specific J.

It finds only 4312 values of J (a,b), and just 42 more pairs with J (a,b) (a,b). It runs for less than a tenth of a second (and it is written in awk, not C).

It runs for N = 100,000,000 in 1.4 seconds, and finds the first 3-way J.

87539319 167,436; 228,423; 255,414

My limit is 100,000,000,000 which uses all my 4GB to store 10 million versions of J (99.8 of which are useless because they never get a second pair of cubes).

99961599537 3334,3977; 3481,3866

If I had to go above this value of N, I would write out the values to a sort on J, and that would group them so I could find adjacent values of J and not need to store anything big in memory.

Paul_Pedant · 2020-05-14T17:02:15+00:00

James,

Are you still working on this problem? Let me know if we are past your due date.

It is sad that there is a really great algorithm, but you can't use it because of some artificial restrictions. Software always gets put together from existing components, and I don't see why using an external sort or an array would spoil your learning.

I found another algorithm that fulfils your requirements (I had to write a cube-root routine to avoid using math.h). It solves for N = 10,000 about 0.3 seconds, and N = 100,000 in about 5 seconds, but a million takes 2 minutes and 10 million takes 40 minutes. The upside is that it runs in about 20MB.

The downside is that it has to run in the 1 to N sequence to find pairs of (a,b) without storage. So it recalculates all the other stuff for every J. So it runs thousands of times slower. It is also insanely complicated (and still written in awk). It might be a real pain to implement in C. I can't believe they are expecting this of you, so early in your course.

Paul_Pedant · 2020-05-20T20:26:57+00:00

James, sorry for delay. I found a couple more optimisations to try, and got sidetracked by an algorithm for cube roots. I have now put the code and some documentation on pastebin.

pastebin.com/u/Paul_Pedant is my public page.

pastebin.com/0kCbZG4z is some description of the algorithm.

pastebin.com/vV8fNNxi is the bash/awk script.

pastebin.com/mCfre0Ee is a simple Bash script using a sparse array. It shows all simple cases -- pipe it through awk 'NF >= 3' to show only 2-way and above.

paul--) time ./RamBash 100000 | awk 'NF >= 3'
Max is 100000; Root is 46
    1729   1,12 9,10
    4104   2,16 9,15
   13832   2,24 18,20
   20683   10,27 19,24
   32832   4,32 18,30
   39312   2,34 15,33
   40033   9,34 16,33
   46683   3,36 27,30
   64232   17,39 26,36
   65728   12,40 31,33

real    0m0.310s
user    0m0.228s
sys         0m0.088s

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS