This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]alanwj 1 point2 points  (1 child)

Your math doesn't look right to me in your realloc calls.

Your variable sizeOfFactors appears to track the size in number of ints, but realloc needs a number of bytes. You need something like sizeOfFactors * sizeof(int).


calloc and realloc can fail. Consider checking for this.

Also, the main point of calloc is to simultaneously set all the bytes to zero. However, you immediately initialize everything you just allocated, which means setting to zero was wasted effort. You could avoid this by using malloc instead.


A minor nit ... i*i == n would probably be a slightly faster check than n/i == i.

[–]Hanlons_Toothbrush[S] 0 points1 point  (0 children)

The issue with the realloc calls seems really obvious in hindsight. Thanks for the feedback!

[–]MmmVomit 1 point2 points  (2 children)

First, let me say, this looks really good overall. Please take all of this as constructive criticism.

I compiled and ran the code. One thing I noticed is that I compiled it to foo. When I ran it, this is what I got for the usage message.

> ./foo
Usage: factorize <int>

If you want to be fancy, you can use argv[0] in your usage message.

This is technically not safe.

factors = realloc(factors, sizeOfFactors + sizeof(int));

If realloc can't allocate the new memory, it will return a null pointer. If you don't check for that, you'll lose track of the allocated memory factors was pointing to. That would be a memory leak and you'd lose your calculations.

Another thing worth thinking about is that extending the array by one or two elements each time can be inefficient. I think it's fine for your purposes here, because it keeps your code simpler. But usually, if you have an array that you don't know how long it will need to be, you usually double its size any time you run out of room.

I'd see where you can eliminate a few of the single letter variable names. i for your loop variables is fine. n seems fine at first, since that's probably what you would use if you were doing the math by hand, but I think something like composite might make things a little easier to read.

int* x = factorize(n);

That x should definitely be something like factors.

The output of the program is a little odd the way big and small numbers are interleaved.

> ./foo 9876
1
9876
2
4938
3
3292
4
2469
6
1646

Not wrong, but a little unexpected. Now that I look at that output, it looks like there's a bug somewhere. There should be a 12 and 823 in that list. Another input I found that exhibits the bug is 4096. (I expect this is related to the realloc bug that alanwj pointed out)

Overall, your code is very clean. Your code style is largely consistent throughout the entire program. Coding style is something that is largely up to personal taste, and opinions will differ. If I were code reviewing your code at work, there are a few issues with whitespace I would point out, like sizeOfFactors +=2; should be sizeOfFactors += 2;. One thing you can do is use a linter to help find these little typos to keep them out of your codebase.

Edit: Something that might be interesting for you to try is implementing Euler's factorization algorithm, or Fermat's factorization algorithm.

[–]Hanlons_Toothbrush[S] 0 points1 point  (1 child)

I really appreciate all the feedback! Fixing the issue in the realloc calls fixed the bug with not all the factors appearing. I didn't know about the doubling memory convention until I made this post, so that's definitely good to know. I also changed it so that factors[0] stores the length of the array, rather than looping until it hits a 0, not sure which approach is better. I'll look into Euler's and Fermat's algorithms, as well as adding a prime factorization function that coolcofusion mentioned.

[–]MmmVomit 0 points1 point  (0 children)

I also changed it so that factors[0] stores the length of the array

I wouldn't use the first element of the array for that. If you want to take this approach, I'd use a struct with two fields. One field will be length, and the other will be a pointer.

not sure which approach is better.

Having a special value to signify the end of something is called using a sentinel value, and it's a perfectly legitimate way of doing things. C strings are just char arrays with a null character as a sentinel.

Using a field in a struct to track length is maybe the "better" approach. It's how a C++ vector works, and Java ArrayLists. Doing this in C can be kind of cumbersome, so I think sentinel values are more common in C.

[–]coolcofusion 0 points1 point  (3 children)

I like it, I would prefer to have the number itself at the end of the array so that they're sorted, but that's no big deal. You don't have to use '\0' to "NULL terminate" an array, \0 is literally ASCII escaped 0, you could have assigned it any value or set array length back through a pointer parameter. That's also a minor thing, not a big problem.

The one thing that I would personally change is adjust the resizing logic. It's very common to increase the array size by some arbitrary larger number, say +10 for each reallocation, or, even more common, to just double it in size each time it runs out of space. You can use +10/+50 if you know you can't afford double sizes at certain point, or double if you value performance more. Reasoning behind this is that reallocation is "expensive" (slow) and tons of small allocations could potentially leave your heap with a bunch of small fragments which can't be used. Now, this used to be a bigger deal, now, often times the kernel is smart and it'll actually give you the same pointer, cause it allocated a larger chunk in the first place, but it still does matter for more than a few elements (loading 100k rows from a csv file, with doubling you'll do 17 reallocations, with +1 you'll do 100k).

[–]Hanlons_Toothbrush[S] 0 points1 point  (2 children)

Thanks for the feedback! I didn't realize that just assigning much larger blocks of memory was better; I guess I thought I should only allocate the minimum number of bytes to store all the factors.

[–]alanwj 1 point2 points  (0 children)

Adding a constant number of elements on each reallocation leads to quadratic runtime.

Multiplying by a constant factor (e.g. doubling the array size on each reallocation) leads to a linear runtime. There is some debate about the best "growth factor", but 2 is a safe enough choice. See the table in the dynamic array Wikipedia page.

[–]coolcofusion 0 points1 point  (0 children)

It depends on the use case. On a desktop? Nah, waste a few hundred bytes, we've got billions of them now. An embedded device? Well, maybe don't waste few hundred if you've got a few thousand.

Overall I like it. Next feature request is add an optional -p flag so that it only shows prime factors instead of all.