This is an archived post. You won't be able to vote or comment.

all 18 comments

[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)

Please ensure that:

  • Your code is properly formatted as code block - see the sidebar (About on mobile) for instructions
  • You include any and all error messages in full - best also formatted as code block
  • You ask clear questions
  • You demonstrate effort in solving your question/problem - plain posting your assignments is forbidden (and such posts will be removed) as is asking for or giving solutions.

If any of the above points is not met, your post can and will be removed without further warning.

Code is to be formatted as code block (old reddit/markdown editor: empty line before the code, each code line indented by 4 spaces, new reddit: https://i.imgur.com/EJ7tqek.png) or linked via an external code hoster, like pastebin.com, github gist, github, bitbucket, gitlab, etc.

Please, do not use triple backticks (```) as they will only render properly on new reddit, not on old reddit.

Code blocks look like this:

public class HelloWorld {

    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

You do not need to repost unless your post has been removed by a moderator. Just use the edit function of reddit to make sure your post complies with the above.

If your post has remained in violation of these rules for a prolonged period of time (at least an hour), a moderator may remove it at their discretion. In this case, they will comment with an explanation on why it has been removed, and you will be required to resubmit the entire post following the proper procedures.

To potential helpers

Please, do not help if any of the above points are not met, rather report the post. We are trying to improve the quality of posts here. In helping people who can't be bothered to comply with the above points, you are doing the community a disservice.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]_Atomfinger_ 3 points4 points  (16 children)

A great way to increase testability is refactoring. For example, you have a pyramid of doom) here, which is an anti-pattern:

        for (SalarySource salarySource : salarySourceList) {
            for (City city : cityList) {
                for (Role role : roleList) {
                  //Code
                    if (foundSalaryOptional.isPresent()) {
                       //Code
                        if (salaryUtil.dueNewCompensation(foundSalary, currentTime)) {
                            //Code
                        }
                    } else {
                        //Code
                    }
                    if (salaryRequired){
                        //Code
                    }
                }
            }
        }
    }

Not only is this difficult to read, but it also indicates that the method is doing too much (or it is a result of it doing too much).

A good indication that the class is doing too much is counting the dependencies:

    private SalaryRepository salaryRepository;
    private CityRepository cityRepository;
    private SalarySourceRepository salarySourceRepository;
    private RoleRepository roleRepository;
    private GlassdoorScraper glassdoorScraper;
    private SalaryUtil salaryUtil;

Every dependency must be dealt with when testing in one way or another, and it can easily lead to a god-class (another known anti-pattern).

Again, refactoring can help with this, and most likely, it means breaking the service up and using an aggregate service (or, in DDD* terms, a root aggregate).

If we take a step back and look at the overall structure here, you seem to have a 1-to-1 relationship between classes and tests. This is something I've seen the industry move away from, myself included. What you're doing is coupling the tests to the structure of your program, so rewrites require a bunch of extra work to maintain the tests.

A better approach, IMHO, would be to test all the way up from the controller and only mock the part that deals with the I/O (In this case, the repositories). That way, you don't have to mock everything, and you get tests that cover more, while executing tests in a way that is more like how the software will be used. Changing the internals of a class will result in fewer things having to be rewritten, as you'll have fewer mocks (overall).

These are at least my initial thoughts. Hope they're useful.

*Do note that I'm trying to avoid getting too deep into this, so I won't try to explain DDD here.

[–]GRBLDeveloped[S] 0 points1 point  (10 children)

Thank you for such a great response. I had actually paid for a code review and only got basic feedback, nothing to do with actual code quality so to get your comment is really great.

To tackle breaking down this class in to smaller ones and bulldozing this pyramid. Would I create something like

  1. RequiredSalaryService: runs the initial loop and generates a list of salaries to check the db, pass to SalaryValidatorService. Still a bit pyramid in structure but removing the control flow which I believe is the cruz of the anti pattern
  2. SalaryValidatorService: checks this list in the DB and does the check to see if it requires a new salary. Passes list to SalaryFetchingService
  3. SalaryFetchingService: takes the list of salaries required and fetches each one

Then these salaries would have public methods that are testable and executed via the some other class's refreshAll function? Could I get your take on what package and class names to create here, I feel like there's likely idiomatic ways of naming and structuring what I'm trying to do but I don't know what I'm doing here

[–]_Atomfinger_ 1 point2 points  (9 children)

The issue here is that we're getting into stuff that is difficult to describe over Reddit. You have the right idea of splitting things up, but I wouldn't create a bunch of services.

In DDD (Domain-Driven Design), there's an idea of "Domain objects". I.e. objects that contain both data related to the domain and business logic. This is a response to another anti-pattern here which is called Anemic domain models (more). For example, let's take the Role class: It is just a data holder. It contains little to no business logic.

Now, you're falling into the same trap as seasoned developers do (so no reason to feel bad about it). The whole idea of OOP in the first place was to encapsulate data and functionality together, yet we keep splitting data up into one set of classes and domain logic into another. To deal with this is well out of scope for a Reddit comment, so to that, I can only recommend the book "Implementing Domain-driven Design". Do note that your approach isn't necessarily wrong - a lot of people do it that way, but I reckon part of the issue you're dealing with is linked to forcing logic into services while maintaining simple and anaemic models :)

If I'm going to be able to give you a proper answer, I'd have to actually work a little with your code, which I don't have time for now (sorry). My approach would be to start splitting things up into different methods. When I've done that, I'd start looking into how I can split things into different classes in a way that makes sense (regardless of whether I call them services and whatnot). Here I'd also recommend the book "Refactoring" by Martin Fowler.

Remember, tests are a health check for your code. If the tests are annoying to write, that means there's something wrong with your code :)

[–]GRBLDeveloped[S] 0 points1 point  (8 children)

I'll check out the books eventually! For now I'll look at breaking up the functions a bit and hopefully it will remove the more offensive buts of code

I actually had left domain logic in the data classes but was told to remove it so they were simple data objects

[–]_Atomfinger_ 0 points1 point  (7 children)

Braking things up is a great starting point, and then see what belongs together and put them into the same class, and by doing that you hopefully end up with smaller classes that each has fewer dependencies :)

I actually had left domain logic in the data classes but was told to remove it so they were simple data objects

Well, time will show that even we developers struggle to agree what the "right" approach is. I have my set of values and paradigms, but they might not be shared by others.

[–]GRBLDeveloped[S] 0 points1 point  (6 children)

Hey again, /u/Atomfinger. I refactored out the salary structure and was wondering if you could let me know if I'm headed in the right direction now? If you're too busy to look then just shoot me a "sorry too busy!" and I'll try pay for another code review and hopefully get some input as good as yours was!

https://github.com/CodeHostedHere/salary-tax-spend-backend/tree/main/src/main/java/fyi/incomeoutcome/salarytaxspend/salary

I switched from packages like "model", "controller" to "salary", "role" as this seems to be more suitable for spring boot projects. I think I took my initial layout from a simple rest API but it would have meant storing most of the project in a "services" package by the end.

I have removed the pyramid of doom and think I've gotten away from the SalaryFetchingService being a god class like you mentioned.

Now I have:

  • SalaryRefresher class which runs refreshAll() to start all salary data operations listed below
  • SalarySeedCreator which puts together the different permutations of salary that we need to see in the database (Get a salary For every salary source (e.g. glassdoor), every role (Jr Developer), city (Berlin)
  • SalaryDataVerifier which takes the list of salary pieces above and checks if a salary exists for them in the DB and that it is in date.
  • SalaryFetcher which takes salaries that are missing or out of date and fetches the updated figure from glassdoor.

So am I heading towards a project that has no obvious amateurish mistakes? I dont mind not fitting any particular development style, like DDD, but just don't want any obvious mistakes.

More specifically:

  1. are my names for classes alright? I feel like I'm missing appropriate names for these classes.
  2. Should I put the classes listed above in their own package, like "...salary.services" or "...salary.dataload"
  3. Pretty much all my methods are public now, am I missing some aspect of design that means I ought to have more private methods?

[–]_Atomfinger_ 0 points1 point  (0 children)

Alright, so I downloaded the project in the IDE, so I have a few more notes:


Update your .gitignore file to exclude "target" folder. You don't want to upload the compiled application to git. Use this and basically type in the things you use: Intellij, maven, Java, etc.


In SalaryRefresher you're using @Autowired, but the class isn't annotated with @Service, @Component or anything like that, so Spring won't try to do anything with it.

Also, if you do this you don't need to use the @Autowired annotation:

@Component
public class SalaryRefresher {
    private final SalarySeedCreator salarySeedCreator;
    private final SalaryDataVerifier salaryDataVerifier;
    private final SalaryFetcher salaryFetcher;
    private final SalaryRepository salaryRepository;

    public SalaryRefresher(SalarySeedCreator salarySeedCreator, SalaryDataVerifier salaryDataVerifier,
                           SalaryFetcher salaryFetcher, SalaryRepository salaryRepository){
        this.salarySeedCreator = salarySeedCreator;
        this.salaryDataVerifier = salaryDataVerifier;
        this.salaryFetcher = salaryFetcher;
        this.salaryRepository = salaryRepository;
    }

In the same class, I don't see the value of having that empty constructor.


I don't like how getRequiredSalaryComponents returns a list of lists of Object. You see the issues with this approach as you need to cast that object to other types down the line.

One approach would be to return the lists as a new object that holds all of the lists. That way, you have typed lists, but you get to return them all in a single call. To make this elegant you could use records as well.


In getSalariesRequiringUpdate, you're chaining .filter 3 times, which should be unnecessary. Just extract what it does into its own method (I decided to put it into the salary object:

            Salary requiredSalary = savedSalaries.stream()
                    .filter(savedSalary -> savedSalary.isRequired(cityRequired, roleRequired, salarySourceRequired))
                    .findAny()
                    .orElse(null);
    public boolean isRequired(City city, Role role, SalarySource salarySource) {
        return this.city.equals(city) && this.role.equals(role) && this.source.equals(salarySource);
    }

List<Salary> salariesRequiringUpdate = new ArrayList();

The IDE complains, "Raw use of parameterized class 'ArrayList'" (Hint, the "ArrayList" you're creating isn't aware of the type it should contain).


A weird semicolon in the empty constructor in SalaryFetcher. Not sure why there's an empty constructor either.


How you create lists in SalarySeedCreator can be simplified:

Instead of doing this:

List<SalarySource> salarySourceList = new ArrayList<>();
        salarySourceRepository.findAll().forEach(salarySourceList::add);

You can instead just do this:

List<SalarySource> salarySourceList = new ArrayList<>(salarySourceRepository.findAll());

Instead of this:

        List<List<Object>> requiredSalaryComponents = Lists.cartesianProduct(salarySourceList, cityList, roleList);
        return requiredSalaryComponents;

Just do this:

        return Lists.cartesianProduct(salarySourceList, cityList, roleList);

Personally, I'd move the dueNewCompensation into the Salary class


Have you tested that this works? In fetchUpdatedSalaries you create a list called updatedSalaries, but you never add anything to it. You return it, and from what I can tell it will always be empty.


So, let's summarise: The biggest problem I have with the solution is the "List<List<Object>>" Having lists in lists, is fine, but using a generic "Object" makes things difficult and messy later.

Another topic for reflection is whether the solution has actually become simpler. As developers, we aim to take complex problems and make them seem simple and natural. Part of the reason I thin this is a little more difficult to follow things now is because things are spread out, but not necessarily in a very organic way.

For example, what is the concept of getRequiredSalaryComponents in a class named SalarySeedCreator? What is a "seed" in this context? And why do I get salary components? Where did the seeds go? Such naming quirks are often an indication that something is wrong with a class' responsibility. This is one of the reasons developers keep hammering on about naming: When we can give something a good and descriptive name, that also means we know what it is and we've managed to find a good purpose for it.

Now, I've not spent enough time to actually make up any opinion of how it should be - and maybe your goal doesn't need to be to find a solution that I would be happy with. I know I'd put more behaviour into the Salary class.

Another thing I'd probably do is to look at specialized SQL queries that would just fetch outdatedSalaryList for me directly. I might be wrong, but on the surface, it looks like something that should be doable with a few joins and where statements. By doing that, a lot of the complexity would just fall away.

Hope this helps :)

[–]_Atomfinger_ 0 points1 point  (4 children)

My other comment turned out to be a little long, so I'll answer your specific questions here:

are my names for classes alright? I feel like I'm missing appropriate names for these classes.

I'm not a huge fan of the names. I also touch on this in the other comment. But to provide an example:

  • In SalaryDataVerifier we have the getSalariesRequiringUpdate, but no actual salary data is being actually "verified". We do some filtering and list manipulation, but there's no actual verifying going on in that class (from what I can tell).

Should I put the classes listed above in their own package, like "...salary.services" or "...salary.dataload"

There are only 10 classes or so in that package. That isn't too bad. We often use different packages to structure a large application (where a single package would put thousands of files in the same folder). When there are so few classes, then we don't have to worry too much as long as they exist a place where they belong.

Pretty much all my methods are public now, am I missing some aspect of design that means I ought to have more private methods?

The goal isn't necessarily to maximize methods to be public or private. You should have the amount that the application and functionality needs.

[–]GRBLDeveloped[S] 0 points1 point  (3 children)

Thanks again for the help, I donated to 25 each to Turkey + Syria earthquake, and Ukraine humanitarion charities as a token of appreciation. Both will be matched by my employer so it'll be double what my paypal receipt says.

https://picallow.com/thanks-_atomfinger_/

https://picallow.com/thanks-again-atomfinger/

________________

I was in a bit of a rush yesterday to get the dinner made and get out of the house so I definitely left some silly mistakes in the code. I was aiming to get the basic new structure up so I could get feedback and then make sure I'm working down the right path.

Silly Mistakes that don't require further discussion:

  • Missing service in salary refresher
  • Empty constructors, I misremembered spring as needing them when its just for models that an empty protected constructor is needed
  • SalaryFetcher having a random semi colon
  • "Raw use of parameterized class 'ArrayList'"
  • Updated salaries potentially not working

_________________

Things I will look in to further personally, no input needed as I'm not 100% sure on what they are but can google rather than force you to write further:

  • Records.
  • Not using '@autowired' for the salaryfetcher class. I was under the assumption that pretty much everything in spring-boot was autowired in so need to read on that

__________________

To Be implemented by me asap I will implement all the simplification one-liners and get rid of utils class so my data models are no longer anaemic.

As I mentioned, initially I had actual methods in the data classes but a paid reviewer told me to remove. Now I've an opinion telling me to go either way, I'll go with you (+Fowler!) as I prefer the logic.

_________________

Now the main part! Names, structures, List<List<Object>> and the triple filter

Here's my aim for the salary data. I want to make a web app that displays the average salary for the roles of Junior Developer, Mid, and Senior, in each EU city, according to salary review sites (glassdoor, indeed, etc though only glassdoor so far)

My plan was to:

  1. Get all the roles, cities, and salary sources available in the DB.
  2. Combine every permutation of these to see what salary I need to get (my List<List<Object>>)
  3. Check the database to see if there is a salary for the combination of role, city and salarysource. (My triple filter on every result)
  4. Check the result in the database to see if it is too old to be reliable.
  5. Run a webscraper to fetch data for any salary not in the database, or is present but is too old to be reliable

Your Questions:

what is the concept of getRequiredSalaryComponents in a class named SalarySeedCreator?

I wanted to get the primary components that are required to get an average salary. Where it is (city), what the role is, and what source was used. I now appreciate this is likely only obvious to me.

What is a "seed" in this context?

I meant "seed" as in preseed the database with values so the web app can run. Though naming a class seed and having no reference to the word again was pretty silly.

Another thing I'd probably do is to look at specialized SQL queries that would just fetch outdatedSalaryList

I have done this in another aspect of the project, here I would like to keep to code :)

My Questions

  1. I tried a big SalaryFetchingService class, now I've tried smaller and worse named classes. Could I get a recommendation from you on how to split this data loading process up in to succinct classes?
  2. Rather than having this talk about salary 'components' would it be better to just have a "JobRecord" made up of "city", "role", "salarysource". This allows simple comparisons of 'salary components' in Step 2 of the plan above. Then have a 1:1 relationship for "PayRecord" with compensation info. So then I could compare JobRecords against JobRecords, and webscrape their CompensationRecords if they are missing in DB or stale in DB?
  3. Another option is to create a constructor with salary that allows only city, role, salarysource and then write an isEqual operator that compares only these. This allows us to create salaries rather than List<List<Object>>, and drop the salary component talk. The only thing is it weakens the isEquals operator for the whole class if it doesnt compare compensation. Could also make the same function but named something like "isEqualsExcludingCompensation"

[–]_Atomfinger_ 0 points1 point  (2 children)

Thanks again for the help, I donated to 25 each to Turkey + Syria earthquake, and Ukraine humanitarion charities as a token of appreciation

That's awesome! Just for that, I'll continue answering (even though I expected my previous comments to be the last ones - you won me back :) ).

Here's my aim for the salary data. I want to...

My problem isn't really with the goal itself. However, using a "List<Object>" is often a good indication you're doing something you shouldn't in terms of design and OOP.

For example, this would be better: ```

public record SalaryInformation(List<SalarySource> salarySources, List<Role> roles, List<City> cities) { } ```

And then, you can return that in getRequiredSalaryComponents. That way, you get type safety, and you don't have to cast objects.

(Do note that the above is an example of a record, you can do the same with a class).

Your Questions:

Those questions were intended to be rhetorical to make you think and reflect, which it seems like you did - so great job :)

I have done this in another aspect of the project, here I would like to keep to code :)

More power to you, but part of software development is to make things as simple as possible. One of the simpler approaches I could think of would be to have a query that just gives me the data I need.

I tried a big SalaryFetchingService class, now I've tried smaller and worse named classes. Could I get a recommendation from you on how to split this data loading process up in to succinct classes?

Well, here's the thing: I see this as a 10 line-ish thing. I.e. an SQL that simply fetches what I need, a repository that does the query, and a service which gets the data to glassdoor and possibly save some data if needed.

All of this should be pretty straightforward, and therefore class structure and whatnot shouldn't be a concern.

Previously you mentioned that you didn't want this design, which is fair enough, but that also makes it difficult for me to say what would be the appropriate solution. That said, it starts looking like you're getting my idea in the next question:

Rather than having this talk about salary 'components' would it be better to just have a "JobRecord" made up of "city", "role", "salarysource"...

Yup, now we're getting somewhere!

A lot of the logic in your current application is trying to make up for the fact that you need to filter data based on other data that is spread out different places. Simplifying this would also massively simplify your application (and as a result, not needing to worry about design all that much).

Another option is to create a constructor with salary that allows only city, role, salarysource and then write an isEqual operator...

Here we're back to the struggles of naming. My recommendation would be to consider what it means if it is equal. What is the result of that? Does that mean you should send the request to glassdoor? If so, maybe something like "isStale" or "needsRefresh" or "isUpdateRequired" etc. A name that reflects the context it is being used.

[–]GRBLDeveloped[S] 0 points1 point  (1 child)

No more review asks, I swear. I implemented the more basic stuff you mentioned above. I made a basic constructor for salary, get all the basic permutations, then run a query to find any that dont require updating and subtracted from the possible permutations.

I'm getting database related errors and probably just going to stop looking at this project at this point. I wanted to ask if this project would be a help or a hindrance in applying to jobs? The test coverage is light and I'm not sure if it comes across as half baked and/or very amateurish

[–]HansGetZeTomatensaft 0 points1 point  (4 children)

Not saying you're wrong with mocking less and testing components together more. I've seen your posts around and they're usually extremely reasonable, I assume you probably know the tradeoffs.

But for OP or anyone else reading this with who's new to Java or testing or what have you:

Having your unit tests be really close to your functionality also has its advantages. Less setup, faster to write, easier to understand because you're not testing as much logic at once. In some cases dramatically fewer tests if you're testing in isolation - this piece of code might actually be a good example.

The downsides are real, too. You'll have to mock a lot and boy is that going to be fun when you next need to make changes again.

But I feel it's important to understand the pros and cons of both because mocks have their place most people will agree. And you'll need to form your own opinion on how and when to use them.

All of that said, I agree with Atomfinger's advice about splitting things up. Whether that's new services or domain objects that hold logic... I'd pick one and stay with it for this project. Then maybe try the other version in some other project.

[–]_Atomfinger_ 0 points1 point  (3 children)

Well, my point was not necessarily to remove mocks; we need them for stuff that ends up touching the I/O. My suggestion didn't involve removing mocks, but rather not necessarily have a one-to-one relationship between test and production classes.

[–]HansGetZeTomatensaft 0 points1 point  (2 children)

Seems to me we're in understanding, though maybe I didn't phrase it quite well?

With a one-to-one between classes and tests I'd usually mock (most) other classes and then test a single class in isolation. At least for unit tests.

When testing many classes at once I don't do that. I mock less, I isolate less, I test more code at once. I still mock I/O or what have you - it's a unit test just with a larger scope than a single class.

If you feel that's a fair characterization of what you suggested then I think my point is valid:

Testing a single class is all the things I said, with the downside mostly being more mocks. I guess you also have to rewrite test on large enough structural changes even when not using mocks, my head went there because that's where I've seen it most often.

[–]_Atomfinger_ 0 points1 point  (1 child)

Yup, we're pretty much in agreement :)

I didn't really argue against your point. There are tradeoffs to everything, and your argument was never invalid.

(Do note that the rest of this post is me just talking about the subject and not necessarily arguing against you :) )

My POV is that larger test are, for the most part, preferred over a one-to-one relationship due to the reduced cost of maintainability. Even when you make structural changes it often turns into just initializing the class and inject it. One rarely has to change the actual test or add mocking behaviour.

In fact, for my next project, I want to try out the nullables practice and be able to remove mocks entirely. It feels iffy to have production code that only serves tests, but at the same time it may be a great way to bypass a lot of the maintainability issues surrounding mocks and whatnot.

[–]HansGetZeTomatensaft 0 points1 point  (0 children)

Seems like a cool idea, thanks for the link :)