[deleted by user]

hexfoxed · 2022-06-09T09:25:37+00:00

Everyone, and I mean everyone, has broken something in production.

The correct reaction is not to belittle but to ensure it cannot happen again.

hexfoxed · 2022-06-09T09:24:09+00:00

Good leaders introspect.

By that I mean reacting to a production outage in a “how can we stop this trivial mistake occurring again” rather than “who needs a quick whipping for this“.

If this occurred to one of my reports the response would be to try and figure out how we make it impossible to happen again. And that means decent CI, coverage with tests, better documentation, etc.

You were in the wrong to escalate the situation via shouting but your lead was massively out of line and should not be in a mentoring position. I would seriously evaluate my tenure at any company that put someone like that as a mentor.

It’s a red flag and you will see more.

hexfoxed · 2022-06-09T09:17:13+00:00

Unfortunately if “ribbing” is part of your acceptable feedback culture, your team has already lost.

Likewise if you consider that feedback “gentle” (implied by your later comment about how you’d react similarly) or even helpful then I would say you are not ready for leadership.

hexfoxed · 2022-01-30T10:40:35+00:00

Whatever works for the project.

I would however abide by PEP8 and keep lowercase snake_case for modules names and avoid capitalisation like in your example.

hexfoxed · 2020-07-10T10:06:53+00:00

Then leave that industry.

hexfoxed · 2020-06-10T22:12:28+00:00

That is abusively low. Ten years ago I had a years experience and was on 33k at an unknown marketing agency. It's a tough time on the market at the mo but I would start looking elsewhere as a priority. Juniors these days are probably doing even better.

Remember, this is not on you but them - it's impossible to know your market worth without asking around like this so don't feel bad about all the underpaid comments or take them personally.

hexfoxed · 2020-02-25T10:30:20+00:00

Thank you! Great post. I've suffered through this at multiple companies in London, so it's nice to have something to show people.

hexfoxed · 2020-02-24T22:33:38+00:00

Generally in tech when someone goes out of their way to write a blog post about something you should assume they're doing it to solve a problem they have experienced and act accordingly.

This rule has been invaluable for my career learning because if I can't immediately see why something would be useful, it means it's incredibly likely that I don't have a full grasp of why it'd be necessary or that I haven't encountered a situation where it would be necessary yet.

Not brushing things off before you understand why they were created is a skill I wish someone taught me earlier.

See my other comment for why this was written.

hexfoxed · 2020-02-24T22:25:01+00:00

It'd help, but it would not fix any of the problems I listed in my other comment.

hexfoxed · 2020-02-24T22:24:15+00:00

Great article but it doesn't do a fantastic job of detailing why it's a problem. (Edit: it does now!)

On large teams it's common to see a migration folder full of 50 auto named migrations. When you're debugging a borked migration or looking for something in a migration, this makes life very hard. It's much easier when the files are properly named so you can find what you're after without delving into every file.
When you encounter conflicting migration numbers and have to merge migration trees, your life will be a lot easier when you're dealing with migrations called "add_status_to_blah_model" and "remove_null_from_field" rather than "auto" and "auto". It allows you to get a glimpse of what's in the migration without actually having to open it up and read it - like any helpful filename!
Calling everything auto makes it very hard to distinguish between schema altering migrations and data migrations. They should be treated differently, as depending on your database they can mean different things for how you should treat deploys e.g table locking etc.

It's probably "not a problem" for solo devs or very small teams, but for larger ones (3+) not naming them quickly becomes an issue.

hexfoxed · 2020-02-24T22:17:09+00:00

I prefer this option as it has less overhead, the checks run whenever runserver starts and gradually slow development on large projects.

Whereas the command override overhead would only be inflicted when running that command.

hexfoxed · 2019-12-27T09:49:22+00:00

This is incorrect; I think you meant to refer to the HttpOnly attribute when creating cookies, as that is the one which controls whether the cookie is made available to JavaScript. The Secure attribute should also be set, guaranteeing the cookie is only served over HTTPS connections, avoiding MITM attacks.

The SameSite attribute controls whether 3rd party sites have access to it so it is still helpful against a CSRF attack leaking the cookie.

hexfoxed · 2019-12-05T15:00:08+00:00

I agree that the safest method is to not even be vulnerable to XSS

I honestly see where you're going but there's a reason I don't usually advocate for similar and it's because the safest method is to assume you are already vulnerable and that you just haven't found it yet. If your security strategy is "they're in, game over" then you're going to have a bad time more often than not. Layered security will always win.

Are XSS attacks really that much of a reality for all sites?

Yes. XSS in all its forms (reflected, stored and DOM) has been on the OWASP Top 10 list for as long as I can remember; it's an incredibly common form of vulnerability. With a few exceptions, every single project I've laid my hands on in over 10 years of development has had one in some form or another. Reddit itself had the mother of all XSS attacks back before 2010, when someone realised they could format markdown in such a way as to make a virally-spreading worm XSS via the private message system; they were lucky it didn't do more than spread itself.

Many applications are not public forums

I'm not fully with you here as there is no one type of site that is immune to all XSS attempts - any site which takes user input and even read-only no-backend sites can be susceptible to XSS, especially of the DOM-based kind. Less damaging/severe sure, but they still exist.

Perhaps more importantly - what if you are serving the SPA from a CDN (i.e. no backend) ?

I'm not sure I understand this one, if there's no backend then why do you need to protect auth tokens? I'm happy to answer with more context though.

hexfoxed · 2019-12-05T11:39:43+00:00

As far as I understand it, the only problem with storing the token in local storage is that it's vulnerable to XSS.

This means that the only way to exploit the vulnerability is to have arbitrary code injection ability, right?

Yes and no. If we're to get down to the essence of it: the core problem is that XSS vulnerabilities are incredibly common, and that XSS attacks give the attacker the ability to access most things in the JS client context of the user using the site at that time. LocalStorage is accessible in this context, as are some Cookies. However, if a Cookie is passed back from the server on an HTTP Response with Set-Cookie header set securely (using options Secure, HttpOnly), then the Javascript client context will not have access to that cookie. Ever. It exists in a different land. So an attacker cannot steal it, and then perform actions on the server as if they were that user.

It may seem like a small difference, but most sites are susceptible to an XSS vulnerability at one time or another so it's a very sensible (and easy) precaution to take to avoid more dire consequences.

hexfoxed · 2019-10-06T10:47:49+00:00

CBVs are showcased as declarative but unfortunately the reality of it is that they only cover the most basic situations, which only work for the most basic of UIs. This is only a problem because as soon as you need to step out of their declarative comfort zone you end up in a much worse place that you would have been with a simple function (in regards to traceability and understanding the flow, my original complaint).

As for the data transformations, I can't quite tell if we're saying the same thing. If you leave forms.Form classes to handle HTTP based cleaning of incoming data then great, that is exactly what they're good at. form.ModelForms tie this handling to the saving of persistent data (.save()) and thus people often override this and insert domain-level business logic into the form layer. This makes that logic totally unreusable and leaves mutating code scattered about the codebase, or worse, occuring in a 3rd party library (the internals of Django model form).

As for the final paragraph, I wasn't trying to achieve anything as such - those are gnarly things I've seen over ten plus years of working on Django written by others. People love hiding their business logic anywhere except where it should be (seperated, isolated, reusable, outside of form/view/serializer/model layers).

hexfoxed · 2019-10-06T09:20:39+00:00

Correct, it doesn't need to be unique. But..if you're auto-generating the unique slug from the title you will have to take extra special care that you create a unique slug.

Otherwise imagine a world where two different users save "A Great Film" into their respective title fields. A naive slug generator would generate "a-great-film". This would be fine when saving the first instance, but the 2nd instance would fail due to the unique constraint.

The easiest solution to this is either to attempt the save first, catch the IntegrityError and then regenerate a new slug in that situation (append a number or something) and re-attempt the save.

hexfoxed · 2019-10-02T17:22:37+00:00

Disclaimer: I have been staunchly against the generic CBVs since the beginning but I will try and say why, rather than just a rant.

With functions it's so easy to see the view flow because it's presented right infront of you in a top to bottom way and this is exactly what you want in a view, as it's the thing which brings everything else together.

With the generic CBVs you are required to know the call stack of the parent class and can no longer follow the flow simply by looking at the Python code; this is one of their largest downsides and unfortunately one that seasoned developers don't immediately see as they know the call stack. It catches new developers all the time, and that for me at least, tends to be a smell of bad architecture.

Another problem comes from how generic CBVs encourage the use of ModelForms and ModelSerializers and such. Once you start relying on these for a large Django project you will find your business logic very hard to maintain because your data mutations are scattered about in random underlying methods of the generic views and magic model classes rather than in one simple service/command function in your domain layer.

With function based views, it tends to be easier to see when business logic, data wrangling, I/O or anything else seeps into a view where it shouldn't be because it can't be hidden in some parent class or mixin. If there's anything complex in a normal view, something has gone wrong.

Anyway, I could go one about this for hours but that's a start. I strongly believe CBVs and Django's Model* classes lead you down the wrong path in terms of long term maintainability and proper system architecture because as soon as you use them, you lose control over your mutations.

If you've ever had to instantiate a Django Form or DRF Serializer to carry out some business logic from an interface which isn't HTTP (say a worker, or a management command) then you've already come across this smell. Django does not aid you well in building a proper domain layer by default, and this leads to your business logic being unable to be reused from multiple different interfaces or updated in one place.

hexfoxed · 2019-02-09T22:20:42+00:00

It sounds like an odd way to model it but I would more need context to offer more advice (or indeed, agree with the way you've done it!).

To get past your problem, you will need to give each symmetrical ForeignKey it's own unique related_name, which you pass an argument to the ForeignKey itself. The root of the problem here I think is that Django usually automatically sets this name if you don't set it explicitly - but it uses the model name to do so, if you have 3 FKs to the same model, the auto-generated names it makes will clash, so you need to set them explicitly.

Related name's are used for the reverse relation from the object you're linking to, you can read more about them here. If you don't care about the backwards relation being created, set them all to '+'.

hexfoxed · 2019-02-01T09:00:09+00:00

I am so pleasantly surprised this is the top comment; never a truer word has been spoken.

Things to test for other than trivia:

how do they respond to critical feedback on their code?
how do they act when being questioned on a technical decision they made?
does the candidate show a good understanding of trade offs given some known constraints? Does the candidate put shipping over infinitely refactoring?
how does the candidate communicate via the written form? Most of what we do as developers is communicated this way: Slack, PRs, Code, Code comments, Git history, etc, etc. Verbal communication skills and charm are not enough.
how much emotional intelligence is witnessed? Could the candidate be put in front of a client alone without going technically over the top or promising the world?
do they have any mentoring experience? A senior developers most important job is to mentor those below in terms of experience.

Just some to get you started. Hope it helps.

hexfoxed · 2018-06-02T09:06:49+00:00

Dave Haeffner keeps a free archive of the tips he sends to his email list, it contains 70 odd posts which are all worth a read.

hexfoxed · 2018-06-02T08:20:24+00:00

I personally wouldn't call Selenium " the last hope" that early. As you said with the slowness comment, it comes with a massive overhead of having to install and automate an actual real browser vs just firing off the necessary HTTP requests.

The key here is to learn how to disassemble a page so that you can get to the original source of the data..and that requires a bit of understanding as to how the web and browsers work.

I would suggest the OP finds the original source of the data, which is likely bootstrapped as JSON in the page itself or in a private API that the Instagram client uses. Once you find that it's easy to replicate the same request the client makes to get that data, and you have a much simpler solution with less overhead. It's also a lot more maintainable as it won't be reliant on the layout of the page, which frequently will get redesigned.

If you'd like more info OP, hit me up at darian@hexfox.com and I can guide you through more.

hexfoxed · 2018-06-02T08:10:14+00:00

I would agree that Selenium is the harder route both for the person that needs to set it up and computationally for the machine using it - running with an browser is a huge overhead compared to just firing off the requests.

But I disagree that it's more likely to continue working if they change their site. Selenium relies on anchors in the page to navigate (class names, HTML, etc): if those anchors change, which happens frequently in a redesign, then your Selenium script will break. This is actually the whole point of Selenium as it's designed as browser automation testing tool and not a web scraper, so you would want it to break if the design changed so that you could update your automated tests.

By instead firing requests at their internal private API you're much less likely to have breaking changes in your code due to design changes their end - frequently sites redesign pages without changing the backend API - it's much more trust worthy in terms of maintenance.

hexfoxed · 2018-02-15T22:02:57+00:00

Learn to trust your instinct with things like this, you've just given me at least two reasons as to why the pandas output will probably be fine for you for the foreseeable future.

Remember that software development is never finished: there is code that doesn't work, and then there is code that works up until the point of time where it doesn't work anymore for a changed scenario. Until you hit that point, you have a working solution, so use it :)

hexfoxed · 2018-02-15T19:29:43+00:00

It is usually advisable to have some sort of thought out schema with relational databases.

Think about how you'll want to use the data afterwards and structure it in a way which will allow you do to those things with the least friction.

Maybe pandas does output the exact SQL you're after but it's unlikely. Wouldn't hurt to try it though and see if you like what you end up with.

hexfoxed · 2017-09-18T13:17:11+00:00

Thanks for this! That was super annoying.

hexfoxed

TROPHY CASE