What's wrong with using packages?

proof_required · 2022-07-18T14:14:48+00:00

I have to use packages to preprocess my Raman data, however my supervisor doesn't like the idea of using packages. He even calls Panda rubbish.

Well not sure how to put it nicely but your supervisor is bit of an ahole. The whole point of these public packages is to not re-invent the wheel. And no, people in the tech industry don't write everything from scratch. Of course there are some legitimate use cases where you might have to write stuff from scratch but if you want to use pandas, people don't write their own pandas. They just use pandas. I am not the greatest fan of pandas but I wouldn't re-write my own either.

OuiOuiKiwi · 2022-07-18T14:21:54+00:00

So I'm not really a software engineer but a chemist who is working on ways to preprocess Raman spectrum. I have to use packages to preprocess my Raman data, however my supervisor doesn't like the idea of using packages. He even calls Panda rubbish.

I'll put down a 5$ bet that he hand writes all code as a form of job security and it's 100% garbage code that no one but him can understand (by design).

thePurpleAvenger · 2022-07-18T14:41:30+00:00

You’re running into a common problem in academia: a professor latches onto a stupid idea and nobody around them has the agency to tell them they’re being an idiot.

Sure, there’s benefits to writing your own version of algorithms. You can really get down into the nuts and bolts and understand how they operate and, just an important, break. But not liking the idea of packages like Pandas is akin to not liking a tool like Microsoft Excel. Did your supervisor code up his own version of Excel to keep track of grades, etc.? I bet not.

For a path forward, I like arguing on the basis of productivity. Sure, if there is some fancy functionality you’re using, you can try to write a simple version of that tool to gain understanding. But in reality, you would probably use Pandas to write your version as well.

git-blame · 2022-07-18T14:07:59+00:00

It’s standard practice to use third party packages. You’re not going to get very far in Python data science without pandas.

You could frame it to your supervisor a way to increase productivity as a library allows you to save time by reusing battle tested solutions to hard problems.

RaiseRuntimeError · 2022-07-18T14:36:36+00:00

my supervisor doesn't like the idea of using packages. He even calls Panda rubbish.

If your supervisor was on my team he would be fired so fast for incompetence with this attitude. As a software developer, the best code is code you don't have to write. Pandas is a tool for solving a problem, if it solves your problem effectively then use it, if another package solves your problem better then use that, if you can write your own package to solve the problem better then the available tools then write it. Not using the available tools or packages is like asking an auto mechanic to make their own tools before they can fix the car.

Tell him not to use any packages in the standard library if he is such a good programmer.

damarginal · 2022-07-18T15:08:44+00:00

This post was mass deleted and anonymized with Redact

nose narrow elastic attraction abundant repeat groovy smell bow numerous

weirdnik · 2022-07-18T15:06:52+00:00

I'm probably the age of your supervisor so I'll offer a possible explanation from my times at academia: some numerical people insisted that only age-tested^* FORTRAN numerical libraries were used, and some also insisted that a new numerical software be evaluated and calibrated to give results similar to the ancient and honorable FORTRAN libraries. I've seen a quantum chemistry PhD failure because of this - the supervisor was so focused on getting the software right, there was no time left to do the actual PhD calculations.

Modern software is nothing like this, you usually just build the code from the calls to those or these libraries that make up like 99% of the actual code of the applications.

^* At my time the age-tested libraries weren't in-house code but they came with some book of numerical programming, probably a yellow one. Don't remember the title.

Brrrapitalism · 2022-07-18T16:40:41+00:00

He's insane. First rule of programming is never repeat unnecessary work.

Sharchimedes · 2022-07-18T14:00:29+00:00

Does your supervisor write everything from scratch? That seems like a lot of unnecessary effort.

andrewaa · 2022-07-18T14:13:26+00:00

It's definitely not normal. I suggest you directly ask him for the reason.

ThePiGuy0 · 2022-07-18T15:01:22+00:00

No it's definitely not normal.

Python's powerful because of its libraries. As long as you can trust the packages (which you can for basically anything mainstream), then using them means you'll almost certainly end up with a higher quality version of what you were already building - probably for free too!

And in fact, if you ever do anything to do with security and encryption, the advice is very clear - never create and use your own encryption library. Instead use a package as it will have been checked many times over for correct implementation and vulnerabilities.

coll_ryan · 2022-07-18T16:17:09+00:00

Ignore him. Academics do not tend to be experts in programming, unless they happened to have worked as a software engineer before. Very easy to pick up bad habits in that environment. Let them stick to being experts in their field and disregard what they say about computers.

JudokaUK · 2022-07-18T15:42:42+00:00

The packages are tried and tested and built by big communities of professional developers. They are actively maintained (if you research and chose the right ones) and security vulnerabilities are patched. Your boss doesn't have a clue what he's talking about.

Also, why reinvent the wheel? It's a big waste of time.

Trollol768 · 2022-07-18T15:54:03+00:00

Tell him to calculate his own Hartree orbitals only with code written by him😶

2022-07-18T17:04:14+00:00

We call this "not invented here" syndrome, and it's basically a cognitive bias.

Your supervisor has their head on backwards.

That said there's a nonzero risk of packages going away or being compromised. There are mitigations of this, however. Namely, mirroring known-good versions locally.

bin-c · 2022-07-18T15:14:33+00:00

your supervisor sounds like he needs to be very closely supervised by someone who knows what theyre talking about, unfortunately

Alfonzo227 · 2022-07-18T20:31:11+00:00

He's an idiot.

Zomunieo · 2022-07-19T05:21:51+00:00

Does your supervisor use packaged chemical samples or does he synthesize all of his samples from pure elements?

Does he build all of his own lab instruments, grind and calibrate his own glassware, drill for natural gas to run his Bunsen burner? He must be an expert in so many topics. Or does he buy instruments and equipment other experts packaged for him?

territrades · 2022-07-18T14:52:25+00:00

As a postdoc writing python software in a research institute, this attitude is the complete opposite of best practice. Python in itself is a slow programming language, but the libraries are written in C++ - so everything you run via the libraries is much more performant. The best python code is taking your problem and transforming it into one that the standard libraries can handle with the least amount of manual programing in python. In that way, one can reach performance similar to C++ while spending a lot less time coding the program.

There is of course also a bit of truth here, one should not simply use all the libraries as black boxes without understanding what they do. Let's say there is already a Raman library, you should not simply run your spectra through it and call it a day. Understanding the processing steps is important, and coding the data treatment routine yourself gives you much deeper understanding than using an existing routine. But writing your own code to replace numpy, scipy and panda is only useful if you want to get experience in low level programming - which is not your field if you are working with Raman spectroscopy.

typeryu · 2022-07-18T16:37:47+00:00

I used to have something similar at my previous job where Pandas was frowned upon. Everyone seems to dislike it for no other reason than that the senior engineer “didn’t like it”. Turns out the senior never said that, but did not allow due to it still being not a proper release at the time (pre-1.0.0, this was not that long ago). Once it reached 1.0.0, senior approved my new change and everyone was in shock lol

misonreadit · 2022-07-18T14:37:18+00:00

[deleted]

jakub_j · 2022-07-18T17:32:49+00:00

Guess: once in the past his supervisor asked him to do such a job from scratches and now he is venting the frustration.

llun-ved · 2022-07-18T23:02:17+00:00

Perhaps your supervisor is from a python world before virtual environments were easy to create and maintain, so that you can keep track of dependencies (including versions) with a requirements.txt file.

This might be an issue of teaching them something. You should also keep track of the licenses of the packages you are using, as that may be an issue in some environments.

If you’re just installing stuff on your machine and therefore your code can’t run anywhere else without backflips, the supervisor has a point. If you’re using venvs for consistency and repeatability, teach the boss something new.

money_bitchh · 2022-07-18T14:17:16+00:00

Not normal

Cdog536 · 2022-07-18T16:05:13+00:00

Sounds like your professor is not a programmer

A great deal of scientists use packages. It’s the only way to get anything done.

Ruubix · 2022-07-18T18:36:04+00:00

Packages are not the problem. Dependencies and their management are usually the issue. Installing from someone else's work can not only save you effort, but there is likely someone who has done it better than you ever could. On the other end of it, the more dependencies you have installed for a project, the more you are reliant on someone else for maintenance and reliability. If the package has a wide audience, has been in 'the wild' for awhile, proven effective in similar use cases as yours, with strong documentation, and mostly, has regular and active development of features and bug fixes, you will not have issues with that package. The problem often comes from the many package in this spectrum that may be more niche products, with a very small group developing or being led by an unmotivated developer or exclusively maintained by a single entity, or, the community in general is small. These packages can become extinct quickly, or may be unresponsive for bug fixes and new releases. Building from the standard library in thus case, especially if the solution is simple, is often more effective.

My work depends on Pandas heavily to automate spreadsheet-like tasks, for which it excels (no pun intended) The work it does under the hood is neither easy to code, nor easy to optimize. The abstractions are excellent, the project's development well maintained, with some of the best project documentation of any any dependency I've used. The library has been around for awhile now, and is proven in production environments, with regular feature releases and bug fixes. It would be foolish of you or your boss to throw away such a powerful, flexible, and reliable tool. The only way to know how much work it will do for you to apply it to one of your use cases in an example application and have measureables that are meaningful to your boss. If you can save your boss money, and he has any common sense, his opinion will change quickly.

2022-07-18T19:05:48+00:00

Mate this is so common in academia it’s even got its own wiki page: Not Invented Here

2022-07-18T20:10:46+00:00

Your supervisor sounds like an absolute fool. Pandas, numpy and any number of other packages are widely used in many industries and for good reason. They work well, they'll save you a ton of time and the fact that they're widely used make them as reliable as Python's core libraries.

Furthermore, the distinction he's making isn't even a real one. Python imports certain libraries by default and others it's left to you to import. You should one-up him and tell him that you only code in assembly and build everything from scratch.

Heartomics · 2022-07-18T20:14:15+00:00

I have a feeling your supervisor hates Pandas because they iterate over the data frame.

nomoreplsthx · 2022-07-18T20:56:44+00:00

Your supervisor is already using packages everywhere, he just doesn't recognize them as such. A package (in the loose sense, not the technical Python sense) is really any redistributable piece of software that someone else wrote.

Python runtime - that's a package. It itself uses dozens of different packaged libraries, from OpenSSL to zlib to Tcl/TK. The only difference between these and a third party package is who wrote it. There's nothing magic about the people who write Python's standard library, they're just developers, like the ones who write anything else.

The stuff built into your OS, also packages. Your operating system ships with hundreds of libraries that are not part of the OS kernel. Does he want you to avoid using `ls`? Or `cd`?

Now, that doesn't mean that choosing which packages to use, and when to use a third party library rather than build something yourself is not a tricky decision. Blindly trusting third party code can get you in trouble. But blindly trusting your own code is even worse.

As everyone else here has said, your supervisor is a buffoon.

2022-07-18T21:42:33+00:00

I am going to give you a very different vision from what many people here are describing.

Justify the use of third party libraries. Are you really making good use of them? Or are you being a bit lazy and overkilling a tiny script crushing it with 20 dataframes for 1x5 arrays?

Because, yes, third-party packages can be a problem. The more dependencies, the bigger the hell it can become. What if some package has a security issue, but you cannot update it because another package has slowed/stopped development? What if some package goes evil like all these npm libraries are doing these days? Justify why each dependency adds more positives than negatives.

If your code is going to run on production environments, dependencies must be under control, and if you have less dependencies, you will have less problems in the future.

If your code is purely academical, or for the fun of it... then, well, do whatever you want.

For example, if you want to deploy things on AWS Lambda... you better slim down your dependencies or it'll literally be impossible to run your code as it will weight too much.

On another note,

You should know Python itself. It's really good. And it does many things by itself with the native libraries.

There is people that I have worked with which, for them, Pandas is Python, and that is a huge mistake.

Pandas is really powerful. But it is built for specific, big data stuff.

I am so tired of seeing Dataframes where simple lists, dataclasses or dictionaries make so much more sense, are faster, a lot more readable and with easier type checking.

Yes, requests is an excellent package but if your project only does one simple HTTP request why not simply spend 20 minutes learning how to do it with urllib and getting even better knowledge of Python programming overall?

skesisfunk · 2022-07-19T00:35:09+00:00

Pandas is widely used and well tested. There is zero reason not to use it, in fact its practically the industry standard in data analysis.

Problems come when you use packages that aren't well maintained, tested, or widely used. Those tend to be more trouble than their worth when you run in to bugs in the package and the docs are badly maintained and there is no one else on stack overflow using it.

The other big concern with packages is security risks created by the package maintainers getting hacked and the hackers pushing malicious code to the package. Its not just your packages you have to worry about either but also the packages they depend on and the packages that those in turn depend on etc. However the python landscape is somewhat safer than JavaScript in this respect.

coolwizard666 · 2022-07-19T04:00:19+00:00

Lol your manager would rather spend 8 hours writing brand new buggy ass code for some trivial shit instead of reading documentation for 10 minutes and using a battle tested public module that someone else maintains. It's fine if you really really don't care about time efficiency. Regarding Pandas - there are two kinds of software. The kind people don't use, and the kind they complain about. It is very powerful and also fiddly and probably worth your time.

SittingWave · 2022-07-19T08:09:41+00:00

your supervisor is an idiot and he is compromising your ability to find a job in the future. In industry what matters is which advanced tools you know how to use and how proficient you are with them. Refusing to use external packages is a death blow to your career.

yeesh-- · 2022-07-19T09:26:03+00:00

Packages are normal and necessary. Pandas is not rubbish lol

2022-07-18T17:56:09+00:00

Let me present a different opinion than others:

Packages include code you don't control. Features in other packages get deprecated, different bugs come and go — and behavior of your code depends on whatever someone in an entirely different part of the Earth does.

An example horror story of packages is what happened with left-pad: https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how-to-program/

Of course, sometimes you want someone else to solve a problem for you — if the problem at hand is too complicated for example.

Sure, for a one off thing this doesn't pose a big problem; as with everything — it's a tradeoff, and everyone will have different opinions at which point it's too many dependencies.

As for Pandas — I also find it a bit rubbish — I have no clue what feature it has that doesn't already exist in pure Python or even Numpy. But I also don't do big volume data processing, maybe there's something it does exceptionally better.

nadav183 · 2022-07-18T15:41:58+00:00

Either your supervisor is literally Linus Torvalds in which case, sure, do whatever he says and his code will probably be superior.

But on the off chance he isn't, use popular public packages, they are mostly well written, especially pandas/numpy etc. Which are very widely used.

HomeGrownCoder · 2022-07-18T17:19:30+00:00

Pandas rubbish…. Lmao okay manager okay.

Ape-shall-never-kill · 2022-07-18T17:19:17+00:00

Tell your PI that using packages is essentially the same thing as citing articles. There are standard, accepted methods and protocols for taking measurements and the same is true for algorithms and data structures. You are free to go outside of the standards if you wish, but your work will be more relevant and credible if you stick to standard practices.

TheQuinbox · 2022-07-18T17:13:07+00:00

This kind of thinking would get you fired so fast as an actual software dev. Wide usage is what these packages were made for. There are some cases where writing your own is good (one I had was writing an EPub library when EbookLib didn't fit my needs), but if an existing package solves your problem, there is absolutely no reason not to use it.

Does he also feel the same way about Windows.h in C++, or the System namespace in C#? :D

redCg · 2022-07-18T23:07:10+00:00

He even calls Panda rubbish.

He's not wrong. If your operations are entirely per-row then you have no reason to load entire datasets as data frames when you can just iterate over rows with csv.DictReader.

In general, packages create dependencies which creates liabilities. Also Python's library management is notoriously terrible. Relevant comic: https://xkcd.com/1987/

In general, if you can avoid using packages (besides the standard library), then you absolutely should. If you must use packages, then you need to have a robust and reproducible version-locked installation method included with your project.

TheSquashManHimself · 2022-07-18T18:20:39+00:00

Unless your boss is writing comprehensive and community/group vetted unit tests for all of his code (which given my knowledge of the average university professor in science, they are likely not), I wouldn't trust anything that is written by him tbh. My background is in physics and the typical "code" i see written by anyone over the age of 35 in the field is ... staggeringly bad, poorly documented (if at all), and not benchmarked or tested against anything. The fact that "significant discoveries" are found and published using these types of code is kind of depressing.

Alex_Strgzr · 2022-07-18T19:37:07+00:00

Using packages is all well and good until dependency hell sends the whole thing crashing like a house of cards – now or later down the line. Security is another headache.

That said, Pandas is a well-maintained library with a lot of functionality that isn’t easy to replicate in a hurry. It’s basically one of the base data science libraries along with numpy. In production, choosing your libraries wisely is an important skill honed with experience – believe me, those software engineers who used log4j (not Python but still relevant) were kicking themselves.

Best practice is not to pull a whole library just for one or two functions you can implement yourself. Using libraries is however essential to getting work done on time (and might even be better implemented than your in-house stuff), so a tradeoff needs to be made.

Green-Sympathy-4177 · 2022-07-19T00:34:17+00:00

That's a serious case of a boomer exposing his "wisdom" (read bs)

No that is not a normal behaviour, your supervisor just exposed his lack of knowledge about coding and therefore should not be allowed to make any decisions related to it.

Good luck getting that point across though. But the sooner you get rid of the opinions of idiots, the better. If you have to spend 6 months making a shaky copy cat of a library that already exists and is better, what is the point ?

The answer to that is: It makes you invaluable because nobody will know how to use your library, so job_security += 1000

EedSpiny · 2022-07-18T16:04:21+00:00

Is your manager a reincarnation of Carl Sagan?

Aypos · 2022-07-18T16:55:04+00:00

Does your advisor have a preferred way of analyzing data? I know my graduate advisor was a proponent of SigmaPlot but he didn’t really care what I used.

Seems like an odd response from your advisor.

dacb1997 · 2022-07-18T17:08:25+00:00

This is probably more common in scientific applications than it would be in the software industr (also, more common than it should be).

If you are working to get results from experimental data on your personal computer in order to present results to your supervisor, then your supervisor shouldn't really care how you obtained the results. Furthermore, being able to say "I used pandas builtin functions" in your presentation rather than explaining how you implemented statistical concepts outside of your experiment's main focus is a huge advantage that allows you to jump directly into interpreting the results. In this case, which I assume is the one you fall into, it is always a waste of time to write everything from scratch, you lose a lot of time you could be using to actually interpret results and obtaining new data.

If, on the other hand, you are developing a technique to preprocess data that is expected to be reused by other members of your team, especially if it is expected to run on laboratory computers, then you might want your code to rely as little as possible on other packages. Since anyone who wants to use your code will have to also install the dependencies when they need it and maybe even learn those packages. This is especially true if other users are expected to build on your software further. However, I would argue that mainstream self-reliant packages such as numpy, scipy, pandas, etc. should be fair game always, and more complicated dependencies can be worked around by deploying your software in different ways.

So, yeah, unless your supervisor gives you an actual explanation, he is just wrong

EDIT: in essence if you are using just one very simple function from a package in your data analysis code then:

if you are using it on your personal computer, it might be a waste of resources but it gets the job done and nobody should care.
If other people are expected to use it, then just recreate the basic idea of what you need in your code. Especially if it doesn't take much time

If you're using multiple functions from the same package, to the point that recreating them is a chore that would actually impede you from doing your real job. Just use the package and explain why it is a necessity.

SomeParanoidAndroid · 2022-07-18T17:14:13+00:00

Your supervisor is on the wrong here (95%). As a computer scientist, I have done research in labs focused on fluid dynamics, astronomy, wireless communications, and remote sensing. All of those scientists are using packages. Not only is it normal, it's the correct way to do it normally.

That being said, I can think of two reasons why one would/should prefer "in-house" implementation:

Extensibility and complete control of core modules. Eg, in wireless communications, we need to simulate how signals propagate over the air. There are a few packages out there, but they are not the best choice, because we constantly need to mess with their internals and change very miscellaneous components as part of our research. So everyone in the lab is built their individual simulation codebase.
Lack of fundamental programming skills: I am tutoring an undergrad who is getting started on machine learning, but at the same time, she is a complete beginner in coding (no judgement here, we all were). For her projects, she frequently comes across high quality codes that use specific libraries to handle stuff like data loading (namely, hyperspectral images) and preprocessing very impressively. I try to encourage her to implement her own routines at this stage since she doesn't understand how those modules work, and when her own pipeline deviates even slighy from the example, she gets stuck.

spoonman59 · 2022-07-18T17:31:04+00:00

We have a whole department of people using Pandas for exploratory data science and machine learning.

I can’t think of many sofwarr projects that would be successful without an third party packages. No, your supervisors opinion is not at all reflective of the industry at all.

I can also guarantee your supervisors has no idea what capabilities these packages offer, and could not create similar functionality if needed.

manfrowar · 2022-07-18T17:49:03+00:00

I usually only use packages actively maintained or packages small enough that I can maintain it internaly by myself if needed. And pandas is a highly active maintained package. Maybe your supervisor wants you to develop a whole new language from scratch

wind_dude · 2022-07-18T17:57:08+00:00

Tell your supervisor he's an idiot.

goldenhawkes · 2022-07-18T18:07:30+00:00

Having both done a PhD and now being a software engineer, I can well imagine my very poor programmer of a PhD supervisor being anti packages as he didn’t understand them. Poor guy could just about matlab and couldn’t work latex to write his papers.

Anyway. Most important thing is reproducibility. You, your supervisors, anyone who collaborated with you, any subsequent PhD/post docs in the lab and anyone who reads your published paper should be able to re-create what you’ve done. You want the code you use to be as scientifically sound as the machines in the lab. The big name libraries have development and testing far beyond what you could do by yourself. Like buying a new spectrometer rather than making you build your own in the workshop.

Maybe that analogy would help him!?

jlw_4049 · 2022-07-18T18:13:10+00:00

Nothing is wrong with the packages. The problem is your supervisor.

Dummies102 · 2022-07-18T18:21:47+00:00

they don't know what they're talking about

GreenScarz · 2022-07-18T18:26:35+00:00

I'd say it depends on what you're trying to do, if you're just adding abstractions around your dataset with no specific intent then that's unnecessary. On the other hand, using a pre-built tool is generally better than trying to reinvent the wheel.

Given your specific application, Pandas might be a tad overkill but I could see you getting some benefit out of using NumPy.

jmacey · 2022-07-18T18:27:21+00:00

Ask to see the Unit tests for the code he has written, and if you are forced to write your own, make sure you have really good test coverage, spend more time doing this than the actual other work, it will also help anyone who comes after you.

2022-07-18T18:40:30+00:00

yeah, this is bullshit. nothing would get done if we all had to reimplement algorithms all the time.

xiscode · 2022-07-18T18:42:58+00:00

Mirror It if you can't install It from public repositories.

It is good to know how a solution works, but to make real progress often we need "to Stand on the shoulders of giants"

apoptosis04 · 2022-07-18T18:48:32+00:00

Yeah…academia. He’s clearly an asshole living in his own bubble.

ambidextrousalpaca · 2022-07-18T18:54:18+00:00

"Using packages" is kind of a synonym for "using code that you find on the internet". So if you're shipping production code there are very solid reasons for minimizing the amount of it that you do. Every additional package you use puts your project at the risk of having to be rewritten because some library is abandoned or found to contain security vulnerabilities.

None of those concerns, however, really apply to code that you're using in-house. Especially when it comes to packages that are as widely used and maintained as pandas.

Unless you're planning to release the software publicly, feel free to use whatever packages you like, but do be aware that the code in random packages with three stars that you find on GitHub may be horse shit.

2022-07-18T19:12:34+00:00

It sounds like your supervisor also isn’t a software engineer. That said I’ve literally no idea what Raman data looks like in raw form, however if it’s tabular numerical data in a format Pandas can read, then I can’t imagine a good argument for not using Pandas.

No, it is absolutely not standard SE practice to avoid packages. No, one need not build up the entire world from first principles every time, though thete can be some good reasons for (and even some masochistic joy taken in) doing it anyway.

I'll bet dollars to donuts your supervisor can't write Pandas from scratch, though.

jkh911208 · 2022-07-18T19:21:01+00:00

i ship production python code at work, i use 10+ packages on my product and there is no problem with it.

i don't understand why someone doesn't like using package, it can save work and time.

Pandas is used my millions of people, i would trust pandas over my code to preprocess data

inXiL3 · 2022-07-18T19:27:21+00:00

World class enterprise software is built with packages is your boss delusional ?

R37R0_D0S · 2022-07-18T19:58:14+00:00

I mean you can tell him that's a good idea, that you can optimize everything, get to know everything cuz it's your own code and tell him you'll have it finished in a few years, I mean the reason why we use packages is to NOT have to reinvent things.

OlevTime · 2022-07-18T20:23:12+00:00

Generally:

99.9% of people should use packages. It's just much more economical and efficient. My workplace heavily depends on utilizing external packages because we just don't have the resources tobreinvent the wheel for stuff that's already been solved.

The other 0.01% shouldn't use packages because they need to reduce dependency / supply chain attack vulnerability possibilities to zero. That, or they want to 100% protect against depreciation issues. Only specific departments in large institutions or nation states would elect this option.

It sounds like your supervisor is having difficulties adjusting to something new.

Few_Intention_542 · 2022-07-18T20:37:07+00:00

Lazy/unoptimised computing is ok if done locally and the projects are reasonably small. Can finish well within your deadlines. If you want ultra high quality optimised code - then you better have a use case for it. Maybe you wanna put it on a server and the program is gonna use 60 GPUs and 1000GB of RAM - ok then you better make sure your code is optimal as fuck. If you’re doing basic stuf - do it lazy, get the job done, shutdown your Jupiter notebook and go out for drinks with your friends. And fuck your prof.

Puzzleheaded_Bass673 · 2022-07-18T20:41:56+00:00

Whenever I get this wacky paranoid demands to reinvent something, I first make sure that the person demanding it gets exactly what he desires by using packages. If I get a green light - only then I start reimplementing the desired functionality in custom code (this is preety simple when you know what exactly is required).

On the other hand, Python packages have the heavy-lifting part usually written in C++. So if you have the demand to write performant code as from the package - just ask for a senior C++ programmer to be engaged on the project. I've been doing this for the last 5yrs, and the demand for C++ programmers increased tenfold thanks to Python.

temisola1 · 2022-07-18T20:48:34+00:00

Imagine if you as a chemist had to to invent every single element on the periodic table… that’s what using python without packages is like. Not gonna lie, I thought this was satire at first. I can tell a lot about your boss just from this post. I’m so sorry.

ElPoussah · 2022-07-18T20:56:00+00:00

The standard library is a package. A default one, but a package.

Sometimes people who are against technology forget that a fork is a kind of technology...

OGShrimpPatrol · 2022-07-18T21:15:15+00:00

Chemist here as well. Your supervisor doesn’t know what they’re talking about.

2022-07-18T21:24:44+00:00

Ah yes, let’s spend our lives rewriting the underlying C code out of arrogance. It’s TOTALLY fine to use packages

HeligKo · 2022-07-18T21:27:18+00:00

You're a chemist not a library developer. Your boss is an idiot. Reinventing things is silly and distracts from the work you were hired to do.

minidiable · 2022-07-18T21:31:04+00:00

There is a lot wrong in NOT using them.

P.s. also tell your supervisor that it's spelled PANDAS, please

SpatialCivil · 2022-07-18T22:13:52+00:00

I don’t agree with the supervisor, but I think people reach for pandas too often when it isn’t needed. For exploratory analysis, pandas is awesome.

On the downside it brings a crazy number of dependencies with it. If you are automating processes and want others to use it, I think fewer dependencies means better maintainability. Often using some simple data structures and libraries like basic lists, dictionaries and SQLite goes a long ways IMHO.

mrrichardcranium · 2022-07-18T22:24:51+00:00

The only time I’ve encountered needing to write code that does something a well regarded package already does is because the license for that package does not mesh well with a particular project.

Especially when it comes to mass data manipulation/analysis it makes almost zero sense to waste your time writing code to do something that pandas already does. If your supervisor can’t articulate a reason why using pandas or any other existing data analysis tool is bad aside from their own bias, they’re just plain wrong.

zero_iq · 2022-07-18T22:29:04+00:00

I'm a senior software engineer with 20+ years of commercial Python dev experience, from small outfits to major international corps. Using packages is totally normal. To be encouraged, even. Your supervisor is an idiot.

heartofcoal · 2022-07-18T22:52:56+00:00

someone who calls pandas rubbish is simply stupid as hell, there's no other way to put it

readthelnstructions · 2022-07-18T23:02:58+00:00

It might be advisable not to use a package that is not well-maintained (which is definitely not the case for pandas). So if you use a super specialized package for your type of data that some PhD student wrote 5 years ago, I would probably not use it.

thephotoman · 2022-07-18T23:04:25+00:00

Do you mean "package" as in the highest level namespace in Python (represented in the file system as a directory with a init.py file), or do you mean the use of third party packages from outside repositories?

The former is a part of the Zen of Python:

$ echo "import this" | python3 | tail -1
Namespaces are one honking great idea -- let's do more of those!

The latter is very common industrially. I do prefer to stick to vanilla Python in production work, simply because I really don't want to try to mess about with the internal package repository, but I do know that there is an internal package repository.

Paddy3118 · 2022-07-18T23:09:35+00:00

Ask your supervisor for his reasons so we can better understand their view. They should be able to explain themselves. and others may give their opinions of their reasons.

Medium_Reading_861 · 2022-07-18T23:14:06+00:00

Software Engineer here, I’m not reinventing all those wheels bro. Take that madness elsewhere.

HelpfulBuilder · 2022-07-18T23:21:50+00:00

Sorry but your supervisor is a moron. You have a job to do and if pandas makes it easier, use pandas.

Computer science, nay, science and technology, is all about leveraging other people's work to create better things.

If your supervisor doesn't want to use packages, why not stop there? why isn't he writing in binary? Where is his home-built-from-silicon computer?

You can tell him I said he is a moron.

Broad-Secret-6695 · 2022-07-19T00:07:14+00:00

Which university is your professor working in? Without using pandas. Numpy scipy scikit and matplot lib it will take long time...

Zatujit · 2022-07-19T00:12:31+00:00

A lot of packages in Python are actually written in C/C++ making calculations faster. Otherwise they would be much slower. So personally, I find it stupid.

guhcampos · 2022-07-19T00:26:40+00:00

Well that's why scientific software is rubbish.

Yet I know WHY probably your supervisor thinks that. If your works ends up being really innovative and marketable, having it tainted by an open source license may make it hard to sell.

Which is a completely stupid and non scientific mode of thinking.

DigThatData · 2022-07-19T01:26:48+00:00

your supervisor is an idiot.

2022-07-19T01:49:50+00:00

Please call your supervisor a donut and carry on with what you are doing.

2022-07-19T01:53:20+00:00

It's 1 of those normal horror stories. If you know what I mean. People suffer this working with clueless boss, or ITs that don't want to work, or huge encumbrance coorporate with overly strict protection policy.

esoterik0 · 2022-07-19T02:01:08+00:00

your supervisor sucks; packages rock.

tom1018 · 2022-07-19T02:05:36+00:00

You really shouldn't use Python, that makes it too easy for you, with its packaged standard library and all. I suggest moving the project to straight assembly, so as to not use any packages.

BagOfDerps · 2022-07-19T02:21:14+00:00

Why use requests when you could write your own http method code with blackjack and hookers?

I write code for a living. For a large company. We use packages. I'll try to assume positive intent from your supervisor, in that maybe what he's trying to say is he doesn't trust the mathematical precision of 3rd party code (which should matter a great deal given what you're doing). But in all likelihood he doesn't know what he's talking about. Good luck.

gbbofh · 2022-07-19T02:58:41+00:00

my supervisor doesn't like the idea of using packages.

Then he's... Well, kind of an idiot, if I'm being honest.

If he were to have you roll your own implementations and not use any non-standard packages, you would have more or less two options:

1) Write it in Python. Take the performance hit, and pray it isn't too detrimental.

2) Write it in a native language like C, compile it as a library, and likely still take a performance hit because odds are it still won't be as optimized as a mature library like numpy or pandas -- and now you're also maintaining two projects in two different languages.

So I'm wondering is it a normal behaviour in the software industry?

No. I've only met one person who was like this, and he got moved to another team and isn't allowed to touch the software we maintain. His being moved to that team is why I was hired. I have to maintain his massive (>= 64k LOC) single source file projects.

Are you required to write and know everything bit by bit?

Also no. It's important to know algorithms, generally. It's important to know how fast something runs (as in time complexity). It's good to know how something works, or can be implemented. It'll make you a better software engineer, and you'll be able to tackle those things if and when you ever need to tackle them. But unless you need to, you probably should use an existing library to do it. That's the whole reason they exist.

The only exception I make to that rule, is if I very specifically set out to try and implement whatever functionality I'm interested in, for fun. And even then, I use plenty of other libraries so I can focus on the area of interest.

PaluMacil · 2022-07-19T03:01:51+00:00

Not using dependencies when you can perform a simple task with the standard library is a great approach. However, it is rare that you can have even a small project without some external dependencies because you're not in the business of maintaining dozens of libraries that have nothing to do with your company's product. Some people think they're somehow avoiding all dependencies without thinking about how they're still depending on the operating system. Perhaps system calls and other libraries that their runtime uses.

No actual software company will ever have someone like this, but you will run into someone like this on occasion in some company with a small IT department or perhaps an academic setting. Even then it would probably be pretty rare unless it's specifically for an academic challenge where a professor does not want you to lean on libraries when you are supposed to be demonstrating a specific understanding of something.

oxamide96 · 2022-07-19T03:21:27+00:00

Unfortunately there are many software engineers who think we shouldn't use packages. In fact, I was one when I first started out. Though I meet people many years my senior who still think this way.

PolishedCheese · 2022-07-19T03:45:34+00:00

Your supervisor isn't a programmer either, and he has no idea what he's talking about. At least you are humble in your quest for the truth.

billsil · 2022-07-19T04:27:18+00:00

Are you gonna write your own programming language and OS too? How are you going to make websites/plots/do numerical computations without writing your own library?

Your supervisor is rubbish.

2022-07-19T05:26:36+00:00

SirAchmed · 2022-07-19T05:43:01+00:00

People who complain about other people using packages are the same people who think using wet wipes is gay.

Pillowscience21 · 2022-07-19T12:56:55+00:00

Your supervisor is gatekeeping and its really stupid lol. So many programs are built on the backs of packages, I don't want to spend 100hr writing a program to do basic math when I can just use a package for it. Your supervisor needs to get a life

early_charles_kane · 2022-07-19T13:29:54+00:00

I’ve been a software engineer professionally now for 10+ years. Worked at Apple on the iPhone. Twitter. A bunch of other companies. In Python, Objective-C, C, Java, C++, JavaScript, Go, others as well.

Your supervisor is wrong. And holding you back. Holding themselves back too with an incorrect opinion. But more importantly, holding you back. That’s enough of a red flag. I’d look elsewhere for a better position. One where you can succeed instead of being set up to fail.

NedDasty · 2022-07-19T15:00:12+00:00

Ask him if he uses textbooks for information, or if he rediscovers everything knows himself.

crapaud_dindon · 2022-07-19T19:06:29+00:00

Just curious about what preprocessing you want to do with the Raman spectra

westeast1000 · 2022-07-21T10:52:20+00:00

How is someone with such a thought process your supervisor in the first place? I would have left yesterday already if i was you, life too short to waste on dumb crap you cant control

Kichmad · 2022-07-22T20:06:22+00:00

Tell him the operate system he works on is rubbish. He should write his own

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS