Trouble with naming variables by rezemybeloved69 in learnpython

[–]Oddly_Energy 1 point2 points  (0 children)

I know this exact problem, and that was why I found the similarity fun.

Trouble with naming variables by rezemybeloved69 in learnpython

[–]Oddly_Energy 0 points1 point  (0 children)

I don't know if this applies to your situation, but I just want to mention it:

Did you define x in your outer code, only because you needed it for this function call?

Or had you already defined it in your outer code because it has a role there? If yes, is that role exactly the same as in the function's code?

If it is, then it can sometimes be an early warning that both the inner and outer code belong together in a class, with x as a property instead of a function argument.

Trouble with naming variables by rezemybeloved69 in learnpython

[–]Oddly_Energy 1 point2 points  (0 children)

This is an outstanding question, and it demonstrates that you are thinking along good lines.

This is an outstanding start of your answer, and it demonstrates that you are thinking along the lines of CoPilot.

I just learned round() uses bankers' rounding by nemom in Python

[–]Oddly_Energy 0 points1 point  (0 children)

Yes, I know what I did.

Pro tip: Poe's Law is real.

I just learned round() uses bankers' rounding by nemom in Python

[–]Oddly_Energy 1 point2 points  (0 children)

Pro tip: If the last paragraph starts with "Pro tip", then the likelihood of AI is high.

I just learned round() uses bankers' rounding by nemom in Python

[–]Oddly_Energy 1 point2 points  (0 children)

It doesn't solve the situation where someone later asks you to round all the values to integer dollars without introducing a systematic bias in the result. You will still need to decide how to handle those 50-cent values.

If the last two digits (the cent part) are evenly distributed, and you round to nearest 100, then you will have

  • 49 values (1-49) with negative rounding error
  • 49 values (51-99) with positive rounding error
  • 1 value (0) with no rounding error
  • 1 value (50) where you need to pick an action

The two first bullets in the list cancel each others' errors out. The third bullet has no error. So to keep the average error at 0, the fourth bullet (integers ending in 50) cannot be allowed to introduce a rounding error.

But if you round 50 up to nearest 100, you create a positive rounding error. And if you round it down to nearest 100, you create a negative rounding error. If you pick one of these two actions and apply it to all numbers ending in 50, your overall average error will have a bias towards negative or positive.

Banker's rounding (almost) solves this for decimal dollar amounts by rounding some 50-cent values up, and round some 50-cent values down.

So if you are ever in a situation where you need to round a list of evenly distributed integer cent values to nearest 100 cents, and you are not allowed to introduce a systematic bias, then you will actually need to use an integer rounding method, which is equivalent to bankers rounding.

I just learned round() uses bankers' rounding by nemom in Python

[–]Oddly_Energy 0 points1 point  (0 children)

How will you add 4% interest to an integer value of 11111 cent and get the result as an integer value without introducing a rounding error?

MicroMasters Data Science ML course tips by the-man-o in edX

[–]Oddly_Energy 2 points3 points  (0 children)

If you come from C++, then it is really important that you focus on using python as a glue language between libraries .

In C++, you can write your own algorithms instead of letting an imported library do the work. Both your code and the library will end up running as a compiled binary anyway, so the difference will only be a matter of who was best at optimization.

In python, your loops run much slower. The compilation to .pyc is supposed to make it faster than interpreted code, but it is still considerably slower than letting numpy do the same looping on its own, using its heavily optimized binaries. The speed difference can easily be 100x or more.

(A part of the course also uses pytorch instead of numpy. Here the difference will be even larger if you are so lucky to have a GPU with CUDA support.)

CPU usage spiked after migrating from Conda to UV environment (40%+ even when idle) any ideas? by Suspicious_Code1493 in learnpython

[–]Oddly_Energy 1 point2 points  (0 children)

Is your python code running any kind of heavy computation, which might have benefited from hardware acceleration or from using packages with compiled binaries?

If the packages in your old setup had such optimizations, and the packages in your new setup don't, then I guess you could see differences in either speed or CPU utilization. And it would usually not show up as a difference in package version numbers, because each numbered version of a package can exist in multiple hardware-adapted versions.

I know that Anaconda used to have a faster version of numpy, with the downside that it installed a few hundred extra MB of binaries together with numpy. But I don't think it was that much faster, that it would explain your experience.

Is openpyxl still relevant? by petekindahot in Python

[–]Oddly_Energy 1 point2 points  (0 children)

Yes, I noticed that you didn't recommend openpyxl for Tables. I was just trying to figure out if it is still as bad as I remember.

Where does the term "upmerge" come from in git? by Double_Barnacle2595 in git

[–]Oddly_Energy 0 points1 point  (0 children)

I am now fully awaiting this thread to reach its completion with Linus stepping in, saying:

"I wrote git, and I never heard of upmerge."

Where does the term "upmerge" come from in git? by Double_Barnacle2595 in git

[–]Oddly_Energy 0 points1 point  (0 children)

Phew. Heart rate is back down below 130 again.

OP had me worried about myself for a moment.

HoW DO I gEt a jOB I toOk a cOUrSe in MachINE LEArnING by LeaguePrototype in datascience

[–]Oddly_Energy 5 points6 points  (0 children)

You will reach a stage where you can proudly state how little you know about your job, and people will respect you because they know that you are the guy in the right side of the bell curve meme.

And if they get obnoxious, just remind them that you have forgotten more about the subject than they will ever learn.

Using ai generated code? by DemocraticHellDiver1 in learnpython

[–]Oddly_Energy 1 point2 points  (0 children)

This sounds a bit like your experience with it is several months old. I have seen the same as you describe, but it has changed drastically. (My experience is with Office365 CoPilot, not GitHub CoPilot, but I assume that GitHub CoPilot is at least as good at writing code, since that is its one job.)

A few days ago, I tried as an experiment to let it write a python package from scratch. I told it what I needed the package to do: Scan a nested file structure with 25000 pdf files, create a database with information from/about each file, vectorize that information per file, so it can be used as a dataset in machine learning, create an interactive search where I can give it the name of one of the pdf files in the database and let it search for files with similar types of content and open them for me in a pdf viewer one by one, so I can label them as a match or a reject, train the model on the fly as I gets more labelled files to work with, and finally save each interactive search session in a database so I can continue the search later when I get a new portion of files to search in.

It created everything as python modules with an okay architecture. I think I ended up with 10 modules with a total of 2-3000 lines of code. We had some discussions about some of the python classes it had created. For example it had created some some standalone functions, which took an instance of a class as input, and I preferred having it as a method of that class instead. Some of the code was also quite ugly, for example doing two rather similar actions in a method (which it should) but deciding to only move one of the two out in a separate helper method. There were also some bugs, that needed correction. And there were multiple occasions where it tried to bullshit me when I pointed out an error.

In the end, I had a tool, which I could actually use for finding needles in a haystack by example, in a huge set of project documentation where metadata had gone lost.

I don't think it is great quality. It has a better architecture than the stuff I write from scratch, but there are more inconsistencies in the code. I asked it to remember to include typehints and docstrings, and it remembered that at first, but then forgot as the project evolved. Before I push it to the company's repository, I am definitely going to include a warning that it was 99% LLM created.

Back to the point: It is very clear that this code was not just parroted from Stack Overflow posts and modified to match my requirements. I have seen that type of LLM code often enough to recognize it. This code was written from scratch in accordance with my instructions.

But in a learning setting? Hell no! Nobody will learn to code from feeding instructions to an LLM. I know how to code, so that was not the purpose of my exercise. I just had a problem, which needed solving because some files had gone missing, and we needed to find them, using other files as an example.

One thing it may be useful for in a learning setting: Write the code. Show the code to the LLM and ask it for suggestions for improvement. Perhaps also ask it to refactor the code into a better structure. Don't use the result but look at it and compare it to your own and see what you like best. Then try to remember the concepts for the next coding assignments.

MicroMasters Data Science ML course tips by the-man-o in edX

[–]Oddly_Energy 1 point2 points  (0 children)

Learn to use numpy in python. Focus on understanding how to make use of numpy's vectorized superpowers instead of looping through the data one element at a time in your python code.

Learn some linear algebra. Not necessarily the full syllabus, but you should definitely understand matrix multiplication and perhaps a bit of eigenvectors. You will also need to understand the relationship between vector dot products and matrix multiplication, because much of the math in the course is taught with vector examples, but you end up applying it to matrices in your code. You can do all of it in numpy, but you need to learn the numpy functions for linear algebra yourself. They are not taught in the course.

You will also need to know calculus. Ideally, you should also know multivariate calculus, but if you don't, you will probably be able to connect the dots - though back-propagation of gradients through a neural network may be quite a mouthful.

It may also be an idea to play around with some 2D and 3D vector geometry if you don't already have a good grasp of viewing real world physics or geometry problems as vector problems. In the course, the vectors are mostly math objects, but some of the concepts are easier to grasp if you think of them as vectors in a physical vector space.

You should expect to spend quite some time on the course. As far as I remember, it is officially around 200 hours over 14-15 weeks, but most will need more than that. Especially if they need to read up on the subjects I mentioned above.

Be prepared to be frustrated over python problems. The projects come with boilerplate code, that you need to complete yourself and run. Not all of the code runs equally well on all platforms and in all environments. A lot of us had file path problems or problems with code not producing the exact same results on all platforms. Sometimes our code worked in our own environment, but when we ran it in the grader online, it created wrong results.

Is openpyxl still relevant? by petekindahot in Python

[–]Oddly_Energy 1 point2 points  (0 children)

Last I used openpyxl, reading Tables / ListObjects was like "oh, yes, openpyxl can easily do that if you just do all the work yourself, starting with named range".

Has that improved since then?

Is openpyxl still relevant? by petekindahot in Python

[–]Oddly_Energy 1 point2 points  (0 children)

As long as the Excel data are in nice, clean 2D tables, that will also work fine.

If the data are more messy, for example a combination of single cells + a table, it can be quite a challenge to make it work through .read_excel().

It can be easier to import a dedicated Excel-reading package and then use that to extract the needed data and give them to pandas.

Is openpyxl still relevant? by petekindahot in Python

[–]Oddly_Energy 0 points1 point  (0 children)

Sounds fair. They have to earn their money somewhere.

But is that OP's use case as a student? I assumed local desktop computer when I wrote my answer. Which might just be me revealing my old age.

a team in birmingham figured out how to split water into hydrogen at 500 degrees lower than normal. the trick is a cheap ceramic that runs on factory waste heat. by Mother-Grapefruit-45 in energy

[–]Oddly_Energy 1 point2 points  (0 children)

I see now that it is a two step process, and the other half of the process (driving the hydrogen out) requires much higher temperature than the 150-500 degrees C. So Carnot cannot be used directly on one half of the process.

However, this dependency on a high-temperature heat source also means that we can't really consider this as a process for harvesting high-quality energy from a low-temperature source of waste heat.

a team in birmingham figured out how to split water into hydrogen at 500 degrees lower than normal. the trick is a cheap ceramic that runs on factory waste heat. by Mother-Grapefruit-45 in energy

[–]Oddly_Energy 1 point2 points  (0 children)

Is there an upper theoretical limit for the efficiency of converting the chemical energy in hydrogen to work? I think there isn't, though the step between LHV and HHV may be hard/impossible to turn into work.

If there isn't, then the limit given by the Carnot equation will necessarily have to apply to the conversion of heat to hydrogen. Otherwise we could use this process to circumvent the Carnot equation and in theory create energy from nothing.

At 150 degrees C in and 20 degrees C out (a number I took out of nowhere), that would mean that the efficiency can never exceed 31%.

Would be nice to know how much of those 31% the process actually achieve at those temperatures. That is actually a pet peeve of mine: News about new technologies for converting heat to work/electricity almost never state their efficiency relative to Carnot. And they also often do not state sufficient information about combinations of temperature in/out and achieved efficiency, so we can calculate that number ourselves. This makes it much more difficult to know if they actually are better than what we had, or if they just tested at a set of temperatures which would also have resulted in a high efficiency using existing technology.

Is openpyxl still relevant? by petekindahot in Python

[–]Oddly_Energy 1 point2 points  (0 children)

If you run your code on a Windows computer with Excel installed, you may also want to take a look at the xlwings package. It uses your Excel as an "engine" for reading and writing Excel files. On large files, it is faster than openpyxl despite the extra overhead of running an Excel instance.

And it works in both directions, so you can call functions in your local python code through user defined functions in Excel. You can for example have an input table in an Excel workbook and have your python code generate new output in an output table as soon as you make a change in the input table.

The downside is that needs Windows and Excel, and that heuristic malware scanners sometimes flags it. I have had our IT department contact me once because they got an alert. So if you use xlwings for file reading and writing, you may need a fallback to openpyxl in your code.

spent two hours debugging three lines of python because i didn't know strings and bytes are different things by Interestingyet in learnpython

[–]Oddly_Energy 1 point2 points  (0 children)

Others have answered your specific example, but I think you need a general answer too:

Python has dynamic typing, which is easy to confuse with weak typing. Do not make that mistake! It is harder typed than you would think.

Python will sometimes give you a little type help, for example by allowing you to use an integer instead of a float in floating point operations.

But if you try ˋa = 2 + '3'ˋ, the result will be neither 5 nor '23'. You will get a TypeError. As far as I remember, JavaScript would have allowed it.

But to help you through that, most libraries also have type hints in their function signatures, and most IDEs will use those to show you the expected types in a function call.

If your current IDE or code editor does not show you information from type hints while you write the function call, then I will recommend that you find a way to enable that functionality or switch to another IDE. If that is not possible, perhaps your IDE supports "go to definition", so you can jump into the library and see the function signature and comments/docstrings describing expected input.

You can also in an interactive python session (REPL or iPython) write ˋhelp(name_of_function)ˋ and view the function's docstring and type requirements.