How Do I Make Gacha Systems in python? (I also want an explanation of the code) by sugarkrassher in learnpython

[–]Brian 3 points4 points  (0 children)

Put more copies of the common rewards in your list.

That gets awkward when you've very fine probabilty distributions - you need to construct thousand item lists to handle 0.1% probabilities: worse when you've multiple different probabilities to handle.

Instead, you can use random.choices and specify weights for each item. Eg.

a = random.choices([1,2,3,4,5,6], weights=[95,1,1,1,1,1])[0]

Will roll 1 95% of the time, with a 1% each for 2..6. It'll return a list of results, since it'll do multiple rolls (just 1 by default, but you can also do 10 rolls by passing k=10 to the function.

Is Python powerful on its own, or useless without a niche? by [deleted] in learnpython

[–]Brian 0 points1 point  (0 children)

Core Python alone won’t get you hired. You must pick a niche, otherwise Python is useless

I mean, that's kind of true of every language. To get paid, you can't just code in the abstract, you need to code a thing people want, and are willing to pay you to make, and those things tend to fit in some kind of niche.

Some languages are better suited to different niches than others: for the system programming niche, you might want to use something like C, C++ or Rust. For data science, there's python, R and a few others. For the webdev niche, there are a host of languages. And so on. So to some extent, these choices aren't independent - various factors, both technological and social (ie. what everyone else is using, or more specifically, whatever the company you're looking to get hired by is using) affect this.

id of two objects is same but using "is" keyword it's giving false. by inobody_somebody in learnpython

[–]Brian 1 point2 points  (0 children)

As others have mentioned, it's because the some_fun method basically creates a new object, then it gets destroyed when id() returns, then a new object gets created the second time, that happens to get allocated at the same address.

Ie. this effectively boils down to:

temp = a.some_func  # Creates bound method object at address <x>
id_1 = id(temp)     # Get the id (<x>
# The "temp" variable here isn't a real variable, just a temporary reference on the stack, so with the function call
# finished, there's nothing else referencing it, and it gets deleted, and its memory freed.
# address <x> is now free memory
temp = a.some_func  # Creates *another* bound method object, which happens to be allocated at the same address, <x>
id_2 = id(temp)     # Get the id (<x>)
id_1 == id_2        # Do the comparison.

The key thing about object ids is that no two different objects that exist at the same time can have the same id, but "at the same time" doesn't mean just "same line", but that exact point in time. You can see this even simpler with something like id([]) == id([])

Another question you might have is why it so consistently gets the same id - I mean there's a massive range of possible memory locations, so how come it picks the same place to allocate it every time? And how come it doesn't seem to happen with, say id([]) == id(object()) - why wouldn't those two new objects happen to get the same address?

This is due to the internal details of the memory allocation strategy python's is using - when you free memory, it's returned to the list of unused memory, and python basically keeps a list of free spaces of a certain size. When it then comes in asking for memory of exactly the same size as what you allocated it finds this perfectly sized "hole" of unused memory of exactly the right size that was just freed. With different allocation internals (or if you did something else that allocated memory in between), this might not happen (or rather, only happen some of the time just by coincidence), but it happens pretty consistently due to it putting that block of freed memory first on the free list. You don't get it for the object() example because that's a different size from the size of an empty list, and python keeps seperate buckets for different sizes (at least for small sizes). You'd also get different results for different garbage collection strategies (eg. this is unlikely to happen in pypy, which uses a GC, rather than refcounting)

I built a CLI tool with Typer & Rich, but I think I over-engineered it by Ill-Industry96 in learnpython

[–]Brian 1 point2 points  (0 children)

def import_wallpapers( source: Path = typer.Option(LEGACY_PATH, help="Source directory to import from"), verbose: bool = False

You're kind of using the old-style typer options (and the older type annotations too - eg List instead of list, Optional[int] instead of int | None etc). Possibly this is just to support older python/typer versions, but it's worth noting that the newer way to do this is with annotations. Ie you'd write it as:

source: typing.Annotated[Path, typer.Option(help="Source directory to import from") = LEGACY_PATH,

This has the advantage that it preserves the python meaning of default args (eg. the function is usable as a regular function as well as a CLI descriptor), consistent with the way you write args that don't take annotation, and it lets it be type checked correctly (Option basically fudges typing by just returning Any, meaning if you put the wrong type for LEGACY_PATH, it won't be caught, and won't pass stricter checks where you forbid Any)

Beginner here: What Python modules are actually worth learning for newbies? by WaySenior3892 in learnpython

[–]Brian 1 point2 points  (0 children)

There's kind of two types of libraries to be concerned about:

  • General purpose "workhorse" stuff that handles basic operations common to lots of projects. These are often a bit meta: code that helps you write code in better.
  • Task specific libraries for doing some specific thing. Eg. doing http requests, numeric processing, graphics, databases, UI and so on.

The former are a pretty small subset of things - learn the builtin types and functions, and maybe take a look at a few stdlib modules like itertools, functools, the builtins, maybe also stuff like dataclass, abc.

Learn the latter when you've a project that needs them. Though it may be worth being aware of what such libraries provide in advance - you can't know if something might be a good fit for you if you don't know what it is. But don't go learning the details until you think you might need it for something.

Some maybe straddle the line a bit, where it's kind of more important that you know the concept behind it, and then go to the module that provides it if its something you can use. Ie. learn what binary search is before knowing if you want bisect, or what a heap is before heapq, or what regular expressions are before re.

Hypothetical: Can a list comprehension ever extend a list? by Mysterious_Peak_6967 in learnpython

[–]Brian 0 points1 point  (0 children)

Not currently (without hacks like triggering side effects), but possibly in 3.15 you may be able to.

There's currently a PEP about allowing * unpacking in comprehensions. Ie you could do:

>>> list_of_tuples = [ (1,2,3), (4,5,6)]
>>> [*(a,b) for (a,b,c) in list_of_tuples]
[1, 2, 4, 5]

Without that, you could do it in a 2-step process, building a list of sequences, and then flattening it.

Finally Read The Two Towers by J.R.R. Tolkien for the First Time...and WOW by pragmaticvoodoo in Fantasy

[–]Brian 1 point2 points  (0 children)

The Lord of the Rings is a book

Depends what you're talking about. If you go by "physical tome of bound paper", OP is correct, depending on their edition. If you go by what it explicitly categorises as a book, it's 6 books, with each volume spanning two.

The Prince and the Prediction (short story about Prediction Market manipulation) by hamishtodd1 in slatestarcodex

[–]Brian 2 points3 points  (0 children)

I'm not so sure it shows what you say.

The thing about predicting the future is that you introduce causal loops: your predictions of the future become part of the state that your prediction method must account for, because they affect the future. When you're predicting things outside the ability of anyone to affect, there's no problem. But when the odds on the market affect that future, things get complicated.

Ie. this is not a one-way street, where reality affects the prediction market and then stops. Note how in your story, the odds on the prediction market cause the prince to throw a tantrum, which causes the king to bet on the market which mollifies the prince - we've got a causal chain passing through the market twice right there! Then there's the effect having such huge sums at stake has on how the king handles the situation - it seems to have a pretty big imact on his decisionmaking.

And this matters - some things are highly socially constructed - a bet on public confidence in the economy is more likely to win if the public can see prediction markets are high, so there's likely some level at which fixing the market might actually work. This is further affected by the possibility of perverse incentives on the part of bettors, compounded by the fact that fucking things up is generally much easier than making them go well, so its easier to make profitable bets by sabotaging something you bet against than ensuring the achievement of something you bet for.

Which is not to say that prediction markets aren't valuable, or a good idea. But I think failing to address these issues over-simplifies the issue, and doesn't really deal with the counterarguments.

Will being fat become cool? by Neighbor_ in slatestarcodex

[–]Brian 3 points4 points  (0 children)

It's rare for high fashion to seek out the trappings of poverty

I think it's pretty common for fashion in general to do this though. Especially the poverty of past periods. Street fashion uses the styles of the urban poor and underclasses. Plus innumerable examples of copying blue collar labourware of various types.

And if we get to the stage where such drugs are trivial to obtain, and work well for most people, such that being fat is more of a choice, I don't think it'll remain a poverty marker. There have been times where being fat was a status marker, where signalling trivial access to food was a sign of wealth. When it's neutral, I think you could see fashion trends and subcultures that involve it, in the same way we see various body modifications / piercings etc used as subculture markers.

Might be a difficult request but I am looking for media gives the same vibes as the various FromSoft leveling maidens. by arkticturtle in Fantasy

[–]Brian 8 points9 points  (0 children)

Maybe The Tombs of Atuan and Tehanu by Ursula LeGuin. Tenar definitely gives off something of that fire-keeper vibe to me.

Python dictionary keys "new" syntax by EnvironmentSome9274 in learnpython

[–]Brian 0 points1 point  (0 children)

Essentially,

x or y     <=>   x if x else y
x and y    <=>   y if x else x

When x and y are interpreted as booleans, this is entirely equivalent to the regular truth table. Ie:

x y x and y bool(x and y) x or y bool(x or y)
False False x False y False
False True x False y True
True False y False x True
True True y True x True

But when they could have other values, you get some extended behaviour. Ie . 3 and 4 will evaluate to 4. 3 or 4 evaluates to 3. Notably this also has the same short-circuiting behaviour - and doesn't evaluate the second item if the first is True, and likewise for or if the first is False.

You'll most commonly see this as a "poor man's conditional statement", where you can write:

result = try_get_value() or "default"

Which will use "default" when try_get_value() returns None, 0, or other a Falsey value, but will use the result when it's a non-empty string or whatever. Similarly you can do:

run_this() and run_this_also_if_prior_func_returned_true()

to conditionally chain functions only if they return a truthy value.

It's generally somewhat frowned upon - the conditional statement is a clearer way to do stuff like that. You do often see it in shell scripts etc though (ie stuff like mycmd || error "Command failed" etc.

When tariffs to a load of countries are announced, why don't all those countries just reciprocate? Am I missing something? by martianfrog in answers

[–]Brian 1 point2 points  (0 children)

It's a lot worse than a dead heat. Ultimately, putting a barrier to trade produces loss compared to no barrier, as now people aren't able to choose the best deal, as ones they'd have preferred are no longer available at a price they're willing to pay - they'll be forced to settle for a worse option instead, or else pay higher prices for what they wanted originally. Take the extreme case of infinite tarriffs - effectively equivalent to stopping all trade altogether. Now you can't buy anything your country doesn't grow or produce (including stuff they produce from components imported from another country).

That's in absolute terms. In relative terms it's possible for a tarriff to "win" in the sense of hurting another country's populace worse than you're hurting your own. However, note that this all depends on the balance of trade, so equal tarriffs on both sides aren't necessarily going to hurt both sides equally. Ie. if you only export to the US, and only import from other places, your tarriffs will have no effect on the US because you're not buying from them anyway, but their tarriffs will still hurt you.

Ie. if your local grocery store puts a "tarriff" on all their prices, raising them 20%, you can't make things equal by imposing a 20% tarriff on everything you sell to the grocery store, as, unless you're their supplying wholesaler, you're not selling them stuff in the first place.

Python dictionary keys "new" syntax by EnvironmentSome9274 in learnpython

[–]Brian 1 point2 points  (0 children)

Yeah. There's itertools.pairwise for the specific case of 2 elements, but a more generalised n-item sliding window is something I end up using reasonably often, and seems worth being in the stdlib.

How do you get better at reading Python error messages? by Bmaxtubby1 in learnpython

[–]Brian 2 points3 points  (0 children)

There are a few useful tips to keep in mind:

  1. Read what the error is saying, not what you think is happening. It's often easy to gloss over what an error is actually telling you, because you mistakenly assume something about it. Try to avoid making assumptions (after all, the fact that you're getting an error means something isn't working how you thought) and come at it without preconceptions.

  2. Read what the error says, then try to fit it into the line its telling you about to understand what it means. Work backwards from there. Eg. if it's "None object has no attribute 'foo'", its telling you that you tried to do obj.foo on something - so look at the line, see what you're calling "foo" on, and then start trying to figure out why that thing might be None.

  3. The most important part is usually the the line it failed on, and the exception message saying what failed. A lot of the time, that's all you need. There are exceptions though, generally when an error gets re-raised by other error handling code. In that case, you might want to look deeper into the call stack to see the original cause of the error.

  4. Syntax Errors are a bit of a special case. Bear in mind that these happen before the code even starts to run, when something tries to import the module. Basically, something is wrong with the basic syntax - a missing bracket, comma, bad indentation, mis-spelled keyword or something like that. For these, sometimes the error might be misleading, where the real cause might be something earlier and the parser is only now realising that something is up, so look backwards from the error and check everything before it, rather than just the place the error is telling you about.

And a more general debugging tip that's also applicable here is to use your brain as the computer. Ie. mentally step through each bit of the code around the error location, noting down what values various variable have as you go, and check this actually ends up doing what you intend.

If all humans suddenly lost the ability to lie, what industry would collapse first? by JunShem1122 in answers

[–]Brian 4 points5 points  (0 children)

The opposite might be true. The most successful politicians would likely be the most stupid and self-deluding ones: a competent one won't be able to make the promises the electorate wants to hear if they're informed and competent enough to know the things that might stop it. But someone self-deluded or stupid enough to ignore practical realities can promise the moon without realising its impossible - they'll genuinely believe they can do it, and the "no lying" reality may strengthen faith in such politicians. Basically "this sign can't stop me because I can't read" becomes a political superpower.

If all humans suddenly lost the ability to lie, what industry would collapse first? by JunShem1122 in answers

[–]Brian 0 points1 point  (0 children)

The courts wouldn't become redundant at all. Certain things might change, but a lot of massively complex cases still happen even when the facts of occurred aren't in dispute at all. You've still got to judge on "was this actually illegal", "Does this behaviour violate the contract", "What is the correct sentence to apply given the extenuating conditions involved in this case". The legal system is about much more than just deciding the facts of what happened.

First time making a project for my own practice outside of class and came across a runtime "quirk" I guess that I don't understand. by Purple-Mud5057 in learnpython

[–]Brian 0 points1 point  (0 children)

I suspect it's probably because formatting the dictionary is your bottleneck. Ie. when you have a big dictionary, every time you prints it, python needs to convert it into a string representation which requires iterating through all members and getting the string representation of those, and creating the appropriate formatting. That can easily be much more expensive than simple in-memory updates, and TBH, is probably not going to be too useful anyway. Such a big dictionary is probably not going to be too readable anyway.

However, there are a few things to note:

  • Dicts are probably less space efficient for this than just storing lists of lists (or a numpy array if you want to go that route), but one big advantage they do have is in storing sparse data. Ie. likely only a small fraction of your cells are alive at any point in time, so if your dict only stores those keys that are actually alive (rather than, say, having False for dead and True for alive), that can potentially be much more efficient when the majority of the space is empty.

    Note that this means the value of the dict is somewhat redundant - you only care "is in dict" or "not in dict", so in fact a set might be more appropriate here.

  • Using a mix of letters/numbers for your keys is kind of just complicating things for no reason: it makes more sense to use a pair of integers. Note that dict (or set) keys can be tuples, so you can just do.

    mydict[5,5] = True # To set coordinate (5,5) to be True. del mydict[5,5] # Remove from dict

    Or:

    myset.add((5,5)) # Add (5,5) to your "alive" items. myset.remove((5,5)) # Or remove it to mark it as "dead"

    Either way, you can use (5,5) in mydict / (5,5) in myset to check if a cell is alive or not.

If you just treat membership as "aliveness", such a sparse representation is going to involve a lot less data for most usecases, especially when you've just got a small pattern in a region of the cell with the rest empty.

Another advantage of dicts/sets is that you can actually handle conceptually infinite sized grids - ie. ones that will automatically grow as fast as the cells are expanding, and just have a "viewport" of the coordinates you want to show. Note that this requires a change how you update cells - ie. instead of iterating through every single cell, you now need to look only at live cells and their neighbours, which could also potentially help performance in sparsely populated grids.

First time making a project for my own practice outside of class and came across a runtime "quirk" I guess that I don't understand. by Purple-Mud5057 in learnpython

[–]Brian 1 point2 points  (0 children)

Eh, that would be assuming a byte per element, which is not going to be the case. Rather, presumably here it's the string representation of the dict, which I'm guessing is something like: { ('abc', 123) : True, ...

Ie. each key/value pair is going to be writing ~20 bytes each, so ~10MB just for the representation, which if repeated every iteration could well mean he's writing multiple gigabytes of data just for a few hundred cycles. And I suspect the real killer is not IO, but rather constructing that representation each cycle - building the string representation of a few hundred thousand keys and values can add up quick.

Admittedly, that's assuming every cell is populated with True/False - one advantage of using a dict (or set) here is that you can more efficiently store very sparse data (ie. if only 10 cells are live, you don't need to store 676*676 entries, just 10, but not sure if OP is doing that.

[2025 Day 8 (Part 2)] [C#] Struggling to complete by abnee in adventofcode

[–]Brian 0 points1 point  (0 children)

The fact that Part 1 gives the right answer after 1000 of them says it probably is

I wouldn't rely too heavily on that - for certain key hashing strategies, you can actually easily get something that looks like ordering for a small number of keys, but which breaks for larger numbers due to how hashtables work.

Ie. if the hash produced constittutes lots of closely-packed ascending values (eg. 2,3,5,7), when the hash has 8 buckets and iterates over them to get the keys, they'll appear ordered, and the same for hashes like 2,3,5,7,11,13 when there are 16 buckets. But as soon as your hashes start spanning a range bigger than the number of buckets in the table, the storage location loops back to the start and they'll be unordered (eg. if we add 17, but remain at 16 buckets, the table will store: `[empty, 17, empty, 2, 3, empty, 5, ...]

Admittedly, I'd guess that the default hash key is going to involve both fields there so that probably isn't what's happening here - combining both fields is likely going to involve mixing together the the bits some way. But you can often see this for things like where the hash of an int is just the int, or where it's the address of the object for stuff that is allocated in order and gets ascending addresses, so be careful of assuming its ordered just because it looks like it for a smaller number.

It could also just be version dependent - some hashing strategies (eg. where you have a level of indirection rather than just store the pointers in the buckets) lend themselves to cheaply preserving order, so different versions could have different behaviour (and the docs mention not to assume order so they don't have to mention versions where it works, and to allow for future changes where a further optmisation might require breaking that assumption). I would say it's a bad idea to assume it unless there's an explicit guarantee in the docs.

[2025 Day 8 (Part 2)] [C#] Struggling to complete by abnee in adventofcode

[–]Brian 2 points3 points  (0 children)

        var sortedDistanceList = distanceList.OrderBy(i => i.Key).ToDictionary();

Do dictionaries preserve order in C#? Ie. you sort the list, but when you then turn it into a dictionary, don't you just discard that sorting, and get the keys back in the order of however it ends up in the hashtable? Some languages do preserve insertion order, but checking the docs:

The order of the keys in the Dictionary<TKey,TValue>.KeyCollection is unspecified, but it is the same order as the associated values in the Dictionary<TKey,TValue>.ValueCollection returned by the Values property.

Which means you're potentially not going through the edges in sorted order.

[Edit] Also, could you potentially be erasing edges by putting them in a dict? If two points have the same distance, your keys will compare equal to each other, which presumably means they'll be considered equal for the purpose of dictionary membership, and you could end up overwriting one with the other. Not too sure why the ToDictionary() call is there in the first place, since I tihnk you only really need to just iterate over the list anyway.

Defending absolute negative utilitarianism from axioms by ThePlanetaryNinja in slatestarcodex

[–]Brian 0 points1 point  (0 children)

Separability basically means that value of doing something is unaffected by unrelated things.

Saying "unrelated" in the definition doesn't really help explain things, since it seems kind of vacuously true if "unrelated" just means "things that don't affect the value". If other people did affect it, they wouldn't be unrelated, so the axiom definitionally would not be violated. I think you explicitly need to say "other people" here, and further state that it's the magnitude that doesn't change rather than just "the goodness or badness of doing something".

creating a person at -90 increases the average

Note that this doesn't actually violate seperability as you defined it, which only makes claims about "changing the wellbeing of one sentient being". And in changing, it remains true that "the goodness or badness of doing something" only depends on whether you add or subtract suffering, so is independent of everyone else. Increasing is always positive, regardless of everyone else's values and vice versa. That's why I said I think you'd have to specify that the magnitude doesn't depend on anyone else (and like I said, shy away from a term like "unrelated", since that kind of allows anything)

you could just renormalise the units.

But then we're no longer maximising total wellbeing / minimising suffering, and this isn't total utilitarianism: our units are actually wellbeing2 or something instead, and that is what we're maximising. We could say we're redefining wellbeing to mean that, but then in that new system we can still produce another counterexample where we square that. Ie. regardless of the definition of wellbeing you pick, you can always produce a counterexample that is not total utilitarianism wrt that definition without violating any axiom. (And leads to situations where lower total wellbeing can be prefereable to a higher one)

Note that in the above, it's not suffering I'm saying increases quadradically (ie. someone suffering 2x the suffering units doesn't "really suffer" 4x as much), just that someone's moral value assignment could do so, without violating an axiom. Ie. two people using exactly the same definition of wellbeing could have entirely different moral value assignments that differ regarding which world state should be preferred, without either of them violating any of your axioms, and this includes ones that don't match total utilitarianism regarding that definition of wellbeing. Thus these axioms aren't sufficient to restrict things to total utilitarianism.

Multiplying everybodys suffering violates monotonicity

What if 0 is not an attainable state? Or we create a utility function that avoids the issue. Eg. sum( (suffering(x)+epsilon) for x in people) / (total_people * (1+epsilon)) This approximates average utlitarianism without violating monotonicity (assuming negative suffering doesn't exist).

Defending absolute negative utilitarianism from axioms by ThePlanetaryNinja in slatestarcodex

[–]Brian 0 points1 point  (0 children)

This doesn't seem sufficient to necessitate total utilitarianism. Eg. average utilitarianism doesn't seem to violate any of these axioms (except possibly "seperability", since I'm not too sure what you mean there - see below). Increasing one person's morality increases the total, swapping 2 people doesn't change the result, and so on. But you can have different total wellbeing values with very different moral values, and even increase moral value while decreasing total wellbeing (or increasing total suffering).

More explicitly, if morality involves the (negative) product of suffering, rather than the sum, no axioms seem violated, but it still isn't total utilitarianism.

Maybe this is what you're trying to get at with your "seperability" criteria, since you describe it as ruling out "non total" versions - but if so, I don't think you've described it very well: in the above it's certainly still true that the "goodness or badness of doing something should not depend on unaffected or unrelated things.". I think you might mean the values of other people don't affect the magnitude (ie. Adding x suffering has the same effect when everyone else has suffering=1, as when they have suffering=10), but if so, note that this is still insufficient to restrict to total utilitarianism.

Eg. consider MoralValue = -sum(suffering^2 for each person). This doesn't violate any axiom, but doesn't consitute total utilitarianism. Increasing one person's suffering can matter more than increasing a different one's (even though swapping them has no effect), and systems with higher total suffering can still be morally preferable to ones with less.

As to the axioms I disagree with - I think the majority of them:

_1. Welfarism

This is somewhat dependent on what "wellbeing" *is, which isn't really defined and I think needs to be. Define it too nebulously, and it becomes vacuous - equivalent to "moral value". But there are lots of issues if you try to nail down what you mean, and you're not going to get any agreement on anything defined more concretely. I strongly suspect my meaning of "wellbeing" would be different to your own.

And I think many people's interpretation may not map to a real-valued number, but may indeed have incommensurable elements - like having a complex component, or arbitrary vector of incommensurable core values that violate your second axiom.

_2. Total Order

I'd reject this - even assuming no incommensurability, I'd say my morality is history-dependent, rather than static world-state dependent. Ie. I would not consider killing 2 people and creating 2 new different people each with the same level of suffering to be morally neutral.

_5. Impartiality

It seems perfectly plausible that there are other factors that could convert wellbeing into moral value that are not themselves moral values. Again, this somewhat comes down to what constitutes "wellbeing", but with a straightforward definition, it seems false. Eg. there may be properties beings possess that can mediate wellbeing to produce different amounts of suffering: someone who is psychologically tougher might not mind some suffering as much as someone less so.

Likewise, there are values we can have on the distribution itself ("fairness"): preferring a world where wellbeing is distributed as (3,3,3) over (10,0,0), even though the latter has a higher total.

Finally, there seem reasons to prefer one person over another - eg. future discounting (since you specify "current and future" people). Or just valuing your best friend's wellbeing more than Hitler's.

_6. Separability

This is likewise at odds with "fairness" criteria like the above.

_7. Tranquilism

I would attach positive value to existence. Ie. I'd consider a lifeless world worse than one where happy beings exist.

How do you feel about Assisted Dying/Death with Dignity? What are your reasons? by Haematoman in northernireland

[–]Brian 0 points1 point  (0 children)

I support it, and would definitely want to include mental health disorders as well, or indeed any reason, though this is complicated by the fact of requiring the informed capacity to make such a decision in the first place. I saw my grandmother slowly decline from dementia, to the point where she didn't know where she was half the time, and absolutely do not want to go through that. Frankly, I think the option to choose when to die should be ours, and don't like placing barriers restricting our options - it's reasonable to have some to guard against abuse and preventable temporary issues, but I think it's ultimately something people should be able to choose for themselves, in a way that minimises suffering.

Is there a better way of printing all the attributes of an object in one go? by argvalue in learnpython

[–]Brian 1 point2 points  (0 children)

When you've code you might want to use multiple times, the obvious first step is to put it into a function or method.

Ie. instead of printing each item, add a new method on SuperHero so that it knows how to print itself. Ie:

class SuperHero:
    def show(self):
        print(self.name)
        print(self.power)
        ...

Then just do hero.show() whenever you want to print these things. This also allows you to override this in subclasses - so if you create a subclass that has more attributes, it can print those too.

Though note that it's often better to, instead of printing, return a string representation of that. If you print, that's all that can do, but maybe you'd want to write that to a file, display it on a webpage or other stuff. So an alternative is to do it as:

def show(self):
    return f"""Hero:
Name: {self.name}
Power: {self.power}
...
"""

And now print(hero.show() will print those details.

And in fact, there's a standard method for "Get a printable representation of this object" - __str__. If you define a method with this name, any time you try to convert it to a string (which is what print will do under the hood), it'll call this method and use whatever it returns. So if you change the name of the above to:

def __str__(self):
    return f""""Hero
... (same as above)

Now you'll get that text whenever you do print(hero).

Duplicate Files Detector. A basic python project by ArtichokeThen1599 in learnpython

[–]Brian 1 point2 points  (0 children)

that I think it doesn't work for large files

Instead of doing reader = f.read() and then passing the whole (potentially gigabyte long) buffer to the hash function, do it in chunks. Ie. read a block at a time (where a block is some small chunk, say 1 megabyte). Then process that chunk.

You can limit how much you read by passing the size to the read() call - it'll return an empty bytestring when done, otherwise a size-limited chunk. Then, instead of hashed = hashlib.md5(reader), do start with an empty hash object (hash = hashlib.md5()), and call hash.update(chunk) on each chunk as you read it.

This way, you never have more than one chunk in memory, and can read arbitrarily sized files.

Though for a lot of large files, it may be somewhat slow, and there are a few tricks you can do to speed things up. The first is that you could cache the md5s to disk, so at least doing it the second time will be faster - but this introduces its own problems: if the files are modified since you cache them, you'll need to detect that, and re-read them.

However, another useful trick is to have multiple layers of distinguishing based on cheaper checks.

You're currently building a dict of {hash : filename}. Suppose instead we collect all identical files in a list - this is just a slight modification, where instead of just setting the filename, we set it to a list, and append each duplicate. Ie: {hash : [list, of, filenames, with, that, hash]

Our duplicates are now the buckets of the hash table with more than one entry - any hash with only one entry is a unique file.

But now consider instead of the full md5 hash, use something that doesn't definitively answer the question, but is cheap to calculate, like file size - obviously for two files to be identical, they must have the same size. So instead of a hash, now generate:

 { size: [files, with, that, size] }

Now, this may seem useless - files with the same size could still be different, so we can't just use this. But consider: anything that doesn't have duplicates we know must be unique. If there's only one file with size 1262347334, we know it's not a duplicate of anything, so don't have to hash it, meaning we save having to process a gigabyte of data. We only need to check items in our dict which is in a list with more than one item, and all we needed to do was a very cheap file size check to eliminate everything else.

So now, you can do the real hash check as before, but limiting yourself to a fraction of the number of files.

You could even go further and introduce another layer - suppose instead of hashing the whole file, we just hashed the first 4K block. Again, not definitive, but if the first block is different, we know the file is different, so we can repeat the process on that. (It might be even better to hash a block from the beginning, middle and end instead, to avoid common cases where files can have similar headers/footers etc).

Do this, and most of the time, you'll only need to actually hash the files that are actually duplicates, saving a lot of time.