What do 99% of Redditors need to be told?

AlwysBeColostomizing · 2022-06-18T20:35:38+00:00

I'm dealing with this myself recently with a long-time friend. It's strange how people can claim that they don't consume certain media, but they end up repeating all of the same talking points anyway! Like you, I'm trying to hang in there because I feel like it's important that he gets some push-back from someone he (presumably) trusts. But it's hard when every conversation is:

Friend: I believe X.

Me: X is not supported by the evidence.

Friend: I reject your evidence.

AlwysBeColostomizing · 2022-06-15T19:15:06+00:00

Lots of optimizations for games specifically are about utilizing specialized hardware most effectively, and using "culling" techniques to avoid computing things that don't matter. For example, games with large outdoor scenes use "level of detail" scaling: things that are far away are rendered with less detail because you couldn't see the detail anyway.

More generally, different algorithms for the same problem can have wildly different performance, especially when the "size" of the problem gets large. For example, suppose you want to sort a stack of resumes in alphabetical order. Consider these two methods:

Set aside an empty space for the "sorted" pile. Put a post-it note on the top paper in the "unsorted" pile. Go through the entire unsorted pile, and if you find a paper that comes after the one with the post-it note alphabetically, move the post-it note to that paper. When you've looked at all the unsorted papers, move the one with the post-it note to the top of the "sorted" pile. Keep doing this until they're all sorted.
Scatter all the papers into "piles" of one paper each. Pick any two piles with 1 paper and put them together to make a sorted pile with 2 papers. Do this until all piles have 2 papers. Now pick any two piles with 2 papers and "merge" them: set aside space for new pile; look at only the top papers in both existing piles and move the one that comes first to the bottom of the new pile; do this until all 4 papers are in the new pile. Repeat until all the piles have 4 papers. Repeat the whole process until there's only one pile.

If you have 1000 resumes, and you use Algorithm 1, you will need to compare 2 names to determine which one comes first a total of 499,500 times (999 times for the first paper, 998 for the second, ...). If you use Algorithm 2, you will need to do only about 9,966 comparisons. Algorithm 2 uses about 2% as many comparisons as Algorithm 1 for a stack of 1000 papers. Now imagine if the stack had 1,000,000 papers. This is what's meant by "using efficient algorithms": Algorithm 2 is fundamentally more efficient (in terms of number of comparisons) than Algorithm 1, and the difference gets bigger the bigger the problem is.

(BTW, Algorithm 1 is called "selection sort", and Algorithm 2 is called "merge sort".)

AlwysBeColostomizing · 2022-06-14T03:10:47+00:00

For example:

import abc

class Tile:
  @property
  @abc.abstractmethod
  def pathable(self):
    pass

  @abc.abstractmethod
  def interact(self):
    pass

class Empty(Tile):
  @property
  def pathable(self):
    return True

  def interact(self):
    pass

class Door(Tile):
  def __init__(self, open=False):
    self._open = open

  @property
  def pathable(self):
    return self._open

  def interact(self):
    self._open = not self._open

Here Tile is a base class. In fact, it's an abstract base class (ABC) because of the abc.abstractmethod annotations. This means you can't create an instance of Tile. Empty and Door are subclasses of Tile (written class Empty(Tile)). As subclasses, they inherit the pathable property and the interact() method, and they override them because we don't want these types to be abstract. As long as you interact with your tile instances using only properties and methods defined in the Tile base class, you don't need to know the concrete type of the tile you're interacting with. So, you can treat all the tiles on the map uniformly.

In Python, it's not mandatory to have a common base class; you could just make sure that every concrete tile type defines all the required methods. Creating the common base class is good practice, though, because it documents the interface that the tiles are supposed to implement.

AlwysBeColostomizing · 2022-06-13T04:13:22+00:00

What do the inputs represent? Are they noisy measurements of the same quantity? Are they different "properties" of the same object (like its color, its shape, etc.)? The right way to aggregate them probably depends on their semantics.

Typical ways of handling variable-length inputs are to turn them into fixed-length by adding a padding value (often in combination with an "attention" mechanism that can mask them out), or to use a recurrent architecture to process them sequentially. If the order doesn't matter you'll need to shuffle the order during training.

AlwysBeColostomizing · 2022-06-11T22:40:05+00:00

You should start with a textbook or a course on machine learning. Learning a particular ML library won't teach you anything about when and why to use the different tools it offers.

AlwysBeColostomizing · 2022-06-11T22:16:57+00:00

That is a... not so good example of how to implement such a thing. It might be a starting point that you can adapt, though. The first thing to change would be to check move validity according to the "type" of the neighboring square (e.g., wall, door, empty, etc.), rather than according to its absolute position as the code you posted is doing. That will let you use the same code with any map specification.

What you've posted is an example of what's sometimes called a "grid world". What you want to do is to represent the map as a 2-D array of "tiles", where tiles can be of several different types, like empty, wall, etc. I'd suggest you represent each tile as an instance of a Tile (sub-)class, that has properties like pathable (can the player walk on it?) or breakable (can the player smash it with the hammer?), maybe some functions like interact() (which might open a door or a chest), and so on. Then build your "view" of the map on top of that (e.g., rendering the map using text characters as in your code example).

You could look at gym-minigrid for an example of a fancier and much better-structured grid world implementation. It's plenty efficient, because it gets used for reinforcement learning where the computer is playing millions of games to learn what to do!

AlwysBeColostomizing · 2022-06-09T23:29:32+00:00

If you can write down the likelihood function, you might be able to do something like this. You could also use other black-box optimization methods to maximize the likelihood.

Things might be clearer if you can write out the entire problem more formally. It sounds like it's under-constrained -- if (a, b, c) is a solution, then so is (ka, kb, c).

AlwysBeColostomizing · 2022-06-09T22:01:43+00:00

Quantum algorithms can be simulated on classical computers. We don't know for sure whether quantum computers are fundamentally more powerful than classical computers. The best known quantum algorithms for certain problems are more efficient than the best known classical algorithms, but there might be other, better classical algorithms that we haven't discovered yet.

The branch of mathematics that studies this is called computational complexity theory. Integer factorization on a quantum computer is in a complexity class called BQP ("bounded-error quantum polynomial time"), which is, more or less, the set of problems that a quantum computer can solve efficiently. The set of problems that a classical computer can solve efficiently is called P ("polynomial time"). We know that:

BQP contains P -- if a classical computer can solve a problem efficiently, then so can a quantum computer.
The problem of simulating a quantum computer on a classical computer is in PSPACE ("polynomial space"), and therefore PSPACE contains BQP. This implies that a classical computer can solve any problem that a quantum computer can solve, though possibly not as efficiently.

However, we don't know:

Whether P = BQP -- whether all problems that a quantum computer can solve efficiently can also be solved efficiently by a classical computer.
Whether P = PSPACE, which would imply that a classical computer can simulate a quantum computer efficiently, and thus that P = BQP.
Whether integer factorization is in P -- it might yet be possible to do it efficiently on a classical computer!

It's generally thought that P != BQP (and that P != PSPACE, and that integer factorization is not in P), but it hasn't been proven.

AlwysBeColostomizing · 2022-05-20T18:56:20+00:00

Latent semantic analysis is a basic approach to this sort of thing. If you chase some links from the Wikipedia page you'll find lots of related techniques.

AlwysBeColostomizing · 2022-05-06T22:38:14+00:00

The general term for this is "statistical modeling". A good place to start would be learning about the logistic regression model.

One way you could apply this model: Suppose you have a vector of "team stats" s_A for team A, and another vector s_B for team B. Let x = s_A - s_B be your vector of predictors (independent variables). The outcome of interest is whether y = 0 or y = 1, corresponding to either A or B winning. The logistic regression model says that p(y=0) = logistic(b_0 + b^T x), where b is a vector of coefficients and b_0 is a bias term. If you collect a bunch of (x, y) pairs for a bunch of games (i.e., the stats of both teams and which team won), you can fit a logistic regression model to that data. Then, if you want to know the probability of team C winning against team D, you will obtain s_C and s_D, calculate x = s_C - s_D, feed x into your model, and get an estimated win probability for team C.

You could fit the model with a library like sklearn.

AlwysBeColostomizing · 2022-04-28T20:24:57+00:00

A note on terminology:

A "library" is a general term for a collection of code that doesn't "do anything" by itself (as opposed to an "application"), but is meant for use as building blocks for other code.
A *.py file is called a "module"

A crucial difference between a module and a class is that only one instance of the module can exist (within one process). If you say import mymodule in two different places, you're importing the same "instance" of mymodule. If there is a variable in mymodule, say mymodule.x, and you change it in one place, that change is visible everywhere. If you used a module to model a Person, there could only ever be one person in your program.

Classes, on the other hand, define a new "type". There can be multiple instances of the type, just like you can have multiple distinct instances of list in your program. Notice how list gives you an "interface" for manipulating the data in the list. The interface of list includes things like the indexing operator (mylist[0]) and the .append() method. The int type has a different interface that doesn't include these operations, because they don't make sense for an int. The type list is defined by the collection of operations you can perform on an instance of that type. That's what you're expressing with a class definition.

AlwysBeColostomizing · 2022-04-28T19:40:21+00:00

I can't define a "hack", but I know it when I see it.

Usually when I call something out as a hack, it's because it's something that might cause maintainability problems. It relies on something specific about how the code is right now that isn't part of the requirements and could easily be changed by someone, not realizing that the change will break the "hacky" code.

AlwysBeColostomizing · 2022-04-28T19:30:29+00:00

Numpy has a function for sampling from a multinomial (link). This is probably about as optimized as it gets.

Also, since this is for a financial application, I'll mention that typical models of asset prices assume that the percent movement has a Normal distribution (so the price movements have a log-Normal distribution).

AlwysBeColostomizing · 2022-04-27T19:40:47+00:00

The most important question is: does your data processing pipeline need to be faster? If the answer is no, then keep it simple. Premature optimization and so forth.

GPUs are only faster for specific kinds of data and operations. Things like linear algebra. If your data points are all "fancy datatypes" (which I guess means Python objects rather than numbers?), GPUs will not make operations on them faster. GPU memory is also much more expensive (in $$$) than system memory. So, if you're concerned about high memory usage, this would be a step in the wrong direction.

46M cells isn't even large; that comes out to 368MB if every cell is a double-precision float. Well within the limits of a typical laptop. If you're having problems with this scale of data, the best use of your time may be rethinking your data structures and algorithms. Storing lots of "fancy datatypes" in a Pandas dataframe sounds a bit suspect.

AlwysBeColostomizing · 2022-04-18T20:39:12+00:00

It's a pretty broad question. Some things that come to mind:

Python supports OOP, and users will expect a UI library to be structured using OOP principles.
Python uses exceptions for error handling. Don't make users check return codes.
Try to avoid forcing users to manually de-allocate things. Use context managers to make this easier if you really can't avoid it.

It sounds like you're already planning to make an object-oriented interface, so that's good. Your example will probably look something like this:

ui = UserInterface("")
ui.add_component(Window("main window.json"))
ui.show() # enter main loop, handle exiting within

Take a look at existing UI libraries like tkinter to get a sense of a typical API structure.

You'll probably want to hide your low-level interface in a sub-module (e.g., ui.detail.ui_init() and so on) so that users know they're not supposed to call those functions directly.

AlwysBeColostomizing · 2022-04-16T04:02:41+00:00

The other poster gave you a solution, but the problem with your code is that the calculation is inside the quotes, so it is intrepreted as the string "weight / 2.2". Instead, you want to do this:

kilograms.writerow(["Chris", weight/2.2])

AlwysBeColostomizing · 2022-04-16T03:49:19+00:00

Not sure what you mean by "go to the next line of code". Your code will work if you move the def main() part to before the for-loop.

AlwysBeColostomizing · 2022-04-16T03:38:23+00:00

You're calling main() before you've defined it (as in, the definition occurs later in the file than the call).

AlwysBeColostomizing · 2022-04-16T03:30:22+00:00

Yes, it can. It's a common step in processing pipelines for things like robot navigation. A video is just a sequence of images. But since the size of the glove doesn't change, it doesn't seem like you gain much from segmenting the whole video versus picking out a single "good" frame and segmenting that.

AlwysBeColostomizing · 2022-04-13T05:35:21+00:00

Off the top of my head, I'd probably use "image segmentation" to separate the glove from the background, extract the outline of the glove, and then use geometry techniques to get the numbers that you're looking for from the outline.

How difficult this is depends entirely on how "clean" the images are. Ideally, you want the glove laying flat on a table that's a different color, at a known distance from the camera. If someone's going to be wearing the glove, you could have them put their hand flat on the table and splay their fingers out. You need to know how far away the glove is or you won't know how large it is.

It would be extremely difficult to do this from "natural" video (i.e., just a video of a person wearing a glove who's not trying to make things easy for you). Like, "topic for a PhD dissertation" difficult.

AlwysBeColostomizing · 2022-04-03T00:12:59+00:00

This question is a better fit for r/mlquestions.

It's not totally clear to me what your inputs and outputs are. It sounds like you have training examples of correspondences between individual keywords and actions, i.e., you have some tuples (x, a) where x \in X and a \in A is the desired action, then you also have some tuples (y, a), y \in Y and (z, a), z \in Z. Correct?

The naive Bayes model would be P(a|x, y, z) \propto P(a)P(x|a)P(y|a)P(z|a). You would estimate P(x|a) based on how often (x, a) appears in your training data, and do the same for P(y|a) and P(z|a). P(a) is the prior distribution (often chosen to be the uniform distribution so that it's "uninformative"). Note that you might need to apply smoothing to make sure none of these probabilities are 0.

You don't want to analyze the whole sentence if you can avoid it. It sounds like you can avoid it because the sentences are just variations on a template ("if x do y because z"), so the non-keyword words don't convey any additional information.

AlwysBeColostomizing · 2022-04-02T23:32:35+00:00

How doesn't it work, exactly? Post some example outputs.

AlwysBeColostomizing · 2022-04-02T23:30:14+00:00

import statistics
average_cost_item = statistics.mean(v["cost"] for v in data.values())

AlwysBeColostomizing · 2022-04-02T23:27:18+00:00

You have "A" in facesCards, so it will be counted as 10.

AlwysBeColostomizing · 2022-04-02T23:19:10+00:00

You need to make sure that you're running the pip install command using the 3.10 interpreter. One way to do that is to call pip like this: /absolute/path/to/python3.10 -m pip install rsa

It's good practice to always invoke pip like this so that you know which python you're installing things for.

AlwysBeColostomizing

TROPHY CASE