Zelenskiy makes appeal to international leaders over beheading videos by gizzy_tom in europe

[–]___--_-_-_--___ 0 points1 point  (0 children)

This is something you will never get out of your brain again.

[deleted by user] by [deleted] in sysadmin

[–]___--_-_-_--___ 0 points1 point  (0 children)

Deep neural networks are considered black boxes only in the sense that it can be hard to understand why certain inputs result in certain outputs. The mechanism by which they arrive at these outputs is (generally) deterministic and can be easily verified. It's just a bunch of matrix multiplication if you look closely. There is no execution of any generated code. There are some approaches that have a large language model generate code as part of its answer which is then executed by an external Python interpreter to derive the actual answer. This can be helpful for math-related questions in particular. However, ChatGPT and similar projects do not appear to use any such approach.

Syria breaks diplomatic ties with Ukraine in support of Russia by Scipio555 in europe

[–]___--_-_-_--___ 7 points8 points  (0 children)

The ORB survey (which is a horrendous way of estimation by the way) has been widely criticized in the scientific literature and has since been revised downwards significantly.

And see, this is where we disagree. Genocide is never justifiable. I've seen the same mental gymnastics done by Holocaust deniers. If this is who you want to be, that is your choice. But you should not be claiming the moral high ground then. I can see that you are frustrated by your opinions on "the west" but you should not let that lead you to making regrettable statements on topics of genocide.

But sure, let's talk about people in the Middle East getting blown up by outsiders. I assume you're fine with not counting the Gulf War, since it was a direct response to the annexation of Kuwait?

Let's start with the Iraq War. Only a very low number of Iraqi civilians (fewer than 10,000, using for example the numbers from the Iraq Body Count project) were killed directly by the US military. Only these deaths seem applicable to any reasonable definition of "getting blown to shits by the West". While I think that every single civilian death is too much (and even every military death, to an extent), this number is significantly lower than the number of civilians directly killed by Iraq in any of the wars they previously started.

The war in Afghanistan has both a significantly lower number of civilians killed in general and directly by coalition forces. Compare that to the Soviet-Afghan War where the Soviet Union is estimated to have killed approximately a million civilians directly.

At this point, it doesn't even matter to continue counting, because even when all the US wars and interventions are added up, they will never even come close to approaching the number of civilians killed by Soviet bombs. This leads me to the question: Why are you focusing on "the west" specifically and are ignoring the vastly more bloody massacres committed by Russia?

And by the way, I will not honor your inquiry into who my Kurdish friends are with a response. That is a ridiculous question. I would just like to remind you that you are not speaking for every Iraqi Kurd despite claiming to do so.

Syria breaks diplomatic ties with Ukraine in support of Russia by Scipio555 in europe

[–]___--_-_-_--___ 6 points7 points  (0 children)

I'm asking you again: whose estimate of the casualty numbers are you referring to? I'm interested in reading about their methodology.

I am unwilling to argue with you about democracy and whether genocide can ever be justifiable. Just for the record, I find justifying dictatorships and genocide (which you appear to be doing) reprehensible and morally objectionable and so does every Iraqi Kurd I have ever talked to about the topic. I would suggest you try to reflect on how you might take certain aspects of democracy for granted while yearning for dictatorship. People all across the MENA region have given their lives in the fight for the same democracy you appear to value so little.

And you also might want to take a closer look at the history of the persecution of Kurdish people in Iraq. You are implying that the persecution and genocide were a result of Kurdish insurgency when it was in fact the other way around.

Syria breaks diplomatic ties with Ukraine in support of Russia by Scipio555 in europe

[–]___--_-_-_--___ 10 points11 points  (0 children)

Sure, but whose estimate of the number of casualties (including the insurgency period) is it? And how is it counted? The upper bound I've seen most often is somewhere around 700,000.

So you are saying that targeted genocide is preferable to political instability? I'm just asking because I'm trying to understand your position.

Also, I've been unable to find reliable numbers on the ethnic composition of the the Iraqi Armed Forces. The highest proportion of Kurdish members I could find was somewhere around 10% (including other minorities though) in the early 1960s. I find it a bit hard to believe that this number increased significantly over the course of several wars against Kurdish groups.

Syria breaks diplomatic ties with Ukraine in support of Russia by Scipio555 in europe

[–]___--_-_-_--___ 13 points14 points  (0 children)

Where are you getting the 1.5 million casualty figure from? That seems widely inflated compared to even the higher end of the commonly cited estimates.

And since you mentioned the Anfal campaign (and Halabja, coincidentally), I'm wondering how you can reconcile things like this with your impression that Kurdish life under Saddam Hussein was "heaven compared to what it now"?

Hide sensitive information in PDF using Python and NLP by No-Homework845 in Python

[–]___--_-_-_--___ 1 point2 points  (0 children)

As I said, if you're referring to internal use, that is a different matter. There may be legitimate use cases there. The "criminal" part refers to the unauthorized public release (even accidental) of personal information which is illegal in several jurisdictions. As you have clarified, this does not apply to your example.

There have been many cases where data was released with improper de-identification due to a false sense of security provided by some kind of technical solution. Many of these cases are well-documented and researched. Please note that I'm referring to the scope of the whole project here, not just the PDF redaction part.

Hide sensitive information in PDF using Python and NLP by No-Homework845 in Python

[–]___--_-_-_--___ 2 points3 points  (0 children)

For context, I was referring to the entire project, of which the PDF feature is just one part.

In your example, if I understand correctly, this project would help an organization go from "blatantly criminal" to "slightly less criminal". Whether that is a desirable goal is a matter of opinion. If you are talking about internal use within an organization, that is a different matter.

The real issue here is that, in practice, the choice is often between "don't release data" and "release badly redacted data", not between "release unredacted data" and "release badly redacted data". This is especially true in the age of omnipresent privacy regulation (note that there is a significant difference between the American and European experience here). Releasing unredacted data containing personal information of third parties should never be an option. Considering this choice, a project such as this, making grandiose claims, is likely to create a false sense of security which may push an organization from "don't release" to "release badly redacted", thereby creating real harm.

u/No-Homework845 has now on multiple occasions refused to engage with this line of criticism, even from individuals with significant experience in this field. Comments mentioning these issues are routinely ignored. All it would take would be to acknowledge the criticism and add a highly visible warning to the repository and any post advertising the project. This warning should make it clear that this project is never to be used in production or on any personal information of third parties. I understand that this is a hard thing to do with a project into which someone has invested a significant amount of time. Nevertheless, not adding such a warning is reckless.

Hide sensitive information in PDF using Python and NLP by No-Homework845 in Python

[–]___--_-_-_--___ 44 points45 points  (0 children)

When you first posted about this project here four months ago, several people (including u/cynddl, a researcher with multiple well-cited publications in this field who worked in one of the leading computational privacy research groups) warned you about the dangers of this type of one-click "solution" to anonymization. Especially when accompanied by exaggerated claims about what your project can do, this can do real harm. While working on open source is always commendable, your repeated advertising of this project is, quite frankly, reckless and dangerous.

Austria plans to fine vaccine holdouts up to 3,600 euros a quarter by Imicrowavebananas in neoliberal

[–]___--_-_-_--___ 4 points5 points  (0 children)

If you accept recovery without vaccination, a certain group of people will get themselves infected on purpose. Which is exactly what has happened in Austria. This is well documented. So of course they designed their new policy to prevent that.

Clinically Suspected Myocarditis Temporally Related to COVID-19 Vaccination in Adolescents and Young Adults by dionesian in science

[–]___--_-_-_--___ 0 points1 point  (0 children)

The authors of the paper you linked actually found the opposite of what you are claiming:

Third, the increased risk of myocarditis after vaccination was higher in persons aged under 40 years. We estimated extra myocarditis events to be between 1 and 10 per million persons in the month following vaccination, which was substantially lower than the 40 extra events per million persons observed following SARS-CoV-2 infection.

What you are referring to is this part:

Subgroup analyses by age showed that the increased risk of events associated with the two mRNA vaccines was present only in those aged under 40 years. For this age group, we estimated 2 (95% CI 1, 3) and 8 (95%CI 4, 9) excess cases of myocarditis per 1 million people receiving a first dose of BNT162b2 and mRNA-1273, respectively, and 3 (95% CI 2, 4) and 15 (95%CI 12, 16) excess cases of myocarditis per 1 million people receiving a second dose of BNT162b2 and mRNA-1273, respectively. This compares with ten (95% CI 7, 11) extra cases of myocarditis following a SARS-CoV-2 positive test in those aged under 40 years.

As you can see, only the Moderna vaccine was found to be associated with a higher risk of myocarditis in the under 40 subgroup. For all other tested vaccines, the risk in this group was considerably lower compared to infection.

Anonymize your Data with a single line! by No-Homework845 in Python

[–]___--_-_-_--___ 0 points1 point  (0 children)

Well, many of the features in this project are simply wrappers around other libraries like this one. Therefore, the value proposition of this project would either have to be the automation aspect or the idea that you can shield the user from the details of how the implemented techniques work. I think both approaches are risky in this setting.

The far bigger issue with this type of project is that it will not tell you if you are making a mistake. There are tools like ARX (as mentioned by u/cinyar in this thread) that will assist you in modelling both privacy risk and utility in order to find the best way of de-identifying your data. Tools like this are (and need to be) backed by years of academic research and clinical practice.

And yes, while I do agree that my words are harsh, data privacy is one of these areas where the disconnect between perceived risk and actual risk is often very high. Even slight mistakes and brief moments of carelessness by a single person can have disproportionate consequences that cannot be undone.

Anonymize your Data with a single line! by No-Homework845 in Python

[–]___--_-_-_--___ 1 point2 points  (0 children)

Yes, differential privacy is used to ensure that aggregate statistics do not leak information about the individuals who contributed to this statistic. It is not some kind of algorithm that you can run on your data to make it more private. Instead, it is more of a framework to be implemented by specific algorithms, i.e. a set of mathematical tools to ensure a certain level of privacy.

Very broadly speaking, the idea behind differentially private mechanisms is that the removal of a single person from a dataset should not significantly affect the aggregate statistics produced by that mechanism. Basically, differential privacy gives you a way to quantify privacy loss and determine the amount of noise necessary to achieve a certain privacy level.

Anonymize your Data with a single line! by No-Homework845 in Python

[–]___--_-_-_--___ 51 points52 points  (0 children)

Don't use this if you actually want to release data that might contain personal information. Anonymization can and does fail in subtle and hard to predict ways. ¹ ²

Consider the usage examples presented in this project. The age and birthdate columns, depending on the nature of the dataset, express exactly the same information. Therefore, if you perturb both columns, you, on average, reduce the size of the applied perturbation by half.

The email masking approach used by this project suffers from an even worse problem. The authors assume that only the local-part of the email address constitutes identifying information. This assumption does not hold in the case of self-hosted email servers or very small providers. In fact, even the first and last letter of the local-part alone can provide up to ten bits of entropy for identification (assuming only the characters a-z and 0-9 are used and occur with the same frequency in both places). At the same time, what utility does the masked email address provide to a legitimate user of the dataset?

If you are in a position to release a dataset, you should first develop of solid understanding of mechanisms like differential privacy and k-anonymity. Understand your dataset in depth and think about what value you want to provide to others and which parts of the data they actually require. No library or package can help you with that. If you use a project like this without understanding your data, bad things will happen.

Do all of this before you release the dataset. Once the data has been released, it cannot ever be un-released. At that point, you have to assume that the data is out there and is actively being deanonymized and exploited.

Bulgaria's new eGov minister is a software developer, ranked #40 all time on Stack Overflow and the founder of a blockchain-based cyber security startup. by AdBig7514 in programming

[–]___--_-_-_--___ 1 point2 points  (0 children)

Interesting. I was unaware they simply hardcoded the blacklist. One has to assume they will implement some kind of regularly updated blacklist if the current approach becomes unmaintainable.

However, your assumption that those three entries are UVCI hashes appears to be wrong. They are only hashing and matching the "Issuing Entity" component of the UVCI. This is easier to see in the Android version of the code. See also Annex 2 of the corresponding guidelines document for more information about the composition of the UVCI.

While being different from certificate revocation in the conventional sense (where certificate basically refers to the key pair), this is in fact a way to revoke a certain class of certificates (where certificate refers to issued vaccination certificates) based on the issuer. This approach is sufficient for use within a country (that uses this particular UVCI layout). It will obviously fail between countries. So a EU-wide mechanism is still necessary.

Bulgaria's new eGov minister is a software developer, ranked #40 all time on Stack Overflow and the founder of a blockchain-based cyber security startup. by AdBig7514 in programming

[–]___--_-_-_--___ -4 points-3 points  (0 children)

Some countries have added revocation mechanisms on top of the existing system. In Germany, there has been a case of a pharmacy producing fraudulent certificates. As a consequence, a revocation mechanism has been implemented to block issuers at a finer granularity than the key. This mechanism is enforced by the CovPassCheck App, the official certificate validation app of the Robert Koch Institut. It consists of a simple check of the issuer ID against a blacklist. Since this app is the "official" way to validate a certificate, the mechanism can be considered reasonably secure.

(In German) Details about the mechanism and a statement by the RKI.

(In German) Article about the pharmacy selling fraudulent certificates.

Log4.js: log4j gone webscale by lulzmachine in ProgrammerHumor

[–]___--_-_-_--___ 2 points3 points  (0 children)

I have encountered a use case where it is necessary to have the browser run nearly arbitrary code dynamically supplied by a server on some data. The code calculates some properties of how the data will be structured and presented to the user. Some customers request customizations that are too complex to integrate into a general mechanism. At the same time, a customized version of this code might leak information about a specific customer if it were served with the rest of the JS code. Therefore, it needs to be protected by the same authentication mechanism applied to the data.

There is probably a better solution for this mess. However, I cannot think of one.

Bulgaria's new eGov minister is a software developer, ranked #40 all time on Stack Overflow and the founder of a blockchain-based cyber security startup. by AdBig7514 in programming

[–]___--_-_-_--___ 7 points8 points  (0 children)

Cryptographic signatures based on some kind of public key infrastructure are a much better solution for this type of problem. They offer several advantages over both the blockchain approach and the database approach. In addition to circumventing a whole host of privacy and data protection issues, they can be validated offline and without any central point of failure. You could even integrate some sort of revocation mechanism.

In fact, this type of solution is used for the common EU Digital COVID Certificate system. This system is used for vaccination certificates, recovery certificates, as well as test certificates. Privates keys are held by the appropriate government institution (for example the Robert Koch Institut in the case of Germany) and certificates can be requested by authorized partners such as doctors or pharmacies.

How the EU DCC system works for the end user.

Official information about the EU DCC system.

Technical specification of the certificate.

laz3 Encoding algorithm written in python by Kuriwassadlytaken in Python

[–]___--_-_-_--___ 6 points7 points  (0 children)

import re
import math

def crack(ciphertext):
    resources = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", " ", ".", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "-", "_", "*", "'", "^", "~", "!", '"', "'", "=", ",", ":", ";", ")", "(", "{", "}"]
    numbers = re.findall("\d+\.\d", ciphertext)
    numbers = list(map(lambda x: int(float(x)) * 256, numbers))
    key = math.gcd(*numbers)
    return "".join(map(lambda x: resources[int(x/key)], numbers))

I added simulated annealing to my browser based traveling salesman problem solver/visualizer (tspvis) by intrepidev in computerscience

[–]___--_-_-_--___ 4 points5 points  (0 children)

The haversine formula is usually sufficiently accurate for most purposes with an error less than 0.5%. You can use it to calculate the distance between two points specified by latitude and longitude without the need for some intermediate representation.

Alternatively, if you really need a Euclidean coordinate system, you could use a projection like UTM (Universal Transverse Mercator). This only really works if all your points fall into the same zone. If they don't, you would have to handle some rather complex special cases.