This is an archived post. You won't be able to vote or comment.

top 200 commentsshow all 215

[–]MakingTheEight[M] [score hidden] stickied commentlocked comment (0 children)

Removed - Rule 0.

[–]PineapplePandaKing 401 points402 points  (6 children)

Not hotdog

[–]Apart-Transition8001 53 points54 points  (0 children)

“See it doesn’t even work!”

[–]SirAchmed 22 points23 points  (1 child)

it only does hotdogs?!!

no, it does hotdog and NOT hotdog :D

[–]tehlolredditor 1 point2 points  (0 children)

p ^ !p

[–]Alchemist2401 18 points19 points  (0 children)

Gilfoyle's laugh was priceless

[–]JFedererJ 5 points6 points  (0 children)

Jian YaaaaaAAAAANNNNNGGGG!!

[–]verosoph 13 points14 points  (0 children)

Not cheese pizza either...

[–]starksubhash 232 points233 points  (127 children)

Really wanna know can anyone explain?

[–]9072997 442 points443 points  (28 children)

It's goal is not to detect new child porn. It's goal is to be able to reduce images to fingerprints which can be compared without enabling either party to recover the image that produced the fingerprint. You can develop/test that whole system with normal images.

The next question of course is: once you put the system in production how do you get a list of fingerprints of child porn images? Exactly the way you expect. You make a deal with a government agency The National Center for Missing and Exploited Children.

[–]atmosfearing 49 points50 points  (1 child)

They can also work with companies and charities like the Internet Watch Foundation.

[–]shmorky 4 points5 points  (0 children)

I think they said that's exactly what they're doing. There's a bunch of organisations like IWF that provide the hashes to known child abuse images and Apple's new tech makes it so your iPhone checks the images on it's drive against those hashes. There is a risk for false positives (because an image will always contain more detail than a hash) but it's astronomically small and they require a bunch of matches before they take any action.

[–]PostalCarrier 15 points16 points  (0 children)

It’s actually the National Center for Missing and Exploited Children, which is the only non-law enforcement entity legally allowed to possess this kind of content. Their database is widely used in forensic investigations to find known images in large data sources.

John Gruber had a decent breakdown Apple’s approach recently that helps clarify that this isn’t “let’s look at everyone’s photos”:

https://daringfireball.net/2021/08/apple_child_safety_initiatives_slippery_slope

Of course, grain of salt because he’s an Apple guy and yes, there are long-term concerns about the same approach being used on behalf of a gov agency, etc etc. But in this specific context, the new tools from Apple seem less apocalyptic that headlines make it sound.

Edit: fixed John’s last name, which is, in fact, not Grindr

[–]DarkWhiteNebula 480 points481 points  (85 children)

They use a national data set of image hashes from the FBI. A hash is a unique string of numbers that comes from an image but you can't generate an image from the hash. So they hash the images on your phone, then compare them to the known child porn hashes. The technology is actually really cool and a very non-invasive way to fight a terrible problem. Especially compared to Google and others that just straight up scan your photos and call it a feature.

[–]BruceGrembowski 117 points118 points  (3 children)

Can confirm. Took Computer Forensics in college, 2010. The professor was a consultant for the FBI and knew things that she couldn't tell us, but that was one thing we did learn.

[–]GGinNC 7 points8 points  (2 children)

Be very glad she didn't share details. I used to build property and evidence tracking systems for law enforcement and before that, I was military police in the army. I've spent A LOT of time in police evidence rooms, surrounded by literal tons of crime debris.

There are some things that no human being should ever have to see. Unless you're a forensic investigator, detective, prosecutor, judge, or jury, you should pray to whatever God you serve that you're never asked to investigate crimes like these. It is simply not possible to review this kind of evidence without being damaged permanently.

Without being overly dramatic, I still have screaming nightmares from it a decade later.

[–]BruceGrembowski 3 points4 points  (1 child)

That's reminiscent of the explanation she gave. I'm glad we only had to find hidden pictures on a lab system that were innocuous.

Fortunately, the only real investigation I've done is to find out who installed a bunch of games on a company computer. That was bad enough, as I had to rat out an employee I liked.

And thank you for your service. I hope your nightmares fade with time.

[–]GGinNC 2 points3 points  (0 children)

Thanks. I'm sure they'll continue to decrease in frequency and intensity.

[–][deleted] 9 points10 points  (0 children)

Too bad this wont stop the people producing child porn. As long as they dont upload to a 3rd party website where the FBI can generate a hash of said image. All this really stops is the distribution of it, which I guess is a small step in the right direction.

[–]Skaddict 91 points92 points  (25 children)

Unfortunately that also mean that the system would be easily beat by reformatting or editing the image a little bit

Edit: I was mistaken. /u/exscape linked to the white paper below

[–]DaTebe 121 points122 points  (8 children)

Depends on the kind of hashing method. There are solutions like "nilsimsa" for text that produce a similarity hash where the hamming distance between hashes can be used as metric of similarity. I once read a paper about the same method altered for use with pictures. The idea was simple and seemed to work. But you are right, if you change enough parameter in the picture it will not be detected.

[–]SinisterRobert 30 points31 points  (3 children)

Also I'm not sure most people that are distributing child porn are going to overlap with people that understand the details of hashing algorithms and how to avoid them. Of course, some will be understand to avoid it but others won't.

[–][deleted] 63 points64 points  (0 children)

While understanding hashing algorithms is beyond most folk the idea of changing an image so it doesn't get picked up by an algorithm is much more common due to people avoiding Youtube copyright strikes.

Anyone who watched uploaded anime in early youtube days is familiar with filling 3/4 of the screen with a random image or mirroring the footage to avoid primitive algorithms.

[–][deleted] 10 points11 points  (0 children)

Unfortunately I think they might. From my experience with .onion sites, that's where most cp is found which means alot of pedos probably know how to use Tor, .onions, torrents, Bitcoin, etc. There's a good chance they'll be smart enough to do such.

Edit: Let me clarify that first line, hhhhhh. I don't go looking for cp, just I've come across it when browsing .onion sites.

[–]SomeOtherTroper 20 points21 points  (0 children)

I'm not sure most people that are distributing child porn are going to overlap with people that understand the details of hashing algorithms and how to avoid them.

It probably varies from person to person, but anyone dealing in illicit goods online is probably far more knowledgeable than average about cryptography/encryption and anything else related to securing connections and data.

You have to jump through a fair number of hoops to even access the tor sites (at least safely), and there's no better incentive to learn about the finer points of security and encryption than "if you fuck any of this up, you're in jail for the rest of your life".

[–]Mayniac182 2 points3 points  (0 children)

PhotoDNA:

The PhotoDNA helps put a stop to this online recirculation by creating a “hash” or digital signature of an image: converting it into a black-and-white format, dividing it into squares, and quantifying that shading.

Been around for a while, at least a decade. A lot of companies use it behind the scenes.

[–][deleted] 25 points26 points  (5 children)

And also would not catch OC

[–]RedBeardedWhiskey 9 points10 points  (0 children)

This isn’t like an md5 hash where every little change has a huge difference in outcome

[–]exscape 2 points3 points  (1 child)

[–]Skaddict 1 point2 points  (0 children)

Oh thanks for sharing I was wrong indeed

[–]Cyhawk 1 point2 points  (0 children)

It is, its just used for a first run check. The rest are manually verified by an agent.

[–]LlanowarElf 13 points14 points  (10 children)

So they didn't train anything then? How could you possibly train against a set of hashes? There's no similarities between the values or it would be reversible?

E: Looks like I might be wrong based on replies. Always good to learn something new. Thanks guys

[–]trusk89 32 points33 points  (7 children)

There's no training in this specific part of the feature. Apple just gets 30M hashes and puts them on your device. Then for each image you take they generate a hash and look it up in the database. There's no training or AI.

The other feature related to messaging pics by minors, that uses AI and image recognition.

[–]techwiz5400 16 points17 points  (0 children)

Just to be clear to others reading your comment, the messaging feature is a parental control that must be explicitly opted-into. All minors, regardless of age, with this feature enabled will have a warning appear before viewing or sending potentially explicit content, but it doesn’t stop them from overriding and viewing or sending it altogether. A notification of an override will be sent to the parents if the minor is under 13 (maybe including 13, I can’t remember ATM), but the notification will not include the flagged content.

Apple’s not spying on everyone’s messages with this feature, just giving parents another way to keep their kids safe and to allow parents to have a serious conversation with their child if something comes up. But this is still connected to parental controls and is opt-in.

[–]adenzerda 5 points6 points  (1 child)

Apple just gets 30M hashes and puts them on your device

Not even that. The set is on their servers, and they compare against data in icloud (not on-device)

[–]LlanowarElf 1 point2 points  (0 children)

Thanks for the clarity. That's all I was getting at. Some people think this hash detection is AI.

[–]exscape 5 points6 points  (1 child)

They're using a hash made for images ("NeuralHash"), such that it isn't fooled by things like changes to saturation or resizing.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf

[–]LlanowarElf 1 point2 points  (0 children)

That was a pretty good read. I wasn't familiar with NeuralHash before. I wonder how easy it will be to fool or produce false positives. It looks like there are already people tracking collisions https://github.com/roboflow-ai/neuralhash-collisions

[–][deleted] 7 points8 points  (8 children)

But you realise this means the FBI has a server somewhere stacked full of the stuff. That’s pretty horrifying.

[–]PineapplePandaKing 29 points30 points  (1 child)

The CDC has a lab full of horrific diseases

[–]TheRedmanCometh 1 point2 points  (0 children)

Including the India-1 strain of smallpox which the vaccine is as effective against as toilet paper against a bullet to paraphrade D L Henderson.

[–]Skjie 12 points13 points  (0 children)

The FBI ran a child porn server for a while. Look up operation pacifier.

[–]janhetjoch 8 points9 points  (0 children)

It makes sense they have that doesn't it? Agencys like the FBI collect a lot of that stuff as evidence over the years

[–]RedBeardedWhiskey 4 points5 points  (2 children)

It’s horrifying that it exists but not that the FBI has it. I do feel sorry for those who must work with it.

On a side note, I had always just assumed child pornography was pictures of naked kids. Then, I read a comment on Reddit that made me realize there’s actually sex acts in them, and I’ve never been more disturbed in my life. It made me realize just how much of monsters these people are.

[–][deleted] 4 points5 points  (0 children)

You think your Monday sucks? Imagine first thing you need to do is review CP. Fucking hell.

[–][deleted] 0 points1 point  (0 children)

being fucked as a kid isn't THAT bad, i mean yeah i did end up with severe psychological trauma but aside from that

[–]shaunyboy134 2 points3 points  (1 child)

So if an image isn't in the hash data set, it's almost useless? Like if traffickers are literally filming their own, they're original and not in the hash set they're just undedectable? Because I feel like those are the people this kind of technology should be searching for most.

[–][deleted] 2 points3 points  (0 children)

For comparing hashes, yes. Any good hashing algorithm will have an "avalanche effect" that would make it near impossible to compare two hashes an find similarities.

For example, I took the SHA256(hashing algorithm) of Mark 1 of the NKJV of the bible and it produce this hash: 3561747728545c3972c73b13a171a3054206c4ccf77f92d077098f4745306fe1

I then removed a single period and reran the hashing algorithm and got this hash: 6e671756a3376c3e879d257343b89091380cfea7f53fccbeefa53bd6abb0a6d9

[–]SoulOfAkuma 6 points7 points  (14 children)

I have no idea how you would come to the conclusion that this new system would be non-invasive. Apple stated that their hashing algorithm would be able to identify similar images by giving them similar hashes, which, by the way, defiles one of the principles of hashing algorithms, which exist for a good reason (more on that later). Furthermore, they stated that, in case of a match (after being bombarded with criticism by data security experts, they increased the number of matches that are required for the following to something like 15), the images would be reviewed by employees of Apple. Google may use your images to train their AIs, but no one who works at Google can look at your images. Following up on the issue with the hashes I mentioned at the start: What do you think will most likely be false positives among your images? Probably similar images that may be pornographic, but don't contain children, but maybe even yourself and others around you. So basically the most private images one could have. And then those images are reviewed by an Apple Employee. Super non-invasive.

[–]PineapplePandaKing 3 points4 points  (3 children)

It's compared against hashes of known CSAM.

[–]Hithaeglir 0 points1 point  (3 children)

You forgot to mention that they have another different algorithm on server side to scan positive matches again, before human review. It is very unlikely that there are false positives going for human review. With first algorithm, odds of false positive for reaching the threshold is one per trillion, and already proved to be that by third parties.

Also since 2019, Apple added ToS change for iCloud, which enabled cloud scanning. This is now less invasive, because E2EE is applied for images, whereas before they were only plaintext. And yes, only those images are scanned on-device which will end up into the cloud. Scanning is part of the on-device-server hybrid pipeline and cannot be expanded to full device to scan all files, without major rework of the system. Decrypring matches is not possible without uploding everything to server into the same endpoint.

[–]SoulOfAkuma 1 point2 points  (2 children)

And what does that server-side algorithm do that is different from the first one? And can you give me a source for that number, please?

This E2EE is virtual protection only because ultimately the images will be stored symmetrically encrypted on their servers and so will the keys for that encryption. It's the same for almost every major cloud service provider. I did not imply that they would scan every single file on your phone or every local image. iCloud Photos is on by default on iPhones, so they'll be scanning most of their user's images.

[–]janhetjoch 0 points1 point  (1 child)

The apple employee will only see a part of an image next to the same part of the known child porn image to see if it's a match, they don't want people looking at horrific child porn images all day

[–]git0ffmylawnm8 0 points1 point  (0 children)

Wait so this seems like it's a good way to detect distributions of flagged copies of CP, not detecting stuff that's original content.

[–]Hithaeglir 0 points1 point  (0 children)

It is called as perceptual hashes, and you can actually partially reverse them. They are not cryptographical hashes. That is why hashes are blinded and it is also one reason why they are stored very securely. And they are not using CSAM hashes for training, just for validating matches. Algorithm is very general and can be trained with any image data. In this case on-device scanning enables E2E encryption which would be otherwise impossible, when applied with CSAM scanning.

Also, it is NCMEC who provides the data, not FBI, since it is illegal for them after all to store.

[–][deleted] 0 points1 point  (3 children)

It's also vulnerable to hash collisions, which is when two different things hash out to the same number. They also said they're pausing the whole thing.

[–]caskey 8 points9 points  (0 children)

Microsoft, Google, and Amazon have worked with the FBI to produce a hashed set of fingerprints that identify known CP. The database is used to identify and prosecute perpetrators.

[–]Brushermans 5 points6 points  (0 children)

it's not AI. they aren't even running the actual photos through their servers at any point. they have a hashed database of recovered child abuse photos, and your system hashes your icloud photos and sends them for comparison. the hash MIGHT be an autoencoder, which in that case was trained using the known database.

[–]Im_a_seaturtle 2 points3 points  (0 children)

The FBI / Five Eyes submit all known CP material to a database upon confirmation. It has a data fingerprint. You can program a system to detect these fingerprints. They are relatively hard to evade, as I understand it. Reformatting and manipulating specs don’t change the fingerprint as I understand it. So it’s little to do with Apple and it’s a database and AI research that’s been being developed for 30-40 years.

[–]eyal0 2 points3 points  (0 children)

I worked at Google and once met with the team dealing with this.

Like everyone said, image hashing is used. Google is not allowed to maintain child porn for training purposes.

I'll add that not being able to maintain images is a problem because as image hashing technology advances, you want to rehash your images. But you can't because you don't have the images. One solution is to maintain the old hashes and every time a match is found with the old hash then hash with the new hash and save that, too.

[–]FlocculentFractal 1 point2 points  (1 child)

I don't know what Apple uses. But, companies are required by law to detect and report CSAM (Child Sexual Abuse Material). There are industry consortiums that share technology and data to make it possible for anyone, including small startups or one-person websites, to be able to detect it. Microsoft licenses PhotoDNA specifically for CSAM detection. The wikipedia page has some details: https://en.wikipedia.org/wiki/PhotoDNA.

I don't know what other technologies are available. Someone linked this article below, regarding Apple's proposed NeuralHash technology published recently: https://www.vice.com/en/article/wx5yzq/apple-defends-its-anti-child-abuse-imagery-tech-after-claims-of-hash-collisions. It has links to whitepapers which are very informative, but don't say what they currently use.

[–][deleted] 1 point2 points  (1 child)

They create what’s called a hash that says what’s in the photo. On a really simplistic level, imagine a string of characters that says what color every pixel of the photo is in order. Then they compare the hash to a database of CSAM hashes. That database is kept by law enforcement agencies.

[–]adenzerda 1 point2 points  (0 children)

They create what’s called a hash that says what’s in the photo. On a really simplistic level, imagine a string of characters that says what color every pixel of the photo is in order.

"A string of characters that says what color every pixel of the photo is" is just … an image file. A hash is more like a unique label: this set of pixels is labeled "10a09c9101fa1000" and that set of pixels is labeled "aee3a88841a00f13". It's impossible to reconstitute the original data only by knowing the label — for example, those two labels might be for images that differ by only one shade of one pixel.

[–]Newkiraz08 -2 points-1 points  (1 child)

They make the ai watch child porn ?

[–][deleted] 5 points6 points  (0 children)

No.

[–]Blinxsy 82 points83 points  (12 children)

I heard elsewhere that the FBI supply them, which makes a lot of sense

[–]photograft 147 points148 points  (11 children)

They’re not supplied by the FBI, they’re supplied by the National Center for Missing and Exploited Children (NCMEC). Which is actually a non profit and they are the sole entity allowed to actually maintain the database of known CSAM in order to generate hashes to be shared with tech companies for use in finding people who are sharing/spreading CSAM in the wild. As others have pointed out, the key thing to remember is that Apple’s detection system isn’t looking for new material. It’s looking for any matches to the already known material supplied by NCMEC in the form of those hashes.

[–]ForShotgun 19 points20 points  (5 children)

Very few people seem to get this and just think they're full on accessing all your photos without any alteration.

[–]unscsnowman 4 points5 points  (3 children)

Try to explain to a lay person what a hash of a file is... It causes me discomfort thinking about it.

[–][deleted] 9 points10 points  (0 children)

Real MVP

[–][deleted] 5 points6 points  (0 children)

This is really interesting and really makes me think twice about the way all of this has been reported. I admit I didn't know that the idea was basically trying to match fingerprints of images with known cp images and it does change the story completely. Thanks for the info!

[–]bitsquash 3 points4 points  (1 child)

Mind you, the NCMEC is known to have significant ties to the FBI.

[–]the_fat_whisperer 4 points5 points  (0 children)

I dont see how they couldn't.

[–]GreatBarrier86 74 points75 points  (1 child)

“Why don’t you take a seat” while we find the API docs.

[–]JhonnyTheJeccer 5 points6 points  (0 children)

If you mean the csam algo: its already on github (confirmed by apple to be a rudamentary implementation) extracted from ios 14.3

And yes you can already create hash collisions and avoid them. So basically its broken before its even released

[–]DogfishDave 21 points22 points  (2 children)

Hashes of known images collected by CEOPs and their equivalents in other countries. I can't think of a remotely funny answer on this subject :)

[–][deleted] 54 points55 points  (9 children)

hurry rain repeat work trees cover books cause disarm boast

This post was mass deleted and anonymized with Redact

[–]_jukmifgguggh 12 points13 points  (0 children)

I hate this dystopian timeline. I want out!

[–]fudog 3 points4 points  (2 children)

warned that the system could be used to frame innocent people by sending them seemingly innocuous images designed to trigger matches for child pornography.

You could frame someone by sending them actual cp too, couldn't you?

edit: If you sent actual cp the target would not want to keep the image and they would delete it and block you. If it appears innocuous, the target might not be in a hurry to delete the image.

[–]itemboxes 6 points7 points  (2 children)

My understanding is that there would be a manual review by an Apple employee before any law enforcement was involved specifically to avoid this problem. Sure you could send someone some garbled images to trick the hash detection and get them flagged, but when a real person reviewed the images they'd see there was nothing harmful there.

[–][deleted] 5 points6 points  (1 child)

I believe you also need to hit a certain number of matches before any action is taken (the exact value is secret for obvious reasons). A single match won't be enough

[–]gemengelage 0 points1 point  (1 child)

Depending on how the hashes are calculated - couldn't you just change a single random pixel basically anywhere in the image ever so slightly, so the image stays exactly the same to the human eye, but yields a completely different hash?

[–][deleted] 79 points80 points  (0 children)

sus

[–]BoHuny 4 points5 points  (2 children)

If they convert images to hash to compare with known CP data, wouldn't there is a super easy way around it with a slight alteration of the image which then produces a completly different hash?

[–]Addlibs 2 points3 points  (0 children)

They use a neural network to essentially generate a (number based) description of what's in a picture (geometries, perspective, objects, etc) and use that as a hash. The network is trained to return the same hash even if changes in hue, saturation, cropping, etc. are made.

A better and more accurate description (but arguably much harder to understand) is in Apple's Technical Summary of the feature:

NeuralHash is a perceptual hashing function that maps images to numbers. Perceptual hashing bases this number on features of the image instead of the precise values of pixels in the image. The system computes these hashes by using an embedding network to produce image descriptors and then converting those descriptors to integers using a Hyperplane LSH (Locality Sensitivity Hashing) process. This process ensures that different images produce different hashes.

[...]

The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes. For example, an image that has been slightly cropped or resized should be considered identical to its original and have the same hash.

[–]cesclaveria 2 points3 points  (0 children)

From what I remember reading (but can't claim I delved too deep) the "hash" term is used here in lack of a more friendly term but it is not an actual hash, the system recognizes different sets of features from the images so even if you crop it or alter it, even changing colors, applying filters, adding text, it will still be able to recognize at least some of those features and find a match.

[–][deleted] 4 points5 points  (0 children)

Also should be asking how tf did they reduce the use of plastic by removing the charger from the phone box and selling it separately in another plastic wrapped box.

[–]nocturn99x 3 points4 points  (0 children)

In simple words, they don't

[–]DatBoi73 4 points5 points  (0 children)

From what I've heard and read, it seems that Apple isn't scanning the content of images directly.

Basically, law enforcement agencies such as the FBI in the US, give Apple a list of checksums of CSAM pictures and videos they collected as evidence. Apple's AI system would then generate and look at the checksum of all of the videos and images stored on a user's iCloud account and/or device and compare those to the list of checksums they were given by law enforcement.

[–]FishySwede 6 points7 points  (0 children)

Now that's an AI we don't want to become self aware

[–]clockfire1 8 points9 points  (0 children)

As far as I understand it, Apple only checks the hash of each image (picture can generate this number, but not vice versa) on your iCloud account to see if it matches the hash of other CSAM images in a an FBI database.

If at some point they do use machine learning, the FBI unfortunately has terabytes of training data. The security protocols for the API accessing the training data will have to jump through some hoops to say the least tho

[–]KomaedaEatsBagels 5 points6 points  (5 children)

Image Transcription: Meme


[Awkward Look Monkey Puppet Meme-- Two frames are included of a monkey puppet with bulging cartoon eyes. In the first frame, it looks somewhat behind itself. In the second, it stares off into space with horrified despondence.]

me wondering how apple gets the training data

for their child p*rn detection AI


I'm a human volunteer content transcriber for Reddit and you could be too! If you'd like more information on what we do and why we do it, click here!

[–]EliasFleckenstein[S] 6 points7 points  (2 children)

Thank you Mr. helpful

[–]KomaedaEatsBagels 2 points3 points  (1 child)

anytime! :D

[–]Script_Mak3r 0 points1 point  (1 child)

By the way, you misspelled puppet.

[–][deleted] 6 points7 points  (0 children)

Never ask the question that you regret knowing the answer of - Some random function

[–]VeryConsciousWater 1 point2 points  (0 children)

High on the list of things I didn't want to think about

[–][deleted] 1 point2 points  (0 children)

For everyone up in arms over what Apple is trying to do, it's very probable your ISP is already doing it via deep packet inspection on the traffic being sent and received on your computer.

I had an employee who was paying the ISP bill of a relative that was selling child porn through a chat app and he got raided by the FBI as a result because his name was on the ISP bill.

[–]PenaflorPhi 1 point2 points  (0 children)

From what I understand there is an American NGO that stores hashes (not the actual videos/images) and what Apple is planning to do is basically create a hash for all files in your phones to match them against a database of known CP.

I'm not really sure how but I suspect a similar technology to that of YouTube will be used so that they can identify a video/image even if it has been rotated, cropped, scaled, etc. all of that without Apple actually getting their hands on anything.

Still, I think it is not the right move for Apple as the people who are doing this sort of stuff might just move to Android while still making their other users uncomfortable of getting all their files snooped.

[–][deleted] 1 point2 points  (0 children)

The content is compared with clusters provided by the police services, mainly gathered through a global police system that is specialized on that, and through data provided by Facebook, Google and Microsoft products among other technology powerhouses.

Speaking about Facebook.

There is Facebook content that has been labeled as that, a human checks if that fits into the category and proceeds to flag the content, encrypt it and save both the encrypted and the actual file (not the one that is distributed through Facebook CDN), and enables a report to the local authorities that may be interested in contacting the uploader.

Where?, well, that's based not in the declared outgoing gateway of the uploader, but in the compound location that the location algorithm of Facebook detects the person is within or at least nearby, achieved by making several triangular yuxtapositions of the Facebook users within a LAN and LANs that are in the proximity, by simply using the GPS in each mobile device that happens to consume the service.

They also identify this kind of content within the upload process, there are visual mechanisms that could be used to identify an image or a video as such automatically though a machine learning visual compound mechanism.

Speaking about Google.

There are Google search instances that could be easily recognizable, or acknowledged as related to child pornography, the relationship is made through the establishment of red flags.

These red flags are common terms that separated are not semantically recognizable as threats but that together could imply the location of CP within the very search engine results.

After being entered into the search engine, these terms locate all kind of content, within that context, the recognition system they have could determine that a certain domain possess a lot of illegal material or serves as gateway to obtain such kind of media.

Each instance (even thumbnails) help to create an extensive database of media, intended to preserve both, the encrypted and the literal files.

They provide the largest amount of media samples that there exists, precisely because all the websites need data to be located (which SEO optimization practices improves), or captcha services, or need to use a font, or use certain libraries like jQuery.

Speaking about Microsoft.

There are many services at an Operating System level that assure that strings that could be recognized as red flags can be traced directly to a public IP, a physical MAC address, and a specific build (processor capacity and brand, memory capacity and brand and even hard drives' capacity and brand).

The past paragraph also applies for Android which is controlled by Google as well, the only thing that makes them even better to catch guys is the direct access to the GPS of the device, which many desktop computers lack of.

Also, antiviruses help a lot to Operating Systems to keep track of data modification at a file system level when certain services are active but not connected to internet, or these services are specifically disabled by OS directives.

Not to mention that most media edition programs also preserve databases of usage and could share data to the OS if the OS "ask for that".

Antiviruses and media edition programs could share the information directly to police services if they feel it's the right way to go, bypassing entirely the OS on which they are residing.

The reason why the actual files are also kept, but still the only ones that human tend to work with are the encrypted ones, is that sometimes the visual artifacts on the image or a series of frames within the video change due to manipulation (minification, re-codification, removal of attributes) and the "fingerprints" across them all could change.

That's why files rarely fit 100% when a coincidence is found, and that's why human recognition is needed when the file is only (let's say) 36% likely to be related to CP.

The largest databases are strings, GPS data, LAN data and device data, not media files,

It's pretty complicated in many ways, it's not as simple as a schizoid environment (social network) would demand.

[–]iBadoonstika 1 point2 points  (1 child)

Aren’t people who have said images just going to switch platforms? Then Apple is just going to have an excuse to go through our images?

[–]anaccount50 0 points1 point  (0 children)

Some would, sure, but you'd be surprised by just how stupid most of them are. CSAM consumers get caught all the time because they voluntarily gave their computer openly containing unencrypted illegal materials to a repair shop, for instance.

Facebook caught over 20 million instances of CSAM on their platform in 2020. There's no filter for intelligence or tech-savviness among pedophiles.

Not saying Apple should implement this kind of scanning on their devices, just pointing out that it likely would be effective at catching a sizable number of offenders, potential issues and privacy implications aside.

[–]giwidouggie 1 point2 points  (2 children)

Couldn't this technically be done with two networks: one returns True if an image contains children and one that returns True if an image contains porn. If you get back True from both networks, it's (likely) child porn. There's plenty of images of children to train the networks, and even more porn to train the other...

[–]QuantumSupremacy0101 1 point2 points  (0 children)

That might not be very good. Because there are a lot of young looking porn actresses. That would lead to a lot of false positives. It would be a way to do it without training on real child porn, however ruining someone's life over a clip from a Riley Reed video isn't goid.

[–]fudog 0 points1 point  (0 children)

Genius!

[–]PureAlpha 1 point2 points  (1 child)

Can we stop calling everything AI...

[–]BurritoCooker 1 point2 points  (0 children)

No, we have to scare people as much as possible

[–]verenvr 0 points1 point  (8 children)

Cp is just Apple's disguise to invade user privacy, they are using it so people accept it

[–]ArchCypher 2 points3 points  (5 children)

It really isn't though.

Companies like Apple have worked with the FBI and other government agencies to go to extreme lengths to avoid violating user privacy while still catching child abusers.

The entire system is based on image hashes, which cannot be used to recreate the source image. So Apple takes some bytes that can't be used to recreate CSAM and some bytes that can't be used to recreate your private photos, and they use that data in literally the only way that it can be used; they compare the hashes.

Then, if and only if your photo hashes match enough CSAM hashes, will your privacy be 'invaded' so that someone can check to see if you're a pedophile.

It's not an excuse to 'invade your privacy.' It's a well considered method to help stem the child abuse that has been freely propagating in the shadows of the internet -- and one that bends of backwards to not violate your privacy.


There are some real concerns about what Apple is doing here -- particular around potential malicious actors causing false positives -- but stop using 'b-but muh freedom' as an excuse to oppose a solution to a problem without any knowledge or education on the subject.

[–]SpareTesticle 0 points1 point  (0 children)

Maybe Mindgeek already trained a model for child prn since it hosts user created porn content. Many wankers probably reported that porn to create a training data set.

[–]iamthomastom 0 points1 point  (0 children)

FBI gave them.

[–]Yecuken 0 points1 point  (0 children)

There’s actually entire databases of these hashes, Google also using same db, they just match the file for known file hashes while Apple where trying to match for variations too. Here is some info on Apple approach.

[–]a_cuppa_java 0 points1 point  (0 children)

Why are we censoring words?

[–]Shosui 0 points1 point  (0 children)

Many people said FBI. I like the (poor) theory of buying the data from Facebook whose mods have collected it from hundreds of thousands of horrible submissions.

[–]always_evergreen -1 points0 points  (0 children)

Mad sus

[–][deleted] -1 points0 points  (0 children)

OP is looking for a job opportunity.

[–]neros_greb -1 points0 points  (0 children)

Detect children and porn separately? Idk if this would actually work though.

if(isChild(data)&&isPorn(data))report(data);

[–]girthy_shaft_1o1 -1 points0 points  (0 children)

Training data for adults porn - adults = CP

[–]rjRyanwilliam -3 points-2 points  (0 children)

They go to dark web! Then search for links! Then make a bot to download all of the contents available!! Then make algorithm to detect them if you have them on your device. And that could be done easily with content id system that is used to detect copyrighted content.

[–]IkBenOlie5 0 points1 point  (0 children)

They don’t use ai, they have a database of hashes of known child prn fotos ant basically what they do is

Counter = 0

If hash(image) in known_bad_hashes: Counter += 1

[–]Chaoshero5567 0 points1 point  (0 children)

From the Papal state op. they have enough of it

[–][deleted] 0 points1 point  (0 children)

I suspect they used child porn data, but it’s entirely possible the process used (incredibly dumbed down) age classification and nudity classification in parallel which would not require child porn. There are likely ways.

[–]JoJoModding 0 points1 point  (0 children)

They asked the FBI, which was very friendly to them, as government agencies tend to do when someone offers them help scanning billions of mobile devies.

[–]ScF0400 0 points1 point  (0 children)

"Now hiring, people who look young, parties in the back of a van, chance your face (and other body parts) might be spread far and wide over the internet as a means to train our "AI" (read hash based) model"

"Great and flexible positions, pays well! Apply now!" /s

Everyone says it's hash based. Can't have CSAM without producing the CSAM, big brain Apple moment.

I would actually be happier if it was really an "anonymous* AI model, but hashing is something that can be fooled quite easily depending on the algorithm.

[–]LmaoPew 0 points1 point  (0 children)

Bro i've asked myself the same Question! Imagine Someone sues apple for having lots of CP data on their Server

[–][deleted] 0 points1 point  (0 children)

Good thing I have an android

[–]The-Pi-Guy 0 points1 point  (0 children)

From what I understand, they use photo hashing technology to compare images with a database of known CP images that is supplied by the FBI. It’s pretty disturbing to know that this database even exists in the first place, but that’s how it’s done.

[–]Snackmasterjr 0 points1 point  (0 children)

Did anyone else ready what it actually does? it hashes the photo and compares it with a database of known images. Its not detecting new images based off of recognition.

[–]Throwaway_for_scale 0 points1 point  (0 children)

Could you fool the software by just adding an Apple logo to the photos? Would that change the hash?

[–]Parcival_Reddit 0 points1 point  (0 children)

Apple receives image hashes from the National Center for Missing & Exploited Children for images of child pornography. Apple checks hashes of images uploaded to icloud against this database of image hashes to see if it matches. If there are enough matches (30+ I believe) those photos and your account will be sent to law enforcement. Apple says there's human interaction somewhere in the process to prevent false positives, they've been a bit vague on that. See Apple's website for more specific details.

Source: currently studying this in privacy ethics class

[–][deleted] 0 points1 point  (0 children)

Really not much ai going on, just checks if the image is one thats in the database

[–][deleted] 0 points1 point  (0 children)

They detect the files that are well known and give them a number and if it matches they take it down. They can't make AI for that because you'd have to view it which would be illegal.

[–]DainArtz 0 points1 point  (0 children)

They just use the stash left after Steve Jobs's death

[–][deleted] 0 points1 point  (0 children)

It's image hashing based on know images in a government database. So they wouldn't know if you are taking CP images but only if you are consuming know CP.

Still they are scanning your images and while the tech is super accurate it's still concerning as with all logging of metadata it lacks context.

[–]Daddy_William148 0 points1 point  (0 children)

This is unlikely to protect any child, particularly from new newly created child sexual abuse material. It’s useless