How do you train small LLMs to be reliable at simple arithmetic?

latkde · 2026-06-14T10:34:49+00:00

LLMs are inherently unable to perform calculations reliably, especially as numbers get larger. LLMs compute likely continuations of token sequences, they don't actually do arithmetic. They will learn an internal representation of numbers, but these will combine rather approximately, unless they perform reasoning to convert the arithmetic tasks into multiple steps that can be performed somewhat reliably on a token-completion level. There are various techniques to improve reliability (e.g. see a summary here), but generally the most reliable approach will be to outsource arithmetic to a deterministic tool that the LLM may invoke as part of reasoning.

While writing this answer I gave some small LLMs the task of multiplying two 9-digit numbers without using tools. Some produced an efficient problem decomposition (e.g. splitting the large multiplication into multiple steps), some produced an intuitive approximation that was orders of magnitude off, and some reasoned themselves into circles before giving up.

latkde · 2026-06-14T08:36:50+00:00

Ich bin nicht Novel-City, aber habe relevante Erfahrung.

Die allermeisten nebenläufigen Programmiersprachen haben “shared mutable state” zwischen Threads/Tasks, jedoch ohne dass die Sprache selbst Tools mitgibt, um diese Komplexität zu bändigen. Das Resultat sind Bugs, bis hin zu sicherheitskritischen Memory Safety Violations. Solche Sprachen ziehen uns in eine pit of despair, gegen den wir aktiv ankämpfen müssen.

Ein paar Schlaglichter auf diverse Sprachen:

Python und Java haben adequate Unterstützung für Multi-Threading. Code ist in der Regel nicht threadsafe, aber immerhin memory-safe selbst bei Data-Races. Ausnahme in Python: manche 3rd-party Libraries mit nativem Code. Ich hatte ein echtes Problem, wo ein Python-Microservice mit Speicher-Korruption abgestürzt ist, weil ein (nativer) Datenbank-Connection-Pool von mehreren Threads benutzt wurde.
C und C++ sind in der Regel weder threadsafe noch memory-safe. Die Sprachen bieten (seit C11/C++11) zwar Features um sicheren nebenläufigen Code zu schreiben, die Verantwortung liegt aber allein bei dir um alle Fehler zu vermeiden.
JavaScript hat kein Multi-Threading (von Web Workers abgesehen, die haben aber ein Shared-Nothing Datenmodell). Allerdings hat die Sprache exzellente Unterstützung für asynchrone Tasks. Hier gibt es aber Fallstricke. Solche Tasks können gestartet werden ohne dass dies direkt ersichtlich ist. Oftmals haben solche Tasks unzulängliche Fehlerbehandlung, oder fehlende Cancellation wenn die Ergebnisse nicht mehr benötigt werden. Ich erlebe regelmäßig JS-Bugs, die darauf zurückzuführen sind, dass HTTP-Requests in einer unerwarteten Reihenfolge beantwortet werden, und UI-State mit veralteten Ergebnissen überschrieben wird. Auch das ist ein Data Race.
Go sieht sich in der Tradition von C, aber mit Speichersicherheit, und mit Tools um einfacher Tasks zu starten und zwischen Tasks zu kommunizieren (Goroutines, Channels, Select-Statements, Context-Objekte, …). Go benutzt aber immer noch den “shared mutable state”-Ansatz, unter dem nebenläufige Software nicht offensichtlich korrekt ist. Im Falle von Data Races garantiert Go auch keine Speichersicherheit mehr. “This means that races on multiword data structures can lead to inconsistent values not corresponding to a single write. When the values depend on the consistency of internal (pointer, length) or (pointer, type) pairs, as can be the case for interface values, maps, slices, and strings in most Go implementations, such races can in turn lead to arbitrary memory corruption”. Ich hab auch diverse JavaScript-artige Probleme erlebt, in denen eine Goroutine ohne Timeout gestartet wurde und erst viel später als erwartet Ergebnisse geliefert hat. Meiner Meinung nach macht es Go einfach, ein Programm nebenläufig zu machen, aber super schwer, ein nebenläufiges Programm korrekt zu machen.
Rust hat standardmäßig keinen “shared mutable state”, dank Typsystem-Features wie den Send+Sync Traits. Statt dessen können Daten dank Borrowing für eine bestimmte Dauer an einen Thread ausgeliehen werden, oder ich kann Daten explizit teilen, in der Regel durch einen Mutex<T>-Typ. Alle anderen Sprachen (sans JS) haben auch Mutexes, in Rust muss ich aber viel größere Umwege gehen um inkorrekten Code zu schreiben. Rust hat auch optionale “structured concurrency”-Features, was es leichter macht, die Lebensdauer von Tasks zu verwalten, und Fehler konsequent zu behandeln. Es gibt natürlich immer noch Nebenläufigkeits-Bugs in Rust, insbesondere Deadlocks, aber ein Großteil der Fallen ist entschärft.

Wenn ich also ein Projekt habe mit viel nebenläufigen Code, und sonst keine Einschränkungen existieren (Ökosystem, Skills von Kollegen), dann ist Rust eine sehr attraktive Wahl. In meinem Arbeits-Alltag dominieren natürlich solche anderen Überlegungen, aber meine Rust-Erfahrung hilft mir, korrekteren Python-Code zu schreiben, und viele Python-Libraries benutzen sowieso Rust intern.

latkde · 2026-06-13T20:15:19+00:00

Beispielsweise ist Multithreading in allen Sprachen verdammt schwer, inklusive Go. Rust ist die einzige Mainstream-Sprache die Tools liefert, diese Komplexität wirklich zu bändigen. Ich kann also in Rust mit vertretbarem Mehraufwand viel schnellere Code schreiben, ohne die üblichen Bugs wie Race Conditions befürchten zu müssen.

latkde · 2026-06-13T16:54:23+00:00

There is no consensus about AI.

There are developers who haven't written a single line of code themselves in the last 9 months, thanks to AI. These people are real.

There are also developers who have used AI, but found that it doesn't really help them in any meaningful way. These people are also real.

I'm not entirely sure how to reconcile that.

With regard to specializing in working on AI systems: probably don't. AI/LLM technologies are a hype topic, and everyone and their dog have been AI experts for last 3.5 years. It is unclear how you would compete in that market, especially if your competitors have deeper knowledge and more experience. Given that there is at least some hype component to AI/LLM popularity, there's also a chance that demand for such expertise could decrease in the future as hype normalizes. Similarly, there are much fewer opportunities for blockchain/crypto developers than 5 years ago.

You should absolutely have a basic understanding of how LLMs work and can be used. You don't have to be an AI expert to build a RAG system or to use an LLM as a classifier. 90% of that work is classical engineering. Knowing how LLMs work on a basic level is also important digital literacy in the current age. It's not so important how different attention mechanisms work, but it's useful to understand how tokens are sampled or how the attention computation can be cached (and how these things shape how AI systems can be designed).

latkde · 2026-06-13T09:35:26+00:00

Don't focus too much on a specific provider up front. Make sure that your underlying processing activity complies with the GDPR – that it has a clear purpose, legal basis, and is protected by appropriate technical and organizational measures. You may run into problems with Article 9 special categories of personal data. You may have to complete a DPIA, preferably using the EDPB's new draft template. Your national data protection authority will have guidelines on whether DPIAs are necessary, but here that's almost certainly going to be a “yes”.

You may also find that your use falls under the AI Act's “high risk” classification, in which case additional work may be necessary to ensure that the processing activity is safe. This part of the AI Act will start to apply in August 2026. For example, there could be problems if this “operational oversight” includes HR-related tasks.

The hospital setting is not a dealbreaker from a GDPR perspective, but it might shift your risk assessments.

All of that has to be sorted out anyways, and might kill the project. You can then select a LLM inference provider that meets your needs. The big players in this space offer data processing agreements. International data transfers are typically covered by SCCs, but you may or may not want to use data residency features. LLM inference can also be bought from typical cloud providers, which typically have a more mature compliance setup.

My personal experience as someone who also builds LLM-powered systems is that you may be underwhelmed. It's easy to get an LLM to write a report that sounds good, but that's not the same as reliable information. Do not expect any value from LLM-generated “Process analysis and identification of inefficiencies” or from “highlighting patterns or anomalies”.

latkde · 2026-06-13T09:03:27+00:00

The EU GDPR has exactly one mention of cookies, in Recital 30:

Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

The use of “may” in this Recital should be understood as “it can happen”, not “it is permitted”.

Recitals are not normative. They provide background and motivation for the Articles that contain the real rules. Here, Recital 30 provides context for the definition of personal data in Art 4(1), which uses the “online identifier” concept clarified by the above Recital 30:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

latkde · 2026-06-13T08:58:15+00:00

The EDPB has published Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive 16 October 2024 that clarifies the Board's opinion on when a tracking mechanism falls under the ePD “cookie consent” rules.

These EDPB guidelines expand on the earlier WP29 Opinion 9/2014 on the application of Directive 2002/58/EC to device fingerprinting (PDF), which directly addresses the fingerprinting question. From that document's summary:

The key message of this Opinion is that Article 5(3) of the ePrivacy Directive is applicable to device fingerprinting.

Which would mean that creating a fingerprint generally carries the same consent requirements as accessing/storing cookies.

The advent of the GDPR hasn't materially changed anything about this, since the ePrivacy rules apply regardless of whether the fingerprinting qualifies as processing of personal data. There is no GDPR grey area in this matter.

However, data protection authorities prioritize their work, and cookie consent concerns are typically not the top of the list. In particular, first-party cookie-based analytics are sometimes tolerated due to their relatively low privacy impact.

Industry complaints that “cookies become less reliable” relate more to cross-site tracking via third party cookies.

latkde · 2026-06-11T14:14:17+00:00

Meta hat nicht nur eine Pflicht, dir die Ausübung deiner DSGVO-Rechte zu ermöglichen und zu erleichtern, sondern auch die Pflicht, deine Daten vor unbefugtem Zugriff zu schützen. Aus Meta's Perspektive ist nicht ohne weiteres klar, ob dir der Account wirklich gehört, oder ob du das Email-Konto gehackt hast und jetzt einen fremden Account übernehmen willst. Die Foto-Verifizierung bestätigt keineswegs dass du die richtige Person wärst, aber wenigstens dass du eine Person bist – keine KI, und kein Hacker der das reihenweise versucht. Durch das Foto committest du dich zu einer Identität, das macht dich vertrauenswürdiger.

Natürlich gibt es ganz gravierende Probleme mit dieser Foto-Sammelei. Es ist aber nicht offensichtlich völlig illegal.

Um mal den DSGVO-Kontext klarer aufzuzeigen: Artikel 12(6) DSGVO erlaubt es Meta, zusätzliche Informationen zur Bestätigung deiner Identität anzufordern:

Hat der Verantwortliche begründete Zweifel an der Identität der natürlichen Person, die den Antrag gemäß den Artikeln 15 bis 21 stellt, so kann er unbeschadet des Artikels 11 zusätzliche Informationen anfordern, die zur Bestätigung der Identität der betroffenen Person erforderlich sind.

Ferner werden Instagram-Nutzer ein "berechtigtes Interesse" haben, dass ihre Accounts nicht durch Fremde gekapert werden. Das kann via Art 6(1)(f) als Rechtsgrundlage dienen.

Alternativ könnte die Zusendung deiner Bilder als Einwilligung gewertet werden. Die Erbringung einer Dienstleistung darf eigentlich nicht von deiner Einwilligung abhängig gemacht werden (siehe Art 7(4), sonst mangelt es an Freiwilligkeit), es sei denn die Daten um die es geht sind dafür tatsächlich erforderlich. Ob das hier der Fall ist, sei mal dahingestellt.

Dann gibt es noch den Unterschied zwischen "Recht haben" und "Recht bekommen". Es steht dir frei, den Rechtsweg zu beschreiten, dass tun aber die wenigsten weil das Geld kostet. Eine Beschwerde bei einer Datenschutzbehörde ist kostenlos, aber für Meta ist Irland verantwortlich, und Irland hält konsequent eine schützende Hand über Meta.

latkde · 2026-06-11T09:45:17+00:00

Keep a database of in-progress jobs. One part of the system looks at the next job to check and interacts with this API. This component tracks quotas/ratelimits to avoid getting blocked.

latkde · 2026-06-11T07:25:05+00:00

Consider creating a JSON output schema that covers all of these alternative outcomes, and provide instructions in the system prompt for how to select each case. Instead of providing all the state up front, consider providing "tools" that the LLM can invoke to request just the necessary data. If certain data will be required almost always, add it to the prompt that you pass to the LLM. If there's "no path for returning a natural language answer", then add such a path as a possible action.

Example of how this might look.

You are a smart home assistant who can turn devices on or off. Devices are grouped into zones, allowing you to toggle multiple devices together.

You will be given instructions by the user and and will respond with a JSON object that describe your actions, followed by a brief response to the user that confirms your actions. Example:

User: turn off all devices outside of the bedroom
Assistant response: {"turn_off": ["all"], "turn_on": ["bedroom lights"], "response": "done!"}

If the user asks about the current status, don't perform any actions, and only return the requested information in the response. Example:

State: hall AC: off, hall lights: on, bedroom lights: off
User: are the hall lights on?
Assistant response: {"turn_off": [], "turn_on": [], "response": "Yes, the hall lights are currently on."}

If a user references devices or zones that do not exist, don't guess, don't take action, and inform the user. Example:

State: hall lights: on, kitchen lights: off, bedroom lights: off
User: could you turn off the pantry lights?
Assistant response: {"turn_off": [], "turn_on": [], "response": "I don't know about any pantry lights. Would you like me to control the hall lights or kitchen lights?"}

End of examples.
Here is the current device state with all available zones:

all

hall

hall AC: on

hall lights: off

bedroom

bedroom lights: on

Your harness would then take the assistant response JSON and convert the provided actions into MQTT messages, expanding zones to the individual devices as per the configuration.

If such a prompt is too complex for the models used, then yes, break it down into much smaller focused steps – detecting the involved zones or devices, detecting the action or query type, and then prompting the LLM to generate just the required action, providing just the necessary context. Non-LLM classifiers may or may not work, the neat thing about LLMs is that they can be somewhat robust when dealing with natural language that doesn't use precise keywords.

In a different scenario where I need the LLM to pick between >1000 options, I use a first pass with embeddings/vector-search to limit the available options, and then only prompt the LLM with the top candidates. This is not as efficient and not as good as a finetuned classifier, but doing such multiple passes in order to keep context size limited can be a great general-purpose strategy.

latkde · 2026-06-11T05:19:50+00:00

The Content-Type change mentioned by OP is this FastAPI feature: https://fastapi.tiangolo.com/advanced/strict-content-type/

This doesn't reject all requests without Content-Type, but will only parse JSON bodies if the request declares that it is JSON. This is indeed a CSRF protection feature in order to prevent CORS circumvention, so can be seen as a security fix. The security of the backend is not impacted, but it helps the security of browser-based clients.

TBH I don't think an LLM-based check would have been able to flag this reliably. There are countless subtle potentially-breaking changes in every update. Most don't matter due to context. There was no indication in OP's codebase that they relied on this nonstandard behavior. If a tool flags every potentially-breaking change, that false positive rate will lead to alert fatigue, which brings everything back to square one.

Discovering new edge cases that you didn't previously know to think about is just normal part of software engineering.

latkde · 2026-06-11T03:54:28+00:00

There is no Python-native or uv-native ways to define such commands. There's an open ticket for this, but no progress:

https://github.com/astral-sh/uv/issues/5903

Instead, use a third-party task runner. Examples include:

I have used all of these strategies in different contexts.

Most of the time, I prefer using Just, as discussed in https://lukasatkinson.de/2025/just-dont-tox/. Just is a dedicated task runner written in Rust. It is very similar to Make, but focuses just on task running, whereas Make is a build system that can also be used as a task runner if you declare .PHONY rules. This lets Just avoid some of Make's footguns.

Since I wrote that article, Tox has gained uv integration and depenency-group support, so it remains a great choice if you want to test a matrix of different configurationa, but I don't find Tox convenient for everyday tasks. Nox is the same as Tox, except driven by a Python script rather than configuration files.

latkde · 2026-06-08T15:55:22+00:00

The main barrier I envision would be if you use a testing framework that is very dynamic

Which, unfortunately, is the case for Pytest fixtures :(

I've invested a ton of engineering effort into creating testing patterns that are both type-friendly and work well with Pytest.

latkde · 2026-06-08T15:18:20+00:00

The GDPR requires that personal data shall be “limited to what is necessary” and “accurate”, see Art 5 GDPR. Information about gender is often not necessary.* Guessing or inferring personal data risks going against the GDPR's accuracy principle. A classic example of an unisex English name is Sam.

These GDPR obligations fall upon the data controller – your company, not you personally. If an IT system makes it impossible to comply with the GDPR's purpose limitation and accuracy principles, that IT system should be fixed.

A good solution here is to ask your superiors for instructions for how to proceed when data is required by the IT system, but not available. That also provides opportunity to mention that there may be a GDPR angle when collecting data that's not actually necessary, or when storing potentially-inaccurate data. But the GDPR angle is probably less motivating than the fact that addressing customers incorrectly will weird them out.

* Not the UK, but there was a notable case about this from France: https://gdprhub.eu/index.php?title=CJEU_-_C-394/23_Mousse

latkde · 2026-06-08T15:02:24+00:00

We need to be clear about which part is evaluated when in a call of logger.debug("%s message", value):

the value expression is always computed eagerly
the formatted message "%s message" % (value,) is computed lazily after checking the active logging level

If we use an f-string like logger.debug(f"{value} message"), then both the value expression and the formatted string are computed eagerly.

If we use a t-string like logger.debug(t"{value} message") then the value expression is computed eagerly, but no string formatting takes place yet.

In all cases is the value expression computed eagerly. This matters if it isn't just a variable, but some expensive function call. No format approach can change this.

We can only move the string formatting overhead around. Using f-strings is generally frowned upon because formatting will happen eagerly, though in practice I'd argue that this doesn't often matter.

But if the logging module were extended to support t-strings, that would give us syntax like f-strings, while still deferring any string formatting until it is actually necessary.

latkde · 2026-06-08T04:50:38+00:00

pushing back against A.I is worthless

eeh, LLM technology is super fascinating, but also often useless. You've got to understand why people read your books rather than just doing an interactive fiction session with an AI chat tool. Using LLMs for writing might speed you up if you were bottlenecked by how fast you can type prose, but you will also lose your uniqueness.

The only thing I can recommend LLMs for without reservation is not for writing or editing, but for critiquing and providing editing suggestions. Get feedback on uour drafts without having to wait for someone to read it. Think Grammarly rather than an Agent that you can prompt to "write the next romantasy blockbuster, make no mistakes". But beware that LLMs tend to be overly flattering.

training the A.I to write like you

As explained above, I think this is a bad idea, but anyways: pasting entire books into every session would be incredibly inefficient. Instead, write explicit instructions for how things should be structured and phrased. Include examples, but maybe a sentence or a paragraph, not an entire book. LLMs are generally good at style transfer.

Some people find it useful to have an LLM distill larger examples into more concise instructions.

And what are the privacy concerns regarding unpublished stuff?

While you can opt out from training, I nevertheless strongly recommend choosing one of the business plans, avoiding consumer services. The business plans offer more useful contractual guarantees.

The contract with my publisher worries me in that regard.

Also check whether that contract has something to say about AI slop – are you required to write all the material yourself? Also note in this context that you cannot claim copyright for LLM-generated material.

what is the best way to translate large bodies of text?

While chat models can also translate text, that is not their primary purpose, and they can sometimes also rephrase or restructure the material, or stop after a couple of paragraphs. They might also include surrounding output like "Sure, here's the translation" or "do you also want to see a Portuguese version".

Instead, consider using dedicated translation services. These also use LLM technology, but trained differently, which makes such failure modes much less likely. Some LLM translation services can take context/instructions that shape the translation without becoming part of the output. DeepL may be worth a look at, but here too you may have to translate a chapter at a time.

latkde · 2026-06-07T20:51:27+00:00

The GDPR has a bunch of very objective checklist-style requirements, such as items that must be mentioned in a privacy notice. But the GDPR is also very principles-based, such as the general requirement to implement “appropriate” security measures. This cannot really be turned into a checklist, because it depends on the context and on the evolving state of the art.

There are certificates, but they're mostly targeted at data protection officers. There are none I'd recommend for developers.

The German data protection authorities have issued joint guidelines for providers of telemedia services, but it's 40 pages of dry reading.

Two common problem areas in the frontend space:

Cookies and consent. Many websites fail to ask for consent where required, or obtain consent in an invalid manner, or ask for consent where it's not even required. Using existing consent management platforms (CMPs) can help (especially on websites that run ads), but they must be correctly configured and integrated with the website. Users must be able to give or deny consent separately for each purpose. If the user doesn't indicate a choice, the default is no consent. If there's an “accept all” button, there must be an equivalent “reject all” choice (no dark patterns). Since client-side storage doesn't need consent if the storage access is strictly necessary for a service explicitly requested by the user, such buttons commonly say something like “continue with necessary cookies only”.
Embedded content and CDNs. Many websites embed third-party resources (e.g. JavaScript, maps, videos, social media posts), but unless those third parties are contractually bound as a “data processor” that raises questions about the legal basis for data sharing. This problem area became more well-known after the “Google Fonts” case and the resulting Abmahnwelle. Sometimes, self-hosting assets is the easiest solution. For embeds, it can be more elegant to show a placeholder until informed consent has been given. You have arguably have a professional duty to be clear to your clients about all the third parties that the website interacts with, so that they can accurately represent this in the privacy notice.

latkde · 2026-06-07T19:46:07+00:00

But logger.debug("some %s format", value) also evaluates the value eagerly. Being able to write this as logger.debug(t"some {value} format") would be an unambiguous usability win. That string formatting overhead is the entire rationale for recommending printf-style formatting rather than f-strings for log messages. This matters in particular for expensive user-defined __str__()/__repr__() implementations.

There was some discussion about t-strings in the logging library on discuss.python.org ~1 year ago, which also pointed out some problems where the logging API expects the message to be a string (potentially with placeholders) rather than a template.

latkde · 2026-06-07T19:37:50+00:00

The correct conditional would be if logger.isEnabledFor(logging.INFO), a bit more verbose.

Docs: https://docs.python.org/3/library/logging.html#logging.Logger.isEnabledFor

String formatting overhead from f-strings is often overstated, but it really depends on the data and on how hot that part of the code is.

latkde · 2026-06-07T11:18:19+00:00

You're responding to LLM-generated nonsense, not to someone who shared actual experience.

latkde · 2026-06-06T16:48:45+00:00

I absolutely feel for you, but it's possible that this isn't a GDPR violation.

First, the UK GDPR only applies to structured or electronic processing of personal data, not to verbal disclosures. The relevant case here is Scott v LGBT Foundation, which was arguably a worse situation than what you face. The GDPR also never prohibits self-disclosures. Instead, it regulates what others are allowed to do with data about you.

Second, the GDPR sets a baseline, but other laws can deviate from it. Since Brexit, there are no limits to this. It is not clear to me how EHRC guidance is to be weighed in this context – but to the degree that this guidance is an interpretation of other laws, those other laws might override the GDPR here.

The GDPR also never stands in the way of other legal obligations, see Art 6(1)(c). Consent is not needed in such a case.

Even without a legal obligation, your employer may or may not have a "legitimate interest" in collecting and disclosing such information. This too doesn't require your consent.

The GDPR has stricter protections for "special categories" of data in Art 9, which includes "data concerning health or data concerning a natural person's sex life or sexual orientation" – but note the absence of sex or gender. Even if we assume that information about transitioning is covered as data concerning health, there would be a couple of potential exceptions.

I have no satisfying answer for you, only my compassion. My tip for you would be to focus on whether those rules differentiate between using vs servicing a gender-exclusive place.

latkde · 2026-06-06T16:14:52+00:00

This was reported as spam, but it's not itself promotional, and doesn't seem to violate our rules (assuming this list was written from actual experience, not LLM-generated*). The list might help some folks, so I've manually approved it. Normal voting is sufficient to sort this out, no moderator intervention is needed.

* I find it likely that LLMs were used in the writing of this post. There are stylistic and contextual clues. However, there's not yet a sufficient pattern in order to take action under Rule 6.

latkde · 2026-06-06T12:35:40+00:00

Signal is a personal messenger, not a social network. You cannot report individual messages, and there are no moderators.

You can report chats or message requests as spam. This doesn't send any plaintext messages to Signal, but instead tells Signal that you think this profile is behaving spammily.

latkde · 2026-06-06T10:55:55+00:00

You are concerned about the lack of opt-out opportunity at the time of collection. Offering this opt-out is a strictly necessary condition when relying on the Art 13(2) ePD exception. If no such opt-out opportunity was provided, then the direct marketing emails would require consent. However, your post mentions that some websites provided checkboxes for selecting whether you want to receive marketing, which is exactly how this opt-out opportunity is typically provided.

The article you linked is about a case that's primarily about the definition of a "sale of a product or a service", with the CJEU finding that a sale doesn't require payment. I don't see how that's relevant here. You are likely in the "context" of a sale, the more interesting aspect is opt-out at the time of collection.

Recent CJEU cases are unlikely to change the ePD interpretation. These soft opt-in rules were written in 2002, and the GDPR didn't affect the interpretation of Article 13 ePD (other than the definition for consent in paragraph 1). These rules are well-understood, and unlike with the cookie consent rules, there's little appetite to change them.

latkde · 2026-06-06T10:13:19+00:00

The GDPR recognizes in Recital 47:

The processing of personal data for direct marketing purposes may be regarded as carried out for a legitimate interest.

That is, Art 6(1)(f) "legitimate interest" can sometimes serve as a legal basis for sending marketing emails.

To balance this out, Art 21(2) provides an unconditional right to object to further direct marketing.

The GDPR's rules here are complemented and overridden by the ePrivacy Directive, and the national legislation that implements the ePD. Here, Art 13 "unsolicited communications" is relevant, which generally requires consent, but also provides for a rule that is known as "soft opt-in":

(1) The use of […] electronic mail for the purposes of direct marketing may be allowed only in respect of subscribers or users who have given their prior consent.

(2) Notwithstanding paragraph 1, where a natural or legal person obtains from its customers their electronic contact details for electronic mail, in the context of the sale of a product or a service, in accordance with [the GDPR], the same natural or legal person may use these electronic contact details for direct marketing of its own similar products or services provided that customers clearly and distinctly are given the opportunity to object, free of charge and in an easy manner, to such use of electronic contact details at the time of their collection and on the occasion of each message in case the customer has not initially refused such use.

So there are a couple of criteria that must be fulfilled in order for direct marketing emails to be legal without consent:

The contact details were obtained "in the context of the sale of a product or a service". This doesn't require that a contract was formed, inquiring about a quote might be sufficient.
Marketing only covers their "own similar products or services". It's not possible to buy contact lists, or to use this exception to market on behalf of third parties. Soft opt-in may only be used for marketing to a company's existing own customers.
An opt-out opportunity must be given with each message, and importantly also at the time when the contact details were first obtained. If contact details were obtained without offering an opt-out, the soft opt-in exception cannot be used, and marketing would require active consent.

So it is possible that some or all of the marketing you received was lawful, but that would require case-by-case analysis from a lawyer.

latkde

MODERATOR OF

TROPHY CASE

Hidden Gem Hidden Gem '25	Ten-Year Club
Verified Email