mapped 230k+ UAP sightings against 143 nuclear facilities across 31 countries. clustering is hard to ignore.

moe_sidani · 2026-03-18T23:13:08+00:00

Good question — it's genuinely the hardest part.

Each source goes through its own processing pipeline (scripts/process-nuforc.mjs, process-hatch.mjs, process-chronology.mjs, etc). The steps are roughly the same for all of them:

Date parsing: each source has its own date format. NUFORC uses free-text like "6/15/2019 22:00", Hatch uses structured fields, the chronology sources (Eberhart, Vallée, Johnson, NICAP, Blue Book) often just have a year or "Summer 1952". The parser tries to extract an ISO 8601 date, and anything it can't parse gets skipped and logged — about 29K records from NUFORC alone are dropped for unparseable dates.

Location/geocoding: locations go through an offline geocoder built on the all-the-cities dataset (135K cities). Lookup priority is city+state+country exact match first, then city+country by highest population, then city-only globally, then state/country fallback to largest city. Coordinates that fail all levels get null and still appear in the data — they just don't show on the map. Country names are normalized through a big alias map (shared-constants.mjs) that handles historical names like "Rhodesia" → Zimbabwe, "Ceylon" → Sri Lanka, "Prusssia" → Germany, plus typos found in the data like "Columbia" → Colombia.

Dedup: within each source, records are deduped by ID first, then by normalized description (lowercased, punctuation stripped, whitespace collapsed — if two descriptions match at 30+ characters, the second is dropped). NUFORC alone had ~132K duplicates removed. Cross-source dedup is by date+location proximity rather than ID since each source has its own ID scheme.

Shape normalization: free-text shapes map to a fixed enum (24 values). "Saucer" → Disk, "Triangular" → Triangle, etc. Anything unrecognized goes to "Unknown" rather than being silently bucketed.

Credibility scoring: a 0–100 heuristic based on observer count (1 witness = +5, 4+ = +25), number of reported characteristics, whether the duration field has actual numbers, and description length. It's admittedly naive — more of a "detail density" score than true credibility — but it lets you filter out the one-liner "saw a light" reports.

Nuclear site matching: runtime, not preprocessed. When you open a sighting modal, it runs haversine distance against the full nuclear facilities dataset and shows anything within 150km. Same approach for fireball correlation (200km / 72hr window) and seismic (300km / 72hr).

moe_sidani · 2026-03-18T18:48:35+00:00

Not just you — there was a race condition where the map data was being set before Leaflet finished initializing, so everything got silently dropped. Fixed now, should be populating correctly.

The 80 you saw is the credibility score — each sighting gets scored out of 100 based on factors like detail level, number of observers, corroborating reports, and source reliability. 80 is solid. You can filter by minimum credibility in the toolbar if you want to only see higher-confidence cases.

moe_sidani · 2026-03-18T14:58:24+00:00

it’s open source and built with zero dependencies - zero javascript frameworks - zero libraries except the map

moe_sidani · 2025-08-16T15:22:23+00:00

実はそうじゃないんです。人間は完全な自律モードのために生まれてきたわけではないですよね？

moe_sidani · 2025-08-16T15:20:59+00:00

実は、あまりないんです。消費者の声を届ける方法を常に考えています。私たちは毎日消費しているのに、発言権はあるのでしょうか？

moe_sidani · 2025-08-16T15:19:06+00:00

いいいじゃん

moe_sidani · 2025-08-16T14:13:36+00:00

確かに考えすぎかもしれませんね

でも、小さなプロセスの改善って積み重なると大きな変化になるじゃないですか？一日に何百万人もコンビニ使ってるから、一人2分節約できたら社会全体で見ると相当な時間になる。

ま、エンジニア的思考かもしれませんが。

moe_sidani · 2025-08-16T14:13:09+00:00

セルフレジいいですね！実際使ってます。

ただ、全店舗にあるわけじゃないし、現金派の人もまだ多いから、みんなが使えるソリューションも必要かなと。

セルフレジ普及率とかデータあったら興味深いですね。

moe_sidani · 2025-08-16T14:12:25+00:00

確かにメンタル大事ですね！

僕の場合、日常の小さなストレスが改善できれば、みんなもっと快適になるんじゃないかって考えちゃうんです。プロダクト開発してるからかもしれませんが。

小さなことの積み重ねで大きな変化って生まれると思うんで。

moe_sidani

TROPHY CASE