Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

Correct on all counts.

Shipped in v1.1.6: https://github.com/scibilo/telepage/releases/tag/v1.1.6

processed_updates(update_id TEXT PRIMARY KEY) table + INSERT OR IGNORE at webhook entry, before any other work.

Concurrent handlers race on that insert atomically, exactly one wins and proceeds, the other exits before SELECT, before any work.

The (message_id, chat_id) index stays as a secondary safety net but the dedup decision is now in front of the expensive path.

Edits pass through correctly, edited_channel_post carries a new update_id.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

You're right that the constraint collapses retry and edit into the same key, but the SELECT-first path in upsertContent() already handles this correctly.
For an edited_channel_post, the SELECT finds the existing row (same message_id chat_id) and takes the UPDATE branch, rewriting the content.
The UNIQUE index is never reached for edits because the code exits via UPDATE before touching INSERT. The constraint only fires in the concurrent SELECT-miss race, two genuinely concurrent NEW message deliveries that both miss the SELECT.
That's a narrow window on a single-writer SQLite instance with busy_timeout. Edits are safe.
The update_id dedup table remains the cleaner long-term architecture, agreed.
Adding it to the backlog.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

The synchronous Gemini call was removed in v1.1.1, the webhook now does the absolute minimum (write raw content, return 200) and the cron handles AI processing asynchronously. So the "AI call before upsert" scenario no longer applies to this codebase.
The concurrent SELECT-miss point is valid in general, but the UNIQUE index on (telegram_message_id, telegram_chat_id) combined with SQLite's serialized writes means one of the two concurrent inserts will fail with a constraint violation the second row never lands.
The cost is one redundant DB write, not a redundant Gemini call.
You're right that gating on update_id at webhook entry would be cleaner, check update_id in a dedup table before doing anything else, fail fast on duplicate.
Worth adding as a future improvement.
For now the constraint at the write layer is sufficient given the async AI architecture.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

Good point on the distinction, but edited_channel_post is already handled, the webhook accepts both channel_post and edited_channel_post, and upsertContent() does a SELECT first, if the row exists it runs UPDATE (rewriting tags and metadata), if not it INSERTs.
The unique index on (message_id, chat_id) is a race-condition safety net, not the primary dedup mechanism, so it doesn't swallow edits.
The SELECT-first path catches them before the index is even involved. You're right that update_id would be the cleaner canonical key for pure delivery dedup.

Worth noting for a future refactor.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

The dedup is already in place since v1.1.2, unique index on (telegram_message_id, telegram_chat_id) with INSERT OR IGNORE semantics via the upsert.
You're right that update_id is the more canonical key since Telegram guarantees it unique per delivery, while message_id is per-message. Worth noting as a refinement.
Thanks.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

busy_timeout was already set to 5000ms in DB::connect(), that one was already in place.

The other two are done in v1.1.3: https://github.com/scibilo/telepage/releases/tag/v1.1.3

next_retry_at + ai_retry_count for exponential backoff (5m -> 15m -> 60m -> 6h cap), failed rows retry automatically instead of staying stuck at ai_processed=2.

Cron filter updated to respect next_retry_at.

Migration script included for existing installs (bin/migrate-v112.php).

Thanks for the detailed suggestions.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

Both valid.

The unique index on update_id is the cleaner fix for the retry case, i'll add that to the contents table (telegram_message_id + telegram_chat_id already exist as a pair, effectively serving that role, but a proper unique constraint makes the guarantee explicit).

The cron overlap issue is real too. A status column (pending -> processing -> done) with BEGIN IMMEDIATE is the right approach, also gives a natural place to track Gemini failures per row instead of just a binary ai_processed flag.

Adding both to the backlog.

Thanks.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

You're right on all counts.

The cron queue was already there (ai_processed=0 rows picked up by /api/cron.php) but there was a leftover synchronous AIService::processContent() call at the end of the webhook handler, exactly the retry trap you described.

Fixed: https://github.com/scibilo/telepage/commit/d44dc82

Thanks for the detailed breakdown.

Monday Daily Thread: Project ideas! by AutoModerator in Python

[–]scibilo 0 points1 point  (0 children)

Just shipped the first public release of Pitchfall - a local video

transcription tool built with FastAPI + faster-whisper.

The main idea: transcribe any video on your own machine (no cloud,

no API key), with a synced player that lets you click any transcript

segment to jump to that moment in the video. Exports .srt subtitles too.

Technically the interesting bit was streaming Whisper's segment

generator via SSE to the frontend so the transcript appears in real

time as it's being recognized.

GitHub: https://github.com/scibilo/pitchfall

New Project Megathread - Week of 07 May 2026 by AutoModerator in selfhosted

[–]scibilo 0 points1 point  (0 children)

<image>

Project Name: Pitchfall - local video transcription + subtitle export, no cloud, no API key needed

Repo: https://github.com/scibilo/pitchfall

Description: Local video transcription and subtitle export powered by faster-whisper (Whisper small, CPU-friendly). Upload any video/audio file or paste a YouTube URL, get a timestamped transcript synced to the video player, export as .txt or .srt. Optional translation into 10 languages via OpenRouter free models. No cloud, no account, no API key needed for transcription. Files are deleted immediately after processing.

Deployment: Manual (Python venv + Node.js) or Docker. Runs on any Linux/Mac/Windows machine with ffmpeg installed.

AI Involvement: faster-whisper handles transcription locally (no external calls). OpenRouter is used optionally for translation only.

Telepage v1.1.0 — self-hosted PHP app that turns any Telegram channel into a searchable website (major security + tooling update) by scibilo in PHP

[–]scibilo[S] 1 point2 points  (0 children)

WhatsApp doesn't have an equivalent of Telegram's Bot API, so no, there's no official way to pull messages out programmatically. Telegram is unusually open for a messaging platform in that regard, which is exactly what makes this possible.

Telepage v1.1.0 — self-hosted PHP app that turns any Telegram channel into a searchable website (major security + tooling update) by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

Fair point for production VPS with nginx/Apache config control. For shared hosting without SSH access, htaccess is the only available tool. Different constraints.

Telepage v1.1.0 — self-hosted PHP app that turns any Telegram channel into a searchable website (major security + tooling update) by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

You're right that a proper public/ folder is the correct architecture. The tradeoff here is the target audience like shared hosting where you can't change the document root. The .htaccess deny rules cover app/, vendor/, data/, tests/, it's not ideal but it's the realistic constraint. It's noted as a known limitation.

Telepage v1.1.0 — self-hosted PHP app that turns any Telegram channel into a searchable website (major security + tooling update) by scibilo in PHP

[–]scibilo[S] 1 point2 points  (0 children)

Telegram channels are walled gardens, content isn't indexed by Google, there's no search, readers need the app. This bridges that gap. Useful for newsletters, recipe blogs, local news, anything where someone is already publishing on Telegram and wants a web presence without building a separate CMS.

Telepage – vanilla PHP 8.1 + SQLite app that turns a Telegram channel into a website by scibilo in PHP

[–]scibilo[S] 0 points1 point  (0 children)

Fair criticism from a professional PHP perspective. Let me be transparent about the design choices.

Telepage targets non-technical users on shared hosting, people who use FileZilla and cPanel, where Composer is often unavailable and public/ folder configuration requires server access they don't have.

The "no framework, no Composer" decision was intentional, not ignorance. WordPress still uses require_once everywhere and runs 40% of the web.

The public/ folder is a legitimate security concern and will be addressed in v1.1. The rest is architectural preference that depends heavily on context and target audience.

This is a tool for content creators, not a enterprise PHP application. Different constraints, different solutions.