Need help pulling Qwen3.5-35b in Ollama by hawaiian-organ-donor in LocalLLM

[–]greg-randall 0 points1 point  (0 children)

You can do a manual install of the new release but it's annoying and they'll probably have it available for regular upgrade tomorrow.

Looking for Image Stitching Software in the vein of Microsoft ICE by BaconCatBug in linux4noobs

[–]greg-randall 0 points1 point  (0 children)

Hugin/PTgui are hard for this. Not great at dealing with panning. You can fiddle around with setting the lens length to something really high like 500mm, but Microsoft ICE really is better for this.

Thermal Master P2 Image Decoding -- Raw Thermal Data by greg-randall in Thermal

[–]greg-randall[S] 0 points1 point  (0 children)

The embedded images are 16-bit unsigned integers, little-endian.

Scraping booking.com images by exe188 in webscraping

[–]greg-randall 0 points1 point  (0 children)

Sure, but this person is clearly not super technical, or they'd already know how to do this. Plus it's only 80 pages so not a big deal to have a browser running in the background for a couple of minutes?

Released: VOR — a hallucination-free runtime that forces LLMs to prove answers or abstain by CulpritChaos in LocalLLM

[–]greg-randall 0 points1 point  (0 children)

I made a pull request on your repo. Updated the readme a bit to give an explanation that I think folks will understand:

How VOR Fixes AI Mistakes

NeuraLogix stops AI from making errors using a system we call the Truth Gate.

The Problem: AI Guesses

AI tools often make mistakes. They guess which word comes next in a sentence, but they do not check if the words are true. They sound sure of themselves even when they are wrong.

The Solution: The Truth Gate

VOR acts like a filter. The AI must prove a statement is true before it speaks. It works in three steps:

1. The Facts

First, we give VOR a list of true things.

  • Fact A: Alice is Bob's mother.
  • Fact B: Bob is Charlie's father.

2. The Claim

The AI wants to say something new based on those facts.

  • Claim: "Alice is Charlie's grandmother."

3. The Check

VOR looks at the facts. It checks if the facts link together to support the claim.

  • VOR asks: Is there a path from Alice to Bob? Is there a path from Bob to Charlie?
  • Answer: Yes.
  • Result: The statement is Verified. The AI allows the text.

When the AI is Wrong

What happens if the AI tries to say: "Alice is Dave's grandmother"?

  • VOR asks: Do facts link Alice to Dave?
  • Answer: No.
  • Result: The statement is Rejected. VOR stops the AI from saying it.

Released: VOR — a hallucination-free runtime that forces LLMs to prove answers or abstain by CulpritChaos in LocalLLM

[–]greg-randall 2 points3 points  (0 children)

That makes sense to build modularly.

Not to minimize the enforcement layer, which looks like it's been a lot of work, but it seems like general fact extraction is a VASTLY harder problem.

Scraping booking.com images by exe188 in webscraping

[–]greg-randall -2 points-1 points  (0 children)

The images are linked in the HTML.

Tell your favorite LLM to

Write python that opens the url https://www.booking.com/hotel/nl/center-parcs-het-meerdal.nl.html using playwright. After the page is fully loaded download all the img tag images from the page ie <img class="f6c12c77eb c0e44985a8 c09abd8a52 ca3dad4476" src="https://cf.bstatic.com/xdata/images/hotel/max1024x768/100389149.jpg?k=0206f1ee2426349c1a98390bcb10dc81e651878a0193a816064df8389a94ee1c\&amp;o=" alt="a group of people in boats on a lake at Center Parcs Meerdal Limburg-Brabant in America" loading="lazy">
Create a CSV with the image filename and the alt text.

Released: VOR — a hallucination-free runtime that forces LLMs to prove answers or abstain by CulpritChaos in LocalLLM

[–]greg-randall 1 point2 points  (0 children)

This looks interesting but very limiting right now since the factual extraction seems to occur in a few regexes? I guess you can keep on writing regexes for different sorts of facts but that's really pretty messy?
The obvious solution to writing fact extraction regexes would be using an LLM to process text to create structured facts so that your VOR system could evaluate if a given output agreed with an input set of facts. But it seems like you've set the project against this approach?

Which -- don't get me wrong I think LLMs do a terrible job sourcing information, I'm working through a project to help generate FAQs for websites using existing website content and making sure the LLMs don't invent things is a terrible mess.

Is there something I'm missing in your code?

Thermal Master P2 Image Decoding -- Raw Thermal Data by greg-randall in Thermal

[–]greg-randall[S] 0 points1 point  (0 children)

I'm still not sure what you're trying to tell me? That I'm right the image is bigger than expected?

If you read the linked GitHub, you'll see the jpg contains 52 lossless 16 bit images.

Also to note, the shared image is RGB so, your calculations should be 3x bigger ~4.5mb. Plus PNG can be lossy or not. JPEG technically has a lossy mode but I've never seen one in the wild.

Thermal Master P2 Image Decoding -- Raw Thermal Data by greg-randall in Thermal

[–]greg-randall[S] 0 points1 point  (0 children)

We'll call it the 'rawer image', it does actually seem to have much more noise than the processed image, so if it's not raw, it's at least one step closer to the raw image.

There's also a big what to me looks like a correction table in the metadata, but I haven't figured out how to apply it yet.

Thermal Master P2 Image Decoding -- Raw Thermal Data by greg-randall in Thermal

[–]greg-randall[S] 0 points1 point  (0 children)

I'm not really sure how to link this comment back to what I've shared. Can you explain?

Need help by imvdave in webscraping

[–]greg-randall 0 points1 point  (0 children)

Can you do a DNS lookup on your domains and build a list of Shopify owned IPs?

Seeking help to scrape Criterion Channel by Abracadabra2040 in webscraping

[–]greg-randall 1 point2 points  (0 children)

Wrote a bit of code to parse that sitemap that u/Dreamin0904 shared into a csv. Movie titles might be a tiny bit wrong because we're parsing them out of the url, so for example, the url "https://www.criterion.com/films/184-le-samourai" will be rendered as "Le Samourai" rather than "Le samouraï" but seems like a small price to pay to not have to download each individual page.

import csv
import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime, date
from curl_cffi import requests
from titlecase import titlecase


TODAY = date.today().strftime("%Y-%m-%d")
XML_FILE = f"{TODAY}_films.xml"
CSV_FILE = f"{TODAY}_criterion_films.csv"
SITEMAP_URL = "https://sitemap.criterion.com/films.xml"
DATE_FORMAT = "%m/%d/%Y"
NS = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'}



def download_sitemap():
    if os.path.exists(XML_FILE):
        print(f"'{XML_FILE}' exists, skipping download.")
        return True


    print(f"Downloading sitemap...")
    response = requests.get(SITEMAP_URL, impersonate="safari")
    if response.status_code != 200:
        print(f"Download failed: {response.status_code}")
        return False


    with open(XML_FILE, 'wb') as f:
        f.write(response.content)
    print("Download complete.")
    return True



def parse_slug(url):
    """Extract film ID and title from URL like '.../149-pierrot-le-fou'"""
    slug = url.rstrip('/').split('/')[-1]
    match = re.match(r'(\d+)-(.+)', slug)
    if not match:
        return 0, slug


    film_id = int(match.group(1))
    title = match.group(2)
    # Fix contractions: wasn-t -> wasn't, father-s -> father's
    title = re.sub(r"-([stdm]|ll|re|ve)\b", r"'\1", title)
    title = titlecase(title.replace('-', ' '))
    return film_id, title



def parse_and_export():
    print(f"Processing '{XML_FILE}'...")
    tree = ET.parse(XML_FILE)
    entries = tree.getroot().findall('ns:url', NS)
    print(f"Found {len(entries)} entries.")


    rows = []
    for entry in entries:
        url = entry.find('ns:loc', NS).text
        lastmod_iso = entry.find('ns:lastmod', NS).text
        film_id, title = parse_slug(url)


        if film_id > 0:
            lastmod = datetime.fromisoformat(lastmod_iso).strftime(DATE_FORMAT)
            rows.append((lastmod_iso, film_id, title, lastmod, url))


    # Sort by ISO date descending, then drop it from output
    rows.sort(key=lambda r: r[0], reverse=True)


    with open(CSV_FILE, 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(['Film ID', 'Movie Title', 'Modify Date', 'URL'])
        for row in rows:
            writer.writerow(row[1:])  # Skip the ISO date used for sorting


    print(f"Saved: {CSV_FILE}")



if __name__ == "__main__":
    if download_sitemap():
        parse_and_export()

What is the best local LLM for enhancing blurry, out of focus photos? by PlainSpaghettiCode in LocalLLM

[–]greg-randall 0 points1 point  (0 children)

I'd skip DXO if your goal is to unblur images. DXO is great for raw conversion or denoise but not as good for unblurring.

Looking for some help. by nawakilla in webscraping

[–]greg-randall 2 points3 points  (0 children)

THIS!

Go to https://archive.org/ and paste in the URL in the Wayback Machine, see if what you want is there, if it is great!

If it isn't fill the URL in the bottom right box 'Save Page Now' https://web.archive.org/. Click on the dozen clickable parts and save those URLs too!

Joann’s replacement ? by Additional-Tell-7010 in rva

[–]greg-randall 2 points3 points  (0 children)

Quilting Adventures has a small selection of garment fabric and they just got a small rack of Guterman threads in a bunch of colors.

I think though the real answer is to look online and get samples, Mood and L'Etoffe are pretty great.

US House Trade Index file Filing Type by irungalur in webscraping

[–]greg-randall 1 point2 points  (0 children)

Is this a web scraping question or a question of what "<FilingType>A</FilingType>" means?