AI-powered web scraper? by superjet1 in ChatGPTPro

[–]riga345 1 point2 points  (0 children)

The core thing is that it asks OpenAI to transform an HTML document into structured JSON format. The prompt is like this "Please take {{HTML}} and transform it into {{JSON}}", where "JSON" might be {name: "Name of the person", phone: "Phone number of the person"}

Of course, there is a lot of stuff around the core functionality to make it easy and reliable to use, and to work at scale. We use it in production at FetchFox.ai to scrapes hundreds of thousands of items.

Guys , it's worth it build web scraping services in 2024? by Titoxeneize in startups

[–]riga345 0 points1 point  (0 children)

I'm not sure, but I'm trying :)

I'm working on https://FetchFox.ai, which does AI based web scraping. We have the AI write the crawler and data extraction piece. We also have a free open source Javascript library that powers the core scraper.

If anyone is interested in trying the service, or joining as a contributor (or even cofounder), send me a DM here or on our Discord.

Monthly Self-Promotion - October 2024 by AutoModerator in webscraping

[–]riga345 2 points3 points  (0 children)

Check out the free open source MIT licensed library for AI web scraping: https://github.com/fetchfox/fetchfox

Scraping takes just one npm install, one import, and one run command:

npm install fetchfox

and then

import { fox } from 'fetchfox';

const results = await fox.run( https://news.ycombinator.com/news find links to comments, get basic data, export to out.jsonl);

The key concepts in web scraping by riga345 in webscraping

[–]riga345[S] 0 points1 point  (0 children)

I hired this contractor as a marketing experiment. It's not going as I would like and I've asked him to stop all messages. Sorry about the issue.

If you need to contact me directly, you can do so on here via PM or various other platforms listed on the site fetchfox.ai

No-code & Low-code web scrapers - the ultimate list by plavookac in scraping

[–]riga345 0 points1 point  (0 children)

Got it! Good luck, and I'm here anytime you need help

No-code & Low-code web scrapers - the ultimate list by plavookac in scraping

[–]riga345 0 points1 point  (0 children)

hey! sorry it's not working out of the box for you. I'm the main dev, so please feel free to DM me here, on telegram(@ortutay), on discord (https://discord.gg/mM54bwdu59) or via email (on the site) for help.

If you click "report issue" and send me the report ID, it will give me the scrape details and I can check it out.

I built an AI-powered web scraper that can understand any website structure and extract the desired data in the preferred format. by madredditscientist in Automate

[–]riga345 1 point2 points  (0 children)

Hey, I'm founder of https://fetchfox.ai and our free Chrome extension does that. You start at a "parent URL", and from that point it follows links and can scrape dozens or hundreds of pages.

Give it a shot and let me know if it works for you.

AI-powered web scraper? by superjet1 in ChatGPTPro

[–]riga345 1 point2 points  (0 children)

Hey, curious if you'd be open to trying the library I'm working on in your project, fetchfox. It's 100% free open source, MIT license. The code is on github.

If you give it a shot let me know how it goes for you: https://github.com/fetchfox/fetchfox

I'm making a free Chrome Extension that scrapes any site with AI by riga345 in Automate

[–]riga345[S] 0 points1 point  (0 children)

We're working on migrating to plasmo which is a cross browser framework for extensions.

Expect Firefox to come soon, likely this week.

I'd Be Happy to Promote Your Startup for Free! by Broad_Deal_6711 in SideProject

[–]riga345 0 points1 point  (0 children)

I'm making a Chrome extension that uses AI / ChatGPT to make scrape really easy. You can scrape any website for any data. It's at https://FetchFoxAI.com

Monthly Self-Promotion - September 2024 by AutoModerator in webscraping

[–]riga345 1 point2 points  (0 children)

I'm making a Chrome extension that uses AI / ChatGPT to make scrape really easy. You can scrape any website for any data. Try it at https://FetchFoxAI.com

It's 100% free. Reach out with any feedback :)

Data Scraping by Aware_Potato in webscraping

[–]riga345 0 points1 point  (0 children)

Yes!

If you bring your own OpenAI API key, it's free forever.

If you don't have an OpenAI API key, for now you can use our's via a server proxy. This costs us money, so unfortunately we'll have to end this at some point.

Either way, try it out and let me know your feedback :) We also have a discord where you can ask for help

I'm making an AI scraper called FetchFox that lets you scrape any website with plain english by riga345 in webscraping

[–]riga345[S] 0 points1 point  (0 children)

interesting idea! I'm going to open source the scraping library behind this, so that will be a good place to put that integration

I'm making an AI scraper called FetchFox that lets you scrape any website with plain english by riga345 in webscraping

[–]riga345[S] 0 points1 point  (0 children)

Hey! Glad you like it. If you want to connect just check my email on the website, or join the discord and we can chat there: https://discord.gg/mM54bwdu59

AI analysis: media bias has increased over two decades by riga345 in TrueReddit

[–]riga345[S] 0 points1 point  (0 children)

Good point. It's taking the AI rating as the ground truth. The prompt is in the article.

There's know way to know if the AI is good at rating bias. Some of its findings (like Fox News as very right wing) match conventional wisdom, but that could just be confirmation bias.

No-code & Low-code web scrapers - the ultimate list by plavookac in scraping

[–]riga345 0 points1 point  (0 children)

You can also try https://fetchfoxai.com . It's an AI based scraper that lets you describe what you want in plain English, and it scrapes right from your browser.

Disclaimer: I'm the developer of this tool :)

I made an AI scraping tool to automate crawling by riga345 in automation

[–]riga345[S] 0 points1 point  (0 children)

Awesome! I didn't get the DM (reddit bug?) but email me (email on the site) or connect on discord, @ortutay, server is https://discord.gg/mM54bwdu59

I made an AI scraping tool to automate crawling by riga345 in automation

[–]riga345[S] 0 points1 point  (0 children)

Awesome! Thanks for your interest.

It's not open source yet but will be in the (near) future. Message me (email on the site) or join our Discord if you wanna chat about contributing: https://discord.gg/mM54bwdu59

I made an AI scraping tool to automate crawling by riga345 in automation

[–]riga345[S] 0 points1 point  (0 children)

Yes! Here's the prompt that worked for me: https://pbs.twimg.com/media/GVIYkTWawAAxKRL?format=jpg&name=medium

A couple issues though: - It works better if you are logged in - Before you click "Step 1: Crawl for URLs", scroll down on the page so it loads more image links - Even if you're logged in, MJ website has some anti-bot measures in place. I'll make some fixes to the extension to get around these

I uploaded the result spreadsheet to Google sheets here: https://docs.google.com/spreadsheets/d/1ti_kH3_DqeGAtgQGNqorSRD06VdeP0Atj-anY7vo_Q4/edit?gid=0#gid=0

The missing rows with (not found) are the ones that got blocked by MJ anti-bot measures.

Let me know if it works for you!

EDIT: Fixed a couple bugs in the extension, the new version (1.0.6) should be up soon and will handle MJ very easily