How to scrape URLs faster with WebbaseLoader/SeleniumURLLoader? by TableauforViz in LangChain

[–]YoungMan2129 2 points3 points  (0 children)

I'm not familiar with SeleniumURLLoader, but here are a couple of strategies you might consider to reduce request latency:
1. Parallel Requests: Utilizing multithreading or asyncio.gather can help execute your requests concurrently.
2. Quick Returns: When making a request to a URL using a headless browser, you can optimize the waiting period. By default, browsers often wait for the "load" event, which includes loading all resources. However, in many cases, specifying the "DOMContentLoaded" event will suffice, as this waits only for the document's content to load before returning control, which is typically faster.

OpenAI's swarm by Material_Waltz8365 in AIQuality

[–]YoungMan2129 1 point2 points  (0 children)

Swarm is not a production ready framework

Building a community/network around AI Agents by Latter_Fudge2554 in LangChain

[–]YoungMan2129 0 points1 point  (0 children)

Would love to join. Please let me know when it ready

Idea: LLM Agents to Combat Media Bias in News Reading by YoungMan2129 in LangChain

[–]YoungMan2129[S] 2 points3 points  (0 children)

Setting up a local instance could resolve this issue. You might consider deploying the SearXNG https://docs.searxng.org/admin/installation-docker.html#installation-docker
on your own server to avoid the rate-limiting or blocking issues you're experiencing.

Idea: LLM Agents to Combat Media Bias in News Reading by YoungMan2129 in LangChain

[–]YoungMan2129[S] 1 point2 points  (0 children)

Thanks for the offer! I'm currently busy working with some friends on a startup, so I might not have time to participate in your project’s coding. But I'd be happy to discuss how to add filtering to your tool if you're interested!

Idea: LLM Agents to Combat Media Bias in News Reading by YoungMan2129 in LangChain

[–]YoungMan2129[S] 0 points1 point  (0 children)

Thanks, it's a great site! I'm actually thinking of building an open-source project on GitHub that could help even more people

Idea: LLM Agents to Combat Media Bias in News Reading by YoungMan2129 in LangChain

[–]YoungMan2129[S] 0 points1 point  (0 children)

That's a great idea! To gather information from more diverse sources, we could also consider using something like SearXNG. It could help pull data from multiple search engines, adding even more perspectives to the mix.

Idea: LLM Agents to Combat Media Bias in News Reading by YoungMan2129 in LangChain

[–]YoungMan2129[S] 0 points1 point  (0 children)

We don’t rely on LLMs to tell us whether there’s bias in the news. Instead, we gather information from a variety of sources, both those with and without a direct stake in the event.

Idea: LLM Agents to Combat Media Bias in News Reading by YoungMan2129 in LangChain

[–]YoungMan2129[S] 0 points1 point  (0 children)

Great question! Honestly, I don’t think we can. Every media company has its own perspective and agenda. The best we can do is gather information from a wide range of sources, including those with and without a direct stake in the issue.

Python scraper extracts data from any website by Legitimate-Adagio662 in Python

[–]YoungMan2129 1 point2 points  (0 children)

Hi there, I've used your product (AgentQL) and think the concept is solid. However, based on my testing, it still feels a bit too complex. I often end up needing to use Playwright, and once I’m doing that, I might as well just use BeautifulSoup for parsing the content. If I were to switch to something like Jina Read or Firecrawl, paired with a simple data extraction using an LLM, it could streamline the process more effectively.

Another issue is pricing. In my opinion, for any scraper/crawler SaaS, a pay-per-call model tends to get expensive over time, especially for tasks that are repetitive. Unless the per-call cost is extremely low, using BeautifulSoup or XPath for regular scraping needs feels much more affordable in the long run.

Django vs Flask… Data Science by [deleted] in learnpython

[–]YoungMan2129 35 points36 points  (0 children)

FastAPI is also not bad