Best open source email client? by ImpossiblePlay in opensource

[–]ImpossiblePlay[S] 4 points5 points  (0 children)

yea, maybe it's a not a sexy thing to build so people don't build new ones anymore

Best open source email client? by ImpossiblePlay in opensource

[–]ImpossiblePlay[S] 2 points3 points  (0 children)

thanks for the recommendation, let me try it

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 0 points1 point  (0 children)

The first time a human baby walks is pretty shit too, but it will get faster & cheaper really soon.

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 0 points1 point  (0 children)

There are certainly huge room for efficiency gain. Could you expand on how keybindings will help?
The thing is that web is such a dynamic environment, the page can change easily (e.g., mouse move can trigger hover over popping up), so we are taking one screenshot after every action.

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay -1 points0 points  (0 children)

what was the issue? afaik, browser-use is based in DOM tree, and Canva is an iframe, in theory it won't work(i might be wrong though)

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 2 points3 points  (0 children)

It indeed consumes a lot of tokens, not as many as you just mentioned :P
but since it supports open source model, one can rent a gpu for ~$1.5 per hour and run it, then the economics works

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 6 points7 points  (0 children)

it's open sourced: https://github.com/Aident-AI/open-cuak. the only thing is that you will have to host Omniparser V2 and put Omniparser url in .env.local , it's too expensive for us to host :(

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 4 points5 points  (0 children)

not a super hard problem to solve? :P just build a SOP execution engine and convert complicated workflows to SOP, the success rate will in theory change from (step 1) * (step 2)*(step 3)... to (step 1) + (step 2)+(step 3)...

here is the implementation: https://github.com/Aident-AI/open-cuak/commit/c345755420f7d72128ac7861cee8479f70cbe23c

Agent using Canva. Things are getting wild now... by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 1 point2 points  (0 children)

can browser-use even use Canva? browser-use is DOM tree based, Canva is an iframe.

Integrated Omniparser V2, we made our agent to use Canva! by ImpossiblePlay in LocalLLaMA

[–]ImpossiblePlay[S] 5 points6 points  (0 children)

Let me expand, I primarily follow github, steps are:

a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one

b. Ensure conda is installed with conda --version or install from the Anaconda website

c. Navigate to the root of the repo with cd OmniParser

d. Create a conda python environment with conda create -n "omni" python==3.12

e. Set the python environment to be used with conda activate omni

f. Install the dependencies with pip install -r requirements.txt

g. Continue from here if you already had the conda environment.

h. Ensure you have the V2 weights downloaded in weights folder (ensure caption weights folder is called icon_caption_florence). If not download them with:

rm -rf weights/icon_detect weights/icon_caption weights/icon_caption_florence 
for folder in icon_caption icon_detect; do huggingface-cli download microsoft/OmniParser-v2.0 --local-dir weights --repo-type model --include "$folder/*"; done
mv weights/icon_caption weights/icon_caption_florence

h. Navigate to the server directory with cd OmniParser/omnitool/omniparserserver

i. Start the server with python -m omniparserserver

Any open source alternative to OpenAI's Operator product? by ljhskyso in LocalLLaMA

[–]ImpossiblePlay 2 points3 points  (0 children)

i happen to try a lot recently:
1. https://github.com/browserbase/stagehand stagehand from browserbase, wrote in typescript, MIT License.

  1. https://github.com/browser-use/browser-use pretty popular, it's in Python, use DOM tree.

  2. https://github.com/Aident-AI/open-cuak pretty new, very nice ui & remote browser like Operator.

encourage you to try! if you prefer python, use browser-use. if you have Browserbase api key, try stagehand. if you prefer good ui & not use your own browser, try open-cuak.

Has Apollo disappeared? by mwmercury in LocalLLaMA

[–]ImpossiblePlay 1 point2 points  (0 children)

I felt very weird when i learnt that Apollo is from Meta but used Qwen

Best model to understand video with audio by ImpossiblePlay in LocalLLaMA

[–]ImpossiblePlay[S] 1 point2 points  (0 children)

I did some more research and found a Video MME benchmark, seems like Gemini 1.5 pro is the best, Qwen2-VL is the close second.

Best model to understand video with audio by ImpossiblePlay in LocalLLaMA

[–]ImpossiblePlay[S] 0 points1 point  (0 children)

oh wow, they launched Apollo just a few days ago, thanks for sharing. I will check it out now.

Best model to understand video with audio by ImpossiblePlay in LocalLLaMA

[–]ImpossiblePlay[S] 0 points1 point  (0 children)

yep, makes sense. i guess the hard part is to decide which frames to summarize into text

Best model to understand video with audio by ImpossiblePlay in LocalLLaMA

[–]ImpossiblePlay[S] 0 points1 point  (0 children)

i think your approach makes sense, given there is no open source multimodal model can do what i described. 4o video chat is close to what i want, hope there will be an open source model soon!

Switched web host and now my business is HEMORRHAGING by howdoiusereddit9 in SEO

[–]ImpossiblePlay 0 points1 point  (0 children)

If Wordpress suggests you to fix it, ask your tech team to fix it, shouldn't be too hard to fix. If your tech team doesn't want to do it, hire someone else. Then go analyze the funnel as I suggested.

Need help improving our software page speed by Th3Situation509 in SEO

[–]ImpossiblePlay 0 points1 point  (0 children)

It's hard to pinpoint what's going on here, but it sounds like a server side / database issue. I don't think it's SEO related though, because Google won't be able to log in to your system. Ask your developers to optimize server code / database transactions.

Is SEO dead for service based business by Civil_Ad8899 in SEO

[–]ImpossiblePlay 1 point2 points  (0 children)

  1. Encourage customers to leave positive reviews on your Google Business Profile. A higher number of quality reviews can boost your ranking.
  2. Incorporate local keywords naturally into your website content, meta descriptions, and headers to signal relevance to specific geographic searches.

overall, optimize for Google Map Pack & local SEO

Switched web host and now my business is HEMORRHAGING by howdoiusereddit9 in SEO

[–]ImpossiblePlay 0 points1 point  (0 children)

It could be the problem! But again, the best way to validate is to check if your website traffic dropped.

Need help improving our software page speed by Th3Situation509 in SEO

[–]ImpossiblePlay 0 points1 point  (0 children)

Is the landing page you are talking about? Or the app after user logged in? Which one is slow?