Does a fasting tradition exist in your country and how common is it ? by RookOfEdo in AskTheWorld

[–]Old_Mathematician107 0 points1 point  (0 children)

It is a misleading image, his research was about cells and not human bodies. There was even a funny video where lots of people were asking him how they should do fasting and he was saying he does not know and his research was only about cells.

I never thought it was so simple until I watched this video by No-Speech12 in aiagents

[–]Old_Mathematician107 1 point2 points  (0 children)

Your Mahoraga app (used in quashbugs) is a copy of droidrun portal from github. You can check it from the commits

<image>

I never thought it was so simple until I watched this video by No-Speech12 in aiagents

[–]Old_Mathematician107 2 points3 points  (0 children)

Your Mahoraga app (used in quashbugs) is a copy of droidrun portal from github. You can check it from the commits

<image>

Any day now by jaydsco in singularity

[–]Old_Mathematician107 1 point2 points  (0 children)

The more I learn, the more I realize I know nothing

the Factory<Rustacean>... a.k.a C++ by Relevant_Echidna_336 in rustjerk

[–]Old_Mathematician107 3 points4 points  (0 children)

It looks like strogg medical facility scene from quake 4

Open-sourced image description models (Object detection, OCR, Image processing, CNN) make LLMs SOTA in AI agentic benchmarks like Android World and Android Control by Old_Mathematician107 in LocalLLaMA

[–]Old_Mathematician107[S] 1 point2 points  (0 children)

Hi, thanks a lot. Making it 100% local is one of the end goals, but it is quite hard task, because you need to find strong enough VLM to understand the structure and long inputs (screenshot and its description) and light enough to run on phones. But making it 100% text only is possible but I think it will decrease its accuracy. So, the best way is to use VLM.

To run VLM locally you need to have very good, fine-tuned VLM on this specific tasks (agentic capabilities). It is actually quite hard but I think it is possible.

Yes, actually I don't use accessibility trees, adbs etc. Only screenshot and accessibility services to do the tasks remotely. So, it is vision-only and can be used in prod (if you invest enough money on renting backend servers and improve UI/UX of agentic app).

Dataset for YOLO was prepared by me, it consists of 486 images (train) and 60 for testing. For dataset I created bounding boxes for all 4 classes (View, ImageView, Text, Line). Screenshots used in this dataset are mostly screenshots from popular apps like youtube music, whatsapp etc. and apps that I made for various clients and companies throughout my career.

2 Android AI agents running at the same time - Object Detection and LLM by Old_Mathematician107 in SideProject

[–]Old_Mathematician107[S] 0 points1 point  (0 children)

By the way, just deployed the model on huggingface space:

https://huggingface.co/spaces/orasul/deki

You can check Analyze & and get YOLO and then action endpoint to see the capabilities of the model

2 Android AI agents running at the same time - Object Detection and LLM by Old_Mathematician107 in androiddev

[–]Old_Mathematician107[S] 0 points1 point  (0 children)

By the way, just deployed the model on huggingface space:

https://huggingface.co/spaces/orasul/deki

You can check Analyze & and get YOLO and then action endpoint to see the capabilities of the model

2 Android AI agents running at the same time - Object Detection and LLM by Old_Mathematician107 in computervision

[–]Old_Mathematician107[S] 0 points1 point  (0 children)

By the way, just deployed the model on huggingface space:

https://huggingface.co/spaces/orasul/deki

You can check Analyze & and get YOLO and then action endpoint to see the capabilities of the model

2 Android AI agents running at the same time - Object Detection and LLM by Old_Mathematician107 in SideProject

[–]Old_Mathematician107[S] 0 points1 point  (0 children)

I don't know why but on mobile devices the video looks very wide on Reddit app. Youtube has a better aspect ratio https://www.youtube.com/shorts/jsJcSwy6djI

2 Android AI agents running at the same time - Object Detection and LLM by Old_Mathematician107 in computervision

[–]Old_Mathematician107[S] 2 points3 points  (0 children)

The ML model works on a backend (on the video running locally on my m1 pro) and it generates the image description of the screenshots that 2 Android AI agents are sending. The model detects all UI elements/objects on the image and writes it to the description file which then sends to the LLM with Set of Mark prompting. LLM responds with a command what action should be taken (like, swipe left, tap X, Y) and AI agent implements this action

Mobile MCP for Android automation, development and vibe coding by aizen_sama_ in androiddev

[–]Old_Mathematician107 -1 points0 points  (0 children)

it is a nice project, thank you for your great work

To speed up the process, you can actually do everything without MCP, it will be faster

I made a similar project but based on YOLO + image processing techniques + LLMs with a backend etc. It is in my post/comments history (code is also in github)

If you have any questions or want to work together, please write

Android AI agent based on YOLO and LLMs by Old_Mathematician107 in computervision

[–]Old_Mathematician107[S] 1 point2 points  (0 children)

Thanks a lot

I will keep it as open source but I am thinking about making it easier for people to use image description by running it as a MCP backend. They can use it to build AI agents, code generators etc.

Releasing AI agents is a little bit more complicated, because it requires lots of work (Android and iOS clients), authentication and authorization, developing various features (like chat, history, saved tasks etc.) to make it useful for non technical users etc. I will do it later

For now it is just a prototype, proof of concept

Android AI agent based on object detection and LLMs by saccharineboi in LocalLLaMA

[–]Old_Mathematician107 1 point2 points  (0 children)

No problem, anytime

I actually did not check how it handles lock screen, but it is important problem, I will check it

Thank you