I made Droidrun, an open-source tool that lets AI agents execute tasks directly in Android apps by mimicking human interaction.
It allows you to build agents that interact with Android UI elements using natural language prompts or goals.
This enables you to design custom mobile app automation and scraping functions without needing specific APIs or complex reverse engineering.
I think a lot of people are going to try to make their own mobile agents from scratch, so the idea is to provide the groundwork/library for the hard parts, so developers don't have to repeat these steps:
Parse Android UI structure in an LLM-friendly way (identifying elements, states, descriptions).
Provide robust primitives for interacting with any UI element (taps, swipes, text input).
Create reusable components for mobile agent perception and action.
To better showcase the power of direct UI interaction, I made a few demos on our X and GitHub:
- scroll TikTok and search for cat videos, comment “I love cats”
- Go to Amazon and extract the top 3 headphone products and send them via WhatsApp to Chris
- go on X and make a post: “hello world”
We are a team of four members and just opened-sourced this after getting great initial interest
(900+ waitlist signups after 72 hours, and 1.5k Stars on Git in 24 hours after the Github launch)
Hey HN,
I made Droidrun, an open-source tool that lets AI agents execute tasks directly in Android apps by mimicking human interaction.
It allows you to build agents that interact with Android UI elements using natural language prompts or goals.
This enables you to design custom mobile app automation and scraping functions without needing specific APIs or complex reverse engineering.
I think a lot of people are going to try to make their own mobile agents from scratch, so the idea is to provide the groundwork/library for the hard parts, so developers don't have to repeat these steps:
Parse Android UI structure in an LLM-friendly way (identifying elements, states, descriptions). Provide robust primitives for interacting with any UI element (taps, swipes, text input). Create reusable components for mobile agent perception and action.
To better showcase the power of direct UI interaction, I made a few demos on our X and GitHub:
- scroll TikTok and search for cat videos, comment “I love cats”
- Go to Amazon and extract the top 3 headphone products and send them via WhatsApp to Chris
- go on X and make a post: “hello world” We are a team of four members and just opened-sourced this after getting great initial interest
(900+ waitlist signups after 72 hours, and 1.5k Stars on Git in 24 hours after the Github launch)