now live — sign up & start controlling your device

turn old phones into
ai agents

give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.

download apk open dashboard view source

pocketagent

$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"

--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)

--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)

--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)

--- step 4/30 ---
action: enter (389ms)

--- step 5/30 ---
think: search results showing. done.
action: done (412ms)

how it works

perceive, reason, act, adapt

every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.

1. perceive

captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.

2. reason

sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.

3. act

executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.

4. adapt

if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.

three modes

interactive, workflows, or flows

type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.

interactive

just type

run it and describe what you want. the agent figures out the rest.

$ bun run src/kernel.ts
enter your goal: send "running
late, 10 mins" to Mom on whatsapp

workflows

ai-powered · json

chain goals across multiple apps. natural language steps, the llm navigates.

{
  "name": "weather to whatsapp",
  "steps": [
    { "app": "com.google...",
      "goal": "search chennai weather" },
    { "goal": "share to Sanju" }
  ]
}

flows

instant · yaml

fixed taps and types. no llm, instant execution. for repeatable tasks.

appId: com.whatsapp
name: Send WhatsApp Message
---
- launchApp
- tap: "Contact Name"
- type: "hello from pocketagent"
- tap: "Send"

workflows

json format, uses ai
handles ui changes and popups
slower (llm calls each step)
best for complex multi-app tasks

flows

yaml format, no ai needed
breaks if ui changes
instant execution
best for simple repeatable tasks

possibilities

what you can build with this

delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.

delegate to ai apps on-device

open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.

remote control with tailscale

install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.

# from anywhere:
adb connect <phone-tailscale-ip>:5555
bun run src/kernel.ts --workflow morning.json

old phones, always on

that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.

automation with ai intelligence

unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.

use cases

things it can do right now

across any app installed on the device.

messaging

send whatsapp to saved or unsaved numbers
reply to latest sms
compose emails via gmail
telegram messages to groups
post standups to slack
broadcast to multiple contacts

research

search google, collect results
ask chatgpt / gemini, grab answer
check weather, stocks, flights
compare prices across apps
translate via google translate
compile multi-source digests

social

post to instagram, twitter/x
like and comment on posts
check engagement metrics
save youtube to watch later
follow / unfollow accounts
check linkedin notifications

productivity

morning briefing across apps
create calendar events
capture notes in google keep
check github pull requests
set alarms and reminders
triage notifications

lifestyle

order food from delivery apps
book an uber ride
play songs on spotify
check commute on maps
log workouts, track expenses
toggle do not disturb

device control

toggle wifi, bluetooth, airplane
adjust brightness, volume
force stop or clear cache
grant/revoke permissions
install/uninstall apps
run any adb shell command

honest take

what works and what doesn't

22 actions + 6 multi-step skills. here's the reality.

works well

native android apps with standard ui
multi-app workflows that chain goals
device settings via shell commands
text input, navigation, taps
stuck detection + recovery
vision fallback for empty trees

unreliable

flutter, react native, games
webviews (incomplete tree)
drag & drop, multi-finger
notification interaction
clipboard on android 12+
captchas and bot detection

can't do

banking apps (FLAG_SECURE)
biometrics (fingerprint, face)
bypass encrypted lock screen
access other apps' private data
audio or camera streams
pinch-to-zoom gestures

setup

getting started

install

one command. installs bun and adb if missing, clones the repo, sets up .env.

curl -fsSL https://pa.rpaby.pw/install.sh | sh

or do it manually:

# install adb
brew install android-platform-tools

# install bun (required — npm/node won't work)
curl -fsSL https://bun.sh/install | bash

# clone and setup
git clone https://github.com/unitedbyai/pocketagent.git
cd pocketagent && bun install
cp .env.example .env

configure an llm provider

edit .env - fastest way to start is groq (free tier):

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here

# or run fully local with ollama (no api key)
# ollama pull llama3.2
# LLM_PROVIDER=ollama

provider	cost	vision	notes
groq	free	no	fastest to start
ollama	free (local)	yes*	no api key, runs on your machine
openrouter	per token	yes	200+ models
openai	per token	yes	gpt-4o
bedrock	per token	yes	claude on aws

install the android app

download and install the companion app on your android device.

download apk (v0.3.1)

connect your phone

enable usb debugging in developer options, plug in via usb.

adb devices   # should show your device
cd pocketagent && bun run src/kernel.ts

tune (optional)

key	default	what
MAX_STEPS	30	steps before giving up
STEP_DELAY	2	seconds between actions
STUCK_THRESHOLD	3	steps before stuck recovery
VISION_MODE	fallback	off / fallback / always
MAX_ELEMENTS	40	ui elements sent to llm

examples

35 workflows + 5 flows

ready to use. workflows are ai-powered (json), flows are deterministic (yaml).

messaging 10 workflows

slack-standup - post daily standup
whatsapp-broadcast - message multiple contacts
telegram-send-message - send telegram message
email-reply - draft and send email
whatsapp-to-email - forward to email
slack-check-messages - read unread messages
email-digest - summarise emails
telegram-channel-digest - digest a channel
whatsapp-reply - reply to a message
send-whatsapp-vi - send to specific contact

social 4 workflows

social-media-post - post across platforms
social-media-engage - like/comment on posts
instagram-post-check - check recent posts
youtube-watch-later - save videos

productivity 8 workflows

morning-briefing - messages, calendar, weather
github-check-prs - check pull requests
calendar-create-event - create event
notes-capture - capture a note
notification-cleanup - triage notifications
screenshot-share-slack - screenshot to slack
translate-and-reply - translate and reply
logistics-workflow - multi-app coordination

research 6 workflows

weather-to-whatsapp - weather via ai mode to whatsapp
multi-app-research - research across apps
price-comparison - compare prices
news-roundup - collect news
google-search-report - search and save
check-flight-status - flight status

lifestyle 8 workflows

food-order - order food
uber-ride - book a ride
spotify-playlist - create/add playlist
maps-commute - check commute
fitness-log - log workout
expense-tracker - log expense
wifi-password-share - share wifi
do-not-disturb - toggle dnd

flows 5 deterministic

send-whatsapp - send a message
google-search - run a search
create-contact - add a contact
clear-notifications - clear all
toggle-wifi - toggle wifi

source

10 files in src/

kernel.ts          main loop
actions.ts         22 actions + adb retry
skills.ts          6 multi-step skills
workflow.ts        workflow orchestration
flow.ts            yaml flow runner
llm-providers.ts   5 providers + system prompt
sanitizer.ts       accessibility xml parser
config.ts          env config
constants.ts       keycodes, coordinates
logger.ts          session logging

turn old phones intoai agents

perceive, reason, act, adapt

1. perceive

2. reason

3. act

4. adapt

interactive, workflows, or flows

interactive

workflows

flows

workflows

flows

what you can build with this

delegate to ai apps on-device

remote control with tailscale

old phones, always on

automation with ai intelligence

things it can do right now

messaging

research

social

productivity

lifestyle

device control

what works and what doesn't

works well

unreliable

can't do

getting started

install

configure an llm provider

install the android app

connect your phone

tune (optional)

35 workflows + 5 flows

10 files in src/

turn old phones into
ai agents