Mobile-Agent: The Powerful GUI Agent Family
-
Updated
Dec 2, 2025 - Python
Mobile-Agent: The Powerful GUI Agent Family
Build multimodal language agents for fast prototype and production
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Build an end-to-end system that ingests inventory report PDFs/images, runs OCR to normalize and extract tabular data, stores the cleaned dataset, and exposes a secure, conversational agent that can answer business queries over the data (aggregation, filtering, joins, trends), returning tables, charts, and exportable results.
multimodal coding assistant that can analyze images containing code problems and generate solutions in multiple programming languages.
Build an end-to-end system that ingests inventory report PDFs/images, runs OCR to normalize and extract tabular data, stores the cleaned dataset, and exposes a secure, conversational agent that can answer business queries over the data (aggregation, filtering, joins, trends), returning tables, charts, and exportable results.
Add a description, image, and links to the multimodal-agent topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-agent topic, visit your repo's landing page and select "manage topics."