francedot/acu: A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.

Last updated: 2025/02/18 at 10:58 PM

Klenance

6 Min Read

Reinforcement Learning for Long-Horizon Interactive LLM Agents (Feb. 2025)

Novel RL approach (LOOP) for training IDAs directly in target environments
32B parameter agent outperforms OpenAI o1 by 9 percentage points on AppWorld

Large Action Models: From Inception to Implementation (Dec. 2024)

Comprehensive framework for developing LAMs that can perform real-world actions beyond language generation
Details key stages including data collection, model training, environment integration, grounding and evaluation

Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation (Dec. 2024)

Novel reward-guided navigation approach

SpiritSight Agent: Advanced GUI Agent with One Look (Dec. 2024)

Single-shot GUI interaction approach

AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs (Dec. 2024)

Novel approach for automatic GUI functionality annotation

Simulate Before Act: Model-Based Planning for Web Agents (Dec. 2024)

Novel model-based planning approach using LLM world models

Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents (Dec. 2024)

Novel autonomous skill discovery framework for web agents
Code

Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents (Dec. 2024)

Novel framework for contextualizing web pages to enhance LLM agent decision making

Digi-Q: Transforming VLMs to Device-Control Agents via Value-Based Offline RL (Dec. 2024)

Novel value-based offline RL approach for training VLM device-control agents

Magentic-One (Nov. 2024)

Multi-agent system with orchestrator-led coordination
Strong performance on GAIA, WebArena, and AssistantBench

Agent Workflow Memory (Sep. 2024)

Novel workflow memory framework for agents
Code

The Impact of Element Ordering on LM Agent Performance (Sep. 2024)

Novel study on element ordering’s impact on agent performance
Code

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents (Aug. 2024)

Novel reasoning and learning framework
Website

OpenWebAgent: An Open Toolkit to Enable Web Agents on Large Language Models (Aug. 2024)

Open platform for web-based agent deployment
Code

Agent-e: From autonomous web navigation to foundational design principles in agentic systems (Jul. 2024)

Hierarchical architecture with flexible DOM distillation
Novel denoising method for web navigation

Apple Intelligence Foundation Language Models (Jul. 2024)

Vision-Language Model with Private Cloud Compute
Novel foundation model architecture

Tree search for language model agents (Jul. 2024)

Multi-step reasoning and planning with best-first tree search
Novel approach for LLM-based agents

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning (Jun. 2024)

Novel reinforcement learning approach
Code

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration (Jun. 2024)

Multi-agent collaboration for mobile device operation
Code

Octopus Series: On-device Language Models for Computer Control (Apr. 2024)

v4: Graph of language models with functional tokens integration (Apr. 2024)
v3: Sub-billion parameter multimodal model for edge devices (Apr. 2024)
v2: Super agent for Android and iOS (Apr. 2024)
v1: Function calling of software APIs (Apr. 2024)
Website
Code

AutoWebGLM: Bootstrap and reinforce a large language model-based web navigating agent (Apr. 2024)

Novel approach for real-world web navigation and bilingual benchmark
Code

Cradle: Empowering Foundation Agents towards General Computer Control (Mar. 2024)

Focus on general computer control using Red Dead Redemption II as a case study
Code

Android in the Zoo: Chain-of-Action-Thought for GUI Agents (Mar. 2024)

Novel Chain-of-Action-Thought framework for Android interaction
Code

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (Feb. 2024)

Vision-language model for computer control
Code

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement (Feb. 2024)

Vision-Language Model for PC interaction
Code

UFO: A UI-Focused Agent for Windows OS Interaction (Feb. 2024)

Specialized for Windows OS interaction
Code

CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation (Feb. 2024)

Novel comprehensive environment perception (CEP) approach for exhaustive GUI perception
Introduces conditional action prediction (CAP) for reliable action response

Intention-inInteraction (IN3): Tell Me More! (Feb. 2024)

Novel benchmark for evaluating user intention understanding in agent designs
Introduces model experts for robust user-agent interaction

Dual-view visual contextualization for web navigation (Feb. 2024)

Novel approach for automatic web navigation with language instructions
Key: HTML elements, visual contextualization

ScreenAI: A Vision-Language Model for UI and Infographics Understanding (Feb. 2024)

Specialized for mobile UI and infographics understanding
Novel approach for visual interface comprehension

GPT-4V(ision) is a Generalist Web Agent, if Grounded (Jan. 2024)

Demonstrates GPT-4V capabilities for web interaction
Code

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (Jan. 2024)

Visual perception for mobile device interaction
Code

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models (Jan. 2024)

End-to-end approach for web interaction
Code

CogAgent: A Visual Language Model for GUI Agents (Dec. 2023)

Works across PC and Android platforms
Code

AppAgent: Multimodal Agents as Smartphone Users (Dec. 2023)

Focused on smartphone interaction
Code

LASER: LLM Agent with State-Space Exploration for Web Navigation (Sep. 2023)

Novel approach to web navigation
Code

AndroidEnv: A Reinforcement Learning Platform for Android (May 2021)

Reinforcement learning platform for Android interaction
Code

TAGGED: agents, Computer, curated, frameworks, francedotacu, Including, list, papers, Projects, Research, resources, tools

Share This Article

Leave a comment

Leave a Reply Cancel reply