Spaces:
Runtime error
Runtime error
title: ScouterAI | |
emoji: 👓 | |
colorFrom: green | |
colorTo: gray | |
sdk: gradio | |
sdk_version: 5.33.0 | |
app_file: app.py | |
pinned: true | |
license: apache-2.0 | |
tag: agent-demo-track | |
short_description: The agent capable of detecting over 9000 entities on images | |
# ScouterAI - The Vision enhanced Agent | |
Welcome to ScouterAI, my [Agents - MCP Hackathon](https://huggingface.co/Agents-MCP-Hackathon) submission. | |
This app falls under the track 3 : Agentic Demo. | |
The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision. | |
LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them. | |
Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready. | |
The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models. | |
The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc... | |
## Overview | |
In this preliminary app, the agent is a CodeAgent provided by the smolagents framework. | |
Its interface consists of a chat interface with example and a gallery which is used to display the agent's work. | |
The agent is provided with a set of tools : | |
- Task model retriever : a RAG tool which, given a task (object-detection or image-segmentation) and a query (car e.g.), returns a list of models with their model id and the list of classes it is capable of detecting/segmenting. The list if based on a curated dataset of all the models available on the HuggingFace Hub, returns the mo | |
- Computer vision models : Any object detection and image segmentation models available of HuggingFace | |
- Image processing functions : Resizing, cropping, ... | |
- Image annotation functions : Label, bounding box and mask annotators | |
To complete a user request | |
## Use-cases | |
## Stack | |
Agent framework : smolagents | |
LLM : Anthropic | |
Compute : Modal |