visual-deepsearch

Sleeping

App Files Files Community

manu commited on 1 day ago

Commit

40e26e1

verified ·

1 Parent(s): 98d3f8a

Update app.py

Browse files

Files changed (1) hide show

app.py +9 -6

app.py CHANGED Viewed

@@ -272,15 +272,16 @@ SYSTEM1 = (
     """
 You are a PDF research agent with a single tool: visual_deepsearch_image_search(query: string, k: int).
 Act iteratively:
-  1) Split the user question into 1–4 focused sub-queries. You can use the provided page images to help you ask relevant followup queries. Subqueries should be asked as natural language questions, not just keywords.
-  2) For each sub-query, call visual_deepsearch_image_search (k=5 by default; increase to up to 10 if you need to go deep).
-  3) You will receive the output of visual_deepsearch_image_search as a list of indices corresponding to page numbers. Print the page numbers out and stop generating. An external system will take over and convert the indices into image for you.
-  4) Analyze the images received to find information you were looking for. If you are condident that you have all the information needed for a complete response, stop early and provide a final answer. Otherwise run new search calls using the tool to find additional missing information.
-  5) Repeat the process for up to 5 rounds of iterations and 20 searches in total. If info is missing, try to continue searching using new keywords and queries.
 Workflow:
   • Use ONLY the provided images for grounding and cite as (p.<page>).
   • If an answer is not present, say “Not found in the provided pages.”
 Deliverable:
   • Return a clear, standalone Markdown answer in the user's language. Include concise tables for lists of dates/items when useful, and cite the page numbers used for each fact.
@@ -388,8 +389,10 @@ def stream_agent(question: str,
         parts: List[Dict[str, Any]] = []
         if round_idx == 1:
             parts.append({"type": "input_text", "text": question})
         else:
-            parts.append({"type": "input_text", "text": "Continue reasoning with the newly attached pages. Remember you should probably further query the search tool."})
         parts += _build_image_parts_from_indices(attached_indices)

     """
 You are a PDF research agent with a single tool: visual_deepsearch_image_search(query: string, k: int).
 Act iteratively:
+  1) If you are given images, analyze the images received to find information you were looking for. If you are condident that you have all the information needed for a complete response, provide a final answer. Most often, you should run new search calls using the tool to find additional missing information.
+  2) To run new searches, split the query into 1–3 focused sub-queries. You can use the potentially provided page images to help you ask relevant followup queries. Subqueries should be asked as natural language questions, not just keywords.
+  3) For each sub-query, call visual_deepsearch_image_search (k=5 by default; increase to up to 10 if you need to go deep).
+  4) You will receive the output of visual_deepsearch_image_search as a list of indices corresponding to page numbers. Print the page numbers out and stop generating. An external system will take over and convert the indices into image for you.
+  5) Back to step 1. Analyze the images received to find information you were looking for. If you are condident that you have all the information needed for a complete response, provide a final answer. Otherwise run new search calls using the tool to find additional missing information.
 Workflow:
   • Use ONLY the provided images for grounding and cite as (p.<page>).
   • If an answer is not present, say “Not found in the provided pages.”
+  • Never do more than three rounds of refinement. If you are past round 3, it's time to gaher all information and produce the final answer if you haven't done so yet.
 Deliverable:
   • Return a clear, standalone Markdown answer in the user's language. Include concise tables for lists of dates/items when useful, and cite the page numbers used for each fact.
         parts: List[Dict[str, Any]] = []
         if round_idx == 1:
             parts.append({"type": "input_text", "text": question})
+        elif round_idx < 5:
+            parts.append({"type": "input_text", "text": f"Continue reasoning with the newly attached pages which are from round {round_idx}. Ground your answer in these images, or query for new pages with the search tool if you are in round 3 or less. Otherwise, write your final answer."})
         else:
+            parts.append({"type": "input_text", "text": f"Time to produce the final answer grounded in the pages. Do not use the tool and query for new pages."})
         parts += _build_image_parts_from_indices(attached_indices)