Spaces:

dolphinium
/

pc-ai-data-analyst-dup

Running

App Files Files Community

dolphinium commited on 28 days ago

Commit

30f0de3

1 Parent(s): 175882a

add quoting rules to prompt while crafting solr filter query

Browse files

Files changed (1) hide show

llm_prompts.py +8 -7

llm_prompts.py CHANGED Viewed

@@ -1,3 +1,4 @@
 """
 Contains the prompt templates for interacting with the Gemini LLM.
@@ -32,8 +33,7 @@ def get_analysis_plan_prompt(natural_language_query, chat_history, search_fields
         # The search_fields are now pre-mapped, so we can use them directly
         formatted_fields = "\n".join([f"  - {field['field_name']}: {field['field_value']}" for field in search_fields])
         dynamic_fields_prompt_section = f"""
----
-### MANDATORY DYNAMIC FILTERS
 An external API has identified the following field-value pairs from the user query.
 **You MUST use ALL of these fields and values to construct the `query_filter`.**
@@ -59,6 +59,7 @@ Your most important job is to think like an analyst and choose a `analysis_dimen
     *   For `group.sort`: If `analysis_measure` involves a function on a field (e.g., `sum(total_deal_value_in_million)`), you MUST use the full function: `group.sort: 'sum(total_deal_value_in_million) desc'`.
     *   If `analysis_measure` is 'count', you MUST OMIT the `group.sort` parameter entirely.
     *   For sorting, NEVER use 'date_year' directly for `sort` in `terms` facets; use 'index asc' or 'index desc' instead. For other sorts, use 'date'.
 5.  On **Qualitative Data** Group Operation:
     * We need to show user **standout examples** for each category chosen.
     For example: if user asks for "USA approved drugs last 5 years" We need to show user standout examples for each year. In this context: standout means the news with the biggest deals in million for each year for example.
@@ -117,7 +118,7 @@ This is the most critical part of your task. A bad choice leads to a useless, bo
   "analysis_dimension": "company_name",
   "analysis_measure": "sum(total_deal_value_in_million)",
   "sort_field_for_examples": "total_deal_value_in_million",
-  "query_filter": "date:["2023-01-01T00:00:00Z" TO \"2023-12-31T23:59:59Z\"]",
   "quantitative_request": {{
     "json.facet": {{
       "companies_by_deal_value": {{
@@ -159,7 +160,7 @@ This is the most critical part of your task. A bad choice leads to a useless, bo
   "analysis_dimension": "news_type",
   "analysis_measure": "count",
   "sort_field_for_examples": "date",
-  "query_filter": "therapeutic_category_s:infections AND date:["2025-01-01T00:00:00Z" TO *]",
   "quantitative_request": {{
     "json.facet": {{
       "news_by_type": {{
@@ -306,7 +307,7 @@ You are a world-class Python data visualization expert specializing in Matplotli
 Your primary task is to generate a single, insightful, and robust Python script to visualize the provided data. The visualization should directly answer the user's analytical goal.
 **1. User's Analytical Goal:**
-"{query_context}"
 **2. Aggregated Data (from Solr Facets):**
 ```json
@@ -331,7 +332,7 @@ You MUST follow these rules meticulously to ensure the code runs without errors
 1.  **Imports:** You must import `matplotlib.pyplot as plt`, `seaborn as sns`, and `pandas as pd`.
 2.  **Use Pandas:** ALWAYS parse the `facet_data` into a pandas DataFrame. This is more robust and flexible than iterating through dictionaries directly.
 3.  **Figure and Axes:** Use `fig, ax = plt.subplots()` to create the figure and axes objects. This gives you better control.
-4.  **Styling:** Apply a clean and professional style, for example: `plt.style.use('seaborn-v0_8-whitegrid')` and use a suitable Seaborn palette (e.g., `palette='viridis'`).
 5.  **NO `plt.show()`:** Your code will be run on a server. **DO NOT** include `plt.show()`.
 6.  **Save the Figure:** The execution environment expects a Matplotlib figure object named `fig`. Your code does not need to handle the saving path directly, but it **MUST** produce the final `fig` object correctly. The calling function will handle saving it.
 7.  **Titles and Labels:** You MUST set a clear and descriptive title and labels for the x and y axes. The title should reflect the user's query.
@@ -401,4 +402,4 @@ plt.tight_layout()
 Now, generate the raw Python code to create the best possible visualization for the user's goal based on the provided data.
 Do not wrap the code in ```python ... ```.
-"""

 """
 Contains the prompt templates for interacting with the Gemini LLM.
         # The search_fields are now pre-mapped, so we can use them directly
         formatted_fields = "\n".join([f"  - {field['field_name']}: {field['field_value']}" for field in search_fields])
         dynamic_fields_prompt_section = f"""
+---### MANDATORY DYNAMIC FILTERS
 An external API has identified the following field-value pairs from the user query.
 **You MUST use ALL of these fields and values to construct the `query_filter`.**
     *   For `group.sort`: If `analysis_measure` involves a function on a field (e.g., `sum(total_deal_value_in_million)`), you MUST use the full function: `group.sort: 'sum(total_deal_value_in_million) desc'`.
     *   If `analysis_measure` is 'count', you MUST OMIT the `group.sort` parameter entirely.
     *   For sorting, NEVER use 'date_year' directly for `sort` in `terms` facets; use 'index asc' or 'index desc' instead. For other sorts, use 'date'.
+    *   **Quoting**: When a field value in the `query_filter` contains spaces (e.g., 'phase 3'), you MUST enclose it in double quotes (e.g., `highest_phase:("phase 3" OR "phase 2")`).
 5.  On **Qualitative Data** Group Operation:
     * We need to show user **standout examples** for each category chosen.
     For example: if user asks for "USA approved drugs last 5 years" We need to show user standout examples for each year. In this context: standout means the news with the biggest deals in million for each year for example.
   "analysis_dimension": "company_name",
   "analysis_measure": "sum(total_deal_value_in_million)",
   "sort_field_for_examples": "total_deal_value_in_million",
+  "query_filter": "date:[\"2023-01-01T00:00:00Z\" TO \"2023-12-31T23:59:59Z\"]",
   "quantitative_request": {{
     "json.facet": {{
       "companies_by_deal_value": {{
   "analysis_dimension": "news_type",
   "analysis_measure": "count",
   "sort_field_for_examples": "date",
+  "query_filter": "therapeutic_category_s:infections AND date:[\"2025-01-01T00:00:00Z\" TO *]",
   "quantitative_request": {{
     "json.facet": {{
       "news_by_type": {{
 Your primary task is to generate a single, insightful, and robust Python script to visualize the provided data. The visualization should directly answer the user's analytical goal.
 **1. User's Analytical Goal:**
+\"{query_context}\"
 **2. Aggregated Data (from Solr Facets):**
 ```json
 1.  **Imports:** You must import `matplotlib.pyplot as plt`, `seaborn as sns`, and `pandas as pd`.
 2.  **Use Pandas:** ALWAYS parse the `facet_data` into a pandas DataFrame. This is more robust and flexible than iterating through dictionaries directly.
 3.  **Figure and Axes:** Use `fig, ax = plt.subplots()` to create the figure and axes objects. This gives you better control.
+4.  **Styling:** Apply a clean and professional style, for example: `plt.style.use('seaborn-v0_8-whitegrid')` and use a suitable Seaborn palette (e.g., `palette='viridis'`)
 5.  **NO `plt.show()`:** Your code will be run on a server. **DO NOT** include `plt.show()`.
 6.  **Save the Figure:** The execution environment expects a Matplotlib figure object named `fig`. Your code does not need to handle the saving path directly, but it **MUST** produce the final `fig` object correctly. The calling function will handle saving it.
 7.  **Titles and Labels:** You MUST set a clear and descriptive title and labels for the x and y axes. The title should reflect the user's query.
 Now, generate the raw Python code to create the best possible visualization for the user's goal based on the provided data.
 Do not wrap the code in ```python ... ```.
+"""