dolphinium commited on
Commit
30f0de3
·
1 Parent(s): 175882a

add quoting rules to prompt while crafting solr filter query

Browse files
Files changed (1) hide show
  1. llm_prompts.py +8 -7
llm_prompts.py CHANGED
@@ -1,3 +1,4 @@
 
1
  """
2
  Contains the prompt templates for interacting with the Gemini LLM.
3
 
@@ -32,8 +33,7 @@ def get_analysis_plan_prompt(natural_language_query, chat_history, search_fields
32
  # The search_fields are now pre-mapped, so we can use them directly
33
  formatted_fields = "\n".join([f" - {field['field_name']}: {field['field_value']}" for field in search_fields])
34
  dynamic_fields_prompt_section = f"""
35
- ---
36
- ### MANDATORY DYNAMIC FILTERS
37
 
38
  An external API has identified the following field-value pairs from the user query.
39
  **You MUST use ALL of these fields and values to construct the `query_filter`.**
@@ -59,6 +59,7 @@ Your most important job is to think like an analyst and choose a `analysis_dimen
59
  * For `group.sort`: If `analysis_measure` involves a function on a field (e.g., `sum(total_deal_value_in_million)`), you MUST use the full function: `group.sort: 'sum(total_deal_value_in_million) desc'`.
60
  * If `analysis_measure` is 'count', you MUST OMIT the `group.sort` parameter entirely.
61
  * For sorting, NEVER use 'date_year' directly for `sort` in `terms` facets; use 'index asc' or 'index desc' instead. For other sorts, use 'date'.
 
62
  5. On **Qualitative Data** Group Operation:
63
  * We need to show user **standout examples** for each category chosen.
64
  For example: if user asks for "USA approved drugs last 5 years" We need to show user standout examples for each year. In this context: standout means the news with the biggest deals in million for each year for example.
@@ -117,7 +118,7 @@ This is the most critical part of your task. A bad choice leads to a useless, bo
117
  "analysis_dimension": "company_name",
118
  "analysis_measure": "sum(total_deal_value_in_million)",
119
  "sort_field_for_examples": "total_deal_value_in_million",
120
- "query_filter": "date:["2023-01-01T00:00:00Z" TO \"2023-12-31T23:59:59Z\"]",
121
  "quantitative_request": {{
122
  "json.facet": {{
123
  "companies_by_deal_value": {{
@@ -159,7 +160,7 @@ This is the most critical part of your task. A bad choice leads to a useless, bo
159
  "analysis_dimension": "news_type",
160
  "analysis_measure": "count",
161
  "sort_field_for_examples": "date",
162
- "query_filter": "therapeutic_category_s:infections AND date:["2025-01-01T00:00:00Z" TO *]",
163
  "quantitative_request": {{
164
  "json.facet": {{
165
  "news_by_type": {{
@@ -306,7 +307,7 @@ You are a world-class Python data visualization expert specializing in Matplotli
306
  Your primary task is to generate a single, insightful, and robust Python script to visualize the provided data. The visualization should directly answer the user's analytical goal.
307
 
308
  **1. User's Analytical Goal:**
309
- "{query_context}"
310
 
311
  **2. Aggregated Data (from Solr Facets):**
312
  ```json
@@ -331,7 +332,7 @@ You MUST follow these rules meticulously to ensure the code runs without errors
331
  1. **Imports:** You must import `matplotlib.pyplot as plt`, `seaborn as sns`, and `pandas as pd`.
332
  2. **Use Pandas:** ALWAYS parse the `facet_data` into a pandas DataFrame. This is more robust and flexible than iterating through dictionaries directly.
333
  3. **Figure and Axes:** Use `fig, ax = plt.subplots()` to create the figure and axes objects. This gives you better control.
334
- 4. **Styling:** Apply a clean and professional style, for example: `plt.style.use('seaborn-v0_8-whitegrid')` and use a suitable Seaborn palette (e.g., `palette='viridis'`).
335
  5. **NO `plt.show()`:** Your code will be run on a server. **DO NOT** include `plt.show()`.
336
  6. **Save the Figure:** The execution environment expects a Matplotlib figure object named `fig`. Your code does not need to handle the saving path directly, but it **MUST** produce the final `fig` object correctly. The calling function will handle saving it.
337
  7. **Titles and Labels:** You MUST set a clear and descriptive title and labels for the x and y axes. The title should reflect the user's query.
@@ -401,4 +402,4 @@ plt.tight_layout()
401
 
402
  Now, generate the raw Python code to create the best possible visualization for the user's goal based on the provided data.
403
  Do not wrap the code in ```python ... ```.
404
- """
 
1
+
2
  """
3
  Contains the prompt templates for interacting with the Gemini LLM.
4
 
 
33
  # The search_fields are now pre-mapped, so we can use them directly
34
  formatted_fields = "\n".join([f" - {field['field_name']}: {field['field_value']}" for field in search_fields])
35
  dynamic_fields_prompt_section = f"""
36
+ ---### MANDATORY DYNAMIC FILTERS
 
37
 
38
  An external API has identified the following field-value pairs from the user query.
39
  **You MUST use ALL of these fields and values to construct the `query_filter`.**
 
59
  * For `group.sort`: If `analysis_measure` involves a function on a field (e.g., `sum(total_deal_value_in_million)`), you MUST use the full function: `group.sort: 'sum(total_deal_value_in_million) desc'`.
60
  * If `analysis_measure` is 'count', you MUST OMIT the `group.sort` parameter entirely.
61
  * For sorting, NEVER use 'date_year' directly for `sort` in `terms` facets; use 'index asc' or 'index desc' instead. For other sorts, use 'date'.
62
+ * **Quoting**: When a field value in the `query_filter` contains spaces (e.g., 'phase 3'), you MUST enclose it in double quotes (e.g., `highest_phase:("phase 3" OR "phase 2")`).
63
  5. On **Qualitative Data** Group Operation:
64
  * We need to show user **standout examples** for each category chosen.
65
  For example: if user asks for "USA approved drugs last 5 years" We need to show user standout examples for each year. In this context: standout means the news with the biggest deals in million for each year for example.
 
118
  "analysis_dimension": "company_name",
119
  "analysis_measure": "sum(total_deal_value_in_million)",
120
  "sort_field_for_examples": "total_deal_value_in_million",
121
+ "query_filter": "date:[\"2023-01-01T00:00:00Z\" TO \"2023-12-31T23:59:59Z\"]",
122
  "quantitative_request": {{
123
  "json.facet": {{
124
  "companies_by_deal_value": {{
 
160
  "analysis_dimension": "news_type",
161
  "analysis_measure": "count",
162
  "sort_field_for_examples": "date",
163
+ "query_filter": "therapeutic_category_s:infections AND date:[\"2025-01-01T00:00:00Z\" TO *]",
164
  "quantitative_request": {{
165
  "json.facet": {{
166
  "news_by_type": {{
 
307
  Your primary task is to generate a single, insightful, and robust Python script to visualize the provided data. The visualization should directly answer the user's analytical goal.
308
 
309
  **1. User's Analytical Goal:**
310
+ \"{query_context}\"
311
 
312
  **2. Aggregated Data (from Solr Facets):**
313
  ```json
 
332
  1. **Imports:** You must import `matplotlib.pyplot as plt`, `seaborn as sns`, and `pandas as pd`.
333
  2. **Use Pandas:** ALWAYS parse the `facet_data` into a pandas DataFrame. This is more robust and flexible than iterating through dictionaries directly.
334
  3. **Figure and Axes:** Use `fig, ax = plt.subplots()` to create the figure and axes objects. This gives you better control.
335
+ 4. **Styling:** Apply a clean and professional style, for example: `plt.style.use('seaborn-v0_8-whitegrid')` and use a suitable Seaborn palette (e.g., `palette='viridis'`)
336
  5. **NO `plt.show()`:** Your code will be run on a server. **DO NOT** include `plt.show()`.
337
  6. **Save the Figure:** The execution environment expects a Matplotlib figure object named `fig`. Your code does not need to handle the saving path directly, but it **MUST** produce the final `fig` object correctly. The calling function will handle saving it.
338
  7. **Titles and Labels:** You MUST set a clear and descriptive title and labels for the x and y axes. The title should reflect the user's query.
 
402
 
403
  Now, generate the raw Python code to create the best possible visualization for the user's goal based on the provided data.
404
  Do not wrap the code in ```python ... ```.
405
+ """