dolphinium
commited on
Commit
·
30f0de3
1
Parent(s):
175882a
add quoting rules to prompt while crafting solr filter query
Browse files- llm_prompts.py +8 -7
llm_prompts.py
CHANGED
@@ -1,3 +1,4 @@
|
|
|
|
1 |
"""
|
2 |
Contains the prompt templates for interacting with the Gemini LLM.
|
3 |
|
@@ -32,8 +33,7 @@ def get_analysis_plan_prompt(natural_language_query, chat_history, search_fields
|
|
32 |
# The search_fields are now pre-mapped, so we can use them directly
|
33 |
formatted_fields = "\n".join([f" - {field['field_name']}: {field['field_value']}" for field in search_fields])
|
34 |
dynamic_fields_prompt_section = f"""
|
35 |
-
|
36 |
-
### MANDATORY DYNAMIC FILTERS
|
37 |
|
38 |
An external API has identified the following field-value pairs from the user query.
|
39 |
**You MUST use ALL of these fields and values to construct the `query_filter`.**
|
@@ -59,6 +59,7 @@ Your most important job is to think like an analyst and choose a `analysis_dimen
|
|
59 |
* For `group.sort`: If `analysis_measure` involves a function on a field (e.g., `sum(total_deal_value_in_million)`), you MUST use the full function: `group.sort: 'sum(total_deal_value_in_million) desc'`.
|
60 |
* If `analysis_measure` is 'count', you MUST OMIT the `group.sort` parameter entirely.
|
61 |
* For sorting, NEVER use 'date_year' directly for `sort` in `terms` facets; use 'index asc' or 'index desc' instead. For other sorts, use 'date'.
|
|
|
62 |
5. On **Qualitative Data** Group Operation:
|
63 |
* We need to show user **standout examples** for each category chosen.
|
64 |
For example: if user asks for "USA approved drugs last 5 years" We need to show user standout examples for each year. In this context: standout means the news with the biggest deals in million for each year for example.
|
@@ -117,7 +118,7 @@ This is the most critical part of your task. A bad choice leads to a useless, bo
|
|
117 |
"analysis_dimension": "company_name",
|
118 |
"analysis_measure": "sum(total_deal_value_in_million)",
|
119 |
"sort_field_for_examples": "total_deal_value_in_million",
|
120 |
-
"query_filter": "date:["2023-01-01T00:00:00Z" TO \"2023-12-31T23:59:59Z\"]",
|
121 |
"quantitative_request": {{
|
122 |
"json.facet": {{
|
123 |
"companies_by_deal_value": {{
|
@@ -159,7 +160,7 @@ This is the most critical part of your task. A bad choice leads to a useless, bo
|
|
159 |
"analysis_dimension": "news_type",
|
160 |
"analysis_measure": "count",
|
161 |
"sort_field_for_examples": "date",
|
162 |
-
"query_filter": "therapeutic_category_s:infections AND date:["2025-01-01T00:00:00Z" TO *]",
|
163 |
"quantitative_request": {{
|
164 |
"json.facet": {{
|
165 |
"news_by_type": {{
|
@@ -306,7 +307,7 @@ You are a world-class Python data visualization expert specializing in Matplotli
|
|
306 |
Your primary task is to generate a single, insightful, and robust Python script to visualize the provided data. The visualization should directly answer the user's analytical goal.
|
307 |
|
308 |
**1. User's Analytical Goal:**
|
309 |
-
"{query_context}"
|
310 |
|
311 |
**2. Aggregated Data (from Solr Facets):**
|
312 |
```json
|
@@ -331,7 +332,7 @@ You MUST follow these rules meticulously to ensure the code runs without errors
|
|
331 |
1. **Imports:** You must import `matplotlib.pyplot as plt`, `seaborn as sns`, and `pandas as pd`.
|
332 |
2. **Use Pandas:** ALWAYS parse the `facet_data` into a pandas DataFrame. This is more robust and flexible than iterating through dictionaries directly.
|
333 |
3. **Figure and Axes:** Use `fig, ax = plt.subplots()` to create the figure and axes objects. This gives you better control.
|
334 |
-
4. **Styling:** Apply a clean and professional style, for example: `plt.style.use('seaborn-v0_8-whitegrid')` and use a suitable Seaborn palette (e.g., `palette='viridis'`)
|
335 |
5. **NO `plt.show()`:** Your code will be run on a server. **DO NOT** include `plt.show()`.
|
336 |
6. **Save the Figure:** The execution environment expects a Matplotlib figure object named `fig`. Your code does not need to handle the saving path directly, but it **MUST** produce the final `fig` object correctly. The calling function will handle saving it.
|
337 |
7. **Titles and Labels:** You MUST set a clear and descriptive title and labels for the x and y axes. The title should reflect the user's query.
|
@@ -401,4 +402,4 @@ plt.tight_layout()
|
|
401 |
|
402 |
Now, generate the raw Python code to create the best possible visualization for the user's goal based on the provided data.
|
403 |
Do not wrap the code in ```python ... ```.
|
404 |
-
"""
|
|
|
1 |
+
|
2 |
"""
|
3 |
Contains the prompt templates for interacting with the Gemini LLM.
|
4 |
|
|
|
33 |
# The search_fields are now pre-mapped, so we can use them directly
|
34 |
formatted_fields = "\n".join([f" - {field['field_name']}: {field['field_value']}" for field in search_fields])
|
35 |
dynamic_fields_prompt_section = f"""
|
36 |
+
---### MANDATORY DYNAMIC FILTERS
|
|
|
37 |
|
38 |
An external API has identified the following field-value pairs from the user query.
|
39 |
**You MUST use ALL of these fields and values to construct the `query_filter`.**
|
|
|
59 |
* For `group.sort`: If `analysis_measure` involves a function on a field (e.g., `sum(total_deal_value_in_million)`), you MUST use the full function: `group.sort: 'sum(total_deal_value_in_million) desc'`.
|
60 |
* If `analysis_measure` is 'count', you MUST OMIT the `group.sort` parameter entirely.
|
61 |
* For sorting, NEVER use 'date_year' directly for `sort` in `terms` facets; use 'index asc' or 'index desc' instead. For other sorts, use 'date'.
|
62 |
+
* **Quoting**: When a field value in the `query_filter` contains spaces (e.g., 'phase 3'), you MUST enclose it in double quotes (e.g., `highest_phase:("phase 3" OR "phase 2")`).
|
63 |
5. On **Qualitative Data** Group Operation:
|
64 |
* We need to show user **standout examples** for each category chosen.
|
65 |
For example: if user asks for "USA approved drugs last 5 years" We need to show user standout examples for each year. In this context: standout means the news with the biggest deals in million for each year for example.
|
|
|
118 |
"analysis_dimension": "company_name",
|
119 |
"analysis_measure": "sum(total_deal_value_in_million)",
|
120 |
"sort_field_for_examples": "total_deal_value_in_million",
|
121 |
+
"query_filter": "date:[\"2023-01-01T00:00:00Z\" TO \"2023-12-31T23:59:59Z\"]",
|
122 |
"quantitative_request": {{
|
123 |
"json.facet": {{
|
124 |
"companies_by_deal_value": {{
|
|
|
160 |
"analysis_dimension": "news_type",
|
161 |
"analysis_measure": "count",
|
162 |
"sort_field_for_examples": "date",
|
163 |
+
"query_filter": "therapeutic_category_s:infections AND date:[\"2025-01-01T00:00:00Z\" TO *]",
|
164 |
"quantitative_request": {{
|
165 |
"json.facet": {{
|
166 |
"news_by_type": {{
|
|
|
307 |
Your primary task is to generate a single, insightful, and robust Python script to visualize the provided data. The visualization should directly answer the user's analytical goal.
|
308 |
|
309 |
**1. User's Analytical Goal:**
|
310 |
+
\"{query_context}\"
|
311 |
|
312 |
**2. Aggregated Data (from Solr Facets):**
|
313 |
```json
|
|
|
332 |
1. **Imports:** You must import `matplotlib.pyplot as plt`, `seaborn as sns`, and `pandas as pd`.
|
333 |
2. **Use Pandas:** ALWAYS parse the `facet_data` into a pandas DataFrame. This is more robust and flexible than iterating through dictionaries directly.
|
334 |
3. **Figure and Axes:** Use `fig, ax = plt.subplots()` to create the figure and axes objects. This gives you better control.
|
335 |
+
4. **Styling:** Apply a clean and professional style, for example: `plt.style.use('seaborn-v0_8-whitegrid')` and use a suitable Seaborn palette (e.g., `palette='viridis'`)
|
336 |
5. **NO `plt.show()`:** Your code will be run on a server. **DO NOT** include `plt.show()`.
|
337 |
6. **Save the Figure:** The execution environment expects a Matplotlib figure object named `fig`. Your code does not need to handle the saving path directly, but it **MUST** produce the final `fig` object correctly. The calling function will handle saving it.
|
338 |
7. **Titles and Labels:** You MUST set a clear and descriptive title and labels for the x and y axes. The title should reflect the user's query.
|
|
|
402 |
|
403 |
Now, generate the raw Python code to create the best possible visualization for the user's goal based on the provided data.
|
404 |
Do not wrap the code in ```python ... ```.
|
405 |
+
"""
|