Commit
·
5e8a58c
1
Parent(s):
4fa74b5
Added RAG Evaluation Code
Browse files- README.md +34 -9
- app.py +1 -1
- eval.py → eval_app.py +2 -2
- requirements.txt +1 -0
- src/generation/generate_response.py +24 -2
- src/retrieval/retriever_chain.py +32 -2
- src/test/eval_custom_model.py +100 -0
- src/test/eval_rag.py +36 -2
- src/test/eval_results_20240325111437.csv +0 -86
README.md
CHANGED
@@ -14,16 +14,15 @@ license: mit
|
|
14 |
|
15 |
Welcome to OpenPages IntelliBot, your intelligent and efficient chatbot powered by the state-of-the-art Retrieval-Augmented Generation (RAG) technique and Large Language Model (LLM).
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
**Streamlit:** [](https://nk-openpages-intellibot.streamlit.app)
|
20 |
|
|
|
21 |
|
22 |
-
## What is
|
23 |
|
24 |
OpenPagesIntelliBot leverages cutting-edge AI technologies to provide you with instant and accurate responses about OpenPages, its features, solutions / modules it offers and its trigger framework. By combining the power of RAG and Zephyr LLM, OpenPagesIntelliBot ensures that you receive contextually relevant information.
|
25 |
|
26 |
-
## How RAG Works?
|
27 |
|
28 |

|
29 |
|
@@ -136,7 +135,18 @@ docker ps -a
|
|
136 |
docker logs -f llm-rag-op-chatbot
|
137 |
```
|
138 |
|
139 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
140 |
|
141 |
**Note:** Navigate to the chat interface UI in the browser and locate `Use via API` and click on it. A fly over opens on the right hand side. Capture the URL under the title named `API documentation`.
|
142 |
|
@@ -150,17 +160,23 @@ docker logs -f llm-rag-op-chatbot
|
|
150 |
* **Vector Database :** ChromaDB
|
151 |
* **Orchestration Framework :** LangChain
|
152 |
* **Embedding Model :** BAAI/bge-large-en-v1.5
|
153 |
-
* **Large Language Model :** huggingfaceh4/zephyr-7b-
|
154 |
* **UI Framework** : Streamlit & Gradio
|
155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
## Streamlit.io Deployment:
|
157 |
|
158 |
If you are encountering issues with `sqlite` version, then run the following steps:
|
159 |
|
160 |
* Add the following dependency to `requirements.txt`:
|
161 |
|
162 |
-
|
163 |
-
|
164 |
* Add the following block of code to `streamlit_app.py` at the beginning of the file:
|
165 |
|
166 |
```
|
@@ -172,6 +188,15 @@ sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
|
|
172 |
|
173 |
**Note:** If running locally for Streamlit UI interace and if you hit any errors with `pysqlite3`, try removing whatever that is mentioned above.
|
174 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
## Contact Me:
|
176 |
|
177 |
For any inquiries or feedback, please contact me at [nikhil.komakula@outlook.com](mailto:nikhil.komakula@outlook.com).
|
|
|
14 |
|
15 |
Welcome to OpenPages IntelliBot, your intelligent and efficient chatbot powered by the state-of-the-art Retrieval-Augmented Generation (RAG) technique and Large Language Model (LLM).
|
16 |
|
17 |
+
****Streamlit:** [](https://nk-openpages-intellibot.streamlit.app)**
|
|
|
|
|
18 |
|
19 |
+
**Gradio:** [](https://huggingface.co/spaces/nikhilkomakula/llm-rag-op-chatbot)
|
20 |
|
21 |
+
## What is OpenPages IntelliBot?
|
22 |
|
23 |
OpenPagesIntelliBot leverages cutting-edge AI technologies to provide you with instant and accurate responses about OpenPages, its features, solutions / modules it offers and its trigger framework. By combining the power of RAG and Zephyr LLM, OpenPagesIntelliBot ensures that you receive contextually relevant information.
|
24 |
|
25 |
+
## How Retrieval-Augmented Generation (RAG) Works?
|
26 |
|
27 |

|
28 |
|
|
|
135 |
docker logs -f llm-rag-op-chatbot
|
136 |
```
|
137 |
|
138 |
+
## RAG Evaluation:
|
139 |
+
|
140 |
+
Used [DeepEval](https://github.com/confident-ai/deepeval) open-source LLM evaluation framework for evaluating the performance of the RAG pipeline. Below metrics are used to evaluate its performance:
|
141 |
+
|
142 |
+
* **Answer Relevancy:** Measures the quality of your RAG pipeline's generator by evaluating how relevant the `actual_output` of your LLM application is compared to the provided `input`.
|
143 |
+
* **Faithfulness:** Measures the quality of your RAG pipeline's generator by evaluating whether the `actual_output` factually aligns with the contents of your `retrieval_context`.
|
144 |
+
* **Contextual Relevancy:** Measures the quality of your RAG pipeline's retriever by evaluating the overall relevance of the information presented in your `retrieval_context` for a given `input`.
|
145 |
+
* **Hallucination:** Determines whether your LLM generates factually correct information by comparing the `actual_output` to the provided `context`.
|
146 |
+
* **Bias:** Determines whether your LLM output contains gender, racial, or political bias.
|
147 |
+
* **Toxicity:** Evaluates toxicness in your LLM outputs.
|
148 |
+
|
149 |
+
## REST API (Gradio Only):
|
150 |
|
151 |
**Note:** Navigate to the chat interface UI in the browser and locate `Use via API` and click on it. A fly over opens on the right hand side. Capture the URL under the title named `API documentation`.
|
152 |
|
|
|
160 |
* **Vector Database :** ChromaDB
|
161 |
* **Orchestration Framework :** LangChain
|
162 |
* **Embedding Model :** BAAI/bge-large-en-v1.5
|
163 |
+
* **Large Language Model :** huggingfaceh4/zephyr-7b-alpha
|
164 |
* **UI Framework** : Streamlit & Gradio
|
165 |
|
166 |
+
## CI/CD:
|
167 |
+
|
168 |
+
* **Streamlit.io**
|
169 |
+
* Committing any changes to the git branch `deploy-to-streamlit` will deploy to streamlit.io.
|
170 |
+
* **Hugging Face Spaces**
|
171 |
+
* Committing any changes to the git branch `deploy-to-hf-spaces` will deploy to Hugging Face Spaces as a Docker space.
|
172 |
+
|
173 |
## Streamlit.io Deployment:
|
174 |
|
175 |
If you are encountering issues with `sqlite` version, then run the following steps:
|
176 |
|
177 |
* Add the following dependency to `requirements.txt`:
|
178 |
|
179 |
+
`pysqlite3-binary==0.5.2.post3`
|
|
|
180 |
* Add the following block of code to `streamlit_app.py` at the beginning of the file:
|
181 |
|
182 |
```
|
|
|
188 |
|
189 |
**Note:** If running locally for Streamlit UI interace and if you hit any errors with `pysqlite3`, try removing whatever that is mentioned above.
|
190 |
|
191 |
+
## Enhancements:
|
192 |
+
|
193 |
+
* Different advanced retrieval methods could be used.
|
194 |
+
* Context re-ranking can be implemented.
|
195 |
+
* Latest LLMs could be used for better performance.
|
196 |
+
* Could be converted to `conversational` AI chatbot.
|
197 |
+
* Utilize better PDF parsers and experiment with `chunk_size` and `chunk_overlap` properties.
|
198 |
+
* Fine-tuning the LLM on the proprietary dataset might improve the results.
|
199 |
+
|
200 |
## Contact Me:
|
201 |
|
202 |
For any inquiries or feedback, please contact me at [nikhil.komakula@outlook.com](mailto:nikhil.komakula@outlook.com).
|
app.py
CHANGED
@@ -10,7 +10,7 @@ def run_streamlit_interface():
|
|
10 |
subprocess.run(["streamlit", "run", "streamlit_app.py"])
|
11 |
|
12 |
def run_rag_evaluate():
|
13 |
-
subprocess.run(["python", "
|
14 |
|
15 |
# Main function to determine which interface to run
|
16 |
def main():
|
|
|
10 |
subprocess.run(["streamlit", "run", "streamlit_app.py"])
|
11 |
|
12 |
def run_rag_evaluate():
|
13 |
+
subprocess.run(["python", "eval_app.py"])
|
14 |
|
15 |
# Main function to determine which interface to run
|
16 |
def main():
|
eval.py → eval_app.py
RENAMED
@@ -3,7 +3,7 @@ from dotenv import find_dotenv, load_dotenv
|
|
3 |
|
4 |
# import functions
|
5 |
from src.test.eval_rag import evaluate_rag
|
6 |
-
from src.generation.generate_response import
|
7 |
|
8 |
def main():
|
9 |
|
@@ -12,7 +12,7 @@ def main():
|
|
12 |
load_dotenv(find_dotenv())
|
13 |
|
14 |
# get the qa chain
|
15 |
-
qa_chain =
|
16 |
|
17 |
# evaluate the qa chain
|
18 |
evaluate_rag("qa_chain", qa_chain)
|
|
|
3 |
|
4 |
# import functions
|
5 |
from src.test.eval_rag import evaluate_rag
|
6 |
+
from src.generation.generate_response import get_qa_chain_eval
|
7 |
|
8 |
def main():
|
9 |
|
|
|
12 |
load_dotenv(find_dotenv())
|
13 |
|
14 |
# get the qa chain
|
15 |
+
qa_chain = get_qa_chain_eval()
|
16 |
|
17 |
# evaluate the qa chain
|
18 |
evaluate_rag("qa_chain", qa_chain)
|
requirements.txt
CHANGED
@@ -9,3 +9,4 @@ tensorflow==2.16.1
|
|
9 |
gradio==4.21.0
|
10 |
pandas==2.2.1
|
11 |
streamlit==1.32.2
|
|
|
|
9 |
gradio==4.21.0
|
10 |
pandas==2.2.1
|
11 |
streamlit==1.32.2
|
12 |
+
deepeval==0.21.21
|
src/generation/generate_response.py
CHANGED
@@ -1,7 +1,9 @@
|
|
1 |
# import libraries
|
2 |
import time
|
3 |
from typing import Optional
|
4 |
-
|
|
|
|
|
5 |
|
6 |
# constants
|
7 |
HF_MODEL = "huggingfaceh4/zephyr-7b-alpha" # "mistralai/Mistral-7B-Instruct-v0.2" # "google/gemma-7b"
|
@@ -138,4 +140,24 @@ def has_global_variable():
|
|
138 |
if 'global_qa_chain' in globals():
|
139 |
return True
|
140 |
|
141 |
-
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# import libraries
|
2 |
import time
|
3 |
from typing import Optional
|
4 |
+
|
5 |
+
# import functions
|
6 |
+
from src.retrieval.retriever_chain import get_base_retriever, load_hf_llm, create_qa_chain, create_qa_chain_eval
|
7 |
|
8 |
# constants
|
9 |
HF_MODEL = "huggingfaceh4/zephyr-7b-alpha" # "mistralai/Mistral-7B-Instruct-v0.2" # "google/gemma-7b"
|
|
|
140 |
if 'global_qa_chain' in globals():
|
141 |
return True
|
142 |
|
143 |
+
return False
|
144 |
+
|
145 |
+
# get the qa chain for evaluation
|
146 |
+
def get_qa_chain_eval():
|
147 |
+
"""
|
148 |
+
Instantiates QA Chain for evaluation.
|
149 |
+
|
150 |
+
Returns:
|
151 |
+
Runnable: Returns an instance of QA Chain.
|
152 |
+
"""
|
153 |
+
|
154 |
+
# get retriever
|
155 |
+
retriever = get_base_retriever(embedding_model=EMBEDDING_MODEL, k=4, search_type="mmr")
|
156 |
+
|
157 |
+
# instantiate llm
|
158 |
+
llm = load_hf_llm(repo_id=HF_MODEL, max_new_tokens=512, temperature=0.4)
|
159 |
+
|
160 |
+
# instantiate qa chain
|
161 |
+
qa_chain = create_qa_chain_eval(retriever, llm)
|
162 |
+
|
163 |
+
return qa_chain
|
src/retrieval/retriever_chain.py
CHANGED
@@ -1,9 +1,9 @@
|
|
1 |
# import libraries
|
2 |
-
import os
|
3 |
from langchain_core.prompts import ChatPromptTemplate
|
4 |
from langchain_community.llms import HuggingFaceEndpoint
|
5 |
from langchain_core.runnables import RunnablePassthrough
|
6 |
from langchain_core.output_parsers import StrOutputParser
|
|
|
7 |
|
8 |
# import functions
|
9 |
from ..indexing.build_indexes import retrieve_indexes
|
@@ -90,7 +90,7 @@ def create_qa_chain(retriever, llm):
|
|
90 |
Returns:
|
91 |
Runnable: Returns qa chain.
|
92 |
"""
|
93 |
-
|
94 |
def format_docs(docs):
|
95 |
return "\n\n".join(doc.page_content for doc in docs)
|
96 |
|
@@ -101,3 +101,33 @@ def create_qa_chain(retriever, llm):
|
|
101 |
| StrOutputParser()
|
102 |
)
|
103 |
return qa_chain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# import libraries
|
|
|
2 |
from langchain_core.prompts import ChatPromptTemplate
|
3 |
from langchain_community.llms import HuggingFaceEndpoint
|
4 |
from langchain_core.runnables import RunnablePassthrough
|
5 |
from langchain_core.output_parsers import StrOutputParser
|
6 |
+
from langchain_core.runnables import RunnableParallel
|
7 |
|
8 |
# import functions
|
9 |
from ..indexing.build_indexes import retrieve_indexes
|
|
|
90 |
Returns:
|
91 |
Runnable: Returns qa chain.
|
92 |
"""
|
93 |
+
|
94 |
def format_docs(docs):
|
95 |
return "\n\n".join(doc.page_content for doc in docs)
|
96 |
|
|
|
101 |
| StrOutputParser()
|
102 |
)
|
103 |
return qa_chain
|
104 |
+
|
105 |
+
|
106 |
+
# define retrieval chain for evaluation
|
107 |
+
def create_qa_chain_eval(retriever, llm):
|
108 |
+
"""
|
109 |
+
Instantiates qa chain for evaluation.
|
110 |
+
|
111 |
+
Args:
|
112 |
+
retriever (VectorStoreRetriever): Vector store.
|
113 |
+
llm (HuggingFaceEndpoint): HuggingFace endpoint.
|
114 |
+
|
115 |
+
Returns:
|
116 |
+
Runnable: Returns qa chain.
|
117 |
+
"""
|
118 |
+
|
119 |
+
def format_docs(docs):
|
120 |
+
return "\n\n".join(doc.page_content for doc in docs)
|
121 |
+
|
122 |
+
rag_chain_from_docs = (
|
123 |
+
RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
|
124 |
+
| create_prompt_template()
|
125 |
+
| llm
|
126 |
+
| StrOutputParser()
|
127 |
+
)
|
128 |
+
|
129 |
+
rag_chain_with_source = RunnableParallel(
|
130 |
+
{"context": retriever, "query": RunnablePassthrough()}
|
131 |
+
).assign(result=rag_chain_from_docs)
|
132 |
+
|
133 |
+
return rag_chain_with_source
|
src/test/eval_custom_model.py
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# import functions
|
2 |
+
from deepeval.models.base_model import DeepEvalBaseLLM
|
3 |
+
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric, ContextualRelevancyMetric, HallucinationMetric, BiasMetric, ToxicityMetric
|
4 |
+
from deepeval.test_case import LLMTestCase
|
5 |
+
|
6 |
+
class LLM(DeepEvalBaseLLM):
|
7 |
+
def __init__(
|
8 |
+
self,
|
9 |
+
model,
|
10 |
+
model_name
|
11 |
+
):
|
12 |
+
self.model = model
|
13 |
+
self.model_name = model_name
|
14 |
+
|
15 |
+
def load_model(self):
|
16 |
+
return self.model
|
17 |
+
|
18 |
+
def generate(self, prompt: str) -> str:
|
19 |
+
model = self.load_model()
|
20 |
+
return model(prompt)
|
21 |
+
|
22 |
+
async def a_generate(self, prompt: str) -> str:
|
23 |
+
return self.generate(prompt)
|
24 |
+
|
25 |
+
def get_model_name(self):
|
26 |
+
return self.model_name
|
27 |
+
|
28 |
+
def eval_answer_relevancy_metric(llm: LLM, question: str, answer: str, context: list):
|
29 |
+
answer_relevancy_metric = AnswerRelevancyMetric(model=llm, threshold=0.5, include_reason=True)
|
30 |
+
test_case = LLMTestCase(
|
31 |
+
input=question,
|
32 |
+
actual_output=answer,
|
33 |
+
retrieval_context=context
|
34 |
+
)
|
35 |
+
|
36 |
+
answer_relevancy_metric.measure(test_case)
|
37 |
+
return answer_relevancy_metric.score
|
38 |
+
|
39 |
+
def eval_faithfulness_metric(llm: LLM, question: str, answer: str, context: list):
|
40 |
+
faithfulness_metric = FaithfulnessMetric(model=llm, threshold=0.5, include_reason=True)
|
41 |
+
test_case = LLMTestCase(
|
42 |
+
input=question,
|
43 |
+
actual_output=answer,
|
44 |
+
retrieval_context=context
|
45 |
+
)
|
46 |
+
|
47 |
+
faithfulness_metric.measure(test_case)
|
48 |
+
return faithfulness_metric.score
|
49 |
+
|
50 |
+
def eval_contextual_relevancy_metric(llm: LLM, question: str, answer: str, context: list):
|
51 |
+
contextual_relevancy_metric = ContextualRelevancyMetric(model=llm, threshold=0.5, include_reason=False)
|
52 |
+
test_case = LLMTestCase(
|
53 |
+
input=question,
|
54 |
+
actual_output=answer,
|
55 |
+
retrieval_context=context
|
56 |
+
)
|
57 |
+
|
58 |
+
contextual_relevancy_metric.measure(test_case)
|
59 |
+
return contextual_relevancy_metric.score
|
60 |
+
|
61 |
+
def eval_hallucination_metric(llm: LLM, question: str, answer: str, context: list):
|
62 |
+
hallucination_metric = HallucinationMetric(model=llm, threshold=0.5, include_reason=True)
|
63 |
+
test_case = LLMTestCase(
|
64 |
+
input=question,
|
65 |
+
actual_output=answer,
|
66 |
+
context=context
|
67 |
+
)
|
68 |
+
|
69 |
+
hallucination_metric.measure(test_case)
|
70 |
+
return hallucination_metric.score
|
71 |
+
|
72 |
+
def eval_bias_metric(llm: LLM, question: str, answer: str):
|
73 |
+
bias_metric = BiasMetric(model=llm, threshold=0.5, include_reason=True)
|
74 |
+
test_case = LLMTestCase(
|
75 |
+
input=question,
|
76 |
+
actual_output=answer
|
77 |
+
)
|
78 |
+
|
79 |
+
bias_metric.measure(test_case)
|
80 |
+
return bias_metric.score
|
81 |
+
|
82 |
+
def eval_toxicity_metric(llm: LLM, question: str, answer: str):
|
83 |
+
toxicity_metric = ToxicityMetric(model=llm, threshold=0.5, include_reason=True)
|
84 |
+
test_case = LLMTestCase(
|
85 |
+
input=question,
|
86 |
+
actual_output=answer
|
87 |
+
)
|
88 |
+
|
89 |
+
toxicity_metric.measure(test_case)
|
90 |
+
return toxicity_metric.score
|
91 |
+
|
92 |
+
def eval_rag_metrics(llm: LLM, question: str, answer: str, context: list) -> dict:
|
93 |
+
return {
|
94 |
+
"AnswerRelevancyMetric": eval_answer_relevancy_metric(llm, question, answer, context),
|
95 |
+
"FaithfulnessMetric": eval_faithfulness_metric(llm, question, answer, context),
|
96 |
+
"ContextualRelevancyMetric": eval_contextual_relevancy_metric(llm, question, answer, context),
|
97 |
+
# "HallucinationMetric": eval_hallucination_metric(llm, question, answer, context),
|
98 |
+
# "BiasMetric": eval_bias_metric(llm, question, answer),
|
99 |
+
# "ToxicityMetric": eval_toxicity_metric(llm, question, answer),
|
100 |
+
}
|
src/test/eval_rag.py
CHANGED
@@ -4,12 +4,31 @@ import time
|
|
4 |
import datetime
|
5 |
import pandas as pd
|
6 |
|
|
|
|
|
|
|
|
|
7 |
# constants
|
|
|
|
|
8 |
EVAL_FILE_PATH = "./src/test/eval_questions.txt"
|
9 |
EVAL_RESULTS_FILE_NAME = "eval_results_{0}.csv"
|
10 |
EVAL_RESULTS_PATH = "./src/test"
|
11 |
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
# load eval questions
|
14 |
def load_eval_questions():
|
15 |
"""
|
@@ -42,25 +61,40 @@ def evaluate_rag(chain_name, rag_chain):
|
|
42 |
columns = ["Chain", "Question", "Response", "Time"]
|
43 |
df = pd.DataFrame(columns=columns)
|
44 |
|
|
|
45 |
eval_questions = load_eval_questions()
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
for question in eval_questions:
|
48 |
|
49 |
start_time = time.time()
|
50 |
-
|
|
|
51 |
end_time = time.time()
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
row = {
|
54 |
"Chain": chain_name,
|
55 |
-
"Question":
|
56 |
"Response": answer,
|
57 |
"Time": "{:.2f}".format(round(end_time - start_time, 2)),
|
|
|
58 |
}
|
59 |
|
60 |
print("*" * 100)
|
61 |
print("Question:", question)
|
62 |
print("Answer:", answer)
|
63 |
print("Response Time:", "{:.2f}".format(round(end_time - start_time, 2)))
|
|
|
64 |
print("*" * 100)
|
65 |
|
66 |
df = pd.concat([df, pd.DataFrame.from_records([row])])
|
|
|
4 |
import datetime
|
5 |
import pandas as pd
|
6 |
|
7 |
+
# import functions
|
8 |
+
from src.test.eval_custom_model import LLM, eval_rag_metrics
|
9 |
+
from src.retrieval.retriever_chain import load_hf_llm
|
10 |
+
|
11 |
# constants
|
12 |
+
EVAL_LLM = "mistralai/Mistral-7B-v0.1" # "mistralai/Mistral-7B-Instruct-v0.2"
|
13 |
+
EVAL_LLM_NAME = "Mistral 7B"
|
14 |
EVAL_FILE_PATH = "./src/test/eval_questions.txt"
|
15 |
EVAL_RESULTS_FILE_NAME = "eval_results_{0}.csv"
|
16 |
EVAL_RESULTS_PATH = "./src/test"
|
17 |
|
18 |
|
19 |
+
# format context documents as list
|
20 |
+
def format_docs_as_list(docs):
|
21 |
+
"""
|
22 |
+
Converts context documents as list
|
23 |
+
|
24 |
+
Args:
|
25 |
+
docs (list): List of Document objects
|
26 |
+
|
27 |
+
Returns:
|
28 |
+
list: Returns list of documents.
|
29 |
+
"""
|
30 |
+
return [doc.page_content for doc in docs]
|
31 |
+
|
32 |
# load eval questions
|
33 |
def load_eval_questions():
|
34 |
"""
|
|
|
61 |
columns = ["Chain", "Question", "Response", "Time"]
|
62 |
df = pd.DataFrame(columns=columns)
|
63 |
|
64 |
+
# load evaluation questions
|
65 |
eval_questions = load_eval_questions()
|
66 |
+
|
67 |
+
# instantiate hf llm
|
68 |
+
hf_eval_llm = load_hf_llm(repo_id=EVAL_LLM, max_new_tokens=512, temperature=0.4)
|
69 |
+
|
70 |
+
# instantiate deepeval llm
|
71 |
+
eval_custom_model = LLM(model_name=EVAL_LLM_NAME, model=hf_eval_llm)
|
72 |
|
73 |
for question in eval_questions:
|
74 |
|
75 |
start_time = time.time()
|
76 |
+
response = rag_chain.invoke(question)
|
77 |
+
print("Response", response)
|
78 |
end_time = time.time()
|
79 |
+
|
80 |
+
query = response['query']
|
81 |
+
answer = response['result']
|
82 |
+
context = format_docs_as_list(response['context'])
|
83 |
+
metrics = eval_rag_metrics(eval_custom_model, question, answer, context)
|
84 |
|
85 |
row = {
|
86 |
"Chain": chain_name,
|
87 |
+
"Question": query,
|
88 |
"Response": answer,
|
89 |
"Time": "{:.2f}".format(round(end_time - start_time, 2)),
|
90 |
+
"Metrics": metrics
|
91 |
}
|
92 |
|
93 |
print("*" * 100)
|
94 |
print("Question:", question)
|
95 |
print("Answer:", answer)
|
96 |
print("Response Time:", "{:.2f}".format(round(end_time - start_time, 2)))
|
97 |
+
print("Metrics:", metrics)
|
98 |
print("*" * 100)
|
99 |
|
100 |
df = pd.concat([df, pd.DataFrame.from_records([row])])
|
src/test/eval_results_20240325111437.csv
DELETED
@@ -1,86 +0,0 @@
|
|
1 |
-
Chain,Question,Response,Time
|
2 |
-
qa_chain,What is FastMap?," FastMap is a productivity tool that integrates with IBM OpenPages with Watson's export feature and automates the process of importing and batch processing object data into OpenPages with Watson. It uses a data load template, typically in Microsoft Excel format, to capture data for import. When importing data into OpenPages with Watson, FastMap validates the data and, if no errors are found, populates the repository with the new or updated records. It supports the import of various types of objects, but not File and Signature objects or the Comment field's system field. Some general rules apply when entering object data into a FastMap data load template, such as associating child objects using specific columns and not importing certain fields like read-only fields or those not in the Admin view for the object. Additionally, FastMap import does not evaluate field dependency rules, allowing users to stage data requiring them to enter required data during subsequent updates. Overall, FastMap helps streamline the process of importing large amounts of data into OpenPages with Watson, reducing manual input and increasing efficiency.",4.01
|
3 |
-
qa_chain,What is a Role Template?," A Role Template in IBM OpenPages with Watson solutions is a predefined set of permissions and access controls that determine the level of access and functionality granted to a user in the system. It includes application permissions, feature access, and Object Access Controls (OACs) for specific object types. Role Templates help simplify the process of managing user access and ensure consistent security policies across the organization. By assigning a Role Template to a user, you can quickly and easily grant them the required permissions and access levels for their job function.",1.85
|
4 |
-
qa_chain,What is the purpose of Object Reset?," The purpose of Object Reset in IBM OpenPages with Watson is to allow users to efficiently update multiple objects at once based on predefined rules. This feature helps to simplify repetitive data entry tasks and reduce errors associated with manually updating large numbers of objects. It also provides flexibility in terms of what objects are affected by the reset, as users can choose specific business entities and define rulesets to control the behavior of the reset. Overall, Object Reset helps to streamline data management processes and improve efficiency in IBM OpenPages with Watson.",2.56
|
5 |
-
qa_chain,What is the purpose of Reporting Periods?," Reporting periods in IBM OpenPages with Watson serve as ""snapshots"" of the current state of the repository during specific points in time, typically at the end of a quarter or year. These periods allow organizations to archive and report on historical data while maintaining the ability to view and analyze current data. Reporting periods facilitate compliance and regulatory requirements, provide insights into trends and patterns, and enable comparisons between different periods. Additionally, they can be used to automate object resets, which modify object properties based on predefined rulesets. Overall, reporting periods help organizations manage and analyze their data more efficiently and effectively.",2.05
|
6 |
-
qa_chain,List the system variables used in Expressions.,"1. $APPLICATION_URL$: A URL for OpenPages.
|
7 |
-
2. $COGNOS_URL$: A URL for IBM Cognos Analytics.
|
8 |
-
3. $ApplicationText/application text key$): Application text content.
|
9 |
-
4. $TASK_VIEW_URL$): A URL to an object task view.
|
10 |
-
5. $System Fields:Task View URL$): Deprecated. Use [$TASK_VIEW_URL$].
|
11 |
-
6. $Setting/OpenPages/...$): A registry setting value.
|
12 |
-
7. $END_USER$): The user name of the signed on user.
|
13 |
-
8. $TODAY$): Today's date.
|
14 |
-
9. $DaysFromNow/field group:field name$): A day count from today to a given date.
|
15 |
-
10. ${asset.id}$: Inserts the identifier of the underlying asset.
|
16 |
-
11. ${asset.name}$: Inserts the asset name of the underlying asset.
|
17 |
-
12. ${asset.description}$: Inserts the asset description of the underlying asset.",2.46
|
18 |
-
qa_chain,Provide the steps to configure Watson Assistant in OpenPages?," Unfortunately, I don't have access to your specific environment or installation details. However, here are the general steps to configure Watson Assistant in OpenPages:
|
19 |
-
|
20 |
-
1. Ensure that you have a Watson Assistant service instance created and configured with the necessary intents and entities.
|
21 |
-
|
22 |
-
2. In OpenPages, go to the Administration Console > Integrations > Watson Assistant.
|
23 |
-
|
24 |
-
3. Click on the ""Add"" button to create a new Watson Assistant integration.
|
25 |
-
|
26 |
-
4. Enter a name and description for the integration and select the appropriate Watson Assistant service instance from the list.
|
27 |
-
|
28 |
-
5. Configure the authentication settings by providing the API key and URL of your Watson Assistant service.
|
29 |
-
|
30 |
-
6. Select the appropriate language and version of Watson Assistant.
|
31 |
-
|
32 |
-
7. Map the Watson Assistant intents and entities to the corresponding OpenPages business objects and fields.
|
33 |
-
|
34 |
-
8. Save the configuration and test the integration by running a sample request through the Watson Assistant REST API.
|
35 |
-
|
36 |
-
9. Verify that the response is correctly mapped to the appropriate OpenPages business object and field values.
|
37 |
-
|
38 |
-
10. Deploy the integration to make it available to users in the OpenPages application.
|
39 |
-
|
40 |
-
Note: These steps may vary depending on your specific implementation and version of OpenPages and Watson Assistant. It's recommended to refer to the official documentation for detailed instructions and best practices.",1.94
|
41 |
-
qa_chain,What is the difference between PRE and POST position in Triggers?,"
|
42 |
-
In Triggers, the position ""PRE"" refers to events that occur prior to the actual execution of an operation by the system. These triggers allow for additional processing of business logic before the operation is carried out. For example, during the creation of a GRC object, a PRE trigger has access to all the necessary information about the object to be created, but the system has not yet taken action to create the object and persist its values. PRE triggers are mandatory for deletions, associations, and disassociations.
|
43 |
-
|
44 |
-
On the other hand, the position ""POST"" refers to events that occur after an operation has been performed by the system but before the transaction is committed. These triggers provide further processing of additional business logic after the operation has been completed. POST triggers are mandatory for creating and updating operations.
|
45 |
-
|
46 |
-
In summary, PRE triggers execute before the operation takes place, while POST triggers execute after the operation has been completed.",2.56
|
47 |
-
qa_chain,What are the features of Operational Risk Management in OpenPages?,"1. Loss Events: This feature allows organizations to track, assess, and manage both internal and external events that could lead to operational losses. Multiple impact events and recoveries associated with operational losses can also be managed through this feature.
|
48 |
-
|
49 |
-
2. Risk and Control Self Assessments (RCSA): This feature helps in identifying, measuring, and mitigating risks, as well as testing and documenting internal controls.
|
50 |
-
|
51 |
-
3. Key Risk Indicators (KRIs) and Key Performance Indicators (KPIs): These features enable tracking of performance metrics that may indicate the presence or state of a risk condition or trend.
|
52 |
-
|
53 |
-
4. Scenario Analysis: This feature is used to identify and measure specific types of risks, particularly low-frequency, high-severity events.
|
54 |
-
|
55 |
-
5. External Loss Events: This feature allows for the import of loss data from various sources such as IBM FIRST Risk Case Studies, ORX, and ORIC loss databases for scenario analysis, benchmarking, and report generation.
|
56 |
-
|
57 |
-
6. Issue Management and Remediation (IMR): This feature includes issue creation and assignment, action creation and assignment, remediation performance, issue closure, and reporting.
|
58 |
-
|
59 |
-
7. Reporting, Monitoring, and Analytics: This feature provides reporting, monitoring, and analytics capabilities for operational risk management.
|
60 |
-
|
61 |
-
Note: The above features are part of IBM OpenPages Operational Risk Management. Other related features like Regulatory Compliance Management, Third Party Risk Management, and IT Governance with RiskLens are also available in IBM OpenPages.",2.56
|
62 |
-
qa_chain,What are the different permissions that can be delegated to a user group administrator?," There are six security management permissions that can be delegated to a user group administrator: Manage, Lock, Unlock, Reset Password, Assign Role, and Browse. These permissions allow the administrator to perform various user-provisioning functions such as creating, modifying, and associating users and groups, locking and unlocking user accounts, resetting passwords, assigning roles to users and groups, and browsing users and groups within their respective groups. The specific permissions required for each user-provisioning function are listed in Table 38 provided in the text material. It's important to note that these permissions should be granted carefully and only to trusted individuals as they have significant impact on the overall security and functionality of the system.",8.57
|
63 |
-
qa_chain,What are the different access controls available for non-participants for a standard stage within a workflow?," Table 142 in the text provides the different access controls available for non-participants for a standard stage within a workflow:
|
64 |
-
|
65 |
-
Access control for the stage
|
66 |
-
Can view the object when it’s at this stage
|
67 |
-
Can edit the object when it’s at this stage
|
68 |
-
Can see the Actions button in views
|
69 |
-
Strict
|
70 |
-
No
|
71 |
-
No
|
72 |
-
No
|
73 |
-
Read
|
74 |
-
Yes
|
75 |
-
No
|
76 |
-
No
|
77 |
-
Open
|
78 |
-
Depends on standard access controls
|
79 |
-
Depends on standard access controls
|
80 |
-
No
|
81 |
-
No Override
|
82 |
-
Depends on standard access controls
|
83 |
-
Depends on standard access controls
|
84 |
-
Yes
|
85 |
-
|
86 |
-
Note: For workflow participants (assignees, oversight users, and subscribers), the standard access controls apply, and they only know their part of the process. Non-participants may require additional access controls based on their roles and responsibilities within the organization. The ""Override"" option allows you to define whether to override these standard access controls for the workflow stage for non-participants.",41.15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|