Spaces:

nikhilkomakula
/

nk-openpages-intellibot

Running

App Files Files Community

nikhilkomakula commited on Mar 14, 2024

Commit

0a6f6d8

1 Parent(s): d11d425

Base RAG

Browse files

Files changed (27) hide show

.dockerignore +2 -0
.gitignore +95 -0
Dockerfile +13 -0
README.md +122 -2
app.py +23 -0
dotenv +1 -0
images/RAG_workflow.png +0 -0
notebooks/.gitkeep +0 -0
notebooks/hf_llm_rag_1.ipynb +0 -0
requirements.txt +10 -0
src/__init__.py +0 -0
src/data/.gitkeep +0 -0
src/data/__init__.py +0 -0
src/data/load_dataset.py +37 -0
src/generation/__init__.py +0 -0
src/generation/generate_response.py +51 -0
src/indexing/.gitkeep +0 -0
src/indexing/__init__.py +0 -0
src/indexing/build_indexes.py +153 -0
src/retrieval/.gitkeep +0 -0
src/retrieval/__init__.py +0 -0
src/retrieval/retriever_chain.py +99 -0
src/test/__init__.py +0 -0
src/test/eval_questions.txt +10 -0
src/test/eval_rag.py +66 -0
src/ui/__init__.py +0 -0
src/ui/chat_interface.py +33 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # python virtual environment
2	+ .lroc-venv

.gitignore ADDED Viewed

	@@ -0,0 +1,95 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# DotEnv configuration
+.env
+# Database
+*.db
+*.rdb
+# Pycharm
+.idea
+# VS Code
+.vscode/
+# Spyder
+.spyproject/
+# Jupyter NB Checkpoints
+.ipynb_checkpoints/
+# exclude data from source control by default
+# /data/
+# Mac OS-specific storage files
+.DS_Store
+# vim
+*.swp
+*.swo
+# Mypy cache
+.mypy_cache/
+# Python virtual environment
+.lroc-venv/
+# references
+/references/

Dockerfile ADDED Viewed

	@@ -0,0 +1,13 @@

+FROM python:3.11.5-slim
+WORKDIR /app
+COPY app.py requirements.txt /app/
+COPY src /app/src
+COPY indexes /app/indexes
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# Use ENTRYPOINT to specify the command to run when the container starts
+ENTRYPOINT ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,2 +1,122 @@
-# llm-rag-op-chatbot
-OpenPages Chatbot to answer general questions about OpenPages features, solutions and triggers.

+i
+# OpenPages IntelliBot
+Welcome to OpenPages IntelliBot, your intelligent and efficient chatbot powered by the state-of-the-art Retrieval-Augmented Generation (RAG) technique and Large Language Model (LLM).
+## What is OpenPagesIntelliBot?
+OpenPagesIntelliBot leverages cutting-edge AI technologies to provide you with instant and accurate responses about OpenPages, its features, solutions / modules it offers and its trigger framework. By combining the power of RAG and Zephyr LLM, OpenPagesIntelliBot ensures that you receive contextually relevant information.
+## How RAG Works?
+![RAG Diagram](images/RAG_workflow.png)
+[Image Credit](https://huggingface.co/learn/cookbook/en/rag_evaluation)
+#### Step 1: Data Collection
+Gather all the data that is needed for your application. In the case of OpenPages IntelliBot, this include administrators guide, solutions or modules offerings, users guide and trigger developer guide.
+#### Step 2: Data Chunking
+Data chunking is the process of breaking your data down into smaller, more manageable pieces. For instance, if you have a lengthy 100-page user manual, you might break it down into different sections, each potentially answering different customer questions.
+This way, each chunk of data is focused on a specific topic. When a piece of information is retrieved from the source dataset, it is more likely to be directly applicable to the user’s query, since we avoid including irrelevant information from entire documents.
+This also improves efficiency, since the system can quickly obtain the most relevant pieces of information instead of processing entire documents.
+#### Step 3: Document Embeddings
+Now that the source data has been broken down into smaller parts, it needs to be converted into a vector representation. This involves transforming text data into embeddings, which are numeric representations that capture the semantic meaning behind text.
+In simple words, document embeddings allow the system to understand user queries and match them with relevant information in the source dataset based on the meaning of the text, instead of a simple word-to-word comparison. This method ensures that the responses are relevant and aligned with the user’s query.
+#### Step 4: Data Retrieval
+When a user query enters the system, it must also be converted into an embedding or vector representation. The same model must be used for both the document and query embedding to ensure uniformity between the two.
+Once the query is converted into an embedding, the system compares the query embedding with the document embeddings. It identifies and retrieves chunks whose embeddings are most similar to the query embedding, using measures such as cosine similarity.
+These chunks are considered to be the most relevant to the user’s query.
+#### Step 5: Response Generation
+The retrieved text chunks, along with the initial user query, are fed into a language model. The algorithm will use this information to generate a coherent response to the user’s questions through a chat interface.
+## Sources used for OpenPages IntelliBot:
+OpenPages IntelliBot can answer questions related to:
+- OpenPages Administration
+- OpenPages Solutions or Modules
+- OpenPages Trigger Development
+## How to Use OpenPages IntelliBot?
+1. Simply type your query or question into the chat interface.
+2. OpenPages IntelliBot will process your query using the RAG model and provide you with a contextually relevant response.
+## Get Started to Run Locally:
+**Step 1:** Download the Git repository
+**Step 2:** Install dependencies
+```python
+python install -r requirements.txt
+```
+**Step 3:** Rename `dotenv` file to `.env` and set `HUGGINGFACEHUB_API_TOKEN` with your API token.
+**Step 4:** Run the application
+```python
+python app.py
+```
+## Build and Run Container Locally:
+**Step 1:** Build image (replace `<docker_id>` with your Docker ID)
+```python
+docker build --tag <docker_id>/llm-rag-op-chatbot .
+```
+**Step 2:** Run container (replace `<docker_id>` with your Docker ID and `api_token` with your Hugging Face API Token)
+```python
+docker run -it -d --name llm-rag-op-chatbot -p 5555:5555 -e HUGGINGFACEHUB_API_TOKEN=<api_token> <docker_id>/llm-rag-op-chatbot:latest
+```
+**Note 1:** List all containers
+```python
+docker ps -a
+```
+**Note 2:** Review the logs
+```python
+docker logs -f llm-rag-op-chatbot
+```
+## Technologies Used:
+* **PDF Parser :** PyMuPDFLoader
+* **Vector Database :** ChromaDB
+* **Orchestration Framework :** LangChain
+* **Embedding Model :** BAAI/bge-large-en-v1.5
+* **Large Language Model :** huggingfaceh4/zephyr-7b-beta
+## Contact Me:
+For any inquiries or feedback, please contact me at [nikhil.komakula@outlook.com](mailto:nikhil.komakula@outlook.com).
+## License:
+This project is licensed under the [MIT License](https://opensource.org/licenses/MIT) - see the [LICENSE](LICENSE) file for details.
+---
+**Note:** OpenPages IntelliBot is for demonstration purposes only and may not provide accurate information in all scenarios. Always verify critical information from reliable sources.

app.py ADDED Viewed

	@@ -0,0 +1,23 @@

+# import libraries
+import sys
+from dotenv import find_dotenv, load_dotenv
+# import functions
+from src.test.eval_rag import evaluate_rag
+from src.ui.chat_interface import create_chatinterface
+from src.generation.generate_response import get_qa_chain, set_global_qa_chain, generate_response
+# find .env automatically by walking up directories until it's found, then
+# load up the .env entries as environment variables
+load_dotenv(find_dotenv())
+if __name__ == "__main__":
+    # get the qa chain
+    qa_chain = get_qa_chain()
+    if len(sys.argv) > 1:
+        evaluate_rag("qa_chain", qa_chain)
+    else:
+        set_global_qa_chain(qa_chain)
+        create_chatinterface(generate_response).launch(server_name="0.0.0.0", server_port=5555)

dotenv ADDED Viewed

	@@ -0,0 +1 @@


1	+ HUGGINGFACEHUB_API_TOKEN="YOUR API KEY GOES HERE"

images/RAG_workflow.png ADDED Viewed

notebooks/.gitkeep ADDED Viewed

File without changes

notebooks/hf_llm_rag_1.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+# external requirements
+python-dotenv==1.0.1
+chromadb==0.4.24
+langchain==0.1.11
+langchain-community==0.0.27
+pymupdf==1.23.26
+sentence-transformers==2.5.1
+tensorflow==2.16.1
+gradio==4.21.0
+pandas==2.2.1

src/__init__.py ADDED Viewed

File without changes

src/data/.gitkeep ADDED Viewed

File without changes

src/data/__init__.py ADDED Viewed

File without changes

src/data/load_dataset.py ADDED Viewed

	@@ -0,0 +1,37 @@

+# import libraries
+import os
+from langchain_community.document_loaders import PyMuPDFLoader
+# constants
+DATA_DIR = "../../data/"
+# load data
+def load_documents():
+    """
+    Loads documents into memory.
+    Raises:
+        e: Any exception while loading the documents.
+    Returns:
+        list: An array of documents.
+    """
+    documents = []
+    try:
+        for root, _, files in os.walk(DATA_DIR):
+            for file in files:
+                if file.endswith(".pdf"):
+                    print(f"Reading File: {file}")
+                    # read PDF
+                    loader = PyMuPDFLoader(os.path.join(root, file))
+                    document = loader.load()
+                    # append to docs
+                    documents += document
+    except Exception as e:
+        print("Error while loading the data!", e)
+        raise e
+    return documents

src/generation/__init__.py ADDED Viewed

File without changes

src/generation/generate_response.py ADDED Viewed

	@@ -0,0 +1,51 @@

+# import libraries
+from src.retrieval.retriever_chain import get_base_retriever, load_hf_llm, create_qa_chain
+# constants
+HF_MODEL = "huggingfaceh4/zephyr-7b-beta"  # "mistralai/Mistral-7B-Instruct-v0.2" # "google/gemma-7b"
+# get the qa chain
+def get_qa_chain():
+    """
+    Instantiates QA Chain.
+    Returns:
+        Runnable: Returns an instance of QA Chain.
+    """
+    # get retriever
+    retriever = get_base_retriever(k=4, search_type="mmr")
+    # instantiate llm
+    llm = load_hf_llm(repo_id=HF_MODEL, max_new_tokens=512, temperature=0.4)
+    # instantiate qa chain
+    qa_chain = create_qa_chain(retriever, llm)
+    return qa_chain
+def set_global_qa_chain(local_qa_chain):
+    global global_qa_chain
+    global_qa_chain = local_qa_chain
+# function to generate response
+def generate_response(message, history):
+    """
+    Generates response based on the question being asked.
+    Args:
+        message (str): Question asked by the user.
+        history (dict): Chat history. NOT USED FOR NOW.
+    Returns:
+        str: Returns the generated response.
+    """
+    # invoke chain
+    response = global_qa_chain.invoke(message)
+    print(response)
+    return response

src/indexing/.gitkeep ADDED Viewed

File without changes

src/indexing/__init__.py ADDED Viewed

File without changes

src/indexing/build_indexes.py ADDED Viewed

	@@ -0,0 +1,153 @@

+# import libraries
+import os
+from typing import List, Optional
+from transformers import AutoTokenizer
+from langchain_community.vectorstores import Chroma
+from sentence_transformers import SentenceTransformer
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.embeddings import HuggingFaceBgeEmbeddings
+from langchain.docstore.document import Document
+# import functions
+from ..data.load_dataset import load_documents
+# constants
+INDEX_DIR = "indexes/"
+EMBEDDING_MODEL = "BAAI/bge-large-en-v1.5"
+# instantiate embedding model
+def load_embedding_model():
+    """
+    Load the embedding model.
+    Returns:
+        HuggingFaceBgeEmbeddings: Returns the embedding model.
+    """
+    # check if GPU is available
+    import tensorflow as tf
+    device = "cuda" if tf.test.gpu_device_name() else "cpu"
+    print("device:", device)
+    hf_bge_embeddings = HuggingFaceBgeEmbeddings(
+        model_name=EMBEDDING_MODEL,
+        model_kwargs={"device": device},
+        encode_kwargs={
+            "normalize_embeddings": True
+        },  # set True to compute cosine similarity
+    )
+    # To get the value of the max sequence_length, we will query the underlying `SentenceTransformer` object used in the RecursiveCharacterTextSplitter.
+    print(
+        f"Model's maximum sequence length: {SentenceTransformer(EMBEDDING_MODEL).max_seq_length}"
+    )
+    return hf_bge_embeddings
+# split documents
+def chunk_documents(
+    chunk_size: int,
+    knowledge_base: List[Document],
+    tokenizer_name: Optional[str] = EMBEDDING_MODEL,
+) -> List[Document]:
+    """
+    Split documents into chunks of maximum size `chunk_size` tokens and return a list of documents.
+    Args:
+        chunk_size (int): Chunk size.
+        knowledge_base (List[Document]): Loaded documents.
+        tokenizer_name (Optional[str], optional): Embedding Model name. Defaults to EMBEDDING_MODEL.
+    Returns:
+        List[Document]: Returns chunked documents.
+    """
+    text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
+        AutoTokenizer.from_pretrained(tokenizer_name),
+        chunk_size=chunk_size,
+        chunk_overlap=int(chunk_size / 10),
+        add_start_index=True,
+        strip_whitespace=True,
+        separators=["\n\n", "\n", ".", ""],
+    )
+    docs_processed = []
+    for doc in knowledge_base:
+        docs_processed += text_splitter.split_documents([doc])
+    # Remove duplicates
+    unique_texts = {}
+    docs_processed_unique = []
+    for doc in docs_processed:
+        if doc.page_content not in unique_texts:
+            unique_texts[doc.page_content] = True
+            docs_processed_unique.append(doc)
+    return docs_processed_unique
+# generate indexes
+def generate_indexes():
+    """
+    Generates indexes.
+    Returns:
+        ChromaCollection: Returns vector store.
+    """
+    # load documents
+    documents = load_documents()
+    # chunk documents to honor the context length
+    chunked_documents = chunk_documents(
+        SentenceTransformer(
+            EMBEDDING_MODEL
+        ).max_seq_length,  # We choose a chunk size adapted to our model
+        documents,
+        tokenizer_name=EMBEDDING_MODEL,
+    )
+    # save indexes to disk
+    vector_store = Chroma.from_documents(
+        documents=chunked_documents,
+        embedding=load_embedding_model(),
+        collection_metadata={"hnsw:space": "cosine"},
+        persist_directory=INDEX_DIR,
+    )
+    return vector_store
+# load indexes from disk
+def load_indexes():
+    """
+    Loads indexes into memory.
+    Returns:
+        ChromaCollection: Returns vector store.
+    """
+    vector_store = Chroma(
+        persist_directory=INDEX_DIR, embedding_function=load_embedding_model()
+    )
+    return vector_store
+# retrieve vector store
+def retrieve_indexes():
+    """
+    Retrieves indexes.
+    Returns:
+        ChromaCollection: Returns vector store.
+    """
+    if [f for f in os.listdir(INDEX_DIR) if not f.startswith(".")] == []:
+        print("Generating indexes...")
+        return generate_indexes()
+    else:
+        print("Loading existing indexes!")
+        return load_indexes()

src/retrieval/.gitkeep ADDED Viewed

File without changes

src/retrieval/__init__.py ADDED Viewed

File without changes

src/retrieval/retriever_chain.py ADDED Viewed

	@@ -0,0 +1,99 @@

+# import libraries
+import os
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_community.llms import HuggingFaceEndpoint
+from langchain_core.runnables import RunnablePassthrough
+from langchain_core.output_parsers import StrOutputParser
+# import functions
+from ..indexing.build_indexes import retrieve_indexes
+# instantiate base retriever
+def get_base_retriever(k=4, search_type="mmr"):
+    """
+    Instantiates base retriever.
+    Args:
+        k (int, optional): Top k results to retrieve. Defaults to 4.
+        search_type (str, optional): Search type (mmr or similarity). Defaults to 'mmr'.
+    Returns:
+        VectorStoreRetriever: Returns base retriever.
+    """
+    # get the vector store of indexes
+    vector_store = retrieve_indexes()
+    base_retriever = vector_store.as_retriever(
+        search_type=search_type, search_kwargs={"k": k}
+    )
+    return base_retriever
+# define prompt template
+def create_prompt_template():
+    """
+    Creates prompt template.
+    Returns:
+        PromptTemplate: Returns prompt template.
+    """
+    prompt_template = """
+        <|system|>
+        You are an AI assistant for question-answering tasks. Use the provided context to answer the question. If you don't know the answer, just say that you don't know. The generated answer should be relevant to the question being asked, short and concise. Do not be creative and do not make up the answer.</s>
+        {context}</s>
+        <|user|>
+        {query}</s>
+        <|assistant|>
+    """
+    chat_prompt_template = ChatPromptTemplate.from_template(prompt_template)
+    return chat_prompt_template
+# define llm
+def load_hf_llm(repo_id, max_new_tokens=512, temperature=0.2):
+    """
+    Loads Hugging Face Endpoint for inference.
+    Args:
+        repo_id (str): HuggingFace Model Repo ID.
+        max_new_tokens (int, optional): Maximum number of new tokens to generate. Defaults to 512.
+        temperature (float, optional): Temperature setting. Defaults to 0.2.
+    Returns:
+        HuggingFaceEndpoint: Returns HuggingFace Endpoint.
+    """
+    hf_llm = HuggingFaceEndpoint(
+        repo_id=repo_id,
+        max_new_tokens=max_new_tokens,
+        temperature=temperature,
+        do_sample=True,
+        repetition_penalty=1.1,
+        return_full_text=False,
+    )
+    return hf_llm
+# define retrieval chain
+def create_qa_chain(retriever, llm):
+    """
+    Instantiates qa chain.
+    Args:
+        retriever (VectorStoreRetriever): Vector store.
+        llm (HuggingFaceEndpoint): HuggingFace endpoint.
+    Returns:
+        Runnable: Returns qa chain.
+    """
+    qa_chain = (
+        {"context": retriever, "query": RunnablePassthrough()}
+        | create_prompt_template()
+        | llm
+        | StrOutputParser()
+    )
+    return qa_chain

src/test/__init__.py ADDED Viewed

File without changes

src/test/eval_questions.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+What is FastMap?
+What is a Role Template?
+What is the purpose of Object Reset?
+What is the purpose of Reporting Periods?
+List the system variables used in Expressions.
+Provide the steps to configure Watson Assistant in OpenPages?
+What is the difference between PRE and POST position in Triggers?
+What are the features of Operational Risk Management in OpenPages?
+What are the different permissions that can be delegated to a user group administrator?
+What are the different access controls available for non-participants for a standard stage within a workflow?

src/test/eval_rag.py ADDED Viewed

	@@ -0,0 +1,66 @@

+# import libraries
+import os
+import time
+import datetime
+import pandas as pd
+# constants
+EVAL_FILE_PATH = "./src/test/eval_questions.txt"
+EVAL_RESULTS_FILE_NAME = "eval_results_{0}.csv"
+EVAL_RESULTS_PATH = "./src/test"
+# load eval questions
+def load_eval_questions():
+    """
+    Loads eval questions into memory.
+    Returns:
+        list: Returns list of questions.
+    """
+    eval_questions = []
+    with open(EVAL_FILE_PATH, "r") as file:
+        for line in file:
+            # Remove newline character and convert to integer
+            item = line.strip()
+            eval_questions.append(item)
+    return eval_questions
+# evaluate rag chain
+def evaluate_rag(chain_name, rag_chain):
+    """
+    Evaluates the rag pipeline based on eval questions.
+    Args:
+        chain_name (str): QA Chain name.
+        rag_chain (Runnable): QA Chain instance.
+    """
+    columns = ["Chain", "Question", "Response", "Time"]
+    df = pd.DataFrame(columns=columns)
+    eval_questions = load_eval_questions()
+    for question in eval_questions:
+        start_time = time.time()
+        answer = rag_chain.invoke(question)
+        end_time = time.time()
+        row = {
+            "Chain": chain_name,
+            "Question": question,
+            "Response": answer,
+            "Time": "{:.2f}".format(round(end_time - start_time, 2)),
+        }
+        df = pd.concat([df, pd.DataFrame.from_records([row])])
+    CSV = EVAL_RESULTS_FILE_NAME.format(
+        datetime.datetime.now().strftime("%Y%m%d%H%M%S")
+    )
+    print(os.path.join(EVAL_RESULTS_PATH, CSV))
+    df.to_csv(os.path.join(EVAL_RESULTS_PATH, CSV), index=False)

src/ui/__init__.py ADDED Viewed

File without changes

src/ui/chat_interface.py ADDED Viewed

	@@ -0,0 +1,33 @@

+# import libraries
+import gradio as gr
+# import functions
+from src.test.eval_rag import load_eval_questions
+# create chatbot interface
+def create_chatinterface(generate_response):
+    """
+    Instantiates the gradio chat interface.
+    Args:
+        generate_response (callable): Function that generates the response.
+    Returns:
+        class: Returns gradio chatinterface class
+    """
+    chat_interface = gr.ChatInterface(
+        fn=generate_response,
+        textbox=gr.Textbox(
+            placeholder="Type your question here!", container=False, scale=7
+        ),
+        title="OpenPages IntelliBot",
+        description="Ask me about OpenPages (v9.0), its features, solutions / modules it offers and the trigger framework. Authored by Nikhil Komakula (nikhil.komakula@outlook.com).",
+        theme=gr.themes.Default(primary_hue="blue"),
+        examples=load_eval_questions(),
+        cache_examples=False,
+        concurrency_limit=None
+    )
+    return chat_interface