File size: 5,374 Bytes
91f974c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
from typing import Dict, Optional, List
from dataclasses import dataclass
from haystack.dataclasses import ChatMessage

@dataclass
class DatasetConfig:
    name: str
    split: str = "train"
    content_field: str = "content"
    fields: Dict[str, str] = None  # Dictionary of field mappings
    prompt_template: Optional[str] = None

# Default configurations for different datasets
DATASET_CONFIGS = {
    "awesome-chatgpt-prompts": DatasetConfig(
        name="fka/awesome-chatgpt-prompts",
        content_field="prompt",
        fields={
            "role": "act",
            "prompt": "prompt"
        },
        prompt_template="""
        Given the following context where each document represents a prompt for a specific role,
        please answer the question while considering both the role and the prompt content.
        
        Available Contexts:
        {% for document in documents %}
            {% if document.meta.role %}Role: {{ document.meta.role }}{% endif %}
            Content: {{ document.content }}
            ---
        {% endfor %}
        
        Question: {{question}}
        Answer:
        """
    ),
    "settings-dataset": DatasetConfig(
        name="syntaxhacker/rag_pipeline",
        content_field="context",
        fields={
            "question": "question",
            "answer": "answer",
            "context": "context"
        },
        prompt_template="""
        Given the following context about software settings and configurations,
        please answer the question accurately based on the provided information.
        
        For each setting, provide a clear, step-by-step navigation path and include:
        1. The exact location (Origin Type > Tab > Section > Setting name)
        2. What the setting does
        3. Available options/values
        4. How to access and modify the setting
        5. Reference screenshots (if available)
        
        Format your answer as:
        "To [accomplish task], follow these steps:

        Location: [Origin Type] > [Tab] > [Section] > [Setting name]
        Purpose: [describe what the setting does]
        Options: [list available values/options]
        How to set: [describe interaction method: toggle/select/input]
        
        Visual Guide:
        [Include reference image links if available]

        For more details, you can refer to the screenshots above showing the exact location and interface."

        Available Contexts:
        {% for document in documents %}
            Setting Info: {{ document.content }}
            Reference Answer: {{ document.meta.answer }}
            ---
        {% endfor %}

        Question: {{question}}
        Answer:
        """
    ),
    "seven-wonders": DatasetConfig(
        name="bilgeyucel/seven-wonders",
        content_field="content",
        fields={},  # No additional fields needed
        prompt_template="""
        Given the following information about the Seven Wonders, please answer the question.
        
        Context:
        {% for document in documents %}
            {{ document.content }}
        {% endfor %}
        
        Question: {{question}}
        Answer:
        """
    ),
    "psychology-dataset": DatasetConfig(
        name="jkhedri/psychology-dataset",
        split="train",
        content_field="question",  # Assuming we want to use the question as the content
        fields={
            "response_j": "response_j",  # Response from one model
            "response_k": "response_k"   # Response from another model
        },
        prompt_template="""
        Given the following context where each document represents a psychological inquiry,
        please answer the question based on the provided responses.

        Available Contexts:
        {% for document in documents %}
            Question: {{ document.content }}
            Response J: {{ document.meta.response_j }}
            Response K: {{ document.meta.response_k }}
            ---
        {% endfor %}

        Question: {{question}}
        Answer:
        """
    ),
    "developer-portfolio": DatasetConfig(
        name="syntaxhacker/developer-portfolio-rag",
        split="train",
        content_field="answer",
        fields={
            "question": "question",
            "answer": "answer",
            "context": "context"
        },
        prompt_template="""
        Given the following context about a software developer's skills, experience, and background,
        please answer the question accurately based on the provided information.
        
        For each query, provide detailed information about:
        1. Technical skills and programming languages
        2. Machine learning and AI experience
        3. Projects and professional experience
        4. Tools and frameworks used
        5. Personal interests and learning approach
        
        Available Contexts:
        {% for document in documents %}
            Question: {{ document.meta.question }}
            Answer: {{ document.content }}
            Context: {{ document.meta.context }}
            ---
        {% endfor %}
        
        Question: {{question}}
        Answer:
        """
    ),
}

# Default configuration for embedding and LLM models
MODEL_CONFIG = {
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "llm_model": "gemini-2.0-flash-exp",
}