File size: 15,172 Bytes
2af0eb7
 
 
 
24ff9b2
2af0eb7
 
 
 
 
 
24ff9b2
2af0eb7
24ff9b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2af0eb7
 
 
 
 
 
 
 
 
 
4779f10
 
 
 
 
 
 
 
 
2af0eb7
 
 
 
 
 
 
24ff9b2
4779f10
 
 
 
 
 
 
 
 
24ff9b2
2af0eb7
24ff9b2
2af0eb7
 
 
 
 
 
24ff9b2
4779f10
2af0eb7
24ff9b2
2af0eb7
 
 
 
 
 
24ff9b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2af0eb7
 
 
 
 
 
24ff9b2
 
2af0eb7
 
 
 
 
 
 
24ff9b2
 
2af0eb7
 
 
 
 
24ff9b2
 
2af0eb7
 
 
 
 
 
24ff9b2
2af0eb7
 
 
 
 
 
 
 
24ff9b2
2af0eb7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of document chunks:  14800\n",
      "\n",
      "Sample document chunk(metadata not the vector): \n",
      "id=0 payload={'link': 'https://www.ros.org/', 'type': 'Document', 'chunk': 0, 'text': 'ROS: Home Why ROS? Getting Started Community Ecosystem ROS - Robot Operating System The Robot Operating System (ROS) is a set of software libraries and tools that help you build robot applications. From drivers to state-of-the-art algorithms, and with powerful developer tools, ROS has what you need for your next robotics project. And it\\'s all open source. What is ROS? ROS Videos \" Install Jazzy Jalisco Jazzy Jalisco is our latest ROS 2 LTS release targeted at the Ubuntu 24.04 (Noble) and'} vector=None shard_key=None order_value=None \n",
      "\n",
      "Number of githb chunks:  3600\n",
      "\n",
      "Sample github chunk(with_vector=false): \n",
      "id=0 payload={'link': 'https://github.com/ros2/ros2/tree/rolling/README.md', 'type': 'Github', 'chunk': 0, 'text': \"#About TheRobotOperatingSystem(ROS)isasetofsoftwarelibrariesandtoolsthathelpyoubuildrobotapplications. Fromdriverstostate-of-the-artalgorithms,andwithpowerfuldevelopertools,ROShaswhatyouneedforyournextroboticsproject. Andit'sallopensource. Fullprojectdetailson[ROS.org](https://ros.org/) #GettingStarted LookingtogetstartedwithROS? Our[installationguideishere](https://www.ros.org/blog/getting-started/).\"} vector=None shard_key=None order_value=None \n",
      "\n",
      "\n",
      "Sample search result(n=10): \n",
      "id=45 version=45 score=0.5391361 payload={'link': 'https://docs.nav2.org/', 'type': 'Document', 'chunk': 40, 'text': 'types of tasks like object following, complete coverage navigation, and more. Nav2 is a production-grade and high-quality navigation framework trusted by 100+ companies worldwide. It provides perception, planning, control, localization, visualization, and much more to build highly reliable autonomous systems. This will compute an environmental model from sensor and semantic data, dynamically path plan, compute velocities for motors, avoid obstacles, and structure higher-level robot behaviors.'} vector=None shard_key=None order_value=None\n",
      "id=9180 version=9180 score=0.511093 payload={'link': 'https://docs.nav2.org/migration/Iron.html', 'type': 'Document', 'chunk': 39, 'text': 'not specifically address here. BehaviorTree.CPP upgraded to version 4.5+ Since we migrated from version 3.8 to 4.5, users must upgrade their XML and source code accordingly. You can refer to [this page](https://www.behaviortree.dev/docs/migration) for more details, but the main changes are: XML must be changed. This [python script can help](https://github.com/BehaviorTree/BehaviorTree.CPP/blob/master/convert_v3_to_v4.py). The syntax of SubTrees has changed; the one of SubTreePlus was adopted,'} vector=None shard_key=None order_value=None\n",
      "id=9922 version=9922 score=0.5105795 payload={'link': 'https://moveit.ai/blog/', 'type': 'Document', 'chunk': 31, 'text': 'September 19, 2015 MoveIt! Upcoming Events - RoboBusiness 2015 Come meet MoveIt! developers and users at RoboBusiness 2015 in San Jose... September 17, 2015 Report on First MoveIt! Community Meeting Watch video of the First MoveIt! Community Meeting in case you missed it. Thank you for coming to the MoveIt! Community Meeting and thanks to the present... July 02, 2015 MoveIt! goes underwater! MoveIt! on an underwater Girona500 AUV robot and 4-DOF arm for autonomous underwater manipulation...'} vector=None shard_key=None order_value=None\n",
      "id=540 version=540 score=0.51053035 payload={'link': 'https://docs.nav2.org/concepts/index.html', 'type': 'Document', 'chunk': 56, 'text': 'to their task. When the behavior tree ticks the corresponding BT node, it will call the action server to process its task. The action server callback inside the server will call the chosen algorithm by its name (e.g. FollowPath) that maps to a specific algorithm. This allows a user to abstract the algorithm used in the behavior tree to classes of algorithms. For instance, you can have N plugin controllers to follow paths, dock with charger, avoid dynamic obstacles, or interface with a tool.'} vector=None shard_key=None order_value=None\n",
      "id=7618 version=7618 score=0.50761116 payload={'link': 'https://docs.nav2.org/configuration/packages/configuring-savitzky-golay-smoother.html', 'type': 'Document', 'chunk': 39, 'text': 'plugin that will take in an input path and smooth it using a simple and fast smoothing technique based on Savitzky Golay Filters. It uses a digital signal processing technique designed to reduce noise distorting a reference signal, in this case, a path. It is useful for all types of planners, but particularly in NavFn to remove tiny artifacts that can occur near the end of paths or Theta* to slightly soften the transition between Line of Sight line segments without modifying the primary path.'} vector=None shard_key=None order_value=None\n",
      "id=1067 version=1067 score=0.50312483 payload={'link': 'https://docs.nav2.org/setup_guides/algorithm/select_algorithm.html', 'type': 'Document', 'chunk': 48, 'text': 'not suitable for ackermann and legged robots since they have turning constraints. That being said, these plugins are best used on robots that can drive in any direction or rotate safely in place, such as circular differential and circular omnidirectional robots. Another planner plugin is the Smac Hybrid-A* planner that supports arbitrary shaped ackermann and legged robots. It is a highly optimized and fully reconfigurable Hybrid-A* implementation supporting Dubin and Reeds-Shepp motion models.'} vector=None shard_key=None order_value=None\n",
      "id=60 version=60 score=0.5007378 payload={'link': 'https://moveit.ai/', 'type': 'Document', 'chunk': 2, 'text': 'given pose, even in over-actuated arms Control Execute time-parameterized joint trajectories to low level hardware controllers through common interfaces 3D Perception Connect to depth sensors and point clouds with Octomaps Collision Checking Avoid obstacles using geometric primitives, meshes, or point cloud data Companies using MoveIt Powerful 3D Interactive Visualizer Out-of-the box visual demonstrations in Rviz allow new users experimentation with various planning algorithms around obstacles.'} vector=None shard_key=None order_value=None\n",
      "id=9196 version=9196 score=0.49414897 payload={'link': 'https://docs.nav2.org/migration/Iron.html', 'type': 'Document', 'chunk': 55, 'text': 'planner. When enforce_path_inversion is true, the path handler will prune the path to the first time the directions change to force the controller to plan to the inversion point and then be set the rest of the path, once in tolerance. The Path Align critic also contains a parameter use_path_orientations which can be paired with it to incentivize aligning the path containing orientation information to better attempt to achieve path inversions where requested and not do them when not requested.'} vector=None shard_key=None order_value=None\n",
      "id=404 version=404 score=0.4938618 payload={'link': 'https://docs.nav2.org/development_guides/devcontainer_docs/devcontainer_guide.html', 'type': 'Document', 'chunk': 43, 'text': 'needed for building the project, as reused by the projects CI. For example, the dever stage modifies /etc/bash.bashrc to automatically source install/setup.bash from the underlay workspace, ensuring all VS Code extensions are loaded with the correct environment, while avoiding any race conditions during installation and startup. To speed up the initial build, images layers from this builder stage are cached by pulling the same image tag used by the projects CI, hosted from the image registry.'} vector=None shard_key=None order_value=None\n",
      "id=523 version=523 score=0.48727226 payload={'link': 'https://docs.nav2.org/concepts/index.html', 'type': 'Document', 'chunk': 39, 'text': 'with the concepts required to appreciating and working with this project. ROS 2 ROS 2 is the core middleware used for Nav2. If you are unfamiliar with this, please visit the ROS 2 documentation before continuing. Action Server Just as in ROS, action servers are a common way to control long running tasks like navigation. This stack makes more extensive use of actions, and in some cases, without an easy topic interface. It is more important to understand action servers as a developer in ROS 2.'} vector=None shard_key=None order_value=None\n"
     ]
    }
   ],
   "source": [
    "from shared import getQdrantClient, getEmbeddingsModel\n",
    "qClient = getQdrantClient()\n",
    "\n",
    "# Show everything in the Document collection\n",
    "numDocumentChunks = 0\n",
    "# Note with_vectors defaults to false, so the vectors are not returned\n",
    "chunks = qClient.scroll(collection_name='Document', limit=100)\n",
    "while True:\n",
    "    for chunk in chunks[0]:\n",
    "        if numDocumentChunks == 0:\n",
    "            sampleDocumentChunk = chunk\n",
    "        numDocumentChunks += 1\n",
    "    chunks = qClient.scroll(collection_name='Document', limit=100, with_payload=False, offset=chunks[1])\n",
    "    if chunks[1] is None:\n",
    "        break\n",
    "print(\"Number of document chunks: \", numDocumentChunks)\n",
    "if numDocumentChunks > 0:\n",
    "    print(\"\\nSample document chunk(metadata not the vector): \")\n",
    "    print(sampleDocumentChunk, '\\n')\n",
    "\n",
    "# Show everything in the Github collection\n",
    "numGithubChunks = 0\n",
    "# Note with_vectors defaults to false, so the vectors are not returned(since they are very large)\n",
    "chunks = qClient.scroll(collection_name='Github', limit=100)\n",
    "while True:\n",
    "    for chunk in chunks[0]:\n",
    "        if numGithubChunks == 0:\n",
    "            sampleGithubChunk = chunk\n",
    "        numGithubChunks += 1\n",
    "    chunks = qClient.scroll(collection_name='Github', limit=100, with_payload=False, offset=chunks[1])\n",
    "    if chunks[1] is None:\n",
    "        break\n",
    "print(\"Number of githb chunks: \", numGithubChunks)\n",
    "if numGithubChunks > 0:\n",
    "    print(\"\\nSample github chunk(with_vector=false): \")\n",
    "    print(sampleGithubChunk, '\\n')\n",
    "\n",
    "# Show a sample search\n",
    "embeddingsModel = getEmbeddingsModel()\n",
    "results = qClient.search(\n",
    "    collection_name=\"Document\",\n",
    "    query_vector = embeddingsModel.embed_query(\"How many companies is Nav2 trusted by worldwide?\"),\n",
    "    limit=10\n",
    ")\n",
    "print(\"\\nSample search result(n=10): \")\n",
    "for result in results:\n",
    "    print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of chunks to embed:  285569\n",
      "Chunks currently embedded:  18400\n"
     ]
    }
   ],
   "source": [
    "# Check how many chunks total will be processed by the FeaturePipeline\n",
    "from shared import getMongoClient\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "\n",
    "texts = []\n",
    "# Create a mongoDB connection\n",
    "mongoHost = getMongoClient()\n",
    "mongoDatabase = mongoHost[\"twin\"]\n",
    "collections = mongoDatabase.list_collection_names()\n",
    "for collection in collections:\n",
    "    mongoCollection = mongoDatabase[collection]\n",
    "    results = mongoCollection.find()\n",
    "    for result in results:\n",
    "        # For each document, split it into chunks\n",
    "        texts.append(result[\"content\"])\n",
    "\n",
    "cleanTexts = []\n",
    "for text in texts:\n",
    "    cleanTexts.append(\"\".join(char for char in text if 32 <= ord(char) <= 126))\n",
    "\n",
    "numChunks = 0\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=500,\n",
    "    chunk_overlap=20,\n",
    "    length_function=len,\n",
    "    is_separator_regex=False,\n",
    ")\n",
    "for text in cleanTexts:\n",
    "    textChunks = text_splitter.split_text(text)\n",
    "    for chunk in textChunks:\n",
    "        numChunks += 1\n",
    "\n",
    "print(\"Total number of chunks to embed: \", numChunks)\n",
    "print(\"Chunks currently embedded: \", numDocumentChunks+numGithubChunks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cosine Similarity for related sentences: 0.523006986899456\n",
      "Cosine Similarity for unrelated sentences: 0.32259653091273344\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "# How cosine distance works\n",
    "\n",
    "queryEmbedding = embeddingsModel.embed_query(\"What is the weather like?\")\n",
    "documentEmbedding = embeddingsModel.embed_documents([\"It is raining today.\", \"ROS is an open source platform\"])\n",
    "def cosine_similarity(vec1, vec2):\n",
    "    dot_product = np.dot(vec1, vec2)\n",
    "    norm_vec1 = np.linalg.norm(vec1)\n",
    "    norm_vec2 = np.linalg.norm(vec2)\n",
    "    return dot_product / (norm_vec1 * norm_vec2)\n",
    "similarity1 = cosine_similarity(queryEmbedding, documentEmbedding[0])\n",
    "similarity2 = cosine_similarity(queryEmbedding, documentEmbedding[1])\n",
    "print(\"Cosine Similarity for related sentences:\", similarity1)\n",
    "print(\"Cosine Similarity for unrelated sentences:\", similarity2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qdrant_client.http.models import Distance, VectorParams\n",
    "# Delete all collections and vectors inside them\n",
    "qClient.delete_collection(collection_name = \"Document\")\n",
    "qClient.delete_collection(collection_name = \"Github\")\n",
    "# Recreate the empty collections\n",
    "qClient.create_collection(\n",
    "    collection_name = \"Document\",\n",
    "    vectors_config=VectorParams(size=3072, distance=Distance.COSINE)\n",
    ")\n",
    "qClient.create_collection(\n",
    "    collection_name = \"Github\",\n",
    "    vectors_config=VectorParams(size=3072, distance=Distance.COSINE)\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}