File size: 15,172 Bytes
2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 4779f10 2af0eb7 24ff9b2 4779f10 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 4779f10 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 24ff9b2 2af0eb7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
{
"cells": [
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of document chunks: 14800\n",
"\n",
"Sample document chunk(metadata not the vector): \n",
"id=0 payload={'link': 'https://www.ros.org/', 'type': 'Document', 'chunk': 0, 'text': 'ROS: Home Why ROS? Getting Started Community Ecosystem ROS - Robot Operating System The Robot Operating System (ROS) is a set of software libraries and tools that help you build robot applications. From drivers to state-of-the-art algorithms, and with powerful developer tools, ROS has what you need for your next robotics project. And it\\'s all open source. What is ROS? ROS Videos \" Install Jazzy Jalisco Jazzy Jalisco is our latest ROS 2 LTS release targeted at the Ubuntu 24.04 (Noble) and'} vector=None shard_key=None order_value=None \n",
"\n",
"Number of githb chunks: 3600\n",
"\n",
"Sample github chunk(with_vector=false): \n",
"id=0 payload={'link': 'https://github.com/ros2/ros2/tree/rolling/README.md', 'type': 'Github', 'chunk': 0, 'text': \"#About TheRobotOperatingSystem(ROS)isasetofsoftwarelibrariesandtoolsthathelpyoubuildrobotapplications. Fromdriverstostate-of-the-artalgorithms,andwithpowerfuldevelopertools,ROShaswhatyouneedforyournextroboticsproject. Andit'sallopensource. Fullprojectdetailson[ROS.org](https://ros.org/) #GettingStarted LookingtogetstartedwithROS? Our[installationguideishere](https://www.ros.org/blog/getting-started/).\"} vector=None shard_key=None order_value=None \n",
"\n",
"\n",
"Sample search result(n=10): \n",
"id=45 version=45 score=0.5391361 payload={'link': 'https://docs.nav2.org/', 'type': 'Document', 'chunk': 40, 'text': 'types of tasks like object following, complete coverage navigation, and more. Nav2 is a production-grade and high-quality navigation framework trusted by 100+ companies worldwide. It provides perception, planning, control, localization, visualization, and much more to build highly reliable autonomous systems. This will compute an environmental model from sensor and semantic data, dynamically path plan, compute velocities for motors, avoid obstacles, and structure higher-level robot behaviors.'} vector=None shard_key=None order_value=None\n",
"id=9180 version=9180 score=0.511093 payload={'link': 'https://docs.nav2.org/migration/Iron.html', 'type': 'Document', 'chunk': 39, 'text': 'not specifically address here. BehaviorTree.CPP upgraded to version 4.5+ Since we migrated from version 3.8 to 4.5, users must upgrade their XML and source code accordingly. You can refer to [this page](https://www.behaviortree.dev/docs/migration) for more details, but the main changes are: XML must be changed. This [python script can help](https://github.com/BehaviorTree/BehaviorTree.CPP/blob/master/convert_v3_to_v4.py). The syntax of SubTrees has changed; the one of SubTreePlus was adopted,'} vector=None shard_key=None order_value=None\n",
"id=9922 version=9922 score=0.5105795 payload={'link': 'https://moveit.ai/blog/', 'type': 'Document', 'chunk': 31, 'text': 'September 19, 2015 MoveIt! Upcoming Events - RoboBusiness 2015 Come meet MoveIt! developers and users at RoboBusiness 2015 in San Jose... September 17, 2015 Report on First MoveIt! Community Meeting Watch video of the First MoveIt! Community Meeting in case you missed it. Thank you for coming to the MoveIt! Community Meeting and thanks to the present... July 02, 2015 MoveIt! goes underwater! MoveIt! on an underwater Girona500 AUV robot and 4-DOF arm for autonomous underwater manipulation...'} vector=None shard_key=None order_value=None\n",
"id=540 version=540 score=0.51053035 payload={'link': 'https://docs.nav2.org/concepts/index.html', 'type': 'Document', 'chunk': 56, 'text': 'to their task. When the behavior tree ticks the corresponding BT node, it will call the action server to process its task. The action server callback inside the server will call the chosen algorithm by its name (e.g. FollowPath) that maps to a specific algorithm. This allows a user to abstract the algorithm used in the behavior tree to classes of algorithms. For instance, you can have N plugin controllers to follow paths, dock with charger, avoid dynamic obstacles, or interface with a tool.'} vector=None shard_key=None order_value=None\n",
"id=7618 version=7618 score=0.50761116 payload={'link': 'https://docs.nav2.org/configuration/packages/configuring-savitzky-golay-smoother.html', 'type': 'Document', 'chunk': 39, 'text': 'plugin that will take in an input path and smooth it using a simple and fast smoothing technique based on Savitzky Golay Filters. It uses a digital signal processing technique designed to reduce noise distorting a reference signal, in this case, a path. It is useful for all types of planners, but particularly in NavFn to remove tiny artifacts that can occur near the end of paths or Theta* to slightly soften the transition between Line of Sight line segments without modifying the primary path.'} vector=None shard_key=None order_value=None\n",
"id=1067 version=1067 score=0.50312483 payload={'link': 'https://docs.nav2.org/setup_guides/algorithm/select_algorithm.html', 'type': 'Document', 'chunk': 48, 'text': 'not suitable for ackermann and legged robots since they have turning constraints. That being said, these plugins are best used on robots that can drive in any direction or rotate safely in place, such as circular differential and circular omnidirectional robots. Another planner plugin is the Smac Hybrid-A* planner that supports arbitrary shaped ackermann and legged robots. It is a highly optimized and fully reconfigurable Hybrid-A* implementation supporting Dubin and Reeds-Shepp motion models.'} vector=None shard_key=None order_value=None\n",
"id=60 version=60 score=0.5007378 payload={'link': 'https://moveit.ai/', 'type': 'Document', 'chunk': 2, 'text': 'given pose, even in over-actuated arms Control Execute time-parameterized joint trajectories to low level hardware controllers through common interfaces 3D Perception Connect to depth sensors and point clouds with Octomaps Collision Checking Avoid obstacles using geometric primitives, meshes, or point cloud data Companies using MoveIt Powerful 3D Interactive Visualizer Out-of-the box visual demonstrations in Rviz allow new users experimentation with various planning algorithms around obstacles.'} vector=None shard_key=None order_value=None\n",
"id=9196 version=9196 score=0.49414897 payload={'link': 'https://docs.nav2.org/migration/Iron.html', 'type': 'Document', 'chunk': 55, 'text': 'planner. When enforce_path_inversion is true, the path handler will prune the path to the first time the directions change to force the controller to plan to the inversion point and then be set the rest of the path, once in tolerance. The Path Align critic also contains a parameter use_path_orientations which can be paired with it to incentivize aligning the path containing orientation information to better attempt to achieve path inversions where requested and not do them when not requested.'} vector=None shard_key=None order_value=None\n",
"id=404 version=404 score=0.4938618 payload={'link': 'https://docs.nav2.org/development_guides/devcontainer_docs/devcontainer_guide.html', 'type': 'Document', 'chunk': 43, 'text': 'needed for building the project, as reused by the projects CI. For example, the dever stage modifies /etc/bash.bashrc to automatically source install/setup.bash from the underlay workspace, ensuring all VS Code extensions are loaded with the correct environment, while avoiding any race conditions during installation and startup. To speed up the initial build, images layers from this builder stage are cached by pulling the same image tag used by the projects CI, hosted from the image registry.'} vector=None shard_key=None order_value=None\n",
"id=523 version=523 score=0.48727226 payload={'link': 'https://docs.nav2.org/concepts/index.html', 'type': 'Document', 'chunk': 39, 'text': 'with the concepts required to appreciating and working with this project. ROS 2 ROS 2 is the core middleware used for Nav2. If you are unfamiliar with this, please visit the ROS 2 documentation before continuing. Action Server Just as in ROS, action servers are a common way to control long running tasks like navigation. This stack makes more extensive use of actions, and in some cases, without an easy topic interface. It is more important to understand action servers as a developer in ROS 2.'} vector=None shard_key=None order_value=None\n"
]
}
],
"source": [
"from shared import getQdrantClient, getEmbeddingsModel\n",
"qClient = getQdrantClient()\n",
"\n",
"# Show everything in the Document collection\n",
"numDocumentChunks = 0\n",
"# Note with_vectors defaults to false, so the vectors are not returned\n",
"chunks = qClient.scroll(collection_name='Document', limit=100)\n",
"while True:\n",
" for chunk in chunks[0]:\n",
" if numDocumentChunks == 0:\n",
" sampleDocumentChunk = chunk\n",
" numDocumentChunks += 1\n",
" chunks = qClient.scroll(collection_name='Document', limit=100, with_payload=False, offset=chunks[1])\n",
" if chunks[1] is None:\n",
" break\n",
"print(\"Number of document chunks: \", numDocumentChunks)\n",
"if numDocumentChunks > 0:\n",
" print(\"\\nSample document chunk(metadata not the vector): \")\n",
" print(sampleDocumentChunk, '\\n')\n",
"\n",
"# Show everything in the Github collection\n",
"numGithubChunks = 0\n",
"# Note with_vectors defaults to false, so the vectors are not returned(since they are very large)\n",
"chunks = qClient.scroll(collection_name='Github', limit=100)\n",
"while True:\n",
" for chunk in chunks[0]:\n",
" if numGithubChunks == 0:\n",
" sampleGithubChunk = chunk\n",
" numGithubChunks += 1\n",
" chunks = qClient.scroll(collection_name='Github', limit=100, with_payload=False, offset=chunks[1])\n",
" if chunks[1] is None:\n",
" break\n",
"print(\"Number of githb chunks: \", numGithubChunks)\n",
"if numGithubChunks > 0:\n",
" print(\"\\nSample github chunk(with_vector=false): \")\n",
" print(sampleGithubChunk, '\\n')\n",
"\n",
"# Show a sample search\n",
"embeddingsModel = getEmbeddingsModel()\n",
"results = qClient.search(\n",
" collection_name=\"Document\",\n",
" query_vector = embeddingsModel.embed_query(\"How many companies is Nav2 trusted by worldwide?\"),\n",
" limit=10\n",
")\n",
"print(\"\\nSample search result(n=10): \")\n",
"for result in results:\n",
" print(result)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of chunks to embed: 285569\n",
"Chunks currently embedded: 18400\n"
]
}
],
"source": [
"# Check how many chunks total will be processed by the FeaturePipeline\n",
"from shared import getMongoClient\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"\n",
"\n",
"texts = []\n",
"# Create a mongoDB connection\n",
"mongoHost = getMongoClient()\n",
"mongoDatabase = mongoHost[\"twin\"]\n",
"collections = mongoDatabase.list_collection_names()\n",
"for collection in collections:\n",
" mongoCollection = mongoDatabase[collection]\n",
" results = mongoCollection.find()\n",
" for result in results:\n",
" # For each document, split it into chunks\n",
" texts.append(result[\"content\"])\n",
"\n",
"cleanTexts = []\n",
"for text in texts:\n",
" cleanTexts.append(\"\".join(char for char in text if 32 <= ord(char) <= 126))\n",
"\n",
"numChunks = 0\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=500,\n",
" chunk_overlap=20,\n",
" length_function=len,\n",
" is_separator_regex=False,\n",
")\n",
"for text in cleanTexts:\n",
" textChunks = text_splitter.split_text(text)\n",
" for chunk in textChunks:\n",
" numChunks += 1\n",
"\n",
"print(\"Total number of chunks to embed: \", numChunks)\n",
"print(\"Chunks currently embedded: \", numDocumentChunks+numGithubChunks)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cosine Similarity for related sentences: 0.523006986899456\n",
"Cosine Similarity for unrelated sentences: 0.32259653091273344\n"
]
}
],
"source": [
"import numpy as np\n",
"# How cosine distance works\n",
"\n",
"queryEmbedding = embeddingsModel.embed_query(\"What is the weather like?\")\n",
"documentEmbedding = embeddingsModel.embed_documents([\"It is raining today.\", \"ROS is an open source platform\"])\n",
"def cosine_similarity(vec1, vec2):\n",
" dot_product = np.dot(vec1, vec2)\n",
" norm_vec1 = np.linalg.norm(vec1)\n",
" norm_vec2 = np.linalg.norm(vec2)\n",
" return dot_product / (norm_vec1 * norm_vec2)\n",
"similarity1 = cosine_similarity(queryEmbedding, documentEmbedding[0])\n",
"similarity2 = cosine_similarity(queryEmbedding, documentEmbedding[1])\n",
"print(\"Cosine Similarity for related sentences:\", similarity1)\n",
"print(\"Cosine Similarity for unrelated sentences:\", similarity2)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from qdrant_client.http.models import Distance, VectorParams\n",
"# Delete all collections and vectors inside them\n",
"qClient.delete_collection(collection_name = \"Document\")\n",
"qClient.delete_collection(collection_name = \"Github\")\n",
"# Recreate the empty collections\n",
"qClient.create_collection(\n",
" collection_name = \"Document\",\n",
" vectors_config=VectorParams(size=3072, distance=Distance.COSINE)\n",
")\n",
"qClient.create_collection(\n",
" collection_name = \"Github\",\n",
" vectors_config=VectorParams(size=3072, distance=Distance.COSINE)\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|