|
--- |
|
language: en |
|
library_name: FlexRAG |
|
tags: |
|
- FlexRAG |
|
- retrieval |
|
- search |
|
- lexical |
|
- RAG |
|
- IR |
|
--- |
|
|
|
# FlexRAG Retriever |
|
|
|
This is a FlexRetriever created with the [`FlexRAG`](https://github.com/ictnlp/FlexRAG) library (version `0.3.0`). |
|
|
|
## Retriever Attributes |
|
The `enwiki_2018_atlas` retriever is a FlexRetriever that provides access to the English Wikipedia corpus from December 2018. It is designed for information retrieval tasks, allowing users to search and retrieve relevant documents based on their queries. |
|
The corpus of this retriever was created by the [Atlas](https://github.com/facebookresearch/atlas) project and the index was built using the [FlexRAG](https://github.com/ictnlp/FlexRAG) library. |
|
|
|
| Corpus Attribute | Value | |
|
| ---------------- | --------------------------------------------------------------- | |
|
| Language | English | |
|
| Domain | Wikipedia | |
|
| Saved Fields | title, section, text | |
|
| Size | 30.4M (26.9M text, 2.7M infobox) | |
|
| Dump Date | Dec 2018 | |
|
| Provideer | [Atlas](https://github.com/facebookresearch/atlas) | |
|
| License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | |
|
|
|
|
|
| Index Attribute | Value | |
|
| --------------- | --------------------------------------------------------------- | |
|
| Index Name | bm25 | |
|
| Index Type | Sparse | |
|
| Index Method | Lucene | |
|
| Indexed Fields | title, section, text (concat) | |
|
| Preprocessing | LengthFilter(min_char=10, max_char=4096) | |
|
| Provideer | [FlexRAG](https://github.com/ictnlp/flexrag) | |
|
| License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | |
|
|
|
| Index Attribute | Value | |
|
| --------------- | --------------------------------------------------------------- | |
|
| Index Name | contriever | |
|
| Index Type | Dense | |
|
| Index Method | IVFPQ | |
|
| Indexed Fields | title, section, text (concat) | |
|
| Query Encoder | `facebook/contriever-msmarco` | |
|
| Passage Encoder | `facebook/contriever-msmarco` | |
|
| Preprocessing | LengthFilter(min_char=10, max_char=4096) | |
|
| Provideer | [FlexRAG](https://github.com/ictnlp/flexrag) | |
|
| License | [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) | |
|
|
|
## Usage |
|
|
|
### Installation |
|
You can install the `FlexRAG` library with `pip`: |
|
|
|
```bash |
|
pip install flexrag faiss-cpu |
|
``` |
|
|
|
### Loading the `FlexRAG` retriever |
|
|
|
You can use this retriever for information retrieval tasks. Here is an example: |
|
|
|
```python |
|
from flexrag.retriever import LocalRetriever |
|
|
|
|
|
# Load the retriever from the HuggingFace Hub |
|
retriever = LocalRetriever.load_from_hub("FlexRAG/enwiki_2018_atlas") |
|
|
|
|
|
# You can retrieve relevant documents now |
|
results = retriever.search("Who is Bruce Wayne?") |
|
``` |
|
|
|
### Running the RAG demo with the retriever |
|
|
|
You can run the **GUI application** of the RAG assistant with this retriever. Here is an example: |
|
|
|
```bash |
|
python -m flexrag.entrypoints.run_interactive \ |
|
assistant_type=modular \ |
|
modular_config.used_fields=[title,text] \ |
|
modular_config.retriever_type="FlexRAG/enwiki_2018_atlas" \ |
|
modular_config.response_type=original \ |
|
modular_config.generator_type=openai \ |
|
modular_config.openai_config.model_name='gpt-4o-mini' \ |
|
modular_config.openai_config.api_key=$OPENAI_KEY \ |
|
modular_config.do_sample=False |
|
``` |
|
|
|
## License |
|
As the corpus is based on the [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license, the retriever is also licensed under the same license. |
|
|
|
|
|
FlexRAG Related Links: |
|
* π[Documentation](https://flexrag.readthedocs.io/en/latest/) |
|
* π»[GitHub Repository](https://github.com/ictnlp/flexrag) |