llm-compare / README.md
francismurray's picture
modify README metadata
2818083

A newer version of the Gradio SDK is available: 5.33.1

Upgrade
metadata
title: LLM-Compare
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
license: mit
short_description: Compare outputs from text-generation models side by side
models:
  - HuggingFaceH4/zephyr-7b-beta
  - NousResearch/Hermes-3-Llama-3.1-8B
  - mistralai/Mistral-Nemo-Base-2407
  - meta-llama/Llama-2-70b-hf
  - aaditya/Llama3-OpenBioLLM-8B

LLM Comparison Tool

A Gradio web application that allows you to compare outputs from different Hugging Face models side by side.

Features

  • Compare outputs from two different LLMs simultaneously
  • Simple and clean interface
  • Support for multiple Hugging Face models
  • Text generation using Hugging Face's Inference API
  • Error handling and user feedback

Setup

  1. Clone this repository

  2. Create and activate the conda environment:

    conda env create -f environment.yml
    conda activate llm-compare
    
  3. Create a .env file in the root directory and add your Hugging Face API token:

    HF_TOKEN=your_hugging_face_token_here
    

    You can get your token from your Hugging Face profile settings.

Running the App

  1. Make sure you have activated the conda environment:

    conda activate llm-compare
    
  2. Run the application:

    python app.py
    
  3. Open your browser and navigate to the URL shown in the terminal (typically http://localhost:7860)

Usage

  1. Enter your prompt in the text box
  2. Select two different models from the dropdown menus
  3. Click "Generate Responses" to see the outputs
  4. The responses will appear in the chatbot interfaces below each model selection

Models Available

  • HuggingFaceH4/zephyr-7b-beta
  • meta-llama/Llama-3.1-8B-Instruct
  • microsoft/Phi-3.5-mini-instruct
  • Qwen/QwQ-32B

Notes

  • Make sure you have a valid Hugging Face API token with appropriate permissions
  • Response times may vary depending on the model size and server load