Spaces:

francismurray
/

llm-compare

Running

App Files Files Community

francismurray commited on May 21

Commit

aac64a6

1 Parent(s): 83b15cf

Initial commit: LLM Comparison App

Browse files

Files changed (4) hide show

.gitignore +2 -0
README.md +62 -1
app.py +178 -0
environment.yml +79 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ .DS_Store
2	+ .env

README.md CHANGED Viewed

@@ -11,4 +11,65 @@ license: mit
 short_description: Compare outputs from text-generation models side by side
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: Compare outputs from text-generation models side by side
 ---
+<!-- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference -->
+# LLM Comparison Tool
+A Gradio web application that allows you to compare outputs from different Hugging Face models side by side.
+## Features
+- Compare outputs from two different LLMs simultaneously
+- Simple and clean interface
+- Support for multiple Hugging Face models
+- Text generation using Hugging Face's Inference API
+- Error handling and user feedback
+## Setup
+1. Clone this repository
+2. Create and activate the conda environment:
+   ```bash
+   conda env create -f environment.yml
+   conda activate llm-compare
+   ```
+3. Create a `.env` file in the root directory and add your Hugging Face API token:
+   ```
+   HF_TOKEN=your_hugging_face_token_here
+   ```
+   You can get your token from your [Hugging Face profile settings](https://huggingface.co/settings/tokens).
+## Running the App
+1. Make sure you have activated the conda environment:
+   ```bash
+   conda activate llm-compare
+   ```
+2. Run the application:
+   ```bash
+   python app.py
+   ```
+3. Open your browser and navigate to the URL shown in the terminal (typically `http://localhost:7860`)
+## Usage
+1. Enter your prompt in the text box
+2. Select two different models from the dropdown menus
+3. Click "Generate Responses" to see the outputs
+4. The responses will appear in the chatbot interfaces below each model selection
+## Models Available
+- HuggingFaceH4/zephyr-7b-beta
+- meta-llama/Llama-3.1-8B-Instruct
+- microsoft/Phi-3.5-mini-instruct
+- Qwen/QwQ-32B
+## Notes
+- Make sure you have a valid Hugging Face API token with appropriate permissions
+- Response times may vary depending on the model size and server load

app.py ADDED Viewed

	@@ -0,0 +1,178 @@

+import os
+import gradio as gr
+from dotenv import load_dotenv
+from huggingface_hub import InferenceClient
+# Load environment variables
+load_dotenv()
+HF_TOKEN = os.getenv("HF_TOKEN")
+if not HF_TOKEN:
+    raise ValueError("Please set HF_TOKEN environment variable")
+# Available models
+AVAILABLE_MODELS = [
+    "HuggingFaceH4/zephyr-7b-beta",
+    "meta-llama/Llama-3.1-8B-Instruct",
+    "microsoft/Phi-3.5-mini-instruct",
+    "Qwen/QwQ-32B",
+]
+# Initialize inference client
+inference_client = InferenceClient(token=HF_TOKEN)
+def get_model_response(prompt, model_name, temperature_value, do_sample):
+    """Get response from a Hugging Face model."""
+    try:
+        # Build kwargs dynamically
+        generation_args = {
+            "prompt": prompt,
+            "model": model_name,
+            "max_new_tokens": 100,
+            "do_sample": do_sample,
+            "return_full_text": False
+        }
+        # Only include temperature if sampling is enabled
+        if do_sample and temperature_value > 0:
+            generation_args["temperature"] = temperature_value
+        response = inference_client.text_generation(**generation_args)
+        return response
+    except Exception as e:
+        return f"Error: {str(e)}"
+def compare_models(prompt, model1, model2, temp1, temp2, do_sample1, do_sample2):
+    """Compare outputs from two selected models."""
+    if not prompt.strip():
+        return (
+            [{"role": "user", "content": prompt}, {"role": "assistant", "content": "Please enter a prompt"}],
+            [{"role": "user", "content": prompt}, {"role": "assistant", "content": "Please enter a prompt"}],
+            gr.update(interactive=True)
+        )
+    response1 = get_model_response(prompt, model1, temp1, do_sample1)
+    response2 = get_model_response(prompt, model2, temp2, do_sample2)
+    # Format responses for chatbot display
+    chat1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
+    chat2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
+    return chat1, chat2, gr.update(interactive=True)
+# Update temperature slider interactivity based on sampling checkbox
+def update_slider_state(enabled):
+    return [
+        gr.update(interactive=enabled),
+        gr.update(
+            elem_classes=[] if enabled else ["disabled-slider"],
+            value=0 if not enabled else None
+        )
+    ]
+# Create the Gradio interface
+with gr.Blocks(css="""
+    .disabled-slider { opacity: 0.5; pointer-events: none; }
+""") as demo:
+    gr.Markdown("# LLM Comparison Tool")
+    gr.Markdown("Compare outputs from different Hugging Face models side by side.")
+    with gr.Row():
+        prompt = gr.Textbox(
+            label="Enter your prompt",
+            placeholder="Type your prompt here...",
+            lines=3
+        )
+    with gr.Row():
+        submit_btn = gr.Button("Generate Responses")
+    with gr.Row():
+        with gr.Column():
+            model1_dropdown = gr.Dropdown(
+                choices=AVAILABLE_MODELS,
+                value=AVAILABLE_MODELS[0],
+                label="Select Model 1"
+            )
+            do_sample1 = gr.Checkbox(
+                label="Enable sampling (random outputs)",
+                value=False
+            )
+            temp1 = gr.Slider(
+                label="Temperature (Higher = more creative, lower = more predictable)",
+                minimum=0,
+                maximum=1,
+                step=0.1,
+                value=0.0,
+                interactive=False,
+                elem_classes=["disabled-slider"]
+            )
+            chatbot1 = gr.Chatbot(
+                label="Model 1 Output",
+                show_label=True,
+                height=300,
+                type="messages"
+            )
+        with gr.Column():
+            model2_dropdown = gr.Dropdown(
+                choices=AVAILABLE_MODELS,
+                value=AVAILABLE_MODELS[1],
+                label="Select Model 2"
+            )
+            do_sample2 = gr.Checkbox(
+                label="Enable sampling (random outputs)",
+                value=False
+            )
+            temp2 = gr.Slider(
+                label="Temperature (Higher = more creative, lower = more predictable)",
+                minimum=0,
+                maximum=1,
+                step=0.1,
+                value=0.0,
+                interactive=False,
+                elem_classes=["disabled-slider"]
+            )
+            chatbot2 = gr.Chatbot(
+                label="Model 2 Output",
+                show_label=True,
+                height=300,
+                type="messages"
+            )
+    def start_loading():
+        return gr.update(interactive=False)
+    # Handle form submission
+    submit_btn.click(
+        fn=start_loading,
+        inputs=None,
+        outputs=submit_btn,
+        queue=False
+    ).then(
+        fn=compare_models,
+        inputs=[prompt, model1_dropdown, model2_dropdown, temp1, temp2, do_sample1, do_sample2],
+        outputs=[chatbot1, chatbot2, submit_btn]
+    )
+    do_sample1.change(
+        fn=update_slider_state,
+        inputs=[do_sample1],
+        outputs=[temp1, temp1]
+    )
+    do_sample2.change(
+        fn=update_slider_state,
+        inputs=[do_sample2],
+        outputs=[temp2, temp2]
+    )
+if __name__ == "__main__":
+    demo.launch()
+    # demo.launch(share=True)

environment.yml ADDED Viewed

	@@ -0,0 +1,79 @@

+name: llm_compare
+channels:
+  - conda-forge
+dependencies:
+  - bzip2=1.0.8=hfdf4475_7
+  - ca-certificates=2025.4.26=hbd8a1cb_0
+  - libexpat=2.7.0=h240833e_0
+  - libffi=3.4.6=h281671d_1
+  - liblzma=5.8.1=hd471939_1
+  - liblzma-devel=5.8.1=hd471939_1
+  - libsqlite=3.49.2=hdb6dae5_0
+  - libzlib=1.3.1=hd23fc13_2
+  - ncurses=6.5=h0622a9a_3
+  - openssl=3.5.0=hc426f3f_1
+  - pip=25.1.1=pyh8b19718_0
+  - python=3.10.17=h93e8a92_0_cpython
+  - python-dotenv=1.1.0=pyh29332c3_1
+  - readline=8.2=h7cca4af_2
+  - setuptools=80.8.0=pyhff2d567_0
+  - tk=8.6.13=h1abcd95_1
+  - wheel=0.45.1=pyhd8ed1ab_1
+  - xz=5.8.1=h357f2ed_1
+  - xz-gpl-tools=5.8.1=h357f2ed_1
+  - xz-tools=5.8.1=hd471939_1
+  - pip:
+      - aiofiles==24.1.0
+      - annotated-types==0.7.0
+      - anyio==4.9.0
+      - certifi==2025.4.26
+      - charset-normalizer==3.4.2
+      - click==8.1.8
+      - exceptiongroup==1.3.0
+      - fastapi==0.115.12
+      - ffmpy==0.5.0
+      - filelock==3.18.0
+      - fsspec==2025.5.0
+      - gradio==5.29.0
+      - gradio-client==1.10.0
+      - groovy==0.1.2
+      - h11==0.16.0
+      - httpcore==1.0.9
+      - httpx==0.28.1
+      - huggingface-hub==0.31.2
+      - idna==3.10
+      - jinja2==3.1.6
+      - markdown-it-py==3.0.0
+      - markupsafe==3.0.2
+      - mdurl==0.1.2
+      - numpy==2.2.6
+      - orjson==3.10.18
+      - packaging==25.0
+      - pandas==2.2.3
+      - pillow==11.2.1
+      - pydantic==2.11.4
+      - pydantic-core==2.33.2
+      - pydub==0.25.1
+      - pygments==2.19.1
+      - python-dateutil==2.9.0.post0
+      - python-multipart==0.0.20
+      - pytz==2025.2
+      - pyyaml==6.0.2
+      - requests==2.32.3
+      - rich==14.0.0
+      - ruff==0.11.10
+      - safehttpx==0.1.6
+      - semantic-version==2.10.0
+      - shellingham==1.5.4
+      - six==1.17.0
+      - sniffio==1.3.1
+      - starlette==0.46.2
+      - tomlkit==0.13.2
+      - tqdm==4.67.1
+      - typer==0.15.4
+      - typing-extensions==4.13.2
+      - typing-inspection==0.4.1
+      - tzdata==2025.2
+      - urllib3==2.4.0
+      - uvicorn==0.34.2
+      - websockets==15.0.1