Spaces:

yusufs
/

vllm-inference

Paused

App Files Files

1 contributor

History: 28 commits

yusufs's picture

feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len

13a5c22 10 months ago

.gitignore

5 Bytes

feat(first-commit): follow examples and tutorials 10 months ago
Dockerfile

1.17 kB

feat(hf_token): set hf token during build 10 months ago
README.md

1.67 kB

feat(download-model): add download model at runtime 10 months ago
download_model.py

503 Bytes

feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len 10 months ago
main.py

6.7 kB

feat(parse): parse output 10 months ago
openai_compatible_api_server.py

24.4 kB

feat(endpoint): add prefix /api on each endpoint 10 months ago
poetry.lock

426 kB

feat(refactor): move the files to root 10 months ago
pyproject.toml

416 Bytes

feat(refactor): move the files to root 10 months ago
requirements.txt

9.99 kB

feat(first-commit): follow examples and tutorials 10 months ago
run.sh

1.55 kB

feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len 10 months ago