Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
yusufs
/
vllm-inference
like
0
Paused
App
Files
Files
Fetching metadata from the HF Docker repository...
13a5c22
vllm-inference
Ctrl+K
Ctrl+K
1 contributor
History:
28 commits
yusufs
feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len
13a5c22
7 months ago
.gitignore
Safe
5 Bytes
feat(first-commit): follow examples and tutorials
7 months ago
Dockerfile
Safe
1.17 kB
feat(hf_token): set hf token during build
7 months ago
README.md
Safe
1.67 kB
feat(download-model): add download model at runtime
7 months ago
download_model.py
Safe
503 Bytes
feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len
7 months ago
main.py
Safe
6.7 kB
feat(parse): parse output
7 months ago
openai_compatible_api_server.py
Safe
24.4 kB
feat(endpoint): add prefix /api on each endpoint
7 months ago
poetry.lock
Safe
426 kB
feat(refactor): move the files to root
7 months ago
pyproject.toml
Safe
416 Bytes
feat(refactor): move the files to root
7 months ago
requirements.txt
Safe
9.99 kB
feat(first-commit): follow examples and tutorials
7 months ago
run.sh
Safe
1.55 kB
feat(reduce-max-num-batched-tokens): Reducing max-num-batched-tokens even the error state it want to reduce max_model_len
7 months ago