21 198 283

lhl PRO

leonardlin

https://randomfoo.net/

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

featherless-ai/Qwerky-72B

posted an update 2 days ago

I'm excited to announce the official release of our Shisa V2 405B model: https://huggingface.co/shisa-ai/shisa-v2-llama3.1-405b It's the strongest model ever trained in Japan, and even goes toe-to-toe w/ GPT-4o and DeepSeek-V3 in JA MT-Bench. For all the details, be sure to check out post and overview report here: https://shisa.ai/posts/shisa-v2-405b/

updated a collection 2 days ago

Shisa V2

View all activity

Organizations

leonardlin's activity

posted an update 2 days ago

Post

259

I'm excited to announce the official release of our Shisa V2 405B model:
shisa-ai/shisa-v2-llama3.1-405b

It's the strongest model ever trained in Japan, and even goes toe-to-toe w/ GPT-4o and DeepSeek-V3 in JA MT-Bench.

For all the details, be sure to check out post and overview report here: https://shisa.ai/posts/shisa-v2-405b/

posted an update 13 days ago

Post

2516

BTW, in case anyone wants to kick the tires, test their 日本語, I have our Shisa V2 405B model up and running temporarily: https://chat.shisa.ai/

3 replies

posted an update about 2 months ago

Post

2658

Happy to announce the release of Shisa V2, our latest generation of our bilingual Japanese-English language models. After hundreds of ablations and months of work, we're releasing some of the strongest open Japanese models at 7B, 8B, 12B, 14B, 32B and 70B! Full announcement here https://shisa.ai/posts/shisa-v2/ or visit the Shisa V2 HF collection: shisa-ai/shisa-v2-67fc98ecaf940ad6c49f5689

reacted to nroggendorff's post with 👀 8 months ago

Post

2678

When huggingface patches this, I'm going to be really sad, but in the meantime, here you go:

When AutoTrain creates a new space to train your model, it does so via the huggingface API. If you modify the code so that it includes a premade README.md file, you can add these two lines:

---
app_port: 8080 # or any integer besides 7860 that's greater than 2 ** 10
startup_duration_timeout: 350m
---

This will tell huggingface to listen for the iframe on your port, instead of the one autotrain is actually hosting on, and because startup time isn't charged, you get the product for free. (you can take this even further by switching compute type to A100 or something)

1 reply

reacted to anakin87's post with 🔥 12 months ago

Post

955

🧪 RAG Evaluation with 🔥 Prometheus 2 + Haystack

📝 Blog post: https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2
📓 Notebook: https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb

─── ⋆⋅☆⋅⋆ ───

When evaluating LLMs' responses, 𝐩𝐫𝐨𝐩𝐫𝐢𝐞𝐭𝐚𝐫𝐲 𝐦𝐨𝐝𝐞𝐥𝐬 like GPT-4 are commonly used due to their strong performance.
However, relying on closed models presents challenges related to data privacy 🔒, transparency, controllability, and cost 💸.

On the other hand, 𝐨𝐩𝐞𝐧 𝐦𝐨𝐝𝐞𝐥𝐬 typically do not correlate well with human judgments and lack flexibility.

🔥 Prometheus 2 is a new family of open-source models designed to address these gaps:
🔹 two variants: prometheus-eval/prometheus-7b-v2.0; prometheus-eval/prometheus-8x7b-v2.0
🔹 trained on open-source data
🔹 high correlation with human evaluations and proprietary models
🔹 highly flexible: capable of performing direct assessments and pairwise rankings, and allowing the definition of custom evaluation criteria.

See my experiments with RAG evaluation in the links above.

posted an update 12 months ago

Post

2088

My weekened project ended up being doing some testing between torchtune, axolotl, and unsloth. I *think* it's a 1:1 comparison of what LoRA fine-tuning performance looks like between the different hardware I have in my dev boxes (4090, 3090, 7900 XTX, W7900) with a few other interesting tidbits.

Tonight I wrote up a WandB report (the panel editor is super broken in Firefox 😔) that sums up some of the more interesting bits from the results: https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Comparison--Vmlldzo4MzU3NTAx

1 reply

posted an update 12 months ago

Post

2511

Maybe of interest, I just finished a long writeup of my weekend project exploring Qwen 2 7B Instruct's Chinese censorship: https://huggingface.co/blog/leonardlin/chinese-llm-censorship-analysis

I also have an accompanying model and dataset (and codebase) for those curious to poke around:

* augmxnt/Qwen2-7B-Instruct-deccp

* augmxnt/deccp

reacted to thomwolf's post with 🔥 about 1 year ago

Post

4624

[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce 📚FineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1

1 reply

replied to their post about 1 year ago

I'll just add that I'm sure it's spam now, that space is attached to another one of my models as well (and obviously not running either). Also the user's other space is straight out linking to something shady: https://huggingface.co/spaces/elseodelasgalletas/detector-de-ia (I can't report as I'm rate limited)

replied to their post about 1 year ago

I mean, it's obviously not running my model (it's a brand new JA/EN ablation), so not sure why it'd be attached...

posted an update about 1 year ago

Post

1951

Interesting, I've just seen the my first HF spam on one of my new model uploads: shisa-ai/shisa-v1-llama3-70b - someone has an SEO spam page as a HF space attached to the model!?! Wild. Who do I report this to?

4 replies

replied to their post about 1 year ago

Also, I tested the new https://huggingface.co/DataPilot/ArrowPro-7B-KUJIRA model and it appears to be the real deal, very impressive performance, trained by a 15-yo (!) @Holy-fox - note that using my sampler settings detailed improved the score as well (as otherwise it suffered from looping errors as well).

I'll be aiming for beating that on the Llama 3 8B, and beating Command R Plus for the 70B in the coming days.

replied to their post about 1 year ago

I'll just add a note on the sampler parameters for testing that I found improved performance for virtually every model I tested: temperature 0.2, min_p 0.1, frequency_penalty 0.5 (a frequency/repetition penalty is required to minimize looping errors that otherwise creep into most of these models)

posted an update about 1 year ago

Post

1620

For those with an interest in JA language models, this Llama 3 70B test ablation looks like it is the current strongest publicly released, commercially usable, open model available. A lot of caveats I know, but it also matches gpt-3.5-turbo-0125's JA performance, which is worth noting, and is tuned *exclusively* with the old shisa-v1 dataset (so it's chart position will be very short lived).

shisa-ai/shisa-v1-llama3-70b

augmxnt/ultra-orca-boros-en-ja-v1

2 replies

posted an update about 1 year ago

Post

1957

With slurm figured out and ablations humming along, I though I'd update and post my understanding of the legal status of training data in Japan. It is in general, much clearer in the US: https://huggingface.co/blog/leonardlin/ai-training-data-in-japan

posted an update about 1 year ago

Post

1382

llm-jp-eval is currently one of the most widely used benchmarks for Japanese LLMs and is half of WandB's comprehensive Nejumi LLM Leaderboard scoring. I was seeing some weirdness in results I was getting and ended up in a bit of a rabbit hole. Here's my article on evaling llm-jp-eval: https://huggingface.co/blog/leonardlin/llm-jp-eval-eval

I've setup a fork of Lightblue's Shaberi testing framework which uses LLM-as-a-Judge style benchmarks as something probably more representative of real world LLM strength in Japanese. Here's how the new base model ablations are looking:

reacted to mrfakename's post with ❤️ about 1 year ago

Post

11216

Introducing StyleTTS 2 detector, an audio classification model to detect StyleTTS 2 vs human-generated content!

Dual-licensed under MIT/Apache 2.0.

Model Weights: mrfakename/styletts2-detector
Spaces: mrfakename/styletts2-detector

2 replies

posted an update about 1 year ago

Post

1260

I've been doing some evals and tuning, and this chat template repo maintained by @chujiezheng is great: https://github.com/chujiezheng/chat_templates

Here's also a simple script for checking what the output looks like:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("augmxnt/shisa-7b-v1")
messages = [
    {'role': 'user', 'content': 'This is the first user input.'},
    {'role': 'assistant', 'content': 'This is the first assistant response.'},
    {'role': 'user', 'content': 'This is the second user input.'},
]

print()
print('Chat Template:')
print(tokenizer.chat_template)
print()
print('---')
print()

print(tokenizer.apply_chat_template(messages, tokenize=False))

replied to mlabonne's post over 1 year ago

BTW, I was trying to get a tree on https://huggingface.co/mlabonne/AlphaMonarch-7B and it was getting caught in a recursion loop. I started first by adding caching on the ModelCard assuming it'd figure things out but it didn't and I hacked in some stuff preventing revisits (also added some weak handling for missing models since that was looping as well since AIDC-ai-business/Marcoroni-7B-v3 for example has disappeared).

Anyway, my updated code still has broken chart rendering (cyclic graph - what was causing the looping issues) but at least it will get a list of the model lineage, which was good enough for my purposes... In case anyone wants to move this forward or needs a reference in case they run into looping issues: https://colab.research.google.com/drive/1-7w_pPWPCCQQpQ7LrvlKIdhyHsoCHH4E?usp=sharing

reacted to hunkim's post with ❤️ over 1 year ago

Post

#lg #gram #solarllm BREAKING NEWS:

upstage/Solar LLM will soon be available for LG gram Laptops as an on-device LLM. 💻🌞🎉

Upstage makes LLMs accessible to everyone and every device. We'd love to see more on-device LLMs.

https://koreajoongangdaily.joins.com/news/2024-02-06/business/industry/LG-Electronics-signs-partnership-with-generative-AI-startup-Upstage-/1975528

3 replies