Frankai123 commited on
Commit
6ae928a
·
verified ·
1 Parent(s): 5e377c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -90,12 +90,19 @@ This multi-level distillation strategy enables DMind-1-mini to maintain high Web
90
 
91
  ## 2. Evaluation Results
92
 
93
- ![DMind-1 Web3 Performance](figures/dmind-1-web3-performance.jpeg)
94
 
95
- We evaluate **DMind-1** and **DMind-1-mini** using the DMind Benchmark, a domain-specific evaluation suite tailored to assess large language models in the Web3 context. The benchmark spans 1,917 expert-reviewed questions across nine critical categories—including Blockchain Fundamentals, Infrastructure, Smart Contracts, DeFi, DAO, NFT, Token Economics, Meme, and Security. It combines multiple-choice and subjective open-ended tasks, simulating real-world challenges and requiring deep contextual understanding, which provides a comprehensive assessment of both factual knowledge and advanced reasoning.
96
- Under this rigorous evaluation:
97
- - DMind-1 ranked 1st among 24 leading models, outperforming both proprietary (e.g., Grok-3) and open-source (e.g., DeepSeek-R1) LLMs, with a normalized Web3 score of 77.44
98
- - DMind-1-mini also performed strongly, ranking 2nd overall with a normalized Web3 score of 74.12. This demonstrates the effectiveness of our compact distillation pipeline
 
 
 
 
 
 
 
 
99
 
100
 
101
  ## 3. Use Cases
 
90
 
91
  ## 2. Evaluation Results
92
 
 
93
 
94
+ ![DMind-1 Web3 Performance](figures/normalized-performance-with-price.jpeg)
95
+
96
+ We evaluate DMind-1 and DMind-1-mini using the [DMind Benchmark](https://huggingface.co/datasets/DMindAI/DMind_Benchmark), a domain-specific evaluation suite designed to assess large language models in the Web3 context. The benchmark includes 1,917 expert-reviewed questions across nine core domain categories, and it features both multiple-choice and open-ended tasks to measure factual knowledge, contextual reasoning, and other abilities.
97
+
98
+ To complement accuracy metrics, we conducted a **cost-performance analysis** by comparing benchmark scores against publicly available input token prices across 24 leading LLMs. In this evaluation:
99
+
100
+ - **DMind-1** achieved the highest Web3 score while maintaining one of the lowest token input costs among top-tier models such as Grok 3 and Claude 3.5 Sonnet.
101
+
102
+ - **DMind-1-mini** ranked second, retaining over 95% of DMind-1’s performance with greater efficiency in latency and compute.
103
+
104
+ Both models are uniquely positioned in the most favorable region of the score vs. price curve, delivering state-of-the-art Web3 reasoning at significantly lower cost. This balance of quality and efficiency makes the DMind models highly competitive for both research and production use.
105
+
106
 
107
 
108
  ## 3. Use Cases