arxiv:2406.10515

Reactor Mk.1 performances: MMLU, HumanEval and BBH test results

Published on Jun 15, 2024

Authors:

Abstract

Reactor Mk.1, ARCs large language model, demonstrates superior performance in efficiency and potency through benchmarking, outperforming GPT-4o, Claude Opus, and Llama 3 on various datasets.

AI-generated summary

The paper presents the performance results of Reactor Mk.1, ARCs flagship large language model, through a benchmarking process analysis. The model utilizes the Lychee AI engine and possesses less than 100 billion parameters, resulting in a combination of efficiency and potency. The Reactor Mk.1 outperformed models such as GPT-4o, Claude Opus, and Llama 3, with achieved scores of 92% on the MMLU dataset, 91% on HumanEval dataset, and 88% on BBH dataset. It excels in both managing difficult jobs and reasoning, establishing as a prominent AI solution in the present cutting-edge AI technology.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.10515 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.10515 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.10515 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.