Spotlight

Opus Released – AWS Leads the Way for Foundational Model Hosting (Currently)

Learn how Claude3 Opus, now available on Amazon Bedrock, outperforms its peers on common evaluation benchmarks for AI systems.
Brenden Judson Trek10
Brenden Judson | Apr 18 2024
4 min read

This week, AWS released Anthropic’s industry-leading, foundational model, Claude3 Opus as publicly available on Amazon Bedrock. This much-anticipated announcement comes after AWS’s groundbreaking $4 billion investment in Anthropic and its Generative AI (GenAI) capabilities.

The release gives AWS an edge over other model hosting providers or at least puts them in the same echelon as the very best. According to standard benchmarks, Claude3 Opus is the industry-leading foundational model when it comes to overall intelligence. Below are some of these standard benchmarks, taken from Anthropic’s official announcement. See note at the end of this post for one callout that needs to be made around GPT4-Turbo’s omission1.

Image Source: https://www.anthropic.com/news...

It is often the case that the top-performing model a hosting provider offers carries greater weight than other models offered. This is because, when approaching a GenAI workload, it is often most effective/efficient to start with the state-of-the-art foundational model, see if it can address your problem, and once that has been confirmed, only then optimize to more efficient models, shorter prompts, etc. Andrej Karpathy gave a more detailed description of this approach in his State of GPT Talk last year.

A Bit of Sobering Reflection

Claude3 Opus will not be the industry standard forever, also probably not even for very long. Sam Altman has publicly stated that he expects there to be an iteration between GPT4 and GPT5 (most likely GPT4.5) this year, and more recent speculation says it could be as soon as June of this year. See OpenAI’s Community Forum on the related, potential leak. This is not exclusive to OpenAI and more generally, the graphs shared below demonstrate the trend of growing development around GenAI technology. While the rapid development around GenAI is only going to increase, I think we can be confident AWS is serious about GenAI, is making the investments to rapidly update in this LLM arms race we find ourselves in, and will stay in that top tier echelon of model hosting providers. We certainly will continue to monitor the landscape.

Images Source: A Comprehensive Overview of Large Language Models

Conclusion

AWS’s guarantee to keep customer data their own (your GenAI workloads are not used to train models or for other purposes–data in your AWS account stays in your account), their full suite/ecosystem of native tooling/services, and their position of hosting state-of-the-art foundational models, make Amazon Bedrock a very compelling choice for hosting your foundational models over other major providers.

Footnotes

1A callout needs to be made that this set of benchmarks was made before the release of GPT4-Turbo and GPT4-Turbo is not included in them. While AWS and Anthropic would be the first to point out that Claude3 Opus outperforms GPT4, it is not yet definitive with the newer GPT4-Turbo. GPT4-Turbo still needs to be publicly put through leading benchmarks like MMLU, Human Eval, GAIA, etc. The popular LMSYS Chatbot Arena Leaderboard has GPT4-Turbo slightly ahead/in the same echelon as Opus3. This could be misleading, and potentially attributed to testing bias in the LMSYS leaderboard. I predict once GPT4-Turbo has been through the complete set of benchmarks we will see a slight edge for Opus (worth noting, while GPT4-Turbo did increase the context window from 8k tokens to 128k tokens, Claude3 Opus offers a 200k context window). Subjective improvement in performance from GPT4 to GPT4-Turbo is somewhat surprising as the turbo models are branded as the cheaper, more efficient models. No matter the results, Claude3 Opus is still in the top-tier echelon of current foundational models.
Author
Brenden Judson Trek10
Brenden Judson