Introducing the next generation of Claude (2024)

Introducing the next generation of Claude (1)

Today, we're announcing the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Each successive model offers increasingly powerful performance, allowing users to select the optimal balance of intelligence, speed, and cost for their specific application.

Opus and Sonnet are now available to use in claude.ai and the Claude API which is now generally available in 159 countries. Haiku will be available soon.

Claude 3 model family

Introducing the next generation of Claude (2)

A new standard for intelligence

Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.

All Claude 3 models show increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.

Below is a comparison of the Claude 3 models to those of our peers on multiple benchmarks [1] of capability:

Introducing the next generation of Claude (3)

Near-instant results

The Claude 3 models can power live customer chats, auto-completions, and data extraction tasks where responses must be immediate and in real-time.

Haiku is the fastest and most cost-effective model on the market for its intelligence category. It can read an information and data dense research paper on arXiv (~10k tokens) with charts and graphs in less than three seconds. Following launch, we expect to improve performance even further.

For the vast majority of workloads, Sonnet is 2x faster than Claude 2 and Claude 2.1 with higher levels of intelligence. It excels at tasks demanding rapid responses, like knowledge retrieval or sales automation. Opus delivers similar speeds to Claude 2 and 2.1, but with much higher levels of intelligence.

Strong vision capabilities

The Claude 3 models have sophisticated vision capabilities on par with other leading models. They can process a wide range of visual formats, including photos, charts, graphs and technical diagrams. We’re particularly excited to provide this new modality to our enterprise customers, some of whom have up to 50% of their knowledge bases encoded in various formats such as PDFs, flowcharts, or presentation slides.

Introducing the next generation of Claude (4)

Fewer refusals

Previous Claude models often made unnecessary refusals that suggested a lack of contextual understanding. We’ve made meaningful progress in this area: Opus, Sonnet, and Haiku are significantly less likely to refuse to answer prompts that border on the system’s guardrails than previous generations of models. As shown below, the Claude 3 models show a more nuanced understanding of requests, recognize real harm, and refuse to answer harmless prompts much less often.

Introducing the next generation of Claude (5)

Improved accuracy

Businesses of all sizes rely on our models to serve their customers, making it imperative for our model outputs to maintain high accuracy at scale. To assess this, we use a large set of complex, factual questions that target known weaknesses in current models. We categorize the responses into correct answers, incorrect answers (or hallucinations), and admissions of uncertainty, where the model says it doesn’t know the answer instead of providing incorrect information. Compared to Claude 2.1, Opus demonstrates a twofold improvement in accuracy (or correct answers) on these challenging open-ended questions while also exhibiting reduced levels of incorrect answers.

In addition to producing more trustworthy responses, we will soon enable citations in our Claude 3 models so they can point to precise sentences in reference material to verify their answers.

Introducing the next generation of Claude (6)

Long context and near-perfect recall

The Claude 3 family of models will initially offer a 200K context window upon launch. However, all three models are capable of accepting inputs exceeding 1 million tokens and we may make this available to select customers who need enhanced processing power.

To process long context prompts effectively, models require robust recall capabilities. The 'Needle In A Haystack' (NIAH) evaluation measures a model's ability to accurately recall information from a vast corpus of data. We enhanced the robustness of this benchmark by using one of 30 random needle/question pairs per prompt and testing on a diverse crowdsourced corpus of documents. Claude 3 Opus not only achieved near-perfect recall, surpassing 99% accuracy, but in some cases, it even identified the limitations of the evaluation itself by recognizing that the "needle" sentence appeared to be artificially inserted into the original text by a human.

Introducing the next generation of Claude (7)

Responsible design

We’ve developed the Claude 3 family of models to be as trustworthy as they are capable. We have several dedicated teams that track and mitigate a broad spectrum of risks, ranging from misinformation and CSAM to biological misuse, election interference, and autonomous replication skills. We continue to develop methods such as Constitutional AI that improve the safety and transparency of our models, and have tuned our models to mitigate against privacy issues that could be raised by new modalities.

Addressing biases in increasingly sophisticated models is an ongoing effort and we’ve made strides with this new release. As shown in the model card, Claude 3 shows less biases than our previous models according to the Bias Benchmark for Question Answering (BBQ). We remain committed to advancing techniques that reduce biases and promote greater neutrality in our models, ensuring they are not skewed towards any particular partisan stance.

While the Claude 3 model family has advanced on key measures of biological knowledge, cyber-related knowledge, and autonomy compared to previous models, it remains at AI Safety Level 2 (ASL-2) per our Responsible Scaling Policy. Our red teaming evaluations (performed in line with our White House commitments and the 2023 US Executive Order) have concluded that the models present negligible potential for catastrophic risk at this time. We will continue to carefully monitor future models to assess their proximity to the ASL-3 threshold. Further safety details are available in the Claude 3 model card.

Easier to use

The Claude 3 models are better at following complex, multi-step instructions. They are particularly adept at adhering to brand voice and response guidelines, and developing customer-facing experiences our users can trust. In addition, the Claude 3 models are better at producing popular structured output in formats like JSON—making it simpler to instruct Claude for use cases like natural language classification and sentiment analysis.

Model details

Claude 3 Opus is our most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what’s possible with generative AI.

Cost [Input $/million tokens \| Output $/million tokens]	$15 \| $75
Context window	200K*
Potential uses	Task automation: plan and execute complex actions across APIs and databases, interactive coding R&D: research review, brainstorming and hypothesis generation, drug discovery Strategy: advanced analysis of charts & graphs, financials and market trends, forecasting
Differentiator	Higher intelligence than any other model available.

^{*1M tokens available for specific use cases, please inquire.}

Introducing the next generation of Claude (8)

Claude 3 Sonnet strikes the ideal balance between intelligence and speed—particularly for enterprise workloads. It delivers strong performance at a lower cost compared to its peers, and is engineered for high endurance in large-scale AI deployments.

Cost [Input $/million tokens \| Output $/million tokens]	$3 \| $15
Context window	200K
Potential uses	Data processing: RAG or search & retrieval over vast amounts of knowledge Sales: product recommendations, forecasting, targeted marketing Time-saving tasks: code generation, quality control, parse text from images
Differentiator	More affordable than other models with similar intelligence; better for scale.

Claude 3 Haiku is our fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with unmatched speed. Users will be able to build seamless AI experiences that mimic human interactions.

Cost [Input $/million tokens \| Output $/million tokens]	$0.25 \| $1.25
Context window	200K
Potential uses	Customer interactions: quick and accurate support in live interactions, translations Content moderation: catch risky behavior or customer requests Cost-saving tasks: optimized logistics, inventory management, extract knowledge from unstructured data
Differentiator	Smarter, faster, and more affordable than other models in its intelligence category.

Model availability

Opus and Sonnet are available to use today in our API, which is now generally available, enabling developers to sign up and start using these models immediately. Haiku will be available soon. Sonnet is powering the free experience on claude.ai, with Opus available for Claude Pro subscribers.

Sonnet is also available today through Amazon Bedrock and in private preview on Google Cloud’s Vertex AI Model Garden—with Opus and Haiku coming soon to both.

Smarter, faster, safer

We do not believe that model intelligence is anywhere near its limits, and we plan to release frequent updates to the Claude 3 model family over the next few months. We're also excited to release a series of features to enhance our models' capabilities, particularly for enterprise use cases and large-scale deployments. These new features will include Tool Use (aka function calling), interactive coding (aka REPL), and more advanced agentic capabilities.

As we push the boundaries of AI capabilities, we’re equally committed to ensuring that our safety guardrails keep apace with these leaps in performance. Our hypothesis is that being at the frontier of AI development is the most effective way to steer its trajectory towards positive societal outcomes.

We’re excited to see what you create with Claude 3 and hope you will give us feedback to make Claude an even more useful assistant and creative companion. To start building with Claude, visit anthropic.com/claude.

Footnotes

This table shows comparisons to models currently available commercially that have released evals. Our model card shows comparisons to models that have been announced but not yet released, such as Gemini 1.5 Pro. In addition, we’d like to note that engineers have worked to optimize prompts and few-shot samples for evaluations and reported higher scores for a newer GPT-4T model. Source.

Introducing the next generation of Claude (2024)

FAQs

Is Claude better than ChatGPT? ›

Claude is better at understanding human language and can provide more accurate responses thanks to its ethical, constitutional design. But ChatGPT does a better job at tasks such as content creation, image generation and voice chat, and offers free, unlimited access to a range of services you have to pay for on Claude.

Get More Info Here ›

What is the difference between Claude Anthropoic and GPT 4? ›

Mixed Evaluations: Claude 3 Opus outperforms with 86.8%, in contrast to GPT-4's 83.1%. Knowledge Q&A: Claude 3 Opus marginally leads with 96.4%, while GPT-4 is close behind at 96.3%. Common Knowledge: Claude 3 Opus scores 95.4%, slightly better than GPT-4's 95.3%.

Can Claude generate code? ›

Code Generation: Code generation has become an attractive feature and a key competitive advantage with every new AI model release. Claude can generate code snippets, understand different programming languages, explain code functionality, and assist in debugging.

Get More Info ›

What is Claude 3 good for? ›

You can use the AI chatbot to analyze, organize, and extract helpful information from the data. Personal Assistant: Claude 3 is best known for generating safe output, following user prompts, and understanding the user intent.

Discover More ›

Is Claude AI Premium worth it? ›

The Bottom Line: Understanding the Metrics That Matter

It begs the question: Are the advanced metrics and the promise of marginally better performance enough to justify the cost of a premium subscription? For most, the answer might lean towards a practical and budget-friendly no.

Read On ›

Can Claude AI be detected? ›

Phrasly analyzes the text for specific patterns and anomalies typical of AI-generated content, making it possible to distinguish between human and AI-written texts. Therefore, despite its advanced capabilities, Claude 3 Opus can still be detected by current AI detection technologies.

Read The Full Story ›

Is Claude faster than GPT? ›

How does Claude AI line up against ChatGPT? The various Claude offerings map well to OpenAI's language model offerings: Claude-Instant is the cheaper and faster offering, similar to GPT-3.5, whereas Claude-2 is the cutting-edge but slower model, competitive with GPT-4.

Get More Info Here ›

How fast is Claude compared to OpenAI? ›

- Claude took 5 minutes to execute 52 prompts. OpenAI took 7 minutes. Forgot to mention, Claude appears to be a lot more rate-limited that OpenAI. Hit quite a few concurrency rate limits, but as long as you have auto-retry, it is non-issue.

View Details ›

Is GPT-4 more powerful than ChatGPT? ›

Compared to GPT-3.5 and the free ChatGPT version that accompanies it, GPT-4 is a better solution for users who want more diverse content outputs and inputs, require more accurate and nuanced outcomes, and need a heavier emphasis on enterprise safety and privacy.

Is Claude AI safe? ›

Claude (anthropic.com/product; sign up at claude.ai) is a generative AI chatbot from Anthropic that has sophisticated safety features built into its analytical systems. It doesn't use multimodal content or live search, instead providing highly capable and reliable text-based information management.

How much does Claude cost? ›

Claude AI is free with limitations, which is currently around 30 messages a day. All you need is an email address to gain access to the free version. To upgrade for access to Anthropic's faster and more intelligent Claude AI models, it costs $20-a-month, billed annually.

Who is behind Claude AI? ›

The company behind Claude AI

Claude was developed by Anthropic, which bills itself as an AI safety and research firm. Based in San Francisco, Anthropic was founded in 2021 by former OpenAI (the company that makes ChatGPT and DALL-E) executives and researchers. Google and Amazon are major investors.

Learn More Now ›

Why is Claude AI better than ChatGPT? ›

Claude has a noticeably more "human" and empathetic approach than ChatGPT, which tends to come off as more robotic and rational. While both models are effective at analysis, Claude's larger context window makes it better for longer documents.

Tell Me More ›

Which AI is best for coding? ›

If finding issues and vulnerabilities is your only responsibility in the software development process, Snyk is the best free AI tool for coding. It uses machine learning and different analysis techniques to find what's wrong with the code.

Read The Full Story ›

Can Claude do image generation? ›

Can Claude generate, produce, edit, manipulate or create images? No, Claude is an image understanding model only.

Read The Full Story ›

Is Claude 3 Opus better than GPT-4? ›

Among the three models, Claude Opus 3 performs significantly better, with a score of 90.7%, followed by Gemini Ultra with 79% and GPT-4, with 74.5%.

Is there a better AI than ChatGPT 4? ›

Best overall: Claude 3. Best for Live Data: Google Gemini. Most Creative: Microsoft Copilot. Best for Research: Perplexity.

Discover More Details ›

Why is Claude AI better? ›

Chatty Claude-y

It answers questions in easy-to-understand human-like language that makes it the most ideal AI chatbot for most people. It's like ChatGPT, but with more refinement towards natural and less robotic language. It also has more up-to-date training data, going up to August 2023 as opposed to September 2021.

Find Out More ›

Is Claude AI safe to use? ›

Claude AI is a chatbot that's built with safety and ethics in mind. It's developed using a safe Large Language Model (LLM) and carefully designed to produce honest, harmless, and helpful content. Hence, I'd say it's safe to use Claude AI, and it does not collect or store your personal information.

Introducing the next generation of Claude (2024)

Claude 3 model family

A new standard for intelligence

Near-instant results

Strong vision capabilities

Fewer refusals

Improved accuracy

Long context and near-perfect recall

Responsible design

Easier to use

Model details

Model availability

Smarter, faster, safer

Footnotes

FAQs

Is Claude better than ChatGPT? ›

How much does Claude cost? ›

References