Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Amazon Bedrock Pricing

Now that we understand the basics of Amazon Bedrock, let's explore the pricing options and cost optimization strategies. Amazon Bedrock offers different pricing models to accommodate various use cases and workload patterns.

Pricing Models

On-Demand Mode

  • Pay-as-you-go with no commitment required
  • Pricing structure:
    • Text models: Charged for every input and output token processed
    • Embeddings models: Charged for every input token processed
    • Image models: Charged for every image generated
  • Works with base models only that are provided as part of Amazon Bedrock.

Now if you want to have some cost savings, you can use the batch mode.

Batch Mode

  • Make multiple predictions at a time with output delivered as a single file in Amazon S3
  • Discounts of up to 50% compared to on-demand pricing
  • Trade-off: Responses are delivered later than real-time
  • Ideal for cost savings when immediate results aren't required

Provisioned Throughput

  • Purchase model units for a specific time period (e.g., one month or six months)
  • Provides guaranteed throughput with maximum number of input and output tokens processed per minute
  • Primary benefit: Maintains capacity and performance
  • Does not necessarily provide cost savings
  • Works with base models but is required for:
    • Fine-tuned models
    • Custom models
    • Imported models
  • Note: Cannot use on-demand mode with custom or fine-tuned models

Model Improvement Pricing

Understanding the cost implications of different model improvement approaches:

1. Prompt Engineering

  • Uses techniques to improve prompts and model outputs
  • No additional computation or fine-tuning required
  • Very cheap to implement
  • No further model training needed

2. RAG (Retrieval Augmented Generation)

  • Uses external knowledge base to supplement model knowledge
  • Less complex with no financial model changes
  • No retraining or fine-tuning required
  • Additional costs include:
    • Vector database maintenance
    • System to access the vector database

3. Instruction-Based Fine-Tuning

  • Fine-tunes the model with specific instructions
  • Requires additional computation
  • Used to steer how the model answers questions and set the tone
  • Uses labeled data

4. Domain Adaptation Fine-Tuning

  • Most expensive option
  • Adapts model trained on domain-specific datasets
  • Requires creating extensive data and retraining the model
  • Uses unlabeled data (unlike instruction-based fine-tuning)
  • Requires intensive computation

Cost Savings Strategies

Pricing Model Selection

  • On-demand pricing: Great for unpredictable workloads with no long-term commitments
  • Batch mode: Achieve up to 50% discounts when you can wait for results
  • Provisioned throughput: Not a cost-saving measure - use for capacity reservation from AWS and providers

Model Configuration

  • Temperature, Top K, and Top P parameters: Modifying these has no impact on pricing
  • Model size: Smaller models are generally cheaper, but this varies by provider

Token Optimization

The main driver of cost savings in Amazon Bedrock is optimizing token usage:

  • Minimize input tokens: Write prompts as efficiently as possible
  • Minimize output tokens: Keep outputs concise and short
  • Focus on token optimization as the primary cost reduction strategy

That's the key information about Amazon Bedrock pricing and cost optimization strategies. The main takeaway is that token usage is the primary cost driver, so optimizing your prompts and outputs is essential for cost management.

Here is the pdf link for better understanding. Read this first