Terraform Data Sources: A Complete Guide & Examples

Introduction
Step-by-Step Guide
Code Example
Additional Notes
Summary
Conclusion
References

Introduction

Imagine building a house with a pre-designed blueprint - you wouldn't redraw every detail yourself. Terraform data sources work similarly, letting you fetch information from various sources without managing them directly in your code. Think of them as read-only tools that retrieve information about existing resources, like room dimensions or window placements on a blueprint. You define the data source type and any filters, Terraform fetches the data, and you can then use it in your configuration. This offers modularity, reusability, and dynamic configuration. For example, you can get an existing AWS S3 bucket's ID, retrieve the latest AMI ID for a region, or access outputs from other Terraform configurations. Data sources bridge your code and the real world, providing information for effective infrastructure management.

Step-by-Step Guide

Imagine you're building a house with a pre-designed blueprint. You wouldn't redraw every detail of the blueprint yourself, right? Instead, you'd refer to the blueprint for information like room dimensions or window placements.

Terraform data sources work similarly. They let you fetch information from various sources without managing those sources directly within your Terraform code.

Think of data sources as read-only tools. They don't create or modify infrastructure; they simply retrieve information about existing resources.

Here's how it works:

You define a data source: You specify the type of data source (e.g., AWS S3 bucket, Azure VM, etc.) and any necessary filters or criteria.
Terraform fetches the data: Terraform reaches out to the specified source (like AWS or Azure) and retrieves the requested information.
You access the data: You can then use the retrieved data within your Terraform configuration, for example, to reference an existing resource's ID or properties.

This approach offers several benefits:

Modularity: You can separate resource management from data retrieval, making your code cleaner and more organized.
Reusability: You can reuse the same data source across different parts of your configuration or even in different projects.
Dynamic Configuration: Data sources enable you to create dynamic configurations that adapt to existing infrastructure.

For instance, you could use a data source to:

Get the ID of an existing AWS S3 bucket: Instead of hardcoding the bucket name, you can dynamically fetch it using a data source.
Retrieve the latest AMI ID for a specific region: This ensures you're always using the most up-to-date image for your instances.
Access outputs from other Terraform configurations: This allows you to share information between different parts of your infrastructure.

In essence, data sources act as bridges between your Terraform code and the real world, providing you with the information you need to manage your infrastructure effectively.

Code Example

This code demonstrates how to use Terraform data sources to fetch information about existing resources and use it in your configurations. It includes examples of getting an AWS S3 bucket ID, retrieving the latest AMI ID for a specific region, and accessing outputs from other Terraform configurations. Data sources allow you to write more dynamic and reusable Terraform code by referencing existing resources and data.

Here are some code examples demonstrating how to use Terraform data sources:

1. Get the ID of an existing AWS S3 bucket:

data "aws_s3_bucket" "example" {
  bucket = "your-existing-bucket-name"
}

resource "aws_instance" "example" {
  # ... other instance configurations ...

  # Use the bucket ID retrieved from the data source
  user_data = <<-EOF
#!/bin/bash
aws s3 cp s3://${data.aws_s3_bucket.example.id}/your-script.sh /tmp/your-script.sh
chmod +x /tmp/your-script.sh
/tmp/your-script.sh
  EOF
}

This code defines a data source aws_s3_bucket to fetch information about an existing S3 bucket named "your-existing-bucket-name". It then uses the data.aws_s3_bucket.example.id attribute within the user_data script of an EC2 instance to dynamically reference the bucket's ID.

2. Retrieve the latest AMI ID for a specific region:

data "aws_ami" "ubuntu" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["099720109477"] # Canonical Ubuntu
}

resource "aws_instance" "example" {
  # ... other instance configurations ...

  # Use the latest AMI ID retrieved from the data source
  ami = data.aws_ami.ubuntu.id
}

This code defines a data source aws_ami to find the latest Ubuntu 20.04 AMI ID in a specific region. It uses filters to narrow down the search and then uses the data.aws_ami.ubuntu.id attribute to dynamically reference the latest AMI ID when creating an EC2 instance.

3. Access outputs from other Terraform configurations:

Imagine you have two separate Terraform configurations: one for networking and another for deploying applications. You can use a data source to access outputs from the networking configuration within your application configuration.

Networking Configuration (network.tf):

resource "aws_subnet" "example" {
  # ... subnet configurations ...
}

output "subnet_id" {
  value = aws_subnet.example.id
}

Application Configuration (app.tf):

data "terraform_remote_state" "network" {
  backend = "local"
  config = {
    path = "../network/terraform.tfstate"
  }
}

resource "aws_instance" "example" {
  # ... other instance configurations ...

  # Use the subnet ID retrieved from the network configuration
  subnet_id = data.terraform_remote_state.network.outputs.subnet_id
}

This code defines a terraform_remote_state data source to access the state file of the networking configuration. It then uses data.terraform_remote_state.network.outputs.subnet_id to retrieve the subnet_id output value from the networking configuration and use it when creating an EC2 instance in the application configuration.

These are just a few examples of how Terraform data sources can be used. By leveraging data sources, you can write more dynamic, modular, and reusable Terraform code.

Additional Notes

Data sources are particularly useful in multi-environment setups. You can use the same codebase to deploy to different environments (e.g., development, staging, production) and use data sources to fetch environment-specific values.
Data sources can help avoid hardcoding values in your configuration. This makes your code more flexible and easier to maintain.
Terraform offers a wide range of built-in data sources for various providers. You can find a comprehensive list in the Terraform documentation for each provider.
You can also create custom data sources using external data sources and scripts. This allows you to integrate with virtually any data source.
When using data sources, be mindful of potential performance implications. Fetching data from external sources can add overhead to your Terraform runs.
Use data sources strategically to improve the modularity, reusability, and flexibility of your Terraform code.

By understanding and effectively utilizing data sources, you can significantly enhance your infrastructure management workflows with Terraform.

Summary

This article explains how Terraform data sources simplify infrastructure management by retrieving information about existing resources.

Analogy: Data sources are like blueprints for your existing infrastructure. Instead of recreating details, you reference them for information.

How they work:

Define: Specify the data source type (e.g., AWS S3 bucket) and any filters.
Fetch: Terraform retrieves the requested information from the source.
Access: Use the data within your configuration (e.g., referencing an existing resource ID).

Benefits:

Modularity: Separates resource management from data retrieval.
Reusability: Use the same data source across different configurations or projects.
Dynamic Configuration: Adapt configurations to existing infrastructure.

Examples:

Get the ID of an existing AWS S3 bucket dynamically.
Retrieve the latest AMI ID for a specific region.
Access outputs from other Terraform configurations.

Key takeaway: Data sources bridge your Terraform code and your infrastructure, providing essential information for effective management.

Conclusion

In conclusion, Terraform data sources are essential for managing infrastructure efficiently. They act as bridges to the real world, providing your code with up-to-date information about existing resources. By using data sources, you can create more modular, reusable, and dynamic configurations, ultimately leading to more robust and maintainable infrastructure as code.

References

Data Sources - Configuration Language | Terraform | HashiCorp ... | Data sources allow Terraform to use information defined outside of Terraform, defined by another separate Terraform configuration, or modified by functions.
Terraform Data Sources - How They Are Utilized (Example) | Learn what is a data source in Terraform, how data sources work, and how to use them. See the examples. Data sources in remote state explained.
Query data sources | Terraform | HashiCorp Developer | Examples of data sources include machine image IDs from a cloud provider or Terraform outputs from other configurations. Data sources make your configuration ...
Use Terraform Data Sources for a Better Infrastructure as Code | Learn how to leverage Terraform data sources so you manage your Infrastructure as Code more efficiently with ControlMonkey.
Understanding Terraform Data Sources in Modules | Random Blurbs ... | So over the last week I have been battling an issue in Terraform that truly drove me nuts and I think understanding can help someone else who is struggling with the same issue. What is a data sourc…
Terraform Data Sources: Everything You Should Know | Discover Terraform Data Sources in this blog. Gain insights, best practices, and actionable tips for efficient infrastructure management.
Data Sources in Terraform resources explained with example ... | This may include: Here’s an example of using the AWS data source in Terraform to retrieve information about an existing Amazon S3 bucket In this example, we’re using the aws_s3_bucket...
Data sources best practice : r/Terraform | Posted by u/DAL3001 - 1 vote and 6 comments
Is it possible to use variable data sources in a for_each? - Terraform ... | I have a couple of remote state datasources based on AWS account names whose purpose it is to get the account ID. I something similar to this in a data_sources.tf: data "terraform_remote_state" "account1" { backend = "s3" config = { bucket = blah ...etc } } data "terraform_remote_state" "account2" { ...etc } in my main.tf, I’m creating a list variable with the account names, and I’d like to do a for_each on the list of account names and use their associated da...