Yoni Leitersdorf

Treating your cloud infrastructure as code (IaC) enables you to handle the growth in demand for your applications. Additionally, you are adopting IaC to scale your cloud environment. IaC makes the roll-out and upkeep of your environment more consistent and repeatable. There are many IaC technologies. One is offered by each leading cloud service provider (CloudFormation for AWS, Templates for Azure, etc.). There’s also Terraform and Pulumi, who are cross-cloud solutions that are gaining steam.

Today, you typically rely on your security team to manually review your IaC and catch potential security issues before deployment. Unfortunately, your security team is struggling to keep up with the demand. The good news is that we are beginning to see tools like checkov, tfsec and AWS CloudFormation Guard. These tools can work in conjunction with your infrastructure automation tools (e.g. AWS CloudFormation, terraform, etc.) to identify potential security risks early in the development cycle. They will scan the terraform or cloudformation files. They can also integrate with your CI/CD pipeline to prevent an insecure infrastructure from being deployed.

While these tools are a technological advancement, there are two main pitfalls with them:

  1. These tools are limited in scope as they evaluate IaC. By analyzing only the “build state” of the environment, these tools miss the ability to stitch the “build state” with the “live state” of an environment. They only provide a partial view, specific to the sub-section of the environment at hand. As a result, many security issues can happen undetected! You would need to fall back on the manual process to catch these issues, otherwise they will be missed.
  2. The second challenge is that these tools do not understand the complexity of the relationships between resources. For example, how problematic a security group is, depends on what types of resources it is associated with, what ports they have open, what subnets they are connected to, etc. Without taking this into account, there are many false positives and false negatives generated. This means noise, as well as missing important security violations.

We will look at two specific examples to illustrate the above mentioned challenges: 

1) False Positives, False Negatives

Most tools will look to ensure that no security groups allow ingress from 0.0.0.0/0 or ::/0 to port 22 for SSH. This can get extremely noisy. What if you are not using that security group? Or what if you are, but you’re doing so in a subnet where the NACL disallows port 22, or there’s no routing to the Internet? If the security group was not in use, you would want to suppress the false positive alert.

module "vpc_example_complete-vpc" {
  source  = "terraform-aws-modules/vpc/aws//examples/complete-vpc"
}

resource "aws_security_group" "sg" {
  vpc_id = module.vpc_example_complete-vpc.vpc_id

  ingress {
    from_port = 22
    protocol = "tcp"
    to_port = 22
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_instance" "test" {
  ami           = "ami-07cda0db070313c52"
  instance_type = "t2.micro"
  subnet_id     = module.vpc_example_complete-vpc.private_subnets[0]
  vpc_security_group_ids = [aws_security_group.sg.id]
  associate_public_ip_address = true
}

Can you spot in the above TF the reason why having port 22 open to the Internet is not actually an issue? Hint: it’s in the subnet. Notice that the EC2 instance is created in a private subnet. The vpc module used generates private and public subnets, and the private subnets don’t have any routes to the Internet.

Evaluating this terraform file without understanding the relationships between the resources will cause a false positive. It’s clear that the NACL is blocking the SSH traffic, so the security group, while ill-configured, is not a real problem to deal with at this time. It’s important to keep noise levels low and manageable.

So, without understanding the context, the relationship between resources, you risk having many false positives obstructing your pipeline, as well as false negatives.

2) Lack of understanding of how resources generated by separate processes impact one another

Imagine you have created an RDS database, placing it in a subnet that has Internet access, but with no public IP. In such a case, the database is not publicly accessible.

resource "aws_vpc" "nondefault" {
  ...
}

resource "aws_network_acl" "ec2_nacl" {
  ... (this NACL allows all traffic, like the default NACL) ...
}

resource "aws_subnet" "nondefault_1" {
  vpc_id = aws_vpc.nondefault.id
  cidr_block = "10.1.1.128/25"
  
   ...
}

resource "aws_subnet" "nondefault_2" {
  vpc_id = aws_vpc.nondefault.id
  cidr_block = "10.1.1.0/25"
  
   ....
}

resource aws_route_table "nondefault_1" {
   ...(route to the internet via an Internet Gateway)...
}

resource "aws_db_subnet_group" "db" {
  name = "rds_db"
  subnet_ids = [aws_subnet.nondefault_1.id, aws_subnet.nondefault_2.id]

}

resource "aws_security_group" "db" {
  vpc_id = aws_vpc.nondefault.id
  ingress {
    from_port = 3306
    protocol = "tcp"
    to_port = 3306
    cidr_blocks = [aws_subnet.nondefault_1.cidr_block]
  }
}

resource "aws_db_instance" "test" {
  ... (usual DB parameters) ...
  db_subnet_group_name = aws_db_subnet_group.db.name
  vpc_security_group_ids = [ aws_security_group.db.id]
  publicly_accessible = false
}

So, not publicly accessible, right? Now imagine someone else, working with the same cloud account, is intending to deploy a new EC2 instance separately, on the same subnet. This EC2 instance would be publicly accessible.

resource "aws_security_group" "publicly_accessible_sg" {
  vpc_id = aws_vpc.nondefault.id
  ingress {
    from_port = 0
    protocol = "tcp"
    to_port = 65000
    cidr_blocks = ["0.0.0.0/0"]
  }
  egress {
    from_port = 0
    protocol = "tcp"
    to_port = 65000
    cidr_blocks = ["0.0.0.0/0"]
  }
}

// This instance can potentially be used to hop into the DB
resource "aws_instance" "public_ins" {
  ami = "ami-0130bec6e5047f596"
  instance_type = "t3.nano"
  associate_public_ip_address = true
  vpc_security_group_ids = [aws_security_group.publicly_accessible_sg.id]
  subnet_id = "subnet-samesubnetasdbwascreatedonabove"

}

What’s wrong here? Each Terraform is not necessarily a security concern by themselves. However, the user creating the RDS database can potentially have their database exposed to outside access. The owner of the EC2 instance Terraform file inadvertently introduced a new security concern to the RDS instance, and the owner had no idea! The owner could have been a 3rd party, even.

So, without stitching together the cloud environment and the proposed plan, it’s very easy to overlook potentially serious security concerns.

Another approach

As we, at Indeni, saw the above challenges and examples, as well as many others, we decided to take them head on. We’re working on a new product, called Cloudrail, that is capable of analyzing IaC files, together with the cloud environments they are targeting, and executing complicated rules, which rely on inter-resource relationships (“context”).

Our goal is to have Cloudrail integrated into CI/CD, catching security issues in IaC code before they make it into the production environment.

We have released v0.1. It’s an early version that should only be used in development and small test environments at this point. We would very much welcome your feedback on it. Please take it for a spin, and even use a few test cases we’ve provided here:

https://github.com/indeni/cloudrail-demo

Summary

In a perfect world, you should always compare your intent expressed in the IaC files with the real life environment to ensure that you are catching all possible security issues before deployment, and you are only getting notifications of true security risks.

We would love to hear your thoughts on this and what other challenges you may have experienced. Are you thinking about security in IaC? Are you looking to integrate security validation in your CI/CD? How much time is this process consuming today? What are your developers thinking about all this?

Please meet us on the Cloudrail support on Slack. You can join at this link:

https://indeni.com/cloudrail-user-support/

See you in the #cloudrail-user-support channel.