AWS S3 in Elixir with ExAws

Screencast

In this article we see how to store and retrieve files on AWS S3 using Elixir and with the help of ExAws.

We start by setting up an AWS account and credentials, configure an Elixir application and see the basic upload and download operations with small files.

Then, we see how to deal with large files, making multipart uploads and using presigned urls to create a download stream, processing data on the fly.

Create an IAM user, configure permissions and credentials

If you don’t have an Amazon Web Services account yet, you can create it on https://aws.amazon.com/ and use the free tier for the first 12 months where you have up to 5GB of free S3 storage.

Be sure you check all the limits of the free tier before you start using the service. Always take a look at the billing page to keep track of the usage.

To access to AWS S3 resources, first we need to create an AWS IAM (Identity and Access Management) user with limited permissions.

Once logged into the AWS console, go to the users section of the security credentials page and click on Add user.

Menu on the top-right side of the AWS console
Menu on the top-right side of the AWS console
Create a new IAM user
Create a new IAM user

When creating a user, we need to set a username and most importantly enable the Programmatic access: this means the user can programmatically access to the AWS resources via API.

Username and Programmatic access
Username and Programmatic access

Then we set the permissions, attaching the AmazonS3FullAccess policy and limiting the user to just the S3 service.

AmazonS3FullAccess policy
AmazonS3FullAccess policy

Now, this policy is fine for this demo, but it’s still too broad: a user, or an app, can access to all the buckets, files and settings of S3.

By creating a custom policy, we can limit the user permissions to only the needed S3 actions and buckets. More on this at AWS User Policy Examples

Once the user is created, we can download the Access Key Id and the Secret Access Key. You must keep these keys secret because whoever has them can access to your AWS S3 resources.

IAM user Access key ID and Secret access key
IAM user Access key ID and Secret access key

To create an S3 bucket using the AWS console, go to the S3 section and click on Create bucket, set a bucket name (I’ve used poeticoding-aws-elixir) and be sure to block all the public access.

Bucket name and region
Bucket name and region

Block all public access
Block all public access

Configure ex_aws and environment variables

Let’s create a new Elixir application and add the dependencies to make ex_aws and ex_aws_s3 work

# mix.exs
def deps do
  [
    {:ex_aws, "~> 2.1"},
    {:ex_aws_s3, "~> 2.0"},
    {:hackney, "~> 1.15"},
    {:sweet_xml, "~> 0.6"},
    {:jason, "~> 1.1"},
  ]
end

ExAws, by default, uses hackney HTTP Client to make requests to AWS.

We create the config/config.exs configuration file, where we set access id and secret access keys

# config/config.exs

import Config

config :ex_aws,
  json_codec: Jason,
  access_key_id: {:system, "AWS_ACCESS_KEY_ID"},
  secret_access_key: {:system, "AWS_SECRET_ACCESS_KEY"}

The default ExAws JSON codec is Poison. If we want to use another library, like Jason, we need to explicitly set the jason_codec property.

We don’t want to write our keys in the configuration file. First, because who has access to the code can see them, second because we want to make them easy to change.

We can use environment variables: by passing {:system, "AWS_ACCESS_KEY_ID"} and {:system, "AWS_SECRET_ACCESS_KEY"} tuples the application gets the keys from the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

In case you are on a Unix/Unix-like system (like MacOs or Linux), you can set these environment variables in a script

# .env file
export AWS_ACCESS_KEY_ID="your access key"
export AWS_SECRET_ACCESS_KEY="your secret access key"

and load them with source

$ source .env
$ iex -S mix

Keep this script secret. If you are using git remember to put this script into .gitignore to avoid to commit this file.

If you don’t want to keep these keys in a script, you can always pass them when launching the application or iex

$ AWS_ACCESS_KEY_ID="..." \
  AWS_SECRET_ACCESS_KEY="..." \
  iex -S mix

In case you’re on a Windows machine, you can set the environment variables using the command prompt or the PowerShell

# Windows CMD
set AWS_ACCESS_KEY_ID="..."

# Windows PowerShell
$env:AWS_ACCESS_KEY_ID="..."

Listing the buckets

Now we have everything ready: credentials, application dependencies and ex_aws configured with environment variables. So let’s try the first request.

# load the environment variables
$ source .env

# run iex
$ iex -S mix

iex> ExAws.S3.list_buckets()
%ExAws.Operation.S3{
  http_method: :get,
  parser: &ExAws.S3.Parsers.parse_all_my_buckets_result/1,
  path: "/",
  service: :s3,
  ...,
}

The ExAws.S3.list_buckets() function doesn’t send the request itself, it returns an ExAws.Operation.S3 struct. To make a request we use ExAws.request or ExAws.request!

iex> ExAws.S3.list_buckets() |> ExAws.request!()

%{
  body: %{
    buckets: [
      %{
        creation_date: "2019-11-25T17:48:16.000Z",
        name: "poeticoding-aws-elixir"
      }
    ],
    owner: %{ ... }
  },
  headers: [
    ...
    {"Content-Type", "application/xml"},
    {"Transfer-Encoding", "chunked"},
    {"Server", "AmazonS3"},
    ...
  ],
  status_code: 200
}

ExAws.request! returns a map with the HTTP response from S3. With get_in/2 we can get just the bucket list

ExAws.S3.list_buckets()
|> ExAws.request!()
|> get_in([:body, :buckets])

[%{creation_date: "2019-11-25T17:48:16.000Z", name: "poeticoding-aws-elixir"}]

put, list, get and delete

With ExAws, the easiest way to upload a file to S3 is with ExAws.S3.put_object/4

iex> local_image = File.read!("elixir_logo.png")
<<137, 80, 78, 71, 13, 10, 26, 10, 0, 0, ...>>
    
iex> ExAws.S3.put_object("poeticoding-aws-elixir", "images/elixir_logo.png", local_image) \
...> |> ExAws.request!()

%{
  body: "",
  headers: [...],
  status_code: 200
}

The first argument is the bucket name, then we pass the object key (the path) and the third is the file’s content, local_image. As a fourth argument we can pass a list of options like storage class, meta, encryption etc.

Using the AWS management console, on the S3 bucket’s page, we can see the file we’ve just uploaded.

Uploaded file visible on AWS Management console
Uploaded file visible on AWS Management console

We list the bucket’s objects with ExAws.S3.list_objects

iex> ExAws.S3.list_objects("poeticoding-aws-elixir") \
...> |> ExAws.request!() \
...> |> get_in([:body, :contents]) \

[
  %{
    e_tag: "\"...\"",
    key: "images/elixir_logo.png",
    last_modified: "2019-11-26T14:40:34.000Z",
    owner: %{ ... }
    size: "29169",
    storage_class: "STANDARD"
  }
]

Passing the bucket name and object key to ExAws.S3.get_object/2, we get the file’s content.

iex> resp = ExAws.S3.get_object("poeticoding-aws-elixir", "images/elixir_logo.png") \
...> |> ExAws.request!()
    
%{
  body: <<137, 80, 78, 71, 13, 10, 26, ...>>,
  headers: [
    {"Last-Modified", "Tue, 26 Nov 2019 14:40:34 GMT"},
    {"Content-Type", "application/octet-stream"},
    {"Content-Length", "29169"},
    ...
  ],
  status_code: 200
}

The request returns a response map with the whole file’s content in :body.

iex> File.read!("elixir_logo.png") == resp.body
true

We can delete the object with ExAws.S3.delete_object/2.

iex> ExAws.S3.delete_object("poeticoding-aws-elixir", "images/elixir_logo.png") \
...> |> ExAws.request!()

%{
  body: "",
  headers: [
    {"Date", "Tue, 26 Nov 2019 15:04:35 GMT"},
    ...
  ],
  status_code: 204
}

After listing again the objects we see, as expected, that the bucket is now empty.

iex> ExAws.S3.list_objects("poeticoding-aws-elixir") 
...> |> ExAws.request!() 
...> |> get_in([:body, :contents])

[]

Multipart upload and large files

The image in the example above is just ~30 KB and we can simply use put_object and get_object to upload and download it, but there are some limits:

S3 and ExAws client support multipart uploads. It means that a file is divided in parts (5 MB parts by default) which are sent separately and in parallel to S3! In case the part’s upload fails, ExAws retries the upload of that 5 MB part only.

With multipart uploads we can upload objects from 5 MB to 5 TB – ExAws uses file streams, avoiding to keep the whole file in memory.

Let’s consider numbers.txt, a relatively large txt file we’ve already seen in another article – Elixir Stream and large HTTP responses: processing text ( you can download from this url https://www.poeticoding.com/downloads/httpstream/numbers.txt).

numbers.txt size is 125 MB, much smaller than the 5GB limit imposed by the single PUT operation. But to me this file is large enough to benefit from a multipart upload!

iex> ExAws.S3.Upload.stream_file("numbers.txt") \
...> |> ExAws.S3.upload("poeticoding-aws-elixir", "numbers.txt") \
...> |> ExAws.request!()

# returned response
%{
  body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\n<CompleteMultipartUploadResult>...",
  headers: [
    {"Date", "Tue, 26 Nov 2019 16:34:08 GMT"},
    {"Content-Type", "application/xml"},
    {"Transfer-Encoding", "chunked"},
  ],
  status_code: 200
}

To have an idea of what ExAws is doing, we can enable the debug option in the ex_aws configuration

# config/config.exs
   
config :ex_aws,
 debug_requests: true,
 json_codec: Jason,
 access_key_id: {:system, "AWS_ACCESS_KEY_ID"},
 secret_access_key: {:system, "AWS_SECRET_ACCESS_KEY"}

We should see multiple parts being sent at the same time

17:11:24.586 [debug] ExAws: Request URL: "...?partNumber=2&uploadId=..." ATTEMPT: 1
17:11:24.589 [debug] ExAws: Request URL: "...?partNumber=1&uploadId=..." ATTEMPT: 1

Multipart upload timeout

When the file is large, the upload could take time. To upload the parts concurrently, ExAws uses Elixir Tasks – the default timeout for part’s upload is set to 30 seconds, which could not be enough with a slow connection.

** (exit) exited in: Task.Supervised.stream(30000)
    ** (EXIT) time out

We can change the timeout by passing a new :timeout to ExAws.S3.upload/4, 120 seconds in this example.

ExAws.S3.Upload.stream_file("numbers.txt")
|> ExAws.S3.upload(
  "poeticoding-aws-elixir", "numbers.txt", 
  [timeout: 120_000]) 
|> ExAws.request!()

Download a large file

To download a large file it’s better to avoid get_object, which holds the whole file content in memory. With ExAws.S3.download_file/4 instead, we can download the data in chunks saving them directly into a file.

ExAws.S3.download_file(
  "poeticoding-aws-elixir", 
  "numbers.txt", "local_file.txt"
) 
|> ExAws.request!()

presigned urls and download streams – process a file on the fly

Unfortunately we can’t use ExAws.S3.download_file/4 to get a download stream and process the file on the fly.

However, we can generate a presigned url to get a unique and temporary URL and then download the file with a library like mint or HTTPoison.

iex> ExAws.Config.new(:s3) \
...> |> ExAws.S3.presigned_url(:get, "poeticoding-aws-elixir", "numbers.txt")

{:ok, "https://...?X-Amz-Credential=...&X-Amz-Expires=3600"}

By default, the URL expires after one hour – with the :expires_in option we can set a different expiration time (in seconds).

iex> ExAws.Config.new(:s3) \
...> |> ExAws.S3.presigned_url(:get, "poeticoding-aws-elixir", 
   "numbers.txt", [expires_in: 300]) # 300 seconds

{:ok, "https://...?X-Amz-Credential=...&X-Amz-Expires=300"}

Now that we have the URL, we can use Elixir Streams to process the data on the fly and calculate the sum of all the lines numbers.txt. In this article you find the HTTPStream’s code and how it works.

# generate the presigned URL
ExAws.Config.new(:s3) 
|> ExAws.S3.presigned_url(:get, "poeticoding-aws-elixir", "numbers.txt")

# returning just the URL string to the next step
|> case do
  {:ok, url} -> url
end

# using HTTPStream to download the file in chunks
# getting a Stream of lines
|> HTTPStream.get()
|> HTTPStream.lines()

## converting each line to an integer
|> Stream.map(fn line-> 
  case Integer.parse(line) do
    {num, _} -> num
    :error -> 0
  end
end)

## sum the numbers
|> Enum.sum()

|> IO.inspect(label: "result")

In the first two lines we generate a presigned url. Then, with HTTPStream.get we create a stream that lazily downloads the file chunk by chunk, transforming chunks into lines with HTTPStream.lines, mapping the lines into integers and summing all the numbers. The result should be 12468816.