Handling failed SQS Events using AWS Dead Letter Queue(DLQ)

Reading Time: 3 minutes
Sending Messages to Amazon SQS from Spring Boot Application - Blog

Amazon SQS is an amazing simple queuing service which offers us a secure, durable hosted queue which lets us integrate and decouple distributed software component. One of the exciting features that SQS provides us with is the support of dead letter queue. Whenever we are using SQS for queuing messages for our services there may be times that our message gets corrupted or for some reason the application is not able to consume the message. This is where DLQ comes in.

How does dead letter Queue Works?

Sometimes, messages can’t be processed because of a variety of possible issues, such as erroneous conditions within the producer or consumer application or an unexpected state change that causes an issue with your application code. Sometimes, producers and consumers might fail to interpret aspects of the protocol that they use to communicate, causing message corruption or loss. Also, the consumer’s hardware errors might corrupt message payload.

The redrive policy specifies the source queue, the dead-letter queue, and the conditions under which Amazon SQS moves messages from the former to the latter if the consumer of the source queue fails to process a message a specified number of times. Some important points to remember are:

  • To specify a dead-letter queue, you can use the console or the AWS SDK for Java. You must do this for each queue that sends messages to a dead-letter queue. Multiple queues of the same type can target a single dead-letter queue.
  • The dead-letter queue of a FIFO queue must also be a FIFO queue. Similarly, the dead-letter queue of a standard queue must also be a standard queue.
  • You must use the same AWS account to create the dead-letter queue and the other queues that send messages to the dead-letter queue. 

What are the benefits of dead-letter queues?

The main task of a dead-letter queue is handling message failure. Some benefits for using it are:

  • Configure an alarm for any messages delivered to a dead-letter queue.
  • Examine logs for exceptions that might have caused messages to be delivered to a dead-letter queue.
  • Analyse the contents of messages delivered to a dead-letter queue to diagnose software or the producer’s or consumer’s hardware issues.
  • Determine whether you have given your consumer sufficient time to process messages.

Handling failures Using DLQ

The first approach we can use is to leverage the Visibility timeout feature of SQS for retry mechanism. We can explore this approach given here.

The other approach for handling failed messages is that we can use the dead letter queue.

A dead letter queue is another queue which is leveraged by other queues to isolate messages that cannot be processed or consumed successfully by the consumer.

  • A dead letter queue must be created first before it is designated as a dead letter queue.
  • When a source queue is created, we can assign a dead letter queue for that source queue.
  • If the source queue is fifo, the dead letter queue should be fifo as well.
  • Message retention period of dead letter queue should be higher than source queue.

Now the question arises what we do once we have a failed message in the dead letter queue?

When a message arrives in DLQ we can analyze what might have caused the error by reviewing any relevant logs, making changes to your stack, and running a redrive function to retry those messages.

A redrive function is how we get those failed messages out of the DLQ and back into the original pipeline to retry the operation. Redrive function can be a AWS::SERVERLESS::FUNCTION which will have the retry logic for failed messages. The logic can have retried implantation according to our use. 

You can check the implementation for this method here.

For more blogs on AWS check out Knoldus blogs.

You can check out the documentation here.

This image has an empty alt attribute; its file name is footer-2.jpg