· Cloud Architecture  · 8 min read

Real-Time Alerts and Zero Data Loss: The AWS Lambda Pattern with Effective Monitoring You Can't Ignore

Discover how to create a reliable and monitored asynchronous processing system with AWS Lambda. This pattern combines SQS queues, dead letter queues (DLQs), and CloudWatch alarms to prevent data loss and provide real-time alerts.

Discover how to create a reliable and monitored asynchronous processing system with AWS Lambda. This pattern combines SQS queues, dead letter queues (DLQs), and CloudWatch alarms to prevent data loss and provide real-time alerts.

Introduction

Have you ever woken up to a barrage of error messages, only to realize that your application lost critical data overnight? If not, you at least had nightmares about the possibility.

Imagine running an online store during a holiday sale. The last thing you want is for order processing to fail silently. This pattern ensures every order is processed or flagged immediately for intervention.

Or imagine how in event-driven architectures, missing a single event can disrupt the entire workflow. By monitoring your system with automated alarms, you catch issues before they ripple through your system.

But what if I told you there’s a straightforward way to shield yourself from this menace? A pattern that not only prevents data loss but also keeps you in the loop with real-time alerts? I’m thrilled to share this simple, yet game-changing AWS Lambda pattern that combines SQS queues, dead letter queues (DLQs), and CloudWatch alarms. It’s the most used Layer 3 Constructin our quiver.

Let’s dive into making your asynchronous processing bulletproof!

Understanding the Challenge

Asynchronous Processing with Lambda

Asynchronous processing with Lambda functions are like ordering food and not having to wait by the door—you get on with your life while your meal is on its way. It’s fantastic for performance and scalability. But: if something goes wrong with your order, you might not find out until you’re hungry and disappointed.

In tech terms, asynchronous invocations don’t wait around to see if everything went smoothly. If a function fails, it might not retry, and you could lose that event entirely. And even worse: not notice until it occurred multiple times. No errors, no alerts—the event is just gone.

Data loss doesn’t just hurt your application’s integrity; it erodes trust with your users and stakeholders. And in a world where data is king, that’s a price too high to pay.

Dead Letter Queues (DLQs)

When a message can’t be processed after a certain number of retries, it’s sent to the DLQ. This ensures that failed messages aren’t discarded into the void. After the issue is resolved, you can reprocess the messages with just a few clicks/commands.

But here’s the catch: DLQs don’t scream for attention. Without monitoring, they’re like silent guardians—you won’t know there’s an issue unless you check manually.

An Infrastructure Pattern to the Rescue

So, how do we turn our silent DLQ into a screaming canary? By integrating it with CloudWatch alarms and SNS notifications. This pattern transforms your DLQ from a passive repository into an active monitoring system.

Monitored Lambda Queue Architecture

Meet the Players

  • AWS SQS Queue: Acts as the reliable messenger, ensuring your events are queued and ready for processing.
  • Lambda Function: The worker that processes messages from the queue.
  • CloudWatch Alarms: Keeps a vigilant eye on your DLQ and sends real-time alerts.
  • SNS Topic: The megaphone that broadcasts alerts to your communication channels.
  • Simple Email Service: Sends email notifications to keep you informed. This can be replaced or extended with other communication services, like Slack or SMS.

What This Pattern Brings to the Table

  • Zero Data Loss: Every message is accounted for—either processed successfully or captured for review.
  • Real-Time Monitoring: Immediatealerts enable you to address issues before they escalate.
  • Scalability: Efficiently handles high-throughput applications without missing a beat.
  • Cost-Effectiveness: Reduces expenses associated with data loss and recovery efforts.

Let’s Build It Together!

Here’s the pattern as CDKcode, plug and play ready. (You can find the full implementation in our GitHub Repository)

1. The Blowhorn for Alerts

Error notifications will be distributed via AWS SNS (Simple Notification Service). This managed service allows you to attach multiple, various subscribers.

const errorTopic = new Topic(this, 'ErrorNotificationTopic', {
  topicName: 'ErrorNotificationTopic',
});

2. Setting Up the Dead Letter Queue

Let’s start with the safety net itself. A DLQ really just is a standard SQS queue. For our purpose, we combine it with an alarm. Let’s call this Construct MonitoredDeadLetterQueue.

export class MonitoredDeadLetterQueue extends Queue {
  readonly alarm: Alarm;

  constructor(
    scope: Construct,
    id: string,
    errorTopic: ITopic,
    props: QueueProps = {
      retentionPeriod: Duration.days(14),
      encryption: QueueEncryption.SQS_MANAGED,
    }
  ) {
    super(scope, id, props);

    this.alarm = new Alarm(this, 'Alarm', {
      alarmDescription: 'There are messages in the Dead Letter Queue',
      evaluationPeriods: 1,
      threshold: 1,
      metric: this.metricApproximateNumberOfMessagesVisible(),
    });

    this.alarm.addAlarmAction(new SnsAction(errorTopic));
  }
}

3. Creating the Message Queue

Next, we set up the message queue that our Lambda function will process.

const monitoredDeadLetterQueue = new MonitoredDeadLetterQueue(
  this,
  'MonitoredDeadLetterQueue',
  errorTopic,
);

const messageQueue = new Queue(this, `MessageQueue`, {
    queueName: `MessageQueue`,
    encryption: QueueEncryption.SQS_MANAGED,
    deadLetterQueue: {
      queue: monitoredDeadLetterQueue,
      maxReceiveCount: 3,
    },
  });

4. Crafting the Lambda Function

This will be replaced with whatever Lambda you want to monitor and secure against data loss. As a placeholder example, we create a simple function that fails on Execution. This will give us the opportunity to test the construct after deployment.

const exampleLambda = new Function(this, 'ExampleFunction', {
    runtime: Runtime.NODEJS_18_X,
    handler: 'index.handler',
    code: Code.fromInline(`
      exports.handler = async function(event) {
        throw new Error('Not today, son'); 
      }`),
  });

5. Connecting the DLQ to the Lambda

We’ll attach the message queue to our Lambda function, ensuring it processes available messages. The DLQ is configured on the message queue to handle failed processing.

messageQueue.grantConsumeMessages(exampleLambda);
exampleLambda.addEventSource(new SqsEventSource(messageQueue));

6. Wrapping Up the Final Construct

The message queue, combined with the monitored DLQ, and attached to the Lambda function, is a composite construct. We call it MonitoredLambdaQueue.

export class MonitoredLambdaQueue extends Construct {
  constructor(
    scope: Construct,
    id: string,
    lambdaFunction: IFunction,
    errorTopic: ITopic
  ) {
    super(scope, id);

    const monitoredDeadLetterQueue = new MonitoredDeadLetterQueue(
      this,
      'MonitoredDeadLetterQueue',
      errorTopic
    );

    const messageQueue = new Queue(this, 'MessageQueue', {
      queueName: 'MessageQueue',
      encryption: QueueEncryption.SQS_MANAGED,
      deadLetterQueue: {
        queue: monitoredDeadLetterQueue,
        maxReceiveCount: 3,
      },
    });

    messageQueue.grantConsumeMessages(lambdaFunction);
    lambdaFunction.addEventSource(new SqsEventSource(messageQueue));
  }
}

7. Testing

Time to put our setup to the test! We’ll simulate a failure in the Lambda function to see if our monitoring works.

Deploy your stack, trigger some messages, and watch as the CloudWatch alarm notifies you of the failure. Success!

This is what the standard email notification will look like.

You are receiving this email because your Amazon CloudWatch Alarm "MonitoredLambdaQueueStack-MonitoredQueueMonitoredQueueDLQAlarm32E97FE8-7tCe1ZLyO61j" in the EU (Frankfurt) region has entered the ALARM state, because "Threshold Crossed: 1 datapoint [1.0 (18/11/24 08:33:00)] was greater than or equal to the threshold (1.0)." at "Monday 18 November, 2024 08:39:18 UTC".

View this alarm in the AWS Management Console:
https://eu-central-1.console.aws.amazon.com/cloudwatch/deeplink.js?region=eu-central-1#alarmsV2:alarm/MonitoredLambdaQueueStack-MonitoredQueueMonitoredQueueDLQAlarm32E97FE8-7tCe1ZLyO61j

Alarm Details:
- Name:                       MonitoredLambdaQueueStack-MonitoredQueueMonitoredQueueDLQAlarm32E97FE8-7tCe1ZLyO61j
- Description:                There are messages in the Dead Letter Queue
- State Change:               INSUFFICIENT_DATA -> ALARM
- Reason for State Change:    Threshold Crossed: 1 datapoint [1.0 (18/11/24 08:33:00)] was greater than or equal to the threshold (1.0).
- Timestamp:                  Monday 18 November, 2024 08:39:18 UTC
- AWS Account:                012345678910
- Alarm Arn:                  arn:aws:cloudwatch:eu-central-1:012345678910:alarm:MonitoredLambdaQueueStack-MonitoredQueueMonitoredQueueDLQAlarm32E97FE8-7tCe1ZLyO61j

Threshold:
- The alarm is in the ALARM state when the metric is GreaterThanOrEqualToThreshold 1.0 for at least 1 of the last 1 period(s) of 300 seconds.

Monitored Metric:
- MetricNamespace:                     AWS/SQS
- MetricName:                          ApproximateNumberOfMessagesVisible
- Dimensions:                          [QueueName = MonitoredQueue-DLQ]
- Period:                              300 seconds
- Statistic:                           Maximum
- Unit:                                not specified

`

State Change Actions:
- OK:
- ALARM: [arn:aws:sns:eu-central-1:012345678910:DLQEmailTopic]
- INSUFFICIENT_DATA:

The Bottom Line

The MonitoredLambdaQueue is your guard against silent failures of asynchronous lambda functions.

Why It’s Awesome

  • Retention Period: Keeps failed messages for a configurable amount of time, allowing you to investigate.
  • Encryption: Secures your data with SQS-managed encryption.
  • Customizable Alerts: Easily configure alarm actions to suit your needs.

Reusability

You can reuse this construct across multiple projects, promoting consistency and saving development time. A simple yet reliable go-to pattern that brings great value to your architecture.

Pro Tips

  • Implement Retry Logic: Incorporate exponential backoff and jitter in your Lambda functions to handle transient errors gracefully.
  • Set Meaningful Thresholds: Avoid alert fatigue by setting appropriate thresholds.
  • Multiple Notification Channels: Use various channels to ensure you never miss an alert.
  • Regular Testing: Periodically test your alarms to confirm they’re functioning correctly.
  • Robust Error Handling: Implement detailed logging within your Lambda functions to aid in troubleshooting.
  • Security Matters: Leverage IAM roles and policies to enforce the principle of least privilege.
  • Cost Awareness: Monitor your DLQs and clean up processed or obsolete messages to minimize costs.

Conclusion

By integrating this AWS Lambda pattern into your architecture, you’re not just preventing data loss—you’re enhancing the reliability and transparency of your entire system. Real-time alerts empower you to act swiftly, keeping your applications running smoothly and your users happy.

Don’t wait for a data loss catastrophe to make a change. Start implementing this pattern today, and enjoy the peace of mind that comes with knowing you’re in control.

Let’s Keep the Conversation Going

Have you tried this pattern or have questions about implementing it? This reminds you of a pattern that you love and regularly use? I’d love to hear your thoughts and experiences. Feel free to use our contact form or reach out to me on X.

Further Resources

Back to Blog

Related Posts

View All Posts »