← Back to Index
Chapter 15 of 20

Amazon Kinesis & Data Streaming

Domain 3 — High-Performing Architectures (24%)
🌊
Question 1Scenario

A company needs to ingest clickstream data in real-time from 1 million concurrent users, process it with multiple independent consumers, and replay the data for reprocessing if a bug is found. Which service is MOST appropriate?

Explanation

Kinesis Data Streams stores records for a configurable retention period (default 24 hours, up to 365 days with Extended Data Retention). Multiple independent consumer applications can read the same data simultaneously. Data can be re-read from any point within the retention window, enabling replay. This is the key differentiator from SQS.

Question 2Scenario

A company wants to load streaming IoT sensor data directly into Amazon S3 and Amazon Redshift in near real-time, without writing custom consumer code. Which Kinesis service is MOST appropriate?

Explanation

Kinesis Data Firehose is the easiest way to reliably load streaming data into AWS data stores. It buffers incoming data and delivers to destinations (S3, Redshift, OpenSearch, HTTP endpoints) with optional data transformation via Lambda. Delivery latency is 60 seconds or when buffer size is reached. No consumers to manage.

Question 3Knowledge

How is the capacity of a Kinesis Data Stream measured, and what are the limits per shard?

Explanation

A Kinesis shard is the base unit of throughput. Each shard: 1 MB/s or 1,000 records/second write capacity, 2 MB/s read capacity for up to 5 read transactions/second. To scale, you add more shards (shard splitting) or reduce with shard merging. With Enhanced Fan-Out, each registered consumer gets dedicated 2 MB/s per shard using HTTP/2 push.

Question 4Knowledge

A Kinesis Data Stream has 6 shards. What is the maximum write (ingestion) throughput for the stream?

Explanation

Write (ingest) capacity = number of shards × 1 MB/s. Read capacity = number of shards × 2 MB/s. For 6 shards: write = 6 MB/s, read = 12 MB/s. If you need more capacity, increase the shard count. Note: 12 MB/s is the read throughput, not write. The exam often tests whether you know ingest vs read limits.

Question 5Knowledge

Which Kinesis service allows you to run real-time SQL queries on streaming data and output results to another Kinesis stream or Firehose?

Explanation

Kinesis Data Analytics supports two modes: SQL-based (simpler, for basic windowed aggregations) and Apache Flink-based (for complex stateful stream processing). It reads from Kinesis Data Streams or Firehose and can write results to Kinesis Streams, Firehose, Lambda, or other destinations. Athena is for serverless batch queries on S3 data.