Chapter 15: Amazon Kinesis & Data Streaming

Question 1Scenario

A company needs to ingest clickstream data in real-time from 1 million concurrent users, process it with multiple independent consumers, and replay the data for reprocessing if a bug is found. Which service is MOST appropriate?

AAmazon SQS Standard — messages are deleted after consumption; no replay capability.
BAmazon SNS — pub/sub notifications; no persistence or replay capability.
CAmazon Kinesis Data Streams — durable real-time streaming with configurable data retention (up to 365 days) and replay capability.✓ Correct
DAmazon SQS FIFO — ordered delivery but no data replay after consumption.

Explanation

Kinesis Data Streams stores records for a configurable retention period (default 24 hours, up to 365 days with Extended Data Retention). Multiple independent consumer applications can read the same data simultaneously. Data can be re-read from any point within the retention window, enabling replay. This is the key differentiator from SQS.

Question 2Scenario

A company wants to load streaming IoT sensor data directly into Amazon S3 and Amazon Redshift in near real-time, without writing custom consumer code. Which Kinesis service is MOST appropriate?

AAmazon Kinesis Data Firehose — fully managed delivery of streaming data to S3, Redshift, OpenSearch, and Splunk; no consumer code required.✓ Correct
BAmazon Kinesis Data Streams — requires custom consumer applications (KCL, Lambda) to process and deliver data.
CAWS Glue — ETL service for batch data transformation, not real-time streaming delivery.
DAmazon MSK (Managed Streaming for Kafka) — managed Kafka, requires consumer development.

Explanation

Kinesis Data Firehose is the easiest way to reliably load streaming data into AWS data stores. It buffers incoming data and delivers to destinations (S3, Redshift, OpenSearch, HTTP endpoints) with optional data transformation via Lambda. Delivery latency is 60 seconds or when buffer size is reached. No consumers to manage.

Question 3Knowledge

How is the capacity of a Kinesis Data Stream measured, and what are the limits per shard?

ABy number of topics; each topic handles up to 10 MB/s ingest and 20 MB/s read.
BBy number of shards; each shard supports 1 MB/s or 1,000 records/second ingest, and 2 MB/s read throughput.✓ Correct
CBy number of producers; each producer can write at a fixed rate regardless of shard count.
DBy total storage capacity in GB across all shards.

Explanation

A Kinesis shard is the base unit of throughput. Each shard: 1 MB/s or 1,000 records/second write capacity, 2 MB/s read capacity for up to 5 read transactions/second. To scale, you add more shards (shard splitting) or reduce with shard merging. With Enhanced Fan-Out, each registered consumer gets dedicated 2 MB/s per shard using HTTP/2 push.

Question 4Knowledge

A Kinesis Data Stream has 6 shards. What is the maximum write (ingestion) throughput for the stream?

A2 MB/s (1 shard read capacity).
B3 MB/s (half the total shard count).
C6 MB/s (6 shards × 1 MB/s ingest per shard).✓ Correct
D12 MB/s (6 shards × 2 MB/s read per shard — but that is the read capacity, not write).

Explanation

Write (ingest) capacity = number of shards × 1 MB/s. Read capacity = number of shards × 2 MB/s. For 6 shards: write = 6 MB/s, read = 12 MB/s. If you need more capacity, increase the shard count. Note: 12 MB/s is the read throughput, not write. The exam often tests whether you know ingest vs read limits.

Question 5Knowledge

Which Kinesis service allows you to run real-time SQL queries on streaming data and output results to another Kinesis stream or Firehose?

AAmazon Kinesis Data Streams — captures and stores streaming data, does not run SQL queries.
BAmazon Kinesis Data Firehose — delivers data to destinations, does not perform real-time analytics.
CAmazon Kinesis Data Analytics — runs SQL (or Apache Flink) on streaming data from Kinesis Streams or Firehose in real time.✓ Correct
DAmazon Athena — queries data in S3 using SQL, not real-time streaming data.

Explanation

Kinesis Data Analytics supports two modes: SQL-based (simpler, for basic windowed aggregations) and Apache Flink-based (for complex stateful stream processing). It reads from Kinesis Data Streams or Firehose and can write results to Kinesis Streams, Firehose, Lambda, or other destinations. Athena is for serverless batch queries on S3 data.

← Chapter 14: Security Services Chapter 16: CloudWatch →