TechWriterDev
  • Cloud
    • AWS
      • 00_Doubts
      • CloudPractitioner
        • Cloud Computing
        • AWS Global Infrastructure
        • Introduction to AWS EC2
        • Elastic load balancer(ELB)
        • 04_Messaging_Queuing
        • Aditional Computing Service
        • Accessing AWS resources
        • AWS Networking
        • Storage
        • Amazon Database Solutions
        • Monitoring Tools
        • AWS Security
        • Distributed Denial Of Service Attacks
      • DeveloperAssociate
        • References
        • AWS DVA-C02 Services Index
        • Services
          • 00_IAM
            • Identity and Access Management (IAM)
            • Account Protection Mechanisms
            • Access Mechanism of AWS Resources
            • Security Tools
            • Responsibility Model
            • Advanced Concepts
          • 01_EC2
            • Elastic Compute Cloud (EC2)
            • EC2 Volume Types
            • Amazon Machine Image (AMI)
            • AWS charges for IPv4 address
          • 02_SecurityGroups
            • Security Groups
          • 03_Elastic_LoadBalancing
            • Terminology
            • Elastic load balancer
            • Features
            • Basics
          • 04_AutoScaling
            • Auto Scaling
          • 05_RDS
            • Relational Database Service (RDS)
            • Aurora
            • Security
            • RDS Proxy
          • 06_ElastiCache
            • Cache
            • Cache Offerings
          • 07_Route53
            • Basics of DNS
            • Route 53
          • 08_VPC
            • Virtual Private Cloud (VPC)
          • 09_S3
            • Simple Storage Service (S3)
            • S3 Features
            • S3 Encryption
            • S3 Features
            • S3 Bucket Policy and IAM Policy
          • 10_ECS
            • Elastic Container Service (ECS)
            • Elastic Container Registry (ECR)
            • AWS Copilot
          • 11_EKS
            • Elastic Kubernetes Service (EKS)
          • 12_SDK_CLI_Tips
            • Access AWS Resources
          • 13_CloudFront
            • Cloud Front
          • 14_Messaging
            • Simple Queue Service (SQS)
            • Simple Notification Service (SNS)
            • Fan Out Pattern
            • Kinesis
            • Compare and Contrast
          • 15_ElasticBeanStalk
            • Elastic Beanstalk
          • 16_CloudFormation
            • CloudFormation
            • Dynamic References
          • 17_Monitoring
            • AWS Monitoring
            • AWS CloudWatch
            • CloudWatch Alarms
            • Synthetics Canary
            • Amazon EventBridge (formerly CloudWatch Events)
            • X-Ray
            • OpenTelemetry
            • CloudTrail
          • 18_Lambda
            • Lambda
            • Lambda Integrations
            • Configuring Lambda
            • Lambda Layers
          • 19_API_Gateway
            • API Gateway
            • API Gateway Integrations
          • 20_DynamoDB
            • DynamoDB
            • Operations
            • Indexes
            • DynamoDB Accelerator (DAX)
            • DynamoDB Streams
            • Transactions
            • Integrations
          • 21_CICD
            • CICD
            • CodeCommit
            • CodePipeline
            • CodeBuild
            • CodeDeploy
            • CodeArtifact
            • CloudGuru
          • 22_SAM
            • Serverless Application Model (SAM)
          • 23_CDK
            • Cloud Development Kit (CDK)
          • 24_StepFunctions
            • Step Functions
            • Types of step function
          • 25_AppSync
            • AppSync
          • 26_Amplify
            • Amplify
          • 27_STS
            • Security Token Service (STS)
          • 28_DirectoryService
            • Active Directory
          • 29_KMS
            • Encryption
            • KMS API
            • Features
            • Cloud Hardware Security Module (HSM)
          • 30_SSM_Store
            • SSM Parameter Store
          • 31_SecretsManager
            • Secrets Manager
          • 32_Cognito
            • Cognito
      • Questions
        • AWS_Region
        • EC2
        • IAM
  • Database
    • MongoDb
      • Mongo db Basics
      • Mongo DB Atlas
      • Document
      • Import-Export based on Data Format
      • Mongo Shell Commands
      • Query Operators
      • Indexes
      • Upsert
      • MongoDB Aggregation Framework
      • Aggregation Framework Operators
    • PostgreSQL
      • POSTGRE SQL DataTypes
      • About table
      • Constraints
  • Technologies
    • RabbitMQ
      • RabbitMQ Concepts
      • Introduction to Exchanges
      • Introduction to Queues
    • Terraform
      • 00_Introduction
      • Configuration blocks
      • Commands
      • Variables
      • Terraform Cloud
      • Modules
  • Languages
    • Java
      • Logging
        • Getting Started
      • 00_Core
        • 00_Basics
          • Java Vs C++
          • Object oriented principles
          • Steps to compile a java program
          • JVM Internals
          • Understanding Java Development Kit
          • What is JIT Compiler?
          • Java data types
          • 07_identifiers_type_conversion
          • 08_references_and_packages
          • Steps for attaching scanner
        • Concurrency
          • 00_Threads
            • Threads
          • 01_ExecutorFramework
            • Executor Framework
            • Asynchronous Computation
      • 01_Backend
        • 01_HttpAndWebServerBasics
          • HTTP
          • Content Type
          • Web Server
        • 02_J2EE_Basics
          • J2EE_Basics
          • Why HttpServlet classs is declared as abstract class BUT with 100 % concrete functionality ?
        • 03_TomCatAndSession
          • What is a Session?
          • WebContainer
        • 04_PageNavigation
          • Cookies Additional Information
          • Page Navigation Techniques
        • 05_AboutServlet
          • CGI v/s Servlet
          • Executor Framework
          • Servlet Life cycle
          • SERVLET CONFIG
          • Servlet Context
          • Servlet Listener (web application listener)
        • 08_SpringBoot
          • Spring Boot
          • Some common annotations used in spring eco system
        • 09_SpringDataJPA
          • Spring Data JPA
        • Java_Language_Changes
          • JDK enhancement tracking reference
        • 06_ORM_Hibernate
          • readmes
            • Hibernate
            • Advantages of Hibernate
            • Hibernate Caching
            • Hibernate API
            • Hibernate Query API
            • Hibernate Annotations and JPQL
            • Entity and Value Type
        • 07_SpringFramework
          • bean_validation
            • Bean Validation
          • core
            • readme
              • Spring
              • Spring Framework Modules
              • Spring MVC Request flow
              • Dependency Injection
              • Spring Beans
              • 06_Spring_Framework_Annotations
      • 03_Tools
        • Maven
          • Maven
  • SoftwareEngineering
    • DesignPatterns
      • Notes
        • Basics
        • OOP
        • SOLID Principles
        • 03_Creational
          • Abstract Factory (aka Kit)
          • Builder
          • Factory Method (aka Virtual constructor)
          • Prototype
          • Singleton
        • 04_Structural
          • Adapter (aka Wrapper)
          • Bridge (aka Handle | Body)
          • Composite
          • Decorator (aka Wrapper)
          • Facade
          • Flyweight
          • Proxy (aka Surrogate)
        • 05_Behavioral
          • Chain of Responsibility
          • Command (aka Action | Transaction)
          • Iterator (aka Cursor)
          • Observer (aka Publish-Subscribe | Dependents)
          • Strategy (aka Policy)
    • Principles
      • REST
        • REST
  • Tools
    • Containers
      • Docker
        • Docker
        • Docker Image
        • Commands
        • Compose
        • Best Practices
      • Kubernetes
        • Kubernetes
    • VCS
      • Git
        • Quick reference of useful Git commands
Powered by GitBook
On this page
  • Icon
  • About
  • Kinesis Data Stream
  • Icon
  • About
  • Kinesis Data Firehose (earlier knowns as Delivery Streams)
  • Icon
  • About
  • Kinesis Data Analytics
  • Icon
  • About
  • Kinesis Video Streams
  • Icon
  • About
  • References
  1. Cloud
  2. AWS
  3. DeveloperAssociate
  4. Services
  5. 14_Messaging

Kinesis

PreviousFan Out PatternNextCompare and Contrast

Last updated 4 months ago

Icon

Kinesis

About

  • Its a managed, serverless proprietary offering from AWS.

  • It makes it easy to collect, process and analyze streaming data in real time (upto 200ms/70ms).

  • Real time data can be Application Logs, Metrics, Website clickstreams, IoT Telemetry data.

  • There are 4 services that makes up Kinesis

    • Kinesis Data Stream

    • Kinesis Data Firehose

    • Kinesis Data Analytics

    • Kinesis Video Streams

  • Supports replay capability.

Kinesis Data Stream

Icon

About

  • Helps to capture, process and store data streams.

  • Use case is in big data.

  • Shards define the stream capacity in terms of ingestion and consumption rates.

Kinesis Record

  • Partition Key decides on to which shard the record will be stored.

  • Hence partition key should be highly distributed, to avoid hot partition problem.

Producers

  • Producers send data (Kinesis record) to Kinesis Data Streams. Producers can be applications, Kinesis Producer Library (KPL), Kinesis Agent. At low-level they rely on AWS SDK.

  • Producers produces Kinesis Record that will be ingested into Kinesis Data Stream.

  • Producers can send records (Write Throughput) at a rate of 1MB/sec or 1000 messages/sec per shard.

  • PutRecord API is used to insert records to Kinesis.

  • Use batching with PutRecords API to reduce costs and increase throughput.

  • Partition Key of the Kinesis Record goes through hash function, which will determine the shard on to which the record goes to.

  • Overproducing records by producers leads to ProvisionedThroughputExceeded exception. To avoid this problem,

    • Use a highly distributed partition key OR

    • Retries by using exponential backoff OR

    • Scale out by increasing shards

Consumers

  • Consumers read the data from Kinesis Data Stream using Kinesis Consumer Library (KCL).

  • There are various consumption modes,

    • Shared (classic) Fan-out Consumer

      • 2MB/sec per shard shared across all consumers.

      • This uses GetRecords() API.

      • Uses Pull model.

      • Useful when there is low number of consuming applications.

      • Per shard only 5 GetRecords() API calls/sec.

      • Latency is approx 200ms.

      • Less costly option

      • Returns upto 10 MB or upto 10000 Records and then it will throttle for 5 seconds.

    • Enhanced Fan-out Consumer (EFO)

      • 2MB/sec per shard per consumers.

      • This uses SubscribeToShard() API.

      • Uses Push model over HTTP/2 connection.

      • Latency is approx 70ms.

      • Higher cost

      • Multiple consuming applications for same stream.

      • Soft Limit of 5 Consumer applications per data stream by default.

  • Consumers can be Kinesis Data Firehose, Lambda, Applications through KCL, Kinesis Data Analytics etc.

  • Retention period can be 1 day to 365 days.

  • Data once inserted in KDS, it can't be deleted.

  • It is possible to replay the data.

  • Data inserted to same partition goes to same shard and thus maintains key based ordering.

Lambda as Consumer Integration

  • Supports both Classic and Enhanced Fan-out consumers.

  • Read records in batches.

  • Batch size and Batch window can be configured.

  • In case of errors, Lambda retries untill succeeds or data expired.

  • Can process 10 batches/shard simultaneously.

Kinesis Client Library (KCL)

  • A java library that helps read the record from a Kinesis Data Stream with distributed applications sharing the read work load.

  • Each shard is to be read by only one KCL instances.

    • 4 shards -> maximum 4 KCL instances

    • 6 shards -> maximum 6 KCL instances

  • Progress is checkpointed into DynamoDB (needs IAM Access).

  • Track other workers and share the work among shards using DynamoDB.

  • KCL can be run on EC2 instances, Elastic BeanStalk, on-premises servers and so on.

  • Records are read in order at the shard level.

  • Versions:

    • KCL 1.x version (supports shared fanout consumers)

    • KCL 2.x version (supports shared/enhanced fanout consumers)

  • When there are more shards than KCL instances then Kinesis will detect this and split the work between KCL applications.

Capacity modes

  • Provisioned mode (historic mode)

    • You chose the number of shards provisioned, scale manually or using API.

    • Each shard gets 1MB/sec for input or 1000 records/sec.

    • Each shard gets 2MB/sec for output or enhanced fan-out consumer.

    • You pay per shard provisioned per hour.

  • On-Demand mode

    • No need to provision or manage capacity.

    • Default capacity provisioned 4 MB/s or 4000 records/sec.

    • Scales automatically based on observed throughput peak during last 30 days.

    • At Max it offers write capacity of 200 MiB/sec and 2000 records/sec and read capacity of 400 MiB/sec. Supports upto 20 consumers when using EFO consumption mode.

    • Offers pricing models to pay per stream/hour & data in/out per GB.

Kinesis Operations

  • There are two operations

Shard Splitting

  • Used to split the shards in to two.

  • This operation is typically used to increase the capacity of the stream.

  • Used to divide hot shard, and increase the capacity. As capacity is increased so does the cost.

  • The old shard will be closed and data will be expired as per the retention policy following which shard will be deleted.

  • A shard cant be split into more than two shards in a single operation.

Shard Merging

  • As shards can be split it can also be merged, when mulitple shards have low traffic.

  • Only two shards (cold shards) can be merged in a single operation.

  • This operation reduces cost and capacity.

  • Once merged the old shard will be deleted once data is expired as per the retention policy.

Security

  • Control access/authorization using IAM policies.

  • Encryption in flight using HTTPS endpoints.

  • Encryption at rest using KMS.

  • Can implement encryption/decryption of data on client side.

  • VPC Endpoints are available for Kinesis to access within VPC.

  • Can monitor all API calls using CloudTrail.

Kinesis Data Firehose (earlier knowns as Delivery Streams)

Icon

About

  • Fully managed service, near real-time, auto scaling, serverless service used to send data streams into AWS data stores.

  • Data stream producers can be AWS Kinesis Data Stream, CloudWatch Logs, AWS IoT.

  • No replay capability and no data storage.

  • It can optionally do a data transformation.

  • Only need to pay for data going through Firehose.

  • Once data is received from producers, it can be written in batches into destinations.

  • Destinations can be of

    • AWS Destinations

      • S3

      • Redshift (It first writes data to S3 and use a COPY command)

      • OpenSearch

    • Third-Party Partner Integrations

      • New Relic

      • MongoDB

      • DataDog

      • Dynatrace

      • Splunk etc

    • Custom Destinations

      • HTTP Endpoint

  • Once all data has been send to Destinations or if it failed to send the data, it can be send to S3 bucket as backup.

  • When sending data to different destination, data can be encrypted and compressed.

  • Near Real Time data writes with buffer interval of 60 to 900 seconds, by default it being 300 seconds.

  • Buffer size is minimum of 1 MB and maximum is 128 MB, with default being 5 MB.

  • Supports many data formats, conversions, transformations, compression.

  • Supports custom data transformation using Lambda.

Kinesis Data Analytics

Icon

About

For SQL Applications

  • Fully managed service, auto-scaling with Pay as you go only for data going through Kinesis Data Analytics.

  • Analyse data streams with SQL or Apache Flink

  • It can read from Kinesis Data Streams or Kinesis Data Firehose.

  • Possible to join reference data from S3 to enrich data in real time.

  • Data can be outputed to (sinks),

    • Kinesis Data Streams

    • Kinesis Data Firehose

      • From this it can be send to S3, OpenSearch, Redshift or other destinations as mentioned before.

  • Use cases includes,

    • Time-series analytics

    • Real-time metrics

    • Real-time dashboards

Amazon Managed Services for Apache Flink

  • Data can be sourced from Kinesis Data Stream or Amazon MSK.

  • With this service one can run any Flink Application on a managed cluster on AWS.

  • It provides automatic provisioning of compute resources, parallel computation and automatic scaling.

  • Application backups (implemented as checkpoints and snapshots).

  • It does not support read from Kinesis Firehose (use Kinesis Analytics SQL Analytics instead).

Kinesis Video Streams

Icon

About

  • Capture, process and store video streams.

References

Kinesis Data Stream Icon

They are made of multiple . And shards must be provisioned ahead of time.

A Kinesis Record consists of Partition Key, Sequence Number (unique per partition key per shard) and Data blob of size upto (1 MB) and .

There is no direct auto-scaling support in Kinesis, the scaling has to managed manually. The auto-scaling setup need to be architected as mentioned .

Kinesis Data Firehose Icon
Kinesis Data Analytics

Use to process and analyze streaming data.

Kinesis Video Streams

shards
more
here
Flink
Kinesis v/s Kafka