DynamoDB

Icon

DynamoDB Icon

Brief about NoSQL

  • Its a NoSQL (non-relational) database, distributed database.

  • NoSQL databases typically have no query joins or limited support for SQL joins.

  • They dont perform aggregations such as SUM, AVG etc.

  • They scale horizontally.

References

About

  • NoSQL distributed database.

  • Fully managed database with replication across multiple AZs.

  • Scales easily and can handle millions or requests, trillions of row, 100TB of storage.

  • Fast and consistent in performance.

  • It is integrated with IAM for security, authorization and administration.

  • Does not SQL like constraints like NON NULL, PRIMARY KEY, UNIQUE KEY etc.

  • Can do event driven programmingarrow-up-right with DynamoDB Streams.

  • Low cost and have Standard and Infrequest Access (IA) Table class.

Concepts

  • DynamoDB is made of tables.

  • Each table has a Primary Key, must be specified at creation time.

  • Each table can have infinite number of items.

  • Each item has attributes, can be added over time and can be null.

  • Table can have classes

    • Standard class - Frequently access data

    • Standard-IA class - Infrequently access data

  • Maximum size of item is 400KB.

  • Data Types supported are

    • Scalar types: String, Number, Binary, Null, Boolean

    • Document types: List, Map

    • Set types: String Set, Number Set, Binary Set.

Choosing Primary Keys

  1. Partition Key Hash

    • Partition key must be unique.

    • Must be diverse, so that data can be distributed.

  2. Partition Key and Sort Key (HASH and Range)

    • The combination must be unique.

    • Data is grouped by partition key.

Read Write Capacity Modes

  • There are two modes

    • Provisioned

    • On-Demand

Provisioned

  • User should specify Read Capacity Units (RCU) and Write Capacity Units (WCU).

  • User need to plan capacity beforehand. Though they can setup auto-scaling of throughput to meet demand.

  • Pay for provisioned read and write capacity units.

  • Throughput above RCU and WCU can be exceeded using Burst Capacity.

    • But once this capacity is exceeded, you will see ProvisionedThroughputExceededException. One can try exponential backoff based retry mechanism to recover from such failure already supported by SDK, select better partition key or use DynamoDB Accelerator.

    • Such an exception occurs due to hot keys, hot partitions or very large item.

Write Capacity Unit (WCU)

  • Represents one write/second for an item upto 1 KB in size.

  • If the items are larger than 1 KB (rounded to integer), more WCUs are consumed.

  • Formula to calculate WCU is,

Read Capacity Unit (RCU)

  • There are two types of reads

    • Strongly Consistent Read

      • Set ConsistentRead parameter to be true in API calls.

      • Consume twice the RCU.

    • Eventually Consistent Read

      • It is default reading strategy.

      • May offer stale data, if tried to read just after a write.

  • Represents 1 Strongly Consistent Read per second or 2 Eventually Consistent Reads per second, for an item up to 4 KB in size.

  • If item size is not multiple of 4, then round it to nearest upper multiple of 4.

  • If item is larger than 4 KB, more RCUs are consumed.

  • Formula to calculate RCU is,

On-Demand

  • Its the default mode.

  • Reads and Writes automatically scales up/down with your workloads.

  • No capacity planning needed and hence no throttling.

  • Pay only for the capacity you use.

  • 2.5 time expensive than provisioned mode.

  • Charged based on Read Request Units (RRU) and Write Request Units (WRU).

  • Use case includes unknown workloads, unpredictable application traffic etc

  • One can switch between both provisioned and on-demand modes once every 24 hours.

Internal Partitions

  • Data is stored in partitions.

  • Based on partition-key send from the application, the partition to write is selected.

  • Partition key, Sort Key and other attributes are given as input to partition algorithm. This hash is then used to determine the partition.

  • The following formula gives the number of partition by capacity,

    No. of partition by capacity = (RCUTotal/3000) + (WCUTotal/1000)

    No. of partition by size = Total Data Size/ 10 GB

    No of partitions = ceil(max(No. of partition by capacity, No. of partition by size))

  • RCU and WCU are spread evenly across partitions.

Write Sharding

  • To solve the Hot Partition issue when data is not evenly distributed due to limited partition key distribution characteristics, one can add suffix/prefix to partition key value to get better distribution.

  • There are two methods to create prefix and suffix.

    • Sharding with random suffix

    • Sharding with calculated suffix using hashing algorithm.

Throttling

  • If application exceeds the provisioned WCU and RCU at partition level, then will receive ProvisionedThroughputExceededException.

  • Reasons could be one of the following,

    • Hot Keys : Too many reads on one partitiond.

    • Hot Partitions

    • Very large items, as RCU and WCU depend on size of items.

  • To solve the above problems one could,

    • Exponential BackOff (included in SDK)

    • Distribute Partition keys as much as possible.

    • If RCU is being throttles, due to Hot Keys issue, use DynamoDB Accelerator (DAX).

TTL

  • Delete items after an expiry timestamp.

  • It doesnt consume any WCU.

  • TTL attribute must be a Number data type with Unix Epoch Timestamp.

  • TTL attribute name must be defined at configuration level in table which defines the expiration time.

  • Expired items are deleted within 48 hours of expiration.

    • An expiration process, will scan and expire items.

    • A Deletion process, will scan and delete items.

  • If expired items appear in result, so if not needed filter them out.

  • Any indexes (LSI or GSI) created which has the expired items will be deleted as well.

  • A delete operation for each expired items enters DynamoDB streams, which can be used to recover expired items.

  • Use cases includes, reduce stored data by keeping only current items, adhere to regulatory obligations etc.

  • Also offers graph view of cloud watch metricss to see the deleted items.

CLI Options

Option
Description

--project-expression

One or more attributes to obtain as output

--filter-expression

Filter items before being returned

--page-size

To retrieve a list of items part by part, with specified page size. By default item list size to be 1000 items. Behind the scene it will be done part by part and single result will be projected. Avoids timeout

--max-items

Maximum number of items to show. It returns NextToken. If no items are to show, it will not return NextToken

--starting-token

To specify NextToken to retrieve the next set of items.

  • Examples

Session State Cache

  • It is a serverless alternative to ElastiCache to store session state.

  • ElastiCache is in memory.

  • Both are key/value stores.

  • EFS as network drive is a great choice for saving into disk.

  • Note that EBS and InstanceStore can only be used for local caching and not shared caching.

  • S3 is not suitable as it has higher latency and not meant for small objects.

Write Types

  • There are different types of writes

    • Conditional Writes : Only write if conditional-expression succeeds.

    • Concurrent Writes : Second write overwrites first writes successfully.

    • Atomic Writes : Write happen atomically i.e., completely succeed or fail.

    • Batch Writes : Write many items at a time.

Fine Grained Access Control - Direct Client Access

  • To directly access DynamoDB, rather than creating IAM roles use Identity Providers like Google (behind the scene uses OpenID Connectarrow-up-right protocol) to exchange temporary AWS credentials.

  • Use the temporary AWS credentials with a restricted IAM role based on condition.

  • Above set up can limit access to items and attributes in DynamoDB based on user level access.

  • Sample policy would like below, Fine Grained Policy

    • LeadingKeys in above policy limit the access to row-level for that particular user with the primary key.

    • Similarly, Attributes limits access to attributes of table.

Security

  • VPC Endpoints allows to access DynamoDB without using Internet.

  • Access fully controlled by IAM.

  • Encryption at rest using AWS KMS and in transit using SSL/TLS.

Backup and Restore

  • Point-in-time Recovery like RDS, with no performance impact.

  • Normal back up and restore.

Global Tables.

  • These are multi-region, fully replicated, high performance DynamoDB.

  • This replication is done using DynamoDB Streams.

DynamoDB Local

  • This allows to run DynamoDB in local machine.

  • This allows to test and develop application using DynamoDB without internet.

Migrations

  • To migrate data to and from DynamoDB we have AWS Data Migration Service (DMS).

  • It supports different database as source and destination like MongoDB, Oracle, S3, MySQL etc.

Last updated