DynamoDB

Icon

Brief about NoSQL

Its a NoSQL (non-relational) database, distributed database.
NoSQL databases typically have no query joins or limited support for SQL joins.
They dont perform aggregations such as SUM, AVG etc.
They scale horizontally.

References

SQL vs NoSQL

About

NoSQL distributed database.
Fully managed database with replication across multiple AZs.
Scales easily and can handle millions or requests, trillions of row, 100TB of storage.
Fast and consistent in performance.
It is integrated with IAM for security, authorization and administration.
Can do event driven programming with Dynamo Streams.
Low cost and have Standard and Infrequest Access (IA) Table class.

Concepts

DynamoDB is made of tables.
Each table has a Primary Key, must be specified at creation time.
Each table can have infinite number of items.
Each item has attributes.
Maximum size of item is 400KB.
Data Types supported are
- Scalar types: String, Number, Binary, Null, Boolean
- Document types: List, Map
- Set types: String Set, Number Set, Binary Set.

Choosing Primary Keys

Partition Key Hash
- Partition key must be unique.
- Must be diverse, so that data can be distributed.
Partition Key and Sort Key (HASH and Range)
- The combination must be unique.
- Data is grouped by partition key.

Read Write Capacity Modes

There are two modes
- Provisioned
- On-Demand
Provisioned
- Its the default mode.
- User should specify Read Capacity Units (RCU) and Write Capacity Units (WCU).
- User need to plan capacity beforehand. Though they can setup auto-scaling of throughput to meet demand.
- Pay for provisioned read and write capacity units.
- Throughput above RCU and WCU can be exceeded using Burst Capacity. But once this capacity is exceeded, you will see ProvisionedThroughputExceededException. One can try exponential backoff based retry mechanism to recover from such failure.
Write Capacity Unit (WCU)
- Represents one write/second for an item upto 1 KB in size.
- If the items are larger than 1 KB (rounded to integer), more WCUs are consumed.
- Formula to calculate WCU is,
  (items/second) * (size of each item/1KB)
Read Capacity Unit (RCU)
- There are two types of reads
  - Strongly Consistent Read
    Set ConsistentRead parameter to be true in API calls.
    Consume twice the RCU.
  - Eventually Consistent Read
    It is default reading strategy.
    May offer stale data, if tried to read just after a write.
- Represents 1 Strongly Consistent Read per second or 2 Eventually Consistent Reads per second, for an item up to 4 KB in size.
- If item size is not multiple of 4, then round it to nearest upper multiple of 4.
- If item is larger than 4 KB, more RCUs are consumed.
- Formula to calculate RCU is,
  (reads per second/type of read-factor) * (item-size to nearest multiple of 4 / 4)
On-Demand
- Reads and Writes automatically scales up/down with your workloads.
- No capacity planning needed and hence no throttling.
- Pay only for the capacity you use.
- 2.5 time expensive than provisioned mode.
- Charged based on Read Request Units (RRU) and Write Request Units (WRU).
- Use case includes unknown workloads, unpredictable application traffic etc
One can switch between both modes once every 24 hours.

Internal Partitions

Data is stored in partitions.
Based on partition-key send from the application, the partition to write is selected.
Partition key, Sort Key and other attributes are given as input to partition algorithm. This hash is then used to determine the partition.
The following formula gives the number of partition by capacity,
No. of partition by capacity = (RCUTotal/3000) + (WCUTotal/1000)
No. of partition by size = Total Data Size/ 10 GB
No of partitions = ceil(max(No. of partition by capacity, No. of partition by size))
RCU and WCU are spread evenly across partitions.

Write Sharding

To solve the Hot Partition issue when data is not evenly distributed due to limited partition key distribution characteristics, one can add suffix/prefix to partition key value to get better distribution.
There are two methods to create prefix and suffix.
- Sharding using random suffix
- Sharding using calculated suffix.

Throttling

If application exceeds the provisioned WCU and RCU at partition level, then will receive ProvisionedThroughputExceededException.
Reasons could be one of the following,
- Hot Keys : Too many reads on one partitiond.
- Hot Partitions
- Very large items, as RCU and WCU depend on size of items.
To solve the above problems one could,
- Exponential BackOff (included in SDK)
- Distribute Partition keys as much as possible.
- If RCU is being throttles, due to Hot Keys issue, use DynamoDB Accelerator (DAX).

TTL

Delete items after an expiry timestamp.
It doesnt consume any WCU.
TTL attribute must be a Number data type with Unix Epoch Timestamp.
Expired items are deleted within 48 hours of expiration.
If expired items appear in result, filter them out.
Any indexes (LSI or GSI) created which has the expired items will be deleted as well.
A delete operation for each expired items enters DynamoDB streams, which can be used to recover expired items.
Use cases includes, reduce stored data by keeping only current items, adhere to regulatory obligations etc.

CLI Options

Option

Description

--project-expression

One or more attributes to obtain as output

--filter-expression

Filter items before being returned

--page-size

To retrieve a list of items part by part, with specified page size by default item list size to be 1000 items. Behind the scene it will be done part by part and single result will be projected.

max-items

Maximum number of items to show in the CLI. It returns NextToken

starting-token

To specify NextToken to retrieve the next set of items.

Session State Cache

It is a serverless alternative to ElastiCache to store session state.
ElastiCache is in memory.
Both are key/value stores.
EFS as network drive is a great choice for saving into disk.
Note that EBS and InstanceStore can only be used for local caching and not shared caching.
S3 is not suitable as it has higher latency and not meant for small objects.

Write Types

There are different types of writes
- Conditional Writes
- Concurrent Writes
- Atomic Writes
- Batch Writes

Fine Grained Access Control

To directly access DynamoDB, rather than creating IAM roles use Identity Providers like Google (behind the scene uses OpenID Connect protocol) to exchange temporary AWS credentials.
Use the temporary AWS credentials with a restricted IAM role based on condition.
Above set up can limit access to items and attributes in DynamoDB based on user level access.
Sample policy would like below,

Security

VPC Endpoints available to access DynamoDB without using Internet.
Access fully controlled by IAM.
Encryption at rest using AWS KMS and in transit using SSL/TLS.

Backup and Restore

Point-in-time Recovery like RDS, with no performance impact.
Normal back up and restore.

Global Tables.

These are multi-region, fully replicated, high performance DynamoDB.
This replication is done using DynamoDB Streams.

DynamoDB Local

This allows to run DynamoDB in local machine.
This allows to test and develop application using DynamoDB without internet.

Migrations

To migrate data to and from DynamoDB we have AWS DMS.
It supports different database as source and destination like MongoDB, Oracle, S3, MySQL etc.

Previous20_DynamoDB NextOperations

Last updated 5 months ago