DynamoDB
Last updated
Last updated
Its a NoSQL (non-relational) database, distributed database.
NoSQL databases typically have no query joins or limited support for SQL joins.
They dont perform aggregations such as SUM, AVG etc.
They scale horizontally.
NoSQL distributed database.
Fully managed database with replication across multiple AZs.
Scales easily and can handle millions or requests, trillions of row, 100TB of storage.
Fast and consistent in performance.
It is integrated with IAM for security, authorization and administration.
Can do event driven programming with Dynamo Streams.
Low cost and have Standard and Infrequest Access (IA) Table class.
DynamoDB is made of tables.
Each table has a Primary Key, must be specified at creation time.
Each table can have infinite number of items.
Each item has attributes.
Maximum size of item is 400KB.
Data Types supported are
Scalar types: String, Number, Binary, Null, Boolean
Document types: List, Map
Set types: String Set, Number Set, Binary Set.
Partition Key Hash
Partition key must be unique.
Must be diverse, so that data can be distributed.
Partition Key and Sort Key (HASH and Range)
The combination must be unique.
Data is grouped by partition key.
There are two modes
Provisioned
On-Demand
Its the default mode.
User should specify Read Capacity Units (RCU) and Write Capacity Units (WCU).
User need to plan capacity beforehand. Though they can setup auto-scaling of throughput to meet demand.
Pay for provisioned read and write capacity units.
Throughput above RCU and WCU can be exceeded using Burst Capacity. But once this capacity is exceeded, you will see ProvisionedThroughputExceededException
. One can try exponential backoff based retry mechanism to recover from such failure.
Write Capacity Unit (WCU)
Represents one write/second for an item upto 1 KB in size.
If the items are larger than 1 KB (rounded to integer), more WCUs are consumed.
Formula to calculate WCU
is,
Read Capacity Unit (RCU)
There are two types of reads
Strongly Consistent Read
Set ConsistentRead parameter to be true in API calls.
Consume twice the RCU.
Eventually Consistent Read
It is default reading strategy.
May offer stale data, if tried to read just after a write.
Represents 1 Strongly Consistent Read per second or 2 Eventually Consistent Reads per second, for an item up to 4 KB in size.
If item size is not multiple of 4, then round it to nearest upper multiple of 4.
If item is larger than 4 KB, more RCUs are consumed.
Formula to calculate RCU
is,
Reads and Writes automatically scales up/down with your workloads.
No capacity planning needed and hence no throttling.
Pay only for the capacity you use.
2.5 time expensive than provisioned mode.
Charged based on Read Request Units (RRU) and Write Request Units (WRU).
Use case includes unknown workloads, unpredictable application traffic etc
One can switch between both modes once every 24 hours.
Data is stored in partitions.
Based on partition-key send from the application, the partition to write is selected.
Partition key, Sort Key and other attributes are given as input to partition algorithm. This hash is then used to determine the partition.
The following formula gives the number of partition by capacity,
No. of partition by capacity = (RCUTotal/3000) + (WCUTotal/1000)
No. of partition by size = Total Data Size/ 10 GB
No of partitions = ceil(max(No. of partition by capacity, No. of partition by size))
RCU and WCU are spread evenly across partitions.
To solve the Hot Partition
issue when data is not evenly distributed due to limited partition key distribution characteristics, one can add suffix/prefix to partition key value to get better distribution.
There are two methods to create prefix and suffix.
Sharding using random suffix
Sharding using calculated suffix.
If application exceeds the provisioned WCU and RCU at partition level, then will receive ProvisionedThroughputExceededException.
Reasons could be one of the following,
Hot Keys : Too many reads on one partitiond.
Hot Partitions
Very large items, as RCU and WCU depend on size of items.
To solve the above problems one could,
Exponential BackOff (included in SDK)
Distribute Partition keys as much as possible.
If RCU is being throttles, due to Hot Keys issue, use DynamoDB Accelerator (DAX).
Delete items after an expiry timestamp.
It doesnt consume any WCU
.
TTL attribute must be a Number
data type with Unix Epoch Timestamp
.
Expired items are deleted within 48 hours of expiration.
If expired items appear in result, filter them out.
Any indexes (LSI or GSI) created which has the expired items will be deleted as well.
A delete operation for each expired items enters DynamoDB streams, which can be used to recover expired items.
Use cases includes, reduce stored data by keeping only current items, adhere to regulatory obligations etc.
--project-expression
One or more attributes to obtain as output
--filter-expression
Filter items before being returned
--page-size
To retrieve a list of items part by part, with specified page size by default item list size to be 1000 items. Behind the scene it will be done part by part and single result will be projected.
max-items
Maximum number of items to show in the CLI. It returns NextToken
starting-token
To specify NextToken
to retrieve the next set of items.
It is a serverless alternative to ElastiCache
to store session state.
ElastiCache
is in memory.
Both are key/value stores.
EFS
as network drive is a great choice for saving into disk.
Note that EBS
and InstanceStore
can only be used for local caching and not shared caching.
S3
is not suitable as it has higher latency and not meant for small objects.
There are different types of writes
Conditional Writes
Concurrent Writes
Atomic Writes
Batch Writes
To directly access DynamoDB, rather than creating IAM roles use Identity Providers like Google (behind the scene uses OpenID Connect protocol) to exchange temporary AWS credentials.
Use the temporary AWS credentials with a restricted IAM role based on condition.
Above set up can limit access to items and attributes in DynamoDB based on user level access.
VPC Endpoints available to access DynamoDB without using Internet.
Access fully controlled by IAM.
Encryption at rest using AWS KMS and in transit using SSL/TLS.
Point-in-time Recovery like RDS, with no performance impact.
Normal back up and restore.
These are multi-region, fully replicated, high performance DynamoDB.
This replication is done using DynamoDB Streams.
This allows to run DynamoDB in local machine.
This allows to test and develop application using DynamoDB without internet.
To migrate data to and from DynamoDB we have AWS DMS.
It supports different database as source and destination like MongoDB, Oracle, S3, MySQL etc.
Sample policy would like below,