DynamoDB
Icon
Brief about NoSQL
Its a NoSQL (non-relational) database, distributed database.
NoSQL databases typically have no query joins or limited support for SQL joins.
They dont perform aggregations such as SUM, AVG etc.
They scale horizontally.
References
About
NoSQL distributed database.
Fully managed database with replication across multiple AZs.
Scales easily and can handle millions or requests, trillions of row, 100TB of storage.
Fast and consistent in performance.
It is integrated with IAM for security, authorization and administration.
Does not SQL like constraints like NON NULL, PRIMARY KEY, UNIQUE KEY etc.
Can do event driven programming with DynamoDB Streams.
Low cost and have Standard and Infrequest Access (IA) Table class.
Concepts
DynamoDB is made of tables.
Each table has a Primary Key, must be specified at creation time.
Each table can have infinite number of items.
Each item has attributes, can be added over time and can be null.
Table can have classes
Standard class - Frequently access data
Standard-IA class - Infrequently access data
Maximum size of item is 400KB.
Data Types supported are
Scalar types: String, Number, Binary, Null, Boolean
Document types: List, Map
Set types: String Set, Number Set, Binary Set.
Choosing Primary Keys
Partition Key Hash
Partition key must be unique.
Must be diverse, so that data can be distributed.
Partition Key and Sort Key (HASH and Range)
The combination must be unique.
Data is grouped by partition key.
Read Write Capacity Modes
There are two modes
Provisioned
On-Demand
Provisioned
User should specify Read Capacity Units (RCU) and Write Capacity Units (WCU).
User need to plan capacity beforehand. Though they can setup auto-scaling of throughput to meet demand.
Pay for provisioned read and write capacity units.
Throughput above RCU and WCU can be exceeded using Burst Capacity.
But once this capacity is exceeded, you will see
ProvisionedThroughputExceededException. One can try exponential backoff based retry mechanism to recover from such failure already supported by SDK, select better partition key or use DynamoDB Accelerator.Such an exception occurs due to hot keys, hot partitions or very large item.
Write Capacity Unit (WCU)
Represents one write/second for an item upto 1 KB in size.
If the items are larger than 1 KB (rounded to integer), more WCUs are consumed.
Formula to calculate
WCUis,
Read Capacity Unit (RCU)
There are two types of reads
Strongly Consistent Read
Set ConsistentRead parameter to be true in API calls.
Consume twice the RCU.
Eventually Consistent Read
It is default reading strategy.
May offer stale data, if tried to read just after a write.
Represents 1 Strongly Consistent Read per second or 2 Eventually Consistent Reads per second, for an item up to 4 KB in size.
If item size is not multiple of 4, then round it to nearest upper multiple of 4.
If item is larger than 4 KB, more RCUs are consumed.
Formula to calculate
RCUis,
On-Demand
Its the default mode.
Reads and Writes automatically scales up/down with your workloads.
No capacity planning needed and hence no throttling.
Pay only for the capacity you use.
2.5 time expensive than provisioned mode.
Charged based on Read Request Units (RRU) and Write Request Units (WRU).
Use case includes unknown workloads, unpredictable application traffic etc
One can switch between both provisioned and on-demand modes once every 24 hours.
Internal Partitions
Data is stored in partitions.
Based on partition-key send from the application, the partition to write is selected.
Partition key, Sort Key and other attributes are given as input to partition algorithm. This hash is then used to determine the partition.
The following formula gives the number of partition by capacity,
No. of partition by capacity = (RCUTotal/3000) + (WCUTotal/1000)
No. of partition by size = Total Data Size/ 10 GB
No of partitions = ceil(max(No. of partition by capacity, No. of partition by size))
RCU and WCU are spread evenly across partitions.
Write Sharding
To solve the
Hot Partitionissue when data is not evenly distributed due to limited partition key distribution characteristics, one can add suffix/prefix to partition key value to get better distribution.There are two methods to create prefix and suffix.
Sharding with random suffix
Sharding with calculated suffix using hashing algorithm.
Throttling
If application exceeds the provisioned WCU and RCU at partition level, then will receive ProvisionedThroughputExceededException.
Reasons could be one of the following,
Hot Keys : Too many reads on one partitiond.
Hot Partitions
Very large items, as RCU and WCU depend on size of items.
To solve the above problems one could,
Exponential BackOff (included in SDK)
Distribute Partition keys as much as possible.
If RCU is being throttles, due to Hot Keys issue, use DynamoDB Accelerator (DAX).
TTL
Delete items after an expiry timestamp.
It doesnt consume any
WCU.TTL attribute must be a
Numberdata type withUnix Epoch Timestamp.TTLattribute name must be defined at configuration level in table which defines the expiration time.Expired items are deleted within 48 hours of expiration.
An expiration process, will scan and expire items.
A Deletion process, will scan and delete items.
If expired items appear in result, so if not needed filter them out.
Any indexes (LSI or GSI) created which has the expired items will be deleted as well.
A delete operation for each expired items enters DynamoDB streams, which can be used to recover expired items.
Use cases includes, reduce stored data by keeping only current items, adhere to regulatory obligations etc.
Also offers graph view of cloud watch metricss to see the deleted items.
CLI Options
--project-expression
One or more attributes to obtain as output
--filter-expression
Filter items before being returned
--page-size
To retrieve a list of items part by part, with specified page size. By default item list size to be 1000 items. Behind the scene it will be done part by part and single result will be projected. Avoids timeout
--max-items
Maximum number of items to show. It returns NextToken. If no items are to show, it will not return NextToken
--starting-token
To specify NextToken to retrieve the next set of items.
Examples
Session State Cache
It is a serverless alternative to
ElastiCacheto store session state.ElastiCacheis in memory.Both are key/value stores.
EFSas network drive is a great choice for saving into disk.Note that
EBSandInstanceStorecan only be used for local caching and not shared caching.S3is not suitable as it has higher latency and not meant for small objects.
Write Types
There are different types of writes
Conditional Writes : Only write if conditional-expression succeeds.
Concurrent Writes : Second write overwrites first writes successfully.
Atomic Writes : Write happen atomically i.e., completely succeed or fail.
Batch Writes : Write many items at a time.
Fine Grained Access Control - Direct Client Access
To directly access DynamoDB, rather than creating IAM roles use Identity Providers like Google (behind the scene uses OpenID Connect protocol) to exchange temporary AWS credentials.
Use the temporary AWS credentials with a restricted IAM role based on condition.
Above set up can limit access to items and attributes in DynamoDB based on user level access.
Sample policy would like below,

LeadingKeysin above policy limit the access to row-level for that particular user with the primary key.Similarly,
Attributeslimits access to attributes of table.
Security
VPC Endpoints allows to access DynamoDB without using Internet.
Access fully controlled by IAM.
Encryption at rest using AWS KMS and in transit using SSL/TLS.
Backup and Restore
Point-in-time Recovery like RDS, with no performance impact.
Normal back up and restore.
Global Tables.
These are multi-region, fully replicated, high performance DynamoDB.
This replication is done using DynamoDB Streams.
DynamoDB Local
This allows to run DynamoDB in local machine.
This allows to test and develop application using DynamoDB without internet.
Migrations
To migrate data to and from DynamoDB we have AWS Data Migration Service (DMS).
It supports different database as source and destination like MongoDB, Oracle, S3, MySQL etc.
Last updated