amazon-web-services


Database design in DynamoDB: Bookmark storage


I'm interested in best practices on how set up tables and indices for given query requirements. I have a basic understanding of related concepts such partition and sort keys or LSI and GSI secondary indices, but have problems putting it all together and designing one or more table with indices that supports a palpable example.
The example i'm looking at is a "Bookmark storage", where multiple users can store bookmarks to URLs and annotate those with a number of tags. A User has multiple Urls (= bookmark). Each Url has a date and can have one or more Tags.
A bookmark might have the following basic structure:
{
"user": "watQuadrat",
"url": "http://stackoverflow.com",
"date": 1494161436362,
"tags": [ "forum", "programming" ]
}
My biggest question at this point is how to setup the table structure so that I can accommodate the various different ways the data could be queried, e.g.:
List all Tags for a User, sorted by how often the user used a tag
List all Tags for a User, sorted alphabetically
List all Tags for an Url, sorted by how often this tag was given for the url
List all Tags matching a given search string, sorted by how often the tag was used (e.g. search for "shop", return all tags that match such as "shopping" order by how often they were used)
List all Urls for a User, sorted by date
List all Urls for a User and a Tag, sorted by date
List all Urls for a Tag, sorted by how often the tag was given to each url
List all Users for an Url, sorted by date
How would this be designed so i can perform all of these queries in a performant way? Would you design this any different when additionally trying to reduce cost?
Considering the scenario you have described, I would design the table as mentioned below. Here I have assumed that one user can create only one bookmark from a given url. And also I have used a new derived attribute called TagCount which denotes the count of tags for that bookmark.
Table Structure
Primary partition key : UserID
Primary sort key : Url
Local Secondary Indexes
Index 1
Partition key : UserID
Sort key : Date
Index 2
Partition key : UserID
Sort key : TagCount
Global Secondary Indexes
Index 1
Partition key : Url
Sort key : Date
Index 2
Partition key : Url
Sort key : TagCount
With this design you can do your queries in the following manner.
List all Tags for a User, sorted by count
Query using LSI UserID-TagCount
List all Tags for an Url, sorted by count
Query using GSI Url-TagCount
List all Tags matching a given string, sorted by count
I assume the string you meant here belongs to url. If so you will have to perform a scan
List all Urls for a User, sorted by date
Query using LSI UserId-Date
List all Urls for a User and a Tag, sorted by date
Query LSI UserId-Date table with a filter expression for searching tag
List all Urls for a Tag, sorted by count
You will have to do a scan here
List all Users for an Url, sorted by date
Query GSI Url-Date
If you are concerned about the cost. You can loose some GSIs based on the query patterns you would expect.
Update 1
Considering the updated requirement, since there are many queries based on the tag, I think there should be a second table with the following structure
Primary partition key : TagName
Primary sort key : UserID
Global Secondary Indexe
Partition key : UserID
Sort key : Usage - Derived attribute similar to tag count, total usage of the tag

Related Links

What are possible ways to access Amazaon S3 data if S3 outage happens?
How to download datapump from AWS RDS
Get Notified when upload is completed in Amazon S3 bucket
How Can I change instance EC2 from Oregon to Virginia
Fine Grained Access Control with Amazon Dynamo DB with Horizontal Information Hiding
How to restrict files to certain users in Amazon-S3
Codedeploy with AWS ASG
Is AWS S3 CORS policy at file level?
AWS:Allowing Access to an IAM application user to a specific S3 bucket
Twitter card whitelisting issues probably because URL is inaccessible to crawler
Using aws cli, what is best way to determine the current region.
Returning images through AWS API Gateway
Dynamo DB batch operations on single table
Amazon SWF: Activity type is not supported by the Activity worker
Making calls to AWS api gateway endpoint with api key using rest client POSTMAN
How to access HTTP headers for request to AWS API Gateway using Lambda?

Categories

HOME
protocol-buffers
opencart
angular-ui-grid
systemd
activemq
share
impala
ui-automation
azure-logic-apps
fasm
vimeo
outlook-restapi
twisted
word2vec
http-authentication
infrared
neo4j.rb
qa
channel
data-collection
plyr
tortoisehg
iwebbrowser2
sfml
fatfs
phpstorm-2016.1
publishing
private-key
kendo-listview
bean-validation
dreamfactory
numerics
pdfnet
aurelia-cli
uivisualeffectview
llvm-ir
tidal-scheduler
selenide
flipkart
haar-classifier
google-prediction
code-push
kitura
flume-twitter
qvtkwidget
viewmodel
non-linear-regression
lightning-workbench
ilrepack
hibernate-ogm
comm
nslayoutconstraint
wikimapia
glassfish-4.1
sesame
dymola
oracle-fusion-apps
streamwriter
bilinear-interpolation
fuzzer
clp
ftp-client
spring-mongo
fragmentstatepageradapter
yii2-model
adp
uibinder
urbit
utf-16
univocity
self
auto-generate
knife
gql
darwin
textscan
text-align
rfc5545
juttle
dc
visual-studio-6
mobile-country-code
rpg
accessory
distributed-r
code-complexity
nodeload
accountmanager
illuminate-container
unrealscript
insertion
aero
donut-chart
chefspec
nsmutablestring
websphere-6.1
uitextfielddelegate
zotonic
java.lang.class
boost-foreach
data-dump
grails-validation
django-pagination
regression-testing
active-record-query
ccl
ios-4.2
.net-services
facebook-fbml
tacit-programming
ntruencrypt
stretchblt
rootkit
account-management

Resources

Encrypt Message