counting


How to apply hyperloglog to a timeseries stream


Can someone explain or link to an explanation about how counting the cardinality of a set with HLL can be used for time series analysis?
I'm pretty sure druid.io does exactly this, but I'm looking for a general explanation of how to do this with HLL alone, without any specific library / database or specific HLL implementation.
A Naive way of doing that would be by prefixing a timestamp on the things we are counting. E.g., using redis HLL API as an example, if you are counting events, starting from second 1000001 up to second 1000060:
PFADD SOMEHLLVAR "1000001-event1" "1000001-event2" ...
PFADD SOMEHLLVAR "1000002-event1" "1000002-event3" ...
PFADD SOMEHLLVAR "1000003-event2" "1000003-event3" ...
# Get count of occurrences of event1 in a minute long range:
PFCOUNT "1000001-event1" -> 1
PFCOUNT "1000002-event1" -> 1
PFCOUNT "10000..-event1" -> ..
PFCOUNT "1000060-event1" -> 0
...add all numbers! -> 2
Just one of the problems this would have is that you would need to iterate through each second in a given range to find out, say, the count of specific events in the last minute.
Using the hyperUnique aggregator in Druid requires a bit of coordination between the ingestion side and the query side.
On the ingestion side, in your list of aggregators, you need to include a "hyperUnique" aggregator where the fieldName matches the dimension you wish to eventually run unique counts over. This creates a new metric that contains HLL "sketches". When your data is ingested and queryable, you use the same "hyperUnique" aggregator on the query side to query for the metric you ingested. You can try out a timeseries query (http://druid.io/docs/latest/TimeseriesQuery.html)
BTW, check out groups.google.com/forum/#!forum/druid-development for more questions about HLL and druid.

Related Links

How to answer queries of type l,r,k which finds number of elements in an array in range l to r which occurs atleast k times?
Dafny and counting of occurences
Counting the number of capital letters in each row in [R]
python: count values of dictionary
How to apply hyperloglog to a timeseries stream
Counting the number of occurrences of C in each line and outputting this number plus the total number of characters in that line
Digit Counting Issue in C++ Program
Binary Strings of the form *111*

Categories

HOME
android-studio
angular-ui-grid
transactions
dry
ews
mirc
mosquitto
cisco
colors
smooks
google-tasks-api
android-contacts
jqxgrid
cocos2d-android
robolectric
salt-cloud
clr
pe
gitkraken
unity-container
android-permissions
hdl
pyopencl
vuforia
firemonkey-style
function-points
introspection
plyr
unrar
sap-lumira
sales
fusion
steam-web-api
om-next
nest
orchardcms-1.8
busboy
subclassing
bluemixtools
nanogallery
django-crispy-forms
microsoft-ui-automation
xll
oauth2
utf
slideshow
classnotfoundexception
dagger
vungle-ads
httphandler
nuclio
return-type
aws-kinesis-firehose
comm
karabiner
gestures
winston
subforms
lvalue
sidewaffle
fitbit
oracle-fusion-apps
google-identity-toolkit
bilinear-interpolation
fuzzer
livecycle
dmarc
jsf-2.3
wininet
seccomp
raw-sockets
declarative
utf-16
preferenceactivity
openmrs
gdl
extend
featuretoggle
wso2ml
phpredis
ampersand
mathematical-expressions
ssmtp
debug-symbols
sql-processor
oracle-adf-mobile
application-blocks
pbkdf2
discovery
boinc
background-thread
arrow
sfinae
viewpagerindicator
uitextfielddelegate
image-scanner
nsfont
asio
tablet-pc
css-friendly
outlook-2007-addin
resharper-5.0
yahoo-maps
iphone-sdk-3.1.3

Resources

Database Users
RDBMS discuss
Database Dev&Adm
javascript
java
csharp
php
android
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App