counting


How to apply hyperloglog to a timeseries stream


Can someone explain or link to an explanation about how counting the cardinality of a set with HLL can be used for time series analysis?
I'm pretty sure druid.io does exactly this, but I'm looking for a general explanation of how to do this with HLL alone, without any specific library / database or specific HLL implementation.
A Naive way of doing that would be by prefixing a timestamp on the things we are counting. E.g., using redis HLL API as an example, if you are counting events, starting from second 1000001 up to second 1000060:
PFADD SOMEHLLVAR "1000001-event1" "1000001-event2" ...
PFADD SOMEHLLVAR "1000002-event1" "1000002-event3" ...
PFADD SOMEHLLVAR "1000003-event2" "1000003-event3" ...
# Get count of occurrences of event1 in a minute long range:
PFCOUNT "1000001-event1" -> 1
PFCOUNT "1000002-event1" -> 1
PFCOUNT "10000..-event1" -> ..
PFCOUNT "1000060-event1" -> 0
...add all numbers! -> 2
Just one of the problems this would have is that you would need to iterate through each second in a given range to find out, say, the count of specific events in the last minute.
Using the hyperUnique aggregator in Druid requires a bit of coordination between the ingestion side and the query side.
On the ingestion side, in your list of aggregators, you need to include a "hyperUnique" aggregator where the fieldName matches the dimension you wish to eventually run unique counts over. This creates a new metric that contains HLL "sketches". When your data is ingested and queryable, you use the same "hyperUnique" aggregator on the query side to query for the metric you ingested. You can try out a timeseries query (http://druid.io/docs/latest/TimeseriesQuery.html)
BTW, check out groups.google.com/forum/#!forum/druid-development for more questions about HLL and druid.

Related Links

How to answer queries of type l,r,k which finds number of elements in an array in range l to r which occurs atleast k times?
Dafny and counting of occurences
Counting the number of capital letters in each row in [R]
python: count values of dictionary
How to apply hyperloglog to a timeseries stream
Counting the number of occurrences of C in each line and outputting this number plus the total number of characters in that line
Digit Counting Issue in C++ Program
Binary Strings of the form *111*

Categories

HOME
qt
elasticsearch
barcode-scanner
android-emulator
wine
yeoman-generator-angular
slide
parsley.js
graphdb
smartgwt
google-cloud-logging
android-contacts
rpmbuild
coroutine
data-type-conversion
unity-container
workday
firemonkey-style
running-object-table
gravity-forms-plugin
fatfs
fusion
credit-card
publishing
wampsharp
pdfminer
numerics
code-rally
eclipse-gmf
assert
reset
praat
klee
notesview
permission-denied
moinmoin
destroy
deployd
gestures
android-sharing
line-intersection
htmlspecialchars
mongodb-aggregation
test-data
clrs
case-when
alertify
gmaps.js
ltrace
adobe-reader
d3v4
gapi
ipywidgets
ndk-build
jpda
visual-studio-monaco
gcloud-node
between
unsigned
adjacency-list
exists
android-async-http
outlook.com
opencyc
uitest
sequence-sql
componentart
actionpack
html-escape-characters
findersync
wicket-1.5
inequality
xcode6.3.1
otl
senchatouch-2.4
assetic
cloo
unrealscript
iirf
postgres-xc
towers-of-hanoi
getopt-long
uitextfielddelegate
monocross
sigar
radscheduler
tablet-pc
3des
formal-semantics
aptitude
ccl
solandra
bucket
source-code-protection
comment-conventions
code-camp
html-generation

Resources

Database Users
RDBMS discuss
Database Dev&Adm
javascript
java
csharp
php
android
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App