110/25/05 Jie Gao, CSE590-fall05 1Robust Aggregation in Sensor Robust Aggregation in Sensor NetworksNetworksJie GaoComputer Science DepartmentStony Brook University10/25/05 Jie Gao, CSE590-fall05 2PapersPapers• [Shrivastava04] Nisheeth Shrivastava, ChiranjeebBuragohain, Divy Agrawal, Subhash Suri, Medians and Beyond: New Aggregation Techniques for Sensor Networks, ACM SenSys '04, Nov. 3-5, Baltimore, MD. • [Nath04] Suman Nath, Phillip B. Gibbons, Zachary Anderson, and Srinivasan Seshan, Synopsis Diffusion for Robust Aggregation in Sensor Networks". In proceedings of ACM SenSys'04.• [Considine04] Jeffrey Considine, Feifei Li, George Kollios, and John Byers, Approximate Aggregation Techniques for Sensor Databases, Proc. ICDE, 2004.• [Przydatek03] Bartosz Przydatek, Dawn Song, Adrian Perrig, SIA: Secure Information Aggregation in Sensor Networks, Sensys’03.10/25/05 Jie Gao, CSE590-fall05 3Problem I: medianProblem I: median10/25/05 Jie Gao, CSE590-fall05 4Problem I: medianProblem I: median• Computing average is simple on an aggregation tree.– Each node x stores the average a(x) and the number of nodes in its subtree n(x).– The average of a node x can be computed from its children u, v. n(x)=n(u)+n(v). a(x)=(a(u)n(u)+a(v)n(v))/n(x).• Computing the median with a fixed amount of message is hard.– We do not know the rank of u’s median in v’s dataset.– We resort to approximations.xu v10/25/05 Jie Gao, CSE590-fall05 5Median and random samplingMedian and random sampling• Problem: compute the median a of n unsorted elements {ai}. • Take a random sample of k elements. Compute the median x. • Claim: x has rank within (½+ε)n and (½-ε)n with probability at least 1-2/exp{2kε2}. (Proof left as an exercise.)• Choose k=ln(2/δ)/(2ε2), then x is an approximate median with probability 1-δ.• A deterministic algorithm?• How about approximate histogram?• What if a sensor generates a list of values?10/25/05 Jie Gao, CSE590-fall05 6QuantileQuantiledigest (qdigest (q--digest)digest)• A data structure that answers – Approximate quantile query: median, the kth largest reading. – Range queries: the kth to lth largest readings.– Most frequent items.– Histograms.• Properties:– Deterministic algorithm.– Error-memory trade-off.– Confidence factor.– Support multiple queries.210/25/05 Jie Gao, CSE590-fall05 7QQ--digestdigest• Exact data: frequency of data value {f1, f2,…,fσ}.• Compress the data:– detailed information concerning frequent data are preserved; – less frequently occurring values are lumped into larger buckets resulting in information loss.• Buckets: the nodes in a binary partition of the range [1, σ]. Each bucket v has range [v.min, v.max].• Only store non-zero buckets.• Digest property:– Count(v) n/k. (except leaf)– Count(v) + Count(p) + Count(s) > n/k. (except root)parentsibling10/25/05 Jie Gao, CSE590-fall05 8ExampleExampleInput data bucketed Q-digestInformation loss10/25/05 Jie Gao, CSE590-fall05 9Construct a qConstruct a q--digestdigest• Each sensor constructs a q-digest based on its value.• Check the digest property bottom up: two “small”children’s count are added up and moved to the parent.10/25/05 Jie Gao, CSE590-fall05 10Merging two qMerging two q--digestsdigests• Merge q-digests from two children• Add up the values in buckets• Re-evaluate the digest property bottom up.Information loss: t undercounts since some of its value appears on ancestors.10/25/05 Jie Gao, CSE590-fall05 11Space complexity and error boundSpace complexity and error bound1. A q-digest with compression parameter k has at most 3k buckets.• By property 2, for buckets Q,– v∈Q[Count(v) + Count(p) + Count(s)] > |Q| n/k.– v∈Q[Count(v) + Count(p) + Count(s)] 3v∈QCount(v)=3n.– |Q|<3k.• Any value that should be counted in v can be present in one of the ancestors.1. Count(v) has max error logσ⋅n/k.– Error(v) ancestor pCount(p) ancestor pn/k logσ⋅n/k.2. MERGE maintains the same relative error.– Error(v) iError(vi) ilogσ⋅ni/k logσ⋅n/k.10/25/05 Jie Gao, CSE590-fall05 12Median and Median and quantilequantilequeryquery• Given q∈(0, 1), find the value whose rank is qn.• Relative error ε=|r-qn|/n, where r is the true rank.• Post-order traversal on Q, sum the counts of all nodes visited before a node v, which is the lower bound on the # of values less than v.max. Report it when it is first time larger than qn.• Error bound: logσ /k = 3logσ /m, where m=3k is the storage bound for each sensor.310/25/05 Jie Gao, CSE590-fall05 13Other queriesOther queries• Inverse quantile: given a value, determine its rank.– Traverse the tree in post-order, report the sum of counts v for which x>v.max, which is within [rank(x), rank(x)+εn]• Range query: find # values in range [l,h].– Perform two inverse quantile queries and take the difference. Error bound is 2εn.• Frequent items: given s∈(0, 1), find all values reported by more than sn sensors.– Count the leaf buckets whose counts are more than sn.– Small false positive: values with count between (s-ε)n and sn may also be reported as frequent.10/25/05 Jie Gao, CSE590-fall05 14Simulation setupSimulation setup• A typical aggregation tree (BFS tree) on 40 nodes in a 200 by 200 area. In the simulation they use 4000~8000 nodes.10/25/05 Jie Gao, CSE590-fall05 15Simulation setupSimulation setup• Random data;• Correlated data:3D elevation value from Death Valley.10/25/05 Jie Gao, CSE590-fall05 16Histogram v.s. qHistogram v.s. q--digestdigest• Comparison of histogram and q-digest. 10/25/05 Jie Gao, CSE590-fall05 17Tradeoff between error and Tradeoff between error and msgmsgsize size 10/25/05 Jie Gao, CSE590-fall05 18Saving on message size Saving on message size410/25/05 Jie Gao, CSE590-fall05 19Problem II: Aggregation along a Problem II: Aggregation along a spanning tree in practicespanning tree in practice• The impact of link dynamics on aggregation tree.• If a link fails, the data from the entire subtree is lost.– Wrong aggregated value;– Inconsistency.10/25/05 Jie Gao, CSE590-fall05 20Problem II: Aggregation along a Problem II: Aggregation along a spanning tree in practicespanning tree in practice• Solution: use multi-path routing (e.g., DAG) to improve robustness under link dynamics.• But if both paths succeed, the the same data is received twice!• This is ok for some
View Full Document