DOC PREVIEW
USC CSCI 599 - fft

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun SwamiSimilaritySimilarity QueriesExtracting Features from SequencesDiscrete Fourier TransformDFT Concept IDFT Concept IIDFT CharacteristicsProposed TechniqueEuclidean distance featuresSlide 11Number of Fourier coefficientsPerformance ExperimentsSlide 14Different Sequence Set SizeVarying Sequence LengthDiscussionSummaryEfficient Similarity Search in Efficient Similarity Search in Sequence DatabasesSequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Rakesh Agrawal, Christos Faloutsos and Arun SwamiLeila KaghazianSimilaritySimilarity•Exact queries •Similarity–Identify companies with similar pattern of growth–Determine products with similar selling pattern–Discover stocks with similar movement in stock prices.Similarity QueriesSimilarity Queries•Whole Matching. The sequences to be compared have the same length n.–Range Query. Given a query sequences that are similar within distance “e”.–All-Pairs queries. Given N sequences, find the pairs of sequences that are within “e” of each other.•Subsequence Matching. The query sequence is smaller; we look for a subsequence in the large sequence that best matches the query sequence.Extracting Features from Extracting Features from SequencesSequences•For numerical sequences, extracting K features, mapping it to k-dimensional space and using multidimensional index methods (R*-tree, R-tree,grid-files,…) to store and search these points.•Completeness of feature extracting•Dimensionality “curse”Discrete Fourier TransformDiscrete Fourier Transform•All periodic waves can be generated by combining Sin and Cos waves of different frequencies•Number of Frequencies may not be finite•Fourier Transform Decomposes a Periodic Wave into its Component FrequenciesDFT Concept IDFT Concept IDFT Concept IIDFT Concept IIDFT CharacteristicsDFT Characteristics•Completeness of feature extracting•Dimensionality curse•Parseval theorem gives that Euclidean distance between two signals x and y in the time domain is the same as their Euclidean distance in the frequency domainProposed TechniqueProposed Technique•Obtain the coefficients of DFT of each sequence in the database•Build a multidimensional index (F-index) using the first fc (<5)Fourier coefficients.•For a range query, obtain the first fc Fourier coefficients of the query.•For an all-pairs query, doing a spatial join using the F-index (superset of the answer set)•The actual answer set is obtained in a post-processing stepEuclidean distance featuresEuclidean distance features•Euclidean distance is useful in many cases•It can be used with any other type of similarity measure•Euclidean distance is the optimal distance measure of estimation if signals are corrupted by Gaussian additive noise•It is preserved under orthonormal transformsDFT CharacteristicsDFT Characteristics•Preserves the distance•Is easy to compute •Concentrate the energy of the signal in few coefficients•It’s a orthonormal transform•The data dependent ones–+ better performance–- expensive data reorganization if data set evolves over time•Data independent ones(DFT, DCT, wavelet)Number of Fourier coefficients Number of Fourier coefficients •Worst-case signal is White noise when xt is completely independent of its neighbors.–It has the same energy in every frequency means all frequency are equally important. This is bad for F-index.•Random walks (brown noise)–Stock movements and exchange rates•Primary and secondary trends correspond to strong, low frequency signals while minor trends corresponds to weak, high frequency signalsPerformance ExperimentsPerformance Experiments•How to choose the number of Fourier coefficients to be retained (cut-off frequency fc) in the F-index method. –A larger fc•reduces the false hits•increases the search time.•How does the search time grow as a function of number of sequences in the database?•How does the length n of the sequences affect the performance?Range QueriesAll-Pairs QueriesNumber of Fourier coefficients Number of Fourier coefficientsDifferent Sequence Set SizeDifferent Sequence Set SizeAll-Pairs QueriesRange QueriesVarying Sequence LengthVarying Sequence LengthAll-Pairs QueriesRange QueriesDiscussion Discussion •The minimum execution time for both range and all-pairs queries is achieved for a small number of fc•Increasing the number of sequences in the database results in higher gains for this method•Increasing the length of the sequence n also results in higher gain for the methodSummary Summary •Use DFT to extract sequence features•Only first few coefficient is strong enough •DFT is orthonotmal•Use R*-tree for indexing•Use Euclidean distance •Complexity is


View Full Document

USC CSCI 599 - fft

Documents in this Course
Week8_1

Week8_1

22 pages

Week2_b

Week2_b

10 pages

LECT6BW

LECT6BW

20 pages

LECT6BW

LECT6BW

20 pages

5

5

44 pages

12

12

15 pages

16

16

20 pages

Nima

Nima

8 pages

Week1

Week1

38 pages

Week11_c

Week11_c

30 pages

afsin

afsin

5 pages

October5b

October5b

43 pages

Week11_2

Week11_2

20 pages

final

final

2 pages

c-4

c-4

12 pages

0420

0420

3 pages

Week9_b

Week9_b

20 pages

S7Kriegel

S7Kriegel

21 pages

Week4_2

Week4_2

16 pages

sandpres

sandpres

21 pages

Week6_1

Week6_1

20 pages

4

4

33 pages

Week10_c

Week10_c

13 pages

LECT7BW

LECT7BW

19 pages

24

24

15 pages

14

14

35 pages

Week9_c

Week9_c

24 pages

Week11_67

Week11_67

22 pages

Week1

Week1

37 pages

LECT3BW

LECT3BW

28 pages

Week8_c2

Week8_c2

19 pages

Week5_1

Week5_1

19 pages

LECT5BW

LECT5BW

24 pages

Week10_b

Week10_b

16 pages

Week11_1

Week11_1

43 pages

Week7_2

Week7_2

15 pages

Week5_b

Week5_b

19 pages

Week11_a

Week11_a

29 pages

LECT14BW

LECT14BW

24 pages

T7kriegel

T7kriegel

21 pages

0413

0413

2 pages

3

3

23 pages

C2-TSE

C2-TSE

16 pages

10_19_99

10_19_99

12 pages

s1and2-v2

s1and2-v2

37 pages

Week10_3

Week10_3

23 pages

jalal

jalal

6 pages

1

1

25 pages

T3Querys

T3Querys

47 pages

CS17

CS17

15 pages

porkaew

porkaew

20 pages

LECT4BW

LECT4BW

21 pages

Week10_1

Week10_1

25 pages

wavelet

wavelet

17 pages

October5a

October5a

22 pages

p289-korn

p289-korn

12 pages

2

2

33 pages

rose

rose

36 pages

9_7_99

9_7_99

18 pages

Week10_2

Week10_2

28 pages

Week7_3

Week7_3

37 pages

Load more
Download fft
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view fft and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view fft 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?