Determining applications and characteristics of encrypted wireless traffic Chris CMPE 257 3 9 2011 On Inferring Application Protocol Behaviors in Encrypted Network Traffic Inferring Speech Activity from Encrypted Skype Traffic Several fundamental security mechanisms for restricting access to network resources rely on the ability of a reference monitor to inspect the contents of traffic as it traverses the network As an administrator you want to know the type of traffic to determine if it is acceptable or not As a user you may need to know that someone can figure out what you are doing even if encrypted Traditional packet inspection does not work if the contents are encrypted No port numbers or TCP flags to check Also valuable if not encrypted but is being disguised as another type of traffic You can still view packet size timing and direction Gathered traffic from first 10 minutes of every quarter hour over a two month period on George Mason Universities OC 3 link While not wireless this gave them more data to experiment with Extracted packets with ports SMTP 25 HTTP 80 HTTP over SSL 443 FTP 20 SSH 22 and Telnet 23 inbound and SMPT and AIM outbound These ports were then marked with these labels without further inspecting the packets Results compare back to this labeling To generate the data for the encrypted test the packets were then encrypted with AES at 512 bits The only thing left was the timing size and direction of the packets For each trace all connections for the given protocol sorted in order of arrival then split into several smaller epochs of constant length s Count the number of packets of each type during each epoch 4 types small or not inbound or outbound s on the order of several seconds Construct a k Nearest Neighbor k NN classifier to assign a label to each epoch based on number of packets of each type To build the k NN classifier a random day from the data set was used as a training set based on the earlier port classification To classify each new epoch they use KullbackLeibler distance to determine which vectors in the training set are nearest to the vector of counts for the given epoch s epoch length k how many nearest neighbors Recognition rates tend to increase with increase with both s and k Increasing s gives a bigger sample but do not allow for analyzing shorter traces where an adversary could hid in quickly Paper chose s 10seconds as an acceptable tradeoff Given the list of labels the mode of traffic is the one with the most frequently occurring label Evaluate this classifier by applying it against another days traffic By using Kullback Leibler distance to construct classifiers for short slices of time they can be combined to build a classifier for longer traces which performs well on aggregate traffic where only a single application protocol is involved Since we cannot assume all flows in the aggregate carry the same application need to expand to a multi flow protocol detector In general a network administrator or a hacker is concerned with detecting the presence of a few specific protocols within the aggregate Modify the k NN classifer Label the vectors in the training set based on whether they contain an instance of the target protocol s Run experiment similar to before but with aggregate traffic Flag the aggregate as containing the target protocol if some percentage of the epochs return True Tune by adjusting the percentage Build statistical models for the sequence of packets produced by each protocol of interest and then use these models to identify the protocol in use in new connections Use techniques based on profile hidden Markov models Difficulty with protocols that have more then one typical behavior like SSH SCP for bulk transfer and interactive versus FTP always bulk transfer The profile HMM is best described as a leftright model built around two long parallel chains of hidden states Each chain has one state per packet in the TCP connection and each state emits symbols with a probability distribution specific to its position in the chain The addition of a second match state per position was intended to allow the model to better represent the correlation between successive packets Server Match state matches only packets observed traveling from the server to the client Client Match state matches packets traveling in the opposite direction A transition from a Client Match state to a Server Match state indicates that a typical packet for the given protocol was observed traveling from the client to the server followed by a similarly typical packet on its way from the server to the client Insert allows for one or more extra packets inserted in an otherwise conforming sequence between two normal parts of the session Delete allows for the usual packet at a given position to be omitted from the sequence Transitions from the Delete state in each column to Insert state in the next column allow for a normal packet at the given position to be removed and replaced with a packet which does not fit the profile In practice the Insert states represent duplicate packets and retransmissions while the Delete states account for packets lost in the network or dropped by the detector The Viterbi classifier finds each model s best explanation for how the packets in the sequence were generated whether by normal application behavior TCP retransmissions etc represented by the Viterbi path and the likelihood of each model s explanation i e the Viterbi path probability It then picks the model that provides the best explanation for the observed packets Using the classifiers from before would be computationally intensive Instead of building models for each protocol build one for the target protocol and one for the noise in the network Able to test 3200 connections in 15 seconds What if the connections are aggregated and put in an encrypted tunnel Wireless users VPN back to work User trying to further hide traffic by putting it all in an SSH tunnel A model based technique which enables us to accurately track the number of connections in a network layer tunnel which carries traffic for only a single application protocol Approach is founded on a few basic assumptions about the behavior of the tunneled TCP connections and their associated packets These assumptions while not entirely correct for real traffic nevertheless allow us to employ simple and usable models which as we demonstrate later produce reasonable results for a variety of protocols Assumption 1 The process Nt
View Full Document