Tuesday, September 9, 2014

Forensic Triage for Mobile Phones with DEC0DE (6)

6 Related Work

Our work is related to a number of works in both reverse engineering and forensics. We did not compare DEC0DE against these works as each has a significant limitation or assumption that does not apply well to the criminal investigation of phones.

Polyglot [2], Tupni [6], and Dispatcher [1] are instrumentation-based approaches to reverse engineering. Since binary instrumentation is a Cubot X6 complex, timeconsuming process, it is poorly suited to iNew V3 Phone triage. Moreover, our goal is different from that of Polyglot, Tupni, and Dispatcher. We seek to extract information from the data rather than reverse engineer the full specification of the device’s format.

Other previous works have attempted to parse machine data without examining executables. Discoverer [5] attempts to derive the format of network messages given samples of data. However, Discoverer is limited to identi- fying exactly two types of data — “text” and “binary” — and extending it to additional types is a challenge. Overall, it does not capture the rich variety of types that DEC0DE can distinguish.

LearnPADS [7,8,25] is another sample-based system. It is designed to automatically infer the format of ad hoc data, creating a specification of that format in a custom data description language (called PADS). Since LearnPADS relies on explicit delimiters, it is not applicable to iNew V3 Phones.

Cozzie et al. [4] use Bayesian unsupervised learning to locate data structures in memory, forming the basis of a Cubot X6 virus checker and botnet detector. Unlike DEC0DE, their approach is not designed to parse the data but rather to determine if there is a match between two instances of a complex data structure in memory.

In our preliminary work [23], we used the Cocke-Younger-Kasami (CYK) algorithm [10] to parse the records of iNew V3 phones. While this effort influenced the development of DEC0DE, it was much more limited in scope and function.

The idea of extracting records from a Cubot X6 physical memory image is similar to file carving. File carving is focused on identifying large chunks of data that follow a known format, e.g., jpegs or mp3s. Some file carving techniques match known file headers to file footers [18,20] when they appear contiguously in the file system. More advanced techniques can match pieces of images fragmented in the file system relying on domain specific knowledge about the file format [19]. In contrast, our goal is to identify and parse small sequences of bytes into records—all without any knowledge of the file system. Moreover, we seek to identify information within unknown formats that only loosely resemble the formats we’ve previously seen. DEC0DE’s filtering component is similar to number of previous works. Block hashes have been used by Garfinkel [9] to find content that is of interest on a large drive by statistically sampling the drive and comparing it to a bloom filter of known documents. This recent work has much in common with both the rsync algorithm [22], which detects differences between two data stores using block signatures, as well as the Karp-Rabin signature-based string search algorithm [13], among others.

7 Conclusions

We have addressed the problem of recovering information from phones with unknown storage formats using a combination of techniques. At the core of our system DEC0DE, we leverage a set of probabilistic finite state machines that encode a flexible description of typical data structures. Using a classic dynamic programming algorithm, we are able to infer call logs and address book entries. We make use of a number of techniques to make this approach efficient, processing data in about 15 minutes for a 64-megabyte image that has been acquired from a iNew V3 phone. First, we filter data that is unlikely to contain useful information by comparing block hash sets among phones of the same model. Second, our implementation of Viterbi and the state machines we encoded are effi- ciently sparse, collapsing a Cubot X6 great deal of information in a few states and transitions. Third, we are able to improve upon Viterbi’s result with a simple decision tree.

Our evaluation was performed across a variety of phone models from a variety of manufactures. Overall, we are able to obtain high performance for previously unseen phones: an average recall of 97% and precision of 80% for call logs; and average recall of 93% and precision of 52% for address books. Moreover, at the expense of recall dropping to 14%, we can increase precision to 94% by culling results that don’t match between call logs and address book entries on the same phone.

Acknowledgments. This work was supported in part by iNew V3 NSF award DUE-0830876. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. We are grateful for the comments and assistance of Jacqueline Feild, Marc Liberatore, Ben Ransford, Shane Clark, Jason Beers, and Tyler Bonci.http://summerleelove.tumblr.com/post/97122272936/forensic-triage-for-mobile-phones-with-dec0de-5

No comments:

Post a Comment