DF2 lecture 1
20 jan. 2013The lecture started with an introduction to the course, and then we talked about the "whole picture" of forensics for a while before we went deeper into ontology and tolerant searching by looking at some papers of recent work in the field
Some of the initial subjects were:
- SANS certificate: It was highly recommended to get certified, but the courses are expensive and usually only realistic as a part of a work situation.
- The field of forensics is huge, even if we focus only on digital forensics. Automation, standardizing and benchmarking of methods are important.
- Being an expert in all domains is impossible: Be an expert on smaller domains.
- It's important to understand how the algorithms work (compared to "push the button" analysis). Dual tool verification is important. Know the weaknesses of the algorithms you use. Redundancy is a method used in safety: Independent hardware and software development.
- The Daubert standard: is important. Basically how to deal with expert witnesses testimony. Builds on Frye standard which is a "lite version". We have 4 Daubert factors: testing (in field conditions), peer review/publication, error rates close to 0, standardization (method used in the same way) and acceptability in the relevant scientific community (from Frye) (source1, source2)
- There is going to be a talk on the legal framework held by the police department
- How reliable is a digital representation of physical evidence? Like a scanned document.
Ontology
The Ontology definition: The difference between taxonomy and ontology is that taxonomy is hierarchical while ontology is a reference based system (like wiki).
Forensics Wiki as mentioned as a good source of information.
File formats used is usually XML (JSON better choice?). Goal is to present knowledge. What is knowledge, meaning and understanding? Gathered manually written by experts or learned automatically?
"Web ontology language" on w3.org/TR/owl-features
- Paper: Digital Forensics Ontology Framework (Jarle Kittilsen, Katrin Franke, Bernhard Hämmerli)
- Mindmap: digital forensics (Jarle Kittilsen, Katrin Franke, Bernhard Hämmerli)
Search
Fast search is important when searching a lot of data. We have fast search algorithms like Boyer-Moore, but it only works on exact strings.
How to find similar? Use edit distance (more general than hamming distance). Terms consecutive insert and delete to describe changes between two strings. How many characters must be added and removed (combined: replaced) in order to change the first string into the other.
- Paper: Cross-Computer Malware Detection in Digital Forensics
- The first exercise is going to be about search algorithms
From machine learning, self organizing maps (SOM) can group features by euclidean distance. Cluster distance threshold must be calibrated. Find "outliers" that are possible candidates for malicious behavior.