10 år 10 år

This lecture consisted of two parts: One on reverse engineering, followed by fuzzy searching (catching up from the first lecture).

Reverse engineering
This lecture was based on the book "Malware Analyst's Cookbook and DVD", focusing on the first chapters.

  • Discussions related to defining forensics, forensics readiness (post mortem vs doing preparations)
  • Discussions around anonymity solutions, why we want to be anonymous, TOR, ethical hacking etc. Used for good and used for bad. See freehaven.net/anonbib for papers on anonymity.
  • Cryptographic hash functions, why we need them. How to deal with similar files: Similarity preserving hash functions (fuzzy hashing). Example ssdeep. Detect related malware, self modifying code, code reuse.
  • Antivirus scanners, example Anubis for analyzing unknown binaries, problem of uploading viruses which could contain sensitive information.
  • Difficulty of tracking source, fast flux technology,
    1. http://www.domaintools.com/research/reverse-ip/
    2. GeoTracking
    3. Waybackmachine: http://archive.org/web/web.php
    4. Tool: nslookup
    5. Tool: whois

Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting Malicious Code

  • Ch1: Anonymizing activity: Decloak.net, proxies, IP, user agent, cookies, (and flash cookies), TOR,
  • Ch2: Honeypots: Dionaea, Nepenthes, mwcollectd. Used for discovering new vulnerabilities, trends, statistics and early warning.
  • Ch3: Malware classification: BinDiff GUI, SSdeep (fuzzy hashing find similar files), ClamAV and YARA (signature scanning engines), PEiD (identify packers with signatures)
  • Ch4: Sandboxes and multi AV-scanners: Online scanning services (VirusTotal, Jotti, NoVirusThanks), NSRL RDS (reference data set).
  • Ch5: Domains and IP's: Geolite, matplotlib, fast-flux-domains technique, reverse DNS visualization, Whois services (shadowserver)
  • Ch6: Documents, shell code and URL's: Spidermonkey (javascript), Heap spray technique, JSunPack (Javascript unpacker), OfficeMalScanner, Graphviz: shellcode visualizer (page 193)
  • Ch7: Malware lab setup: Snort, Deep freeze, Redirect HTTP/HTTPS via proxy for sniffing (page 225), Wireshark, Simulate Internet with Inetsim

See part 2 of reverse engineering.

Fuzzy search
We spent about an hour trying to implement search, followed by some tips. Basically using edit distance and setting limits on insertions and deletions (substitutions is a combination of delete and insert). There are libraries available to build on like python-Levenshtein for python. I already implemented Boyer–Moore string search so I went directly for writing something fuzzy. The code: (Far from usable..)

# coding=utf8
import os
os.system('clear') # empty the console

loop = True
haystack = u"A verry long string with lots' of erors in it cr3ated får testing purp0ses"
haystack = haystack.lower()
haystack_length = len(haystack)
search_criteria = u"lots of errors"
search_criteria = search_criteria.lower()
search_criteria_length = len(search_criteria)
search_criteria_window = 0.25
search_criteria_range = int(search_criteria_window * search_criteria_length)
fuzzyness = 0.80
fuzzyness_criteria = int(fuzzyness * search_criteria_length)
match_positions = []

print("Searching in \n\t'%s'\n for \n\t'%s'\n\n") % (haystack, search_criteria)

haystack_pos = 0
while(loop):
search_criteria_position = 0
current_match = 0
for i in range(0, search_criteria_length):
for j in range(0, search_criteria_range):
try:
if haystack[haystack_pos + i] == search_criteria[search_criteria_position + j]:
current_match += 1
search_criteria_position += (1 + j)
break
except IndexError:
continue

if current_match >= fuzzyness_criteria:
print("Found a match at position %s\n %s\n\n") % (haystack_pos, haystack[haystack_pos: haystack_pos + search_criteria_length])
#match_positions.append(haystack_pos)
else:
current_match = 0

if haystack_pos < haystack_length:
haystack_pos += 1
else:
loop = False