Friday, April 8, 2016

Some raw thoughts on current digital forensics, IT security and data science

Recently I’ve been tasked with writing down some thoughts as discussion ideas and teasers on current digital forensics, It security and data science. Some of this were floating around for a long time more as reaction to events than real effort to do a serious discussion.
At first glance digital forensics and data science does not have much in common, especially when we are talking about how digital forensics is approached and executed today. What is usually not taken into the account is the fact that digital forensics is the part of both computer and forensic science, two very different science fields. At the moment digital forensics is a new field getting incorporated into forensics, digital specifics should be recognized and incorporated into traditional forensic environment.
For start definitions should be stated. First we can introduce forensics and digital forensics. Forensics is “The application of scientific knowledge to legal problems" (Merriam-Webster), what includes forensic medicine, physics, chemistry, dentistry, fingerprints, DNA, firearm analysis, accounting all traditional sciences. In the other hand for the digital forensics we have first idea of “Forensic Computing” by V. Venema, D. Farmer late in 1990’s: „Gathering and analyzing data in a manner as free from distortion or bias as possible to reconstruct data or what has happened in the past on a system.”. When this definition of forensic computing is expanded with digital evidence we get what is in current sense digital forensics. By Wikipedia “Digital forensics and Computer forensics” is: defined as “Computer forensics, sometimes known as computer forensic science is a branch of digital forensic science pertaining to evidence found in computers and digital storage media. The goal of computer forensics is to examine digital media in a forensically sound manner with the aim of identifying, preserving, recovering, analyzing and presenting facts and opinions about the digital information”. In this context digital evidence or electronic evidence is defined as “any probative information stored or transmitted in digital form that a party to a court case may use at trial.”
To make things difficult digital evidence is the key element of digital forensics, what makes it hard to accept in the traditional forensics and law where sound physical evidence is golden standard. Also forensics science is not dealing with big amount of data but with specific science scenarios and analysis resulting in limited datasets, what causes different sensitivity and understanding of the data and computer science.
Even the basic Locard principle on which forensic science is build up, has its digital twist; Lockard’s Exchange Principle is "Every contact leaves a trace" (Prof. Edmond Locard, c. 1910). It is perfectly correct, log analysis was one of the first evolved branches of IT security and digital forensics. .One of the key forensic principles is not to change evidence; when applied to digital forensics means working with read only data copies with hash signatures providing proof of data not being changed. Translating this to practical computing means ability to do parallel processing limited only by media and processing bandwidth.
The core problem of digital forensics today is the problem of processing huge volumes of data. To be honest this is really a big unspoken obstacle which is often overlooked, sometimes not understood by digital forensic practitioners and even vendors. Disks size skyrocket from megabytes to tens of terabytes; this sheer volume of data where relevant digital evidence is hidden is a huge problem. Only to create a forensic copy of one terabyte disk you need at least 3 hours and this is even before any analysis can be done. After that step even more time consuming process of digital evidence finding and extraction is started and it takes usually much longer - sometimes days are used in this process. This step is analysis in digital forensics and is conceptually very close to datamining process.
Current mainstream digital forensic tools are not capable of efficient parallelism, automation or scripting and are limited to Microsoft Windows platforms on Intel architecture, “general purpose PC paradigm” which is not best choice for fast and efficient data processing.

Current problems and computing development makes this issues practically unsolvable without using knowledge and experience form other computing science fields, especially from data science. From data point of view, we can separate digital forensics into two broad categories: classic postmortem forensics and live forensic, in sense where we are dealing with static data or dynamically changing data. In both situations we have to work with raw data and transform it into meaningful digital evidence. This is even more significant if we are talking about incident response in modern networked systems. We can approach each end node involved in incident as data source which has to be collected and analyzed; a situation where we have very different types of data from raw binary disk and memory images to process structures, elaborate log information or local agent database. At the moment all this data is handled separately, not as a part of one picture. To address this issues in efficient way data science knowledge should be used, to refine methods and tools in digital forensics. 

11.04.2016 link to draft presentation  for this discussion 

No comments:

Post a Comment