Without going deep into any theoretical discussion it is quit obvious that digital forensic tasks are actually very well suited for parallel processing. The key issue is readonly access to data in most data intensive operations, but also in other parallelism can be applied too taking into account nature of digital forensic process and its dataprocessing steps:
First data processing step in any digital forensics task is acquisition of data from device or media .
Device acquisition is serial task, since without live access we have only one channel the devices. Parallelism here is more question of device itself than forensic tool. As for example if we have more than one access channel to device data and we are in read only mode acquisition can be parallel too.
Key element is readonly access to data.
In other steps like during analyses, situation is just slightly different. It is usually data extraction and reconstruction which results finally in data size reduction, Data which we are using in this step is under read-only access, while results of processing are written and maybe again read back into process. This two modes of data access are well separated in analyses process. Each analyses task can go in parallel with other tasks without corrupting data. In most situation this new result are actually metadata. Such metadata is much smaller than original data, and can be put back into analyses cycle if it is necessary. As it is shown for most of the analyses step parallelism also can be used.
To illustrate this in more details we can discuss important forensic tasks indexing and raw search.
Indexing is specific since it can generate almost same volume of data as the original data. It is also highly repetitive task, since it depends new recovered or unlocked documents to be indexed and data added to existing index structure. Operations are very disk intensive but again can be done effectively in parallel, especially if index structure is stored in database way. It maybe sound strange but raw search is very close to index process, especially in phase of building index structure, in fact it is the same, simply said we have to extract raw data from disk and in that data find words which are indexed. Exactly the same as raw search do. Conclusion again is the same parallelism can be used too, also parallelism is important for indexing and search tasks since it requires processing huge amount of data .
Forensic data processing usually generates metadata which presents a new logical view on the original data. Good examples are bookmarks, so much loved in digital forensics.
As for report creating again the same approach works, from data and bookmarks report is compiled, data is not changed in that process, so it can be parallelised too.
So what is conclusion ?
Parallelism is highly desirable in digital forensics, but we don't have tools which are very effective in using parallelism, This is something what is happening just now with various level of success for different vendors.
My opinion is that vendors are landlocked in their tools and the real advantage of parallelism is in type of forensic tools which can be fully automated and can freely and easily cooperate, being scripted and capable of working standardised on the highly parallel computing infrastructure:)
I'll elaborate on this later, while talking about what such tools and systems have to be able to do and which already existing knowledge we have.
- acquisition,
- analyses and
- reporting.
First data processing step in any digital forensics task is acquisition of data from device or media .
Device acquisition is serial task, since without live access we have only one channel the devices. Parallelism here is more question of device itself than forensic tool. As for example if we have more than one access channel to device data and we are in read only mode acquisition can be parallel too.
Key element is readonly access to data.
In other steps like during analyses, situation is just slightly different. It is usually data extraction and reconstruction which results finally in data size reduction, Data which we are using in this step is under read-only access, while results of processing are written and maybe again read back into process. This two modes of data access are well separated in analyses process. Each analyses task can go in parallel with other tasks without corrupting data. In most situation this new result are actually metadata. Such metadata is much smaller than original data, and can be put back into analyses cycle if it is necessary. As it is shown for most of the analyses step parallelism also can be used.
To illustrate this in more details we can discuss important forensic tasks indexing and raw search.
Indexing is specific since it can generate almost same volume of data as the original data. It is also highly repetitive task, since it depends new recovered or unlocked documents to be indexed and data added to existing index structure. Operations are very disk intensive but again can be done effectively in parallel, especially if index structure is stored in database way. It maybe sound strange but raw search is very close to index process, especially in phase of building index structure, in fact it is the same, simply said we have to extract raw data from disk and in that data find words which are indexed. Exactly the same as raw search do. Conclusion again is the same parallelism can be used too, also parallelism is important for indexing and search tasks since it requires processing huge amount of data .
Forensic data processing usually generates metadata which presents a new logical view on the original data. Good examples are bookmarks, so much loved in digital forensics.
As for report creating again the same approach works, from data and bookmarks report is compiled, data is not changed in that process, so it can be parallelised too.
So what is conclusion ?
Parallelism is highly desirable in digital forensics, but we don't have tools which are very effective in using parallelism, This is something what is happening just now with various level of success for different vendors.
My opinion is that vendors are landlocked in their tools and the real advantage of parallelism is in type of forensic tools which can be fully automated and can freely and easily cooperate, being scripted and capable of working standardised on the highly parallel computing infrastructure:)
I'll elaborate on this later, while talking about what such tools and systems have to be able to do and which already existing knowledge we have.
No comments:
Post a Comment