I've done a short presentation recently about scanned documents and encase while doing my most recent Encase Enterprise Examination training. During practice and discussion we touched issues of scanned documents and paper evidence. By definition paper documents are not part of digital evidence, but their content and metadata can be part of investigation. It is same with picture processing when we have picture of the document. Basically you can read that paper evidence and put it by hand into case, or use some automation to let software to work for you.
The theory is simple, scan paper and process results as digital evidence, in practice plenty of things can be hard, quality of scan is first one comes to mind, than language and alphabet support in OCR software.
Language and alphabet support, localisation, it is not new issue for digital forensics practice, it was a lot of problems with non latin character sets and non-english languages since ever. Same is with the OCR, this is actually the most important factor since it is a readability of text recognition. Some software tools are with embedded OCRs as part of forensic package, but sometimes you have use external tool which works better for your choosen language and alphabet.
There are other issues especially ability to automate process, if you have fully automated process it will go faster, with less mistakes and can be reused, as it is for any automated solution. Sometimes it is scripting or using wizards, what depend on the OCR you have.
It is also worth mentioning scanning documents, intuitively it is just putting papers into scanner, but it is not so simple sometimes. It can be a digital camera taking pictures of the book too, whole setup is needed stand, lights, cameras and no forensic equipment. There is one very interesting blog DIY Book Scanning worth of reading especially if your lab is tight on budget.
20.5.2104
very nice paper was posted trough linkedin "Optical Character Recognition" by Irene Ferraz,
gives excellent description of forensics and ORC links.
The theory is simple, scan paper and process results as digital evidence, in practice plenty of things can be hard, quality of scan is first one comes to mind, than language and alphabet support in OCR software.
ELO http://www.eloweb.eu showing the proces of OCR |
Language and alphabet support, localisation, it is not new issue for digital forensics practice, it was a lot of problems with non latin character sets and non-english languages since ever. Same is with the OCR, this is actually the most important factor since it is a readability of text recognition. Some software tools are with embedded OCRs as part of forensic package, but sometimes you have use external tool which works better for your choosen language and alphabet.
There are other issues especially ability to automate process, if you have fully automated process it will go faster, with less mistakes and can be reused, as it is for any automated solution. Sometimes it is scripting or using wizards, what depend on the OCR you have.
It is also worth mentioning scanning documents, intuitively it is just putting papers into scanner, but it is not so simple sometimes. It can be a digital camera taking pictures of the book too, whole setup is needed stand, lights, cameras and no forensic equipment. There is one very interesting blog DIY Book Scanning worth of reading especially if your lab is tight on budget.
20.5.2104
very nice paper was posted trough linkedin "Optical Character Recognition" by Irene Ferraz,
gives excellent description of forensics and ORC links.