Content Data Store(CDS) Compressing and enhancing technique…


Aggressively we are adding new features to Content Data Store(CDS) system. One of the feature that i am going to discuss here is compression technique(BigData application is incomplete without compression). And what if i tell you in CDS, we use compression along with enhancement of visual image/scanned documents.

Our compression technique has two additional features:-

  1. Smaller:- Reduce file size and save 80% space compare to your image/scanned document.
  2. Clearer:- Isolate foreground color by identifying background color and choose small number of representative colors.

Another important feature is performance. We don’t use api provided by office-lens or others, instead we have small python script which call numpy, scipy and python-imaging. Therefore its very fast to convert MB’s if images or scanned document.

