Short Description: Command-Line Corrupt Office 2007 Text Extractor extracts text from corrupted docx, xlsx and pptx files where the respective Office 2007 or 2010 programs fail to make this basic recovery. It also works on non-corrupt files.
Long Description 1: Coded by Ccy, author of HaHa Zip and using Delphi Zip, Command-Line Corrupt Office 2007 Text Extractor extracts text from corrupted docx, xlsx and pptx files where the respective Office 2007 programs fail to make this basic recovery. It also works on non-corrupt files. Docx, xlsx and pptx files are conventionally zipped collections of mostly XML files. Delphi Zip ignores zip file corruption allowing access to XML data despite zip problems.
Long Description 2: Coded by Ccy, author of HaHa Zip and using Delphi Zip, Command-Line Corrupt Office 2007 Text Extractor will often recover text from corrupt Office 2007 docx, xlsx, and pptx format files where the respective Office 2007 or 2010 programs cannot make the basic salvaging of the text or data.
Office 2007 Office Open format files are zipped collections of XML files. There are two kinds of corruption of these types of files, zip structure corruption and corruption of the XML files containing the actual text or data and/or the formatting. The unzipping module used in Office 2007 and 2010, appears to be more finicky than InfoZip module used by Command-Line Corrupt Office 2007 Text Extractor. Thus the underlying XML can often be extracted as raw material for this new program even though this is not available to Office 2007 and 2010 programs.
In regards to the other type of corruption, XML is by design a very unforgiving medium for file damage. From the errors returned from attempts at salvaging the text from corrupt docx and pptx files as well as the data from xlsx file, Office 2007 and 2010 appear to be using a standard interpreter of XML. Command-Line Corrupt Office 2007 Text Extractor on the other hand uses coding that is more tolerant of XML errors.
|