Office 2007 has a much nicer document format. The new format is XML
based, no longer some proprietary binary format, and unlike the previous
XML format introduced in Office 2003 this is the real deal not a format
that only supports a subset of the functionality.
But the new format, DOCX for Word, itself is actually a ZIP file
containing a whole range of other documents. These other documents are
XML, actually there can be other file types as well but the XML files is
what it is all about. So ZIP and XML means we should have an easy time
of opening and reading these file from our .NET programs, after all both
are pretty standard and have plenty of tools and libraries available.
But life is even better than that. Microsoft has decided to add native
support for reading and writing these files in .NET, in the
System.IO.Packaging package to be exact. To use there you don't even
have to have Office installed on your machien. Take a look at the
following code for an example:
Imports
System.IO.Packaging
Imports System.Xml
Module Module1
Sub Main()
Using doc
As Package = Package.Open("MyDocument.docx")
Dim part
As PackagePart
Console.WriteLine("=====
Document parts. =====")
For Each
part In doc.GetParts()
Console.WriteLine(part.Uri)
Next
Console.WriteLine("Press
any key to continue and show the contents.")
Console.ReadKey()
Console.WriteLine("=====
/word/document.xml contents. =====")
part = doc.GetPart(New
Uri("/word/document.xml", _
UriKind.RelativeOrAbsolute))
Using
reader As XmlReader =
XmlReader.Create(part.GetStream())
While
reader.Read()
Console.WriteLine(reader.ReadString())
End While
End Using
Console.WriteLine("Press
any key to end.")
Console.ReadKey()
End Using
End Sub
End Module
The only slightly confusing thing is trying to import the
System.IO.Packaging namespace. You would tend to try to set a
reference to a System.IO.Packaging.dll, either from the GAC or file system,
but it just isn’t there. Instead you need to add a reference to
WindowsBase.dll. It should be in the list of .NET assemblies but if not you
can find it in the "C:\Program Files\Reference
Assemblies\Microsoft\Framework\v3.0" folder.