As I stated in my Documents as Code post, text formats such as Markdown work well with Git as it was written for source code that is in a text based format and therefore doesn’t understand what has changed between two revisions of a binary document.
So, if others are writing most of their documentation in either Microsoft Word or OpenOffice’s Writer applications, how can you examine the evolving content between the various commits via a git diff in a Git repository?
First, create a git repository:
git init binary_diff cd binary_diff/
Then, create a *.odt document and add a simple line of text such as “hello.” Stage the file and commit the doc to the repo:
git add file.odt git commit -m "Create file.odt with hello"
Now, change the text in the doc to “Hello Solar System.” Add and commit the updated doc:
git commit -am "Update the file.odt file"
Let’s see the git log output:
git log --oneline f14e810 (HEAD -\> main) Update the file.odt file a2f8e6a Create file.odt with hello
Next, issue a git diff on the first and last commit to show that binary files do not show the differences:
git diff a2f8e6a..f14e810 diff --git a/file.odt b/file.odt index e08debd..02d4dce 100644 Binary files a/file.odt and b/file.odt differ
Not very helpful huh?
In order to enable diffs on binary files, do the following. First, create a .gitattributes file and add the following:
*.docx diff=docx *.odt diff=odt
Then, add this to the .git/config file:
[diff "docx"] textconv = pandoc --to=plain [diff "odt"] textconv = pandoc --to=plain
Now, do a git diff on the first and last commit to show that binary files do show the differences
git diff a2f8e6a..f14e810 diff --git a/file.odt b/file.odt index 02d4dce..e08debd 100644 --- a/file.odt +++ b/file.odt @@ -1 +1 @@ -hello +Hello Solar System
You will find that you can get the same result with *.docx file diffs.
This fix enables you to view how the .docx/.odt files have changed between the various commits.