The Git index and Recovering Files
This question on StackOverflow
asks about recovering files that were added to the Git index using
git add
but were subsequently removed from the index. I provided an
answer for that direct question, since you can recover these changes, but
I also wanted to dig a little deeper into what happens when you add something
to the index.
When Igit add
a file, it will add that file to the object database and place
the object information in the staging area. For example, I can create a new
file and git add
it to my staging area, and examine my staging area using the
git ls-files --stage
command to see the details (including the object ID) of
what's staged:
% echo "new file" > newfile.txt
% git add newfile.txt
% git ls-files --stage
100644 40ee2647744341be918c15f1d0c5e85de4ddc5ed 0 file.txt
100644 3748764a2c3a132adff709b1a6cd75499c11b966 0 newfile.txt
So this file is a normal git blob at this point, and lives inside the git repository, even though I haven't committed these changes yet:
% ls -Fls .git/objects/37 total 1 1 -r--r--r-- 1 ethomson Administ 26 May 9 09:26 48764a2c3a132adff709b1a6cd75499c11b966
That Git has already created a blob and added that information to the index is why you can make changes, stage them and then continue making changes to a file and only the staged files will be committed (not the subsequent, unstaged modifications).
If I append some data to this file, it will have both staged changes and unstaged changes:
% echo an addendum >> newfile.txt
% git status
On branch master
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: newfile.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: newfile.txt
If I were to commit right now, it would only commit the staged changes,
leaving the unstaged changes behind. But what if I forgo these changes
entirely and do a git reset --hard
? In that case, only my index is
updated but the blob will be maintained in the object database, so I can
recover if I had mistakenly unstaged this:
% git reset --hard HEAD
% ls -Fls .git/objects/37
total 1
1 -r--r--r-- 1 ethomson Administ 26 May 9 09:26 48764a2c3a132adff709b1a6cd75499c11b966
Generally, though, I won't know the object ID of the file I've just
misplaced, so I would use the git fsck
tool, which will do an integrity
check of the git repository and show me any objects that are not
"reachable", either because they were part of a commit that is not on a
branch anymore, or because I git add
ed a file and did not commit it. My
newfile.txt
is one of these unreachable objects:
% git fsck
Checking object directories: 100% (256/256), done.
dangling blob 3748764a2c3a132adff709b1a6cd75499c11b966
Unfortunately, its filename is not stored in the object database (since identical contents would have the same object regardless of name), so if you have many dangling blobs, you will have to examine each one:
% git show 3748764
new file
Once I determine which dangling blob it is that I want to recover, I can put
it back on the filesystem by redirecting git show
:
% git show 3748764 > newfile.txt
And the file is recovered!