Home / Linux / How to identify duplicate files on Linux

How to identify duplicate files on Linux

Identifying files that share disk space relies on making use of the fact that the files share the same inode — the data structure that stores all the information about a file except its name and content. If two or more files have different names and file system locations, yet share an inode, they also share content, ownership, permissions, etc.

These files are often referred to as “hard links” — unlike symbolic links that simply point to other files by containing their names. Symbolic links are easy to pick out in a file listing by the “l” in the first position and -> symbol that refers to the file being referenced.

$ ls -l my*
-rw-r--r-- 4 shs shs   228 Apr 12 19:37 myfile
lrwxrwxrwx 1 shs shs     6 Apr 15 11:18 myref -> myfile
-rw-r--r-- 4 shs shs   228 Apr 12 19:37 mytwin

Identifying hard links in a single directory is not as obvious, but it is still quite easy. If you list the files using the ls -i command and sort them by inode number, you can pick out the hard links fairly easily. In this type of ls output, the first column shows the inode numbers.

$ ls -i | sort -n | more
 788000 myfile	<==
 788000 mytwin	<==
 801865 Name_Labels.pdf
 786692 never leave home angry
 920242 NFCU_Docs
 800247 nmap-notes

Scan your output looking for identical inode numbers and any matches will tell you what you want to know.

If, on the other hand, you simply want to know if one particular file is hard-linked to another file, there’s an easier way than scanning through a list of what may be hundreds of files. The find command’s -samefile option will do the work for you.

$ find . -samefile myfile

Notice that the starting location provided to the find command will determine how much of the file system is scanned for matches. In the above example, we’re looking in the current directory and subdirectories.


>> Source Link

Check Also

An Open Source Programming Language

Microsoft has introduced a new open source programming language called Bosque. It’s inspired by the …

%d bloggers like this: