Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Duplicate File Finder
#1


Here's a duplicate file finder perl script that i've modified a few times. It will look through files of a similar size and compare them by looking through their MD5 hashes to find duplicate files. This is VERY useful if you have copies of music or something within a same directory but with a different name, because this will detect those copies, and you can delete them manually to free up some space on your hard drive.

I didn't make it to automatically remove files just so that you have the option yourself to decide whether or not to delete them.

Code:
#!/usr/bin/perl -w

use strict;
use File::Find;
use Digest::MD5;

my %files;
my $wasted = 0;
find(\&check_file, $ARGV[0] || ".");

local $" = "\n";
foreach my $size (sort {$b <=> $a} keys %files) {
  next unless @{$files{$size}} > 1;
  my %md5;
  foreach my $file (@{$files{$size}}) {
    open(FILE, $file) or next;
    binmode(FILE);
    push @{$md5{Digest::MD5->new->addfile(*FILE)->hexdigest}},$file;
  }
  foreach my $hash (keys %md5) {
    next unless @{$md5{$hash}} > 1;
    print "\n";
    print "\n";
    print "($size bytes) Duplicate Files:\n";
    print "@{$md5{$hash}}\n";
    print "\n";
    $wasted += $size * (@{$md5{$hash}} - 1);
  }
}

1 while $wasted =~ s/^([-+]?\d+)(\d{3})/$1,$2/;
print "\n";
print "######################################################\n";
print "                                                    \n";
print "  You have $wasted bytes total in duplicate files   \n";
print "                                                   \n";
print "######################################################\n";
print "\n";

sub check_file {
  -f && push @{$files{(stat(_))[7]}}, $File::Find::name;
}

Put this in the directory that you want to look though and run it from it's filename within cmd prompt. It will compare files from different folders as well.

Enjoy
Reply
#2
No one active in perl programming i'm assuming? Smile Even if you don't understand it, I would recommend using this script to find file duplicates. It will find file duplicates by using it's MD5 hash, and compares files by file size to check. So unless the MD5 changes, it will detect any file duplicates. If the MD5 changes, that means the file is not the same or has been modified.

Enjoy Smile
Reply
#3
This is very interesting. I might actually use this later since I have a ton of images I need to sort through. It's pretty simple for what it does as well. I like.
Reply
#4
You'll find it very useful. I'm suprised there aren't more people who have experience in perl scripting. They are missing out on lots of good things you can do with the programming language.
Reply
#5
(02-14-2011, 09:06 PM)Infinity Wrote: You'll find it very useful. I'm suprised there aren't more people who have experience in perl scripting. They are missing out on lots of good things you can do with the programming language.

I personally like Ruby, it has the best bits of Perl and Python. Nice useful script though. Bad music in the YT vid. Tongue
Someone with no history is nothing but suspicious.
Reply
#6
(04-30-2011, 03:06 AM)eax Wrote: I personally like Ruby, it has the best bits of Perl and Python. Nice useful script though. Bad music in the YT vid. Tongue

Found it through Audioswap or whatever that youtube function is called. I already went through the rest of my youtube playlist from iTunes so I didn't know what to add. The music is just there so you don't get bored of listening to the silence though, it really has no effect on the actual video. You can mute it if you want
Reply
#7
What to do with that code?
Reply
#8
(05-08-2011, 09:28 AM)Bengan Wrote: What to do with that code?

You put it into a .pl file and you open it with your perl command line interpreter. You have to have a version of ActivePerl downloaded.
Reply
#9
A script to find and remove duplicate files in one or more directory. The program gets a speed-up by abbreviation file reads to a minimum. In a lot of cases, it alone reads small chunks from unique files and only files with duplicates are read completely.
Reply
#10
(05-14-2011, 11:49 AM)andrewgail Wrote: A script to find and remove duplicate files in one or more directory. The program gets a speed-up by abbreviation file reads to a minimum. In a lot of cases, it alone reads small chunks from unique files and only files with duplicates are read completely.

You don't know what you're talking about sorry to say.

1) it doesn't remove the duplicate file
2) it's not a program, nothing is compiled, it's only a script that gets interpreted
3) it doesn't read small chunks of files and read the full file of duplicates because it needs to determine what the duplicate files are first
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Admin Page Finder HF~Legend 1 1,288 08-20-2012, 01:33 PM
Last Post: Trump

Forum Jump:


Users browsing this thread: 1 Guest(s)