I needed to delete all but 1 duplicate files from a bunch of experiments. I didn't care about file names. Here's the bash - it works.
#!/bin/bash
declare -a list=( `find ./DIRECTORY_CONTAINING_FILES f` );
declare -a sums;
cnt=${#list[@]}
echo "creating md5sum list"
for ((x = 0; x < $cnt -1; x++))
do
sums[$x]=`md5sum ${list[$x]} | cut -d ' ' -f 1`
#echo ${sums[$x]}
progress=$(echo "scale=2;($x/$cnt)*100" | bc)
echo -ne "progress $progress %\r"
done
echo "doing compare"
for ((x = 0; x < $cnt -1; x++))
do
for ((y = x+1; y < $cnt; y++))
do
if [ "${sums[$x]}" == "${sums[$y]}" ];then
#echo ${sums[$x]} " - " ${sums[$y]}
if [ ${list[$x]} != ${list[$y]} ]; then
#remove '#' and combine next 2 lines to enable
echo "Delete file\n${list[$y]}\n${list[$x]}"
# && rm -f ${list[$y]}
fi
fi
done
done
If you didn't care about the filenames could you have just renamed all the files to their md5sum?
for f in *
do
mv $f $(md5sum $f|cut -f1 -d" ")
done
ha! Your bash is consistently better than moin. HOW DO YOU DO IT?!