|
|
|
|
![]() ![]() |
Oct 8 2007, 04:19 PM
Post
#1
|
|
|
Super Member Group: Members Posts: 515 Joined: 29-September 06 Member No.: 16,228 |
Can't really Google what I can't name
I have a massive file of urls, which I want to make sure there is only one of each in. Not really bothered what its in, even js or php script, or something to google would be brilliant. |
|
|
|
Oct 8 2007, 11:18 PM
Post
#2
|
|
|
Living at the Datacenter Group: [HOSTED] Posts: 696 Joined: 30-June 06 From: Australia Member No.: 14,219 |
So you are looking for duplicate lines in the same file? A very time consuming method would be to put all the links in a document editor (Notepad, Winword for Windows, Openoffice, gedit for Linux) and search for the url's, if there is a second copy, remove it.
Many of the Duplicate Finders (search google) that are on the internet deal with actual files, and not just lines. |
|
|
|
Oct 9 2007, 04:20 AM
Post
#3
|
|
|
Absolute Newbie Group: Admin Posts: 888 Joined: 20-February 05 From: Indianapolis, Indiana, USA (Midwest) Member No.: 2,714 |
Well, there are a couple of ways to do this.
The first is to write a script that reads the file, puts the contents in an array (seperated by line) then do a duplicate value check on the array. Then rewrite the file from the cleaned array. The second is to copy the conents of your file and paste in a spreadsheet. Sort the spreadsheet and then evaluate the data after it is sorted. You can manually check to see if there are duplicates as the copy would be right below it. You could automate the search by using a spreadsheet formula to make a copy of the entry in the next column over only if the entry above it is not the same. Then your new column of data would be free of duplicates but might have a hole or two where the duplicate entry wasn't copied over. I use this method frequently for various lists. Also, instead of copying the entry to the next column if there isn't a match, you could print a message if there is a match in the row above then just delete the duplicate line. Hope this helps. vujsa |
|
|
|
Oct 9 2007, 05:32 AM
Post
#4
|
|
|
Advanced Member Group: Members Posts: 170 Joined: 30-July 07 Member No.: 23,704 |
The second is to copy the conents of your file and paste in a spreadsheet. Sort the spreadsheet and then evaluate the data after it is sorted. You can manually check to see if there are duplicates as the copy would be right below it. You could automate the search by using a spreadsheet formula to make a copy of the entry in the next column over only if the entry above it is not the same. Then your new column of data would be free of duplicates but might have a hole or two where the duplicate entry wasn't copied over. I use this method frequently for various lists. Also, instead of copying the entry to the next column if there isn't a match, you could print a message if there is a match in the row above then just delete the duplicate line. Actually this is 1 way to do it and its a great way to do it. And if you are using Microsoft Excel, then you can use advance filter function that will filter all duplicates. So don't have to manually look through your data. Steps will be: 1. Sorting all data. 2. Trim all data without any spaces at the end. 2. Click on Data -> Filter -> Advance Filter. 3. Select Action: Filter the list, in-place. 4. Check on Unique records only. 5. Click ok. That will basically filter all duplicate records. Hope this help too. And if you are using other spreadsheet, like OpenOffice, you can do that too. But don't know the steps. |
|
|
|
Oct 10 2007, 04:39 PM
Post
#5
|
|
|
Super Member Group: Members Posts: 515 Joined: 29-September 06 Member No.: 16,228 |
Wow.. I've been using excel in college all term and still don't know things like that! Done, ty.
|
|
|
|
![]() ![]() |
Similar Topics
|
Lo-Fi Version | Time is now: 13th October 2008 - 09:32 PM |