Well, for better search engine optimization, I had created a AUTO-KEYWORD generating script. The following was the procedure that I followed..
Read the entire contents.
Converted it into String.
Removed the HTML Tags and the Whitespace characters
Filtered out only valid words without special characters ( only hypen permited )
Once all the filtering process was done, General words like "the", "a" etc was removed
After that, the string was converted into Array.
Using the callback of the following function, I calculated the Frequency of the words.
[array_count_values() returns an array using the values of the input array as keys and their frequency in input as values. ]
Then The Top 15 words were selected, converted into string and printed out.
They were embedded in BOLD characters automatically!
Now the **** thing that happened was, All the spelling mistakes and things like LOL and OK and hundreds of diffrent words started comming up at the top 20. Any suggestions in which I can make only top valid english meaningful words as the keyword.
| |
|
Welcome to AstaHost - Dear Guest | |
Toggle shoutbox
Shoutbox
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Auto-KeyWord Generating Script.
Started by OpaQue, Sep 11 2004 02:17 AM
5 replies to this topic
#2
Posted 11 September 2004 - 10:30 AM
hm sounds difficult. only two ways i can think of right now, first is to store all the "bad" words and get the engine to exclude them. second is to have a dictionary and get the engine to validate words based on whether they can be found in the dictionary.
#3
Posted 07 March 2005 - 08:25 AM
Considering that most of the abbreviations that people use are three letters and there are few words that people will actually type into a search engine that are three letters, you could block all words that are three letters or shorter. This is not the best idea because people will want to search for things like "cat" or "dog." You can definitely block words that are two or one letter, because those are only inconsequential words. Yuo can also have a vowel/consonant checker, and if a word contains no vowels, no consonants, three or more vowels in a row, or three or more consonants in a row, block it. I may be able to help more if you explain to me what the purpose of diaplying the top 15 searched for words are, sometimes understanding the reasoning leads to interesting solutions.
#4
Posted 26 March 2005 - 10:41 PM
I dont use that script now..
The main reason is.. it puts a lot of load on the server.
Removing words below 4 characters is a good idea.
Comparing results with dictionary is not a good idea..puts load again :-(
Also, many a times, the word which is repeated is not necessarily topic relevant. I now use The topic titles as "Page titles" for and also as Headings. This title is again repeated on the page to improve keyword density
The main reason is.. it puts a lot of load on the server.
Removing words below 4 characters is a good idea.
Comparing results with dictionary is not a good idea..puts load again :-(
Also, many a times, the word which is repeated is not necessarily topic relevant. I now use The topic titles as "Page titles" for and also as Headings. This title is again repeated on the page to improve keyword density
#6
Posted 29 March 2005 - 03:07 AM
I would recommend excluding all words under 4 characters.
Its a good idea, but as you said it puts a lot of load on the server. Maybe using a table, each post is parse for the keywords, the added to the table if field exists the add to a count, if it doesnt insert a new row.
Then use the caching feature to cache the top words in the forum. That would mean the load would be minimal as the query is added to a query already in use
Its a good idea, but as you said it puts a lot of load on the server. Maybe using a table, each post is parse for the keywords, the added to the table if field exists the add to a count, if it doesnt insert a new row.
Then use the caching feature to cache the top words in the forum. That would mean the load would be minimal as the query is added to a query already in use
Reply to this topic
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users











