Welcome Guest ( Log In | Register )



 
Reply to this topicStart new topic
> Auto-KeyWord Generating Script.
OpaQue
post Sep 11 2004, 02:17 AM
Post #1


Administrator
Group Icon

Group: Admin
Posts: 458
Joined: 26-August 04
Member No.: 1



Well, for better search engine optimization, I had created a AUTO-KEYWORD generating script. The following was the procedure that I followed..

Read the entire contents.
Converted it into String.
Removed the HTML Tags and the Whitespace characters
Filtered out only valid words without special characters ( only hypen permited )
Once all the filtering process was done, General words like "the", "a" etc was removed
After that, the string was converted into Array.
Using the callback of the following function, I calculated the Frequency of the words.
[array_count_values() returns an array using the values of the input array as keys and their frequency in input as values. ]
Then The Top 15 words were selected, converted into string and printed out.
They were embedded in BOLD characters automatically!

Now the **** thing that happened was, All the spelling mistakes and things like LOL and OK and hundreds of diffrent words started comming up at the top 20. Any suggestions in which I can make only top valid english meaningful words as the keyword.
Go to the top of the page
 
+Quote Post
dissipate
post Sep 11 2004, 10:30 AM
Post #2


Advanced Member
Group Icon

Group: [HOSTED]
Posts: 120
Joined: 2-September 04
Member No.: 100



hm sounds difficult. only two ways i can think of right now, first is to store all the "bad" words and get the engine to exclude them. second is to have a dictionary and get the engine to validate words based on whether they can be found in the dictionary.
Go to the top of the page
 
+Quote Post
vizskywalker
post Mar 7 2005, 08:25 AM
Post #3


Techno-Necromancer
Group Icon

Group: Members
Posts: 1,018
Joined: 13-January 05
From: The Net
Member No.: 2,127



Considering that most of the abbreviations that people use are three letters and there are few words that people will actually type into a search engine that are three letters, you could block all words that are three letters or shorter. This is not the best idea because people will want to search for things like "cat" or "dog." You can definitely block words that are two or one letter, because those are only inconsequential words. Yuo can also have a vowel/consonant checker, and if a word contains no vowels, no consonants, three or more vowels in a row, or three or more consonants in a row, block it. I may be able to help more if you explain to me what the purpose of diaplying the top 15 searched for words are, sometimes understanding the reasoning leads to interesting solutions.
Go to the top of the page
 
+Quote Post
OpaQue
post Mar 26 2005, 10:41 PM
Post #4


Administrator
Group Icon

Group: Admin
Posts: 458
Joined: 26-August 04
Member No.: 1



I dont use that script now..

The main reason is.. it puts a lot of load on the server.

Removing words below 4 characters is a good idea.

Comparing results with dictionary is not a good idea..puts load again :-(

Also, many a times, the word which is repeated is not necessarily topic relevant. I now use The topic titles as "Page titles" for and also as Headings. This title is again repeated on the page to improve keyword density smile.gif
Go to the top of the page
 
+Quote Post
vizskywalker
post Mar 26 2005, 10:46 PM
Post #5


Techno-Necromancer
Group Icon

Group: Members
Posts: 1,018
Joined: 13-January 05
From: The Net
Member No.: 2,127



So it was for the astahost and trap17 sites. I noticed the changes in a little bit, and I think they are good ones.
Go to the top of the page
 
+Quote Post
Muddyboy
post Mar 29 2005, 03:07 AM
Post #6


Member [ Level 1 ]
Group Icon

Group: Members
Posts: 33
Joined: 29-March 05
Member No.: 3,343



I would recommend excluding all words under 4 characters.

Its a good idea, but as you said it puts a lot of load on the server. Maybe using a table, each post is parse for the keywords, the added to the table if field exists the add to a count, if it doesnt insert a new row.

Then use the caching feature to cache the top words in the forum. That would mean the load would be minimal as the query is added to a query already in use smile.gif
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic

Collapse

> Similar Topics

Topics Topics


 



- Lo-Fi Version Time is now: 7th September 2008 - 02:53 AM