function getKeyWords($content){
$tokens=explode(" ", $content);
$keywords=array("num_words"=>1);
foreach ($tokens as $word){
if (!array_key_exists($word, $keywords)){
//echo "word not found! $word
";
$keywords[$word]=1;
}else
$keywords[$word]++;
$keywords['num_words']++;
}
return $keywords;
}
Basically it breaks apart the content into an array of tokens and checks for duplicates. As it stands this will grab anything with a space between as a token so things like 'alt="text' will be tagged as a token. Some modifications are needed for this to work on an HTML document like some fancy reg exp searches =). I will keep posting with any updates to this function...
No comments:
Post a Comment