May 31, 2007
Extracting Knowledge from Unstructured Data Sources
Still catching up on my WallStrip interviews… an excerpt from this one with Tim Wolters, CTO of Collective Intellect:
“We funnel people into what we call ‘topic nets,’ and topic nets are neighborhoods of individuals that talk about certain topics. So we’re able to assess over time what the credibility of sources are based on the past information they’ve posted about those topics, and then we create sort of a probability distribution model about how likely is this new post by this person to be really good on this topic so you should pay attention to it.”
Sounds fascinating but I’m not sure if it works; I’d need to see it in action.
Cat: | Time: 6:13 pm (utc+8)
May 31st, 2007 at 6:22 pm
Yeah, they use text-mining, it works alright. The trick is figuring out the correct set of weights to attach to blogs and other sources. You wouldn’t want to have Maoxian.com skew the market much, would you? :)
May 31st, 2007 at 6:37 pm
This could be no more fancy than rankings for Amazon book reviewers based on the percentage of favorable clicks they’ve received.
May 31st, 2007 at 8:18 pm
bjk: I’m sure they gauge clicks and readership too but some people’s opinions carry more weight than others.
I keep telling people the Fed should cut rates but they don’t listen to me. If Bernanke were to make a blog post hinting at it, the markets would react.
These systems can get complex really fast.
June 5th, 2007 at 5:36 am
Hey guys. Yup, our technology works. And Tom is exactly correct that the trick is in how you weight the various sources. We use the concept of “Maven Density” (how many important humans, not bots, link to you and speak highly of you in those links) and “Topic Density” (what % of time you talk about a topic”). Clicks and raw links are a very “90’s way of tracking things”. Of course, I’m their sales guys and not the CTO so you should probably talk to him.
-DK from Collective Intellect