Matthew Young

M.S. Computer Engineering

Knowledge Representation and Reasoning Group
University of Illinois, Urbana-Champaign

Extracting Knowledge from the Wikipedia

The Wikipedia is, arguably, the single largest source of knowledge on the planet. Commonsense database efforts by the AI community, such as Cyc and OpenMind, simply can't compare in scope and magnitude to this little collaborative project that has become a cultural phenomenon.

As an AI researcher, it is very tempting to attempt to use the Wikipedia as a commonsense knowledge base for commonsense reasoning, or as an expert for training algorithms. Unfortunately, unlike Cyc and OpenMind, the knowledge in the Wikipedia is somewhat difficult to work with. It is all encoded in natural language, there is no logical structure defining heirarchies or connections between different topics, and what structure does exist is loosely defined and minimally implemented at best.

This page is the home of my attempts to overcome these barriers and extract useful, machine-readable information from the Wikipedia.

X
This setting lets you choose whether you want the knowledge extractor to try to do some more work with its results and assign each one a "relevance score" in an attempt to give better results.