Ontology

17 jan. 2013

Ontology reading group blog at hig.no
Paper read was NLP Techniques for Term Extraction and Ontology Population

Ontology is a sub topic we will touch upon in digital forensics and machine learning, so I attended this reading group on the subject to get an understanding of what it is. It's related to a previous presentation I attended last year on document image analysis where Dr. Umapada talked about his work trying to solve the problem of making machines translate written text to a format understood by digital machines.

Ontology on the other hand is derived from the philosophy studying reality, what is. To be more specific one could say it's about building a model of logical languages in terms of detecting meaning represented by worlds in "classes" and the relationships between them. The idea of an apple is related to fruits, food, an perhaps computers in some minds. These associations or links between concepts can be used to infer (deduce) the meaning in new contexts. It's important to separate between the concept of a thing and an instance of it. Those familiar with objective programming should know the difference: An apple is a concept. An instance of an apple, the red, apple lying on the table has properties (weight, color, shape, the atoms it consists of) that are unique to that instance. We still classify it as an apple.

Voice recognition software found on smart phones try to achieve the same goal of translating a human question in a natural language to something the computer can answer. These kind of systems are still very basic recognizing simple patterns. What we really want in ontology is to create a model of language so that a machine can "understand" the meaning and answer complex questions not predefined.

We know that terminology used in different fields of knowledge uses different terms. A priest and a medical doctor have learned different things in order to practice in their domain. Trying to "catch it all" is still considered too complex, so the focus is usually on a small subset (domain). If we compare this to how humans learn, we know that children start by learning the most common words we all have in common in order to function in daily life. I wonder if this still is the best way to tackle the problem on, as more complex concepts are build upon simper ones. Learning what a table and a spoon is might seem easy, and getting a grip of the concept of "an entity" might require something to build on.

One problem domain mentioned by one of the professors was how to make people use the same ontology, sharing the knowledge.
In the paper we had today the ontology was illustrated as a tree connecting words in an hierarchical structure. The most general concepts on the root. This template can be learned (machine learning) or created by "domain experts", humans doing the thinking. I have no belief in humans doing the work, we need to make software that allows the machine to "learn from experience" and create som kind of feedback mechanisms allowing the machine to know if what it did was correct or not. (Or to what degree it was correct).
One question I wonder about is whether a human can learn to read if he/she is not allowed to interact with anything or anyone...

In order to analyze the performance of how well the algorithm is mapping an ontology in a domain, a template is created by experts. A "gold standard". Then it is compared to the structure generated and a weighting based on how far each node is apart from the "true position" is calculated, as there is not binary right or wrong.

Similarity of words can be used to find synonyms, and the term "name entity recognition" was used. Did not quite get what it was..