Ontologies

3.1. Definition

Many definitions of ontology are available in related works. In our context, we have chosen this one: "an ontology provides the common vocabulary of a specific domain and defines, more or less formally, terms meaning and some of their relationships" (Gomez-Perez, 1999). From a Knowledge Engineering point of view, an ontology is a set of concepts represented by a label, and a set of relations connecting these concepts. Major relationships are the "isa" relationship, the "partOf" relationship...

We have choosen XML to store our results, and our ontologies. The SHOE DTD seems convenient for us but it lacks some important information we will present later. So, we have extended this DTD to add our information. Our ontology DTD is described in the appendix 1. Currently, we have only managed the "isa" relationship but other ones can be used (and have to be used). For example, an extract of the Thoth's University ontology is shown in the appendix 2.

A major problem that concerns ontologies is: where to find them and how to build them? First, there exist ontologies for specialized field of interest like in the SHOE project, in the KA² project, in the Knowledge Sharing Effort public Library (KSEL)... In a same way, we can find general ontologies in WWW's indexers like AltaVista or Yahoo! (Labrou and Finin, 1999). Second, a subset of a thesaurus can be used and can be extended to build ontologies. For (Martin, 1995), such a thesaurus is a wide linguistic ontology. Users should rarely have to add intermediate types but rather specialized precise types of WordNet in order to express the shades of meanings needed for the application (the disambiguation process described in the next section is then easiest). Finally, we can use tools (manual, semi-automatic or automatic) to build ontologies according to a set of typical data (KA² project).

3.2. Disambiguating label of concepts

According to Knowledge Engineering researches, a concept is considered unambiguous and unique. However, in usual ontologies, a single label represents each concept. This choice is valid for the most part of IA processes but too superficial for linguistic based ones. Indeed, in our context a concept label is used as natural language entity which is a term. However, such a term is basically ambiguous. For example, if we find the term "chair" as a concept label, either this is the "armchair" or this is the faculty member of a University. Therefore, we decide to disambiguate labels used to represent concepts of ontologies using hierarchical heuristics based on the ontology and the WordNet thesaurus (Miller, 1990). However, after our process, some labels of concepts may still have more than one related sense (generally between two and four). Since the ontology represents often knowledge of a specialized domain, in most cases the process cannot be completely automatic. Therefore, the user has to decide what sense to take when it's necessary.

A thesaurus can be viewed as a linguistic ontology (Guarino et al., 1999; Borgo et al., 1997). In WordNet, a "concept", called a sense, is defined with a single set of synonyms, called a synset. Therefore, concept in a thesaurus is unambiguous. For example, the first sense (sense 1) of the term "Person" is the synset {<"individual#1","person#1", "human#1", "mortal#1", "soul#2", "somebody#1", "someone#1">} (for each term the own sense number is given after the '#' symbol). We use WordNet because it is free available for research purpose and it is a broad coverage linguistic ontology (70 000 nodes). However, it does not include cross-part-of speech semantic relationship and it includes too much fine-grained sense-distinctions and lacks domain information (O'Hara et al., 1998) for text retrieval.

Our main goal is to associate to each concept of the ontology the right synset of the thesaurus (figure 1). An ontology concept has a label, which is only one of its possible lexical forms. This lexical form helps the process to select corresponding synsets into the thesaurus (each term has often several meanings). Then, among these selected synsets, we only choose the more relevant ones. Accordingly, for each concept label of the ontology, the thesaurus provides several candidate synsets related to this concept. To select the relevant synset, we try to find if the synset context according to the hypernym relationship in the thesaurus is similar to the concept context according to the "isa" relationship in the ontology. During this process, we measure the matching degree between a synset and a concept of the ontology. It is evaluated taking the result of the matching process into account, namely the number of related concepts, the type of relationship, the depth of the different relationship... Therefore, our ontology is a terminologically oriented ontology (Martin, 1999) to ease rapid and simple knowledge representation, management, and use.

Figure 1. The disambiguation process

Then, after this disambiguation process, we obtain a XML file. An extract of this file is shown in figure 2 (the matching degree is the "convenience" attribute).

Figure 2. An ontology after the automatic step of the disambiguation process

Remark: This entire process (and the final matching in section 5) is implemented using the Java Expert System Shell called Jess (Friedman-Hil, 2000), which used a CLIPS like language. Jess is a portable, extensible, fast reasoning engine written in Sun Microsystem's Java language. It was developed at Sandia National Laboratories (USA) and is distributed free of charge for academic use. Jess has a large, worldwide user community. Jess is commonly used in agent research, as it provides a convenient way to integrate complex reasoning capabilities into Java-based software. Jess users have developed many software extensions, including ones for fuzzy logic, database access, blackboards, and language understanding. Jess has been used to deploy many real systems, including some based on a multi-agent paradigm.

3.4. Results

We applied this disambiguation process to two versions of the University ontology:

the SHOE University ontology (74 concepts, 195 senses i.e. 2.26 sense by concepts),
the Thoth's University ontology (79 concepts, 199 sense i.e. 2.52 senses by concepts) see Appendix 2.

To evaluate this process, we classify concepts according to the level of disambiguation. The table 5 shows the rate of concepts concerned by each level. The first level concerns concepts having their right label sense not selected. The second level concerns concepts having their right label sense selected. The third level concerns concepts having the right label sense, which belongs to the best ones considering the matching degree. The fourth level concerns concepts, which have the right label sense with the unique best matching degree. Finally, a concept in the fifth level is a concept having only their right label sense selected.

Table 5. results of the desambiguation process
Level	The right label sense...	SHOE's University ontology	Thoth's University ontology
0	...is not selected.	1.35%	0%
1	...is selected.	98.65%	100%
2	...belongs the best ones.	97.30%	100%
3	...is the best one.	74.32%	96.20%
4	...is the only selected sense.	62.16%	67.09%

Regarding these results, even if the disambiguation process is not complete for a given concept, the user has a good chance to find the right label sense first (or among the first ones). This chance is increased if the ontology is convenient for the thesaurus. Indeed, the Thoth's ontology is built according to a substantial modification of the SHOE's one according to WordNet. As a result, the disambiguation process is more efficient on the Thoth's ontology than the SHOE's one.

Table 6 gives several samples of concepts for each level. In this table, a concept is described by its label, its right label sense (after the '#' sign), the number of candidate label senses at the end of the disambiguation process in comparison of the number of all its possible label senses, and a list of each label sense number and its matching degree.

Table 6. Samples of concepts in each level of desambiguation
Level	SHOE's University ontology	Thoth's University ontology
0	Agent #1 - 4/5 (5/0.3 2/0.3 4/0.3 3/0.25)	Empty
1	Document #1 - 4/4 (4/0.25 1/0.17 3/0.25 2/0.17)	Thesis #2 - 2/2 (1/0.38 2/0.38) School #1 - 3/7 (6/0.33 4/0.95 1/0.95)
2	Dean #1 - 3/3 (3/0.75 1/0.75 2/0.75)	The same as level 1
3	Work #1 - 5/7 (3/0.25 1/0.38 5/0.25 6/0.25 2/0.25)	Magazine #2 - 5/6 (6/.25 4/.25 1/.25 5/.25 2/.75) Information #1 - 2/5 (1/0.95 5/0.38)
4	Publication #1 - 1/3 (1/0.38) EmailAddress #1 - 1/1 (1/1.0)	Professor #1 - 1/1 (1/1.0) Address #2 - 1/7 (2/0.95)

This process allows the user to verify and to correct the ontology too. Looking to the matching result help us to correct the ontology according to the referring thesaurus. Table 5 shows that manipulating ontology can improve the disambiguation process. These manipulations can be: adding or removing concepts, changing concepts labels or descriptions...