Newsroom

The latest version of leading knowledge database, Yago

Yago is one of the leading large general-purpose knowledge bases. It was developed by scientists at the Max Planck Institute for Informatics  (Saarbrücken, Germany) and Télécom Paris researchers – including the initiator of the project, Fabian Suchanek. The researchers have now unveiled a new version of Yago, which is based on Wikidata and schema.org.  The data is logically consistent, which allows the machine to reason on it.

Yago (Yet Another Great Ontology) is a knowledge base, in other words a collection of data about the real world that can be understood by a computer.  The knowledge base contains entities, i.e., objects drawn from real life such as cities, organizations, scientists, etc. Yago contains a total of 50 million entities and two billion facts about them. For instance, Yago knows the location of a city, or the country of which a politician is president. This data takes the form of a graph, where the nodes are entities, and the edges are the links between them.

Effective data processing and categorization of information

Yago has always distinguished itself from the other knowledge bases by its focus on precision and clarity. Yago uses semantic constraints to filter out any data item that is inconsistent with the other items. In addition, Yago classes all entities into a taxonomy, i.e., a hierarchical structure of categories.

Yago is well received by the scientific community. In 2018, Yago’s co-designers, Fabian Suchanek (professor at Télécom Paris), Gjergji Kasneci (SCHUFA Holding AG), and Gerhard Weikum (Max Planck Institute for Informatics) received the Test of Time Award of The Web conference, the most prestigious conference in this field. They also won the  Prominent Paper Award, of the Artificial Intelligence journal, the leading journal in AI.  The data of Yago is freely available on the Internet, and the code was made open source in 2018.

YAGO 4: far more information and a more detailed taxonomy

Yago’s new version is now entirely based on two new resources:  Wikidata and schema.org. Wikidata is an excellent source for entities and schema.org is one of the most recognized taxonomies on the Web. Yago 4 combines the best of both worlds, by taking the entities from Wikidata and the taxonomy from schema.org. In addition, the semantic constraints of Yago make the data so clean that it can be used for automated reasoning.  With the release of version 4, the knowledge base was transferred to a new website hosted by Télécom Paris. The accompanying scientific article was published at ESWC 2020, the foremost semantic web conference in Europe