DBPedia

From Freephile Wiki

The wp:DBpedia project aims to interconnect the world's open data sets. (There are other similar projects like wp:Freebase.)

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.

One main dataset they "translate" into RDF is the Wikipedia data. Although the freeform content of Wikipedia is not data, the extensive use of 'infoboxes' does give structure to the content (and visual styling too). Then this content can be mapped into actual ontologies (if they aren't already semantically mapped in Wikipedia itself [1].)

One such infobox is the infobox for software. It's used extensively on the English Wikipedia, for instance, it's used on the article for GIMP

DBpedia has already mapped the Infobox_software template so that all this data is contained in DBpedia.

Goal[edit]

Make Free Software Directory (FSD) part of the (Semantically) Linked web of Open Data; Linked Data for short, also called the LOD, or "LOD cloud"

InfoBox Approach[edit]

The FSD could incorporate the Infobox_software template so as to gain the ability to link the FSD dataset into the DBpedia. In other words, the FSD form should incorporate the Infobox_software template as a subset of the datapoints that go into an FSD listing.

Debian Packaging System[edit]

The Debian Package Tracking System produces RDF metadata and is already included in DBpedia. For example, here's a 'Turtle' representation of the GIMP package https://packages.qa.debian.org/g/gimp.ttl

If all Debian packages are not in the FSD, they could be added by consuming their RDF. If we incorporate their data systematically, then our data becomes easily updated and synchronized by bot.

WikiData[edit]

Wikidata, a project of the WikiMedia Foundation, is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data

The FSD should integrate with Wikidata. Not just in a reciprocal link manner, but in a real compatible data sharing way.

For example, here is the entry for the GIMP https://www.wikidata.org/wiki/Q8038 Notice that the WikiData for this entry contains a property for the corresponding link in the FSD

Also note that one of the ways that WikiData is composed and curated is through the use of bots like FLOSSbot. See https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/FLOSS FSD should make use of bots to patrol and edit the directory. A plugin to the FLOSSbot was used to browse the FSD, and add several hundred entries to WikiData.

Benefits[edit]

When the data is machine readable, you get much more varied ways of representing and consuming the data. Examples:

And, the FSD becomes part of the LOD Cloud https://en.wikipedia.org/wiki/File:LOD_Cloud_2014.svg

Being machine readable should make the directory more visible and useful.

It also puts emphasis on the quality of the data.

Status[edit]

What is the current status?

Questions[edit]

These are areas of ambiguity; or just notes for action/follow-up.

Do we use any semantic markup today in the forms?
Yes (See Template:Entry), however it appears to be a proprietary vocabulary meaning it's not a standard schema.
Is this an initiative that the FSF supports?
Unknown

Vocabulary / Schema[edit]

Is there a currently adopted schema for describing software?
At schema.org they show a schema for Software Application and variants.
Is that used in either Wikipedia's template, or in the final mapping at DBpedia?
Unknown
What is the schema used by Debian, and how does that map to our needs?
Short answer: they use ADMS.SW (Asset Description Metadata Schema for Software) See Olivier Berger's blog for more detail.

Requirements[edit]

The FSD would almost certainly have to be upgraded, to take advantage of the improved Semantic capabilities of MediaWiki since the currently installed version is 1.20 whereas the currently available version is 1.29.0-wmf.7 at the time of this writing. See wiki report


References[edit]

  1. Page Forms are often used to make it easy for users to enter the data, which is mapped into a semantic template.