Plenary debates of the European Parliament as Linked Open Data
The LinkedEP dataset

The Talk of Europe project curates Linked Open Data about the European Parliament (EP). The dataset covers all plenary debates held in the EP between July 1999 and January 2014, and biographical information about the members of parliament. The dataset includes: information on the monthly sessions of the EP, the agenda of debates, the spoken words and translations thereof in 23 languages; the speakers, their role and the country they represent; membership of national parties, European parties and commissions.


Links

LinkedEP contains links to GeoNames, DBpedia and the official RDF database of the Italian parliament. The European Union Data Portal provides links between Member of Parliament instances in LinkedEP and their named entity resource JRC-Names, available through their SPARQL endpoint.


Enrichments

In the second Talk of Europe Creative Camp, Adam Funk and Wim Peters (University of Sheffield) used their in-house text engineering infrastructure GATE to annotate the speeches with the concepts in them and their degree of occurrence across the proceedings. They also interconnected these concepts based on their semantic relationship. The resulting RDF (n-triples) is available for download here.


Origin of the data

To obtain data about the plenary debates, we generated RDF from the HTML pages published on the official website of the EP. We collaborated with the Political Mashup project by Maarten Marx at the University of Amsterdam, who provided scripts to scrape the HTML pages.

PoliticalMashup LogoADEP screenshot

The bibliographical data about members of parliament come from the Automated Database of the European Parliament of the University of Oslo [Høyland et al., 2009]. We translated this database to RDF, linked it to the debate data, and made it available as Linked Data as part of the LinkedEP dataset.

UPDATES & CHANGES

28 January 2016: We had a film made about the Talk of Europe project! Available on YouTube (5 min.)

26 January 2016: The example SPARQL queries below are not clickable. Click to see the results in the YASGUI SPARQL editor.

23 June 2015: We have had to reset the server but all is up and running again.

15 April 2015: The data are now marked up with provenance information and other metadata using the PROV, VoID and OMV vocabularies.

2 March 2015: Problems with incorrect language tags fixed. On http://europarl.europa.eu/, speeches are sometimes displayed in other languages than the user-selected language. This happens when translations are not available. Until now, this problem persisted in LinkedEP. In the current version, we have fixed the majority of the incorrect language tags of speeches, although some remain.

18 Feb 2015: The dataset now covers the complete fifth, sixth, and seventh term (1999-2014) of the European Parliament. Note that the declared prefixes have changed, see the updated model depiction and example queries below.


Access to the data We provide access in several ways:
  1. For full-text search through the entire LinkedEP dataset, a search box is provided in the upper right corner of this page.
  2. Through a SPARQL endpoint at http://linkedpolitics.ops.few.vu.nl/sparql/
  3. Using the browse and search options of ClioPatria. ClioPatria is the semantic web server that we use to publish our data. The displayed menu bar at the top of this page is the ClioPatria interface. To browse the graphs, go to the Graphs option under the Places tab; to query, go to the Query tab and choose your preferred SPARQL query interface.
  4. By downloading the data in turtle or RDF/XML, again through Cliopatria. The data are divided over the following graphs:
Alternatively, the data is available as triple patterns fragments at http://data.linkeddatafragments.org/linkedpolitics (Thanks to Ruben Verborgh).

The concepts of 'Linked Dataset' and 'named graph' are registered in the CLARIN Component Registry . The CMDI file describing these resources can be found here.


Data model

The schema of classes and properties used in the LinkedEP dataset is displayed in the figure below. For a description see here.

RDF model Talk of Europe


Example queries RDF can be queried using the SPARQL query language. This data portal implements the latest SPARQL version, 1.1. To search within a single graph (e.g. in the German graph), use the GRAPH keyword:

SELECT *
WHERE {
GRAPH <http://purl.org/linkedpolitics/German>{
   ?s ?p ?o.
}} LIMIT 10


Example query 1: Retrieve a sample of 10 texts in English translation from speeches dating back between 6 May 2009 and 6 May 2010, in chronological order.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX lp: <http://purl.org/linkedpolitics/>
PREFIX lpv: <http://purl.org/linkedpolitics/vocabulary/>
PREFIX xml: <http://www.w3.org/XML/1998/namespace>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?text
WHERE {
   ?sessionday dcterms:hasPart ?agendaitem.
   ?sessionday dc:date ?date.
   ?agendaitem dcterms:hasPart ?speech.
   ?agendaitem lpv:number ?agendaitemnr.
   ?speech lpv:number ?speechnr.
   ?speech lpv:text ?text.

   FILTER ( ?date >= "2009-05-06"^^xsd:date && ?date <= "2010-05-06"^^xsd:date )
   FILTER(langMatches(lang(?text), "en"))
} ORDER BY ?date ?agendaitemnr ?speechnr LIMIT 10


Example query 2: For a particular agenda item (the fourth item of 16 December 2010), find the frequency distribution of the speaking slots over the EU parties of the speakers involved.


SELECT ?partyname (COUNT(DISTINCT ?speech) AS ?speechno)
WHERE {
   <http://purl.org/linkedpolitics/eu/plenary/2010-12-16_AgendaItem_4> dcterms:hasPart ?speech.
   ?speech lpv:spokenAs ?function.
   ?function lpv:institution ?party.
   ?party rdf:type lpv:EUParty.
   ?party lpv:acronym ?partyname.
} GROUP BY ?partyname


Example query 3: Count the agenda items in which at least one MEP from France spoke out.

SELECT (COUNT (DISTINCT ?ai) as ?count)
WHERE {
   ?ai rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/AgendaItem
   ?ai dcterms:hasPart ?speech.
   ?speech lpv:speaker ?speaker.
   ?speaker lpv:countryOfRepresentation ?country.
   ?country rdfs:label ?label.  
   filter(?label="France"@en)  
}


Example query 4:
Get the transcript of (a random selection of 10) speeches that contain the word "agriculture".

SELECT DISTINCT ?text
WHERE {
   ?speech lpv:text ?text.
   FILTER regex(str(?text), "agriculture").  
} LIMIT 10


Example query 5:
Get the number of speeches held in each language.

SELECT DISTINCT ?language (COUNT(DISTINCT ?speech) AS ?speechno)
WHERE {
   ?speech dc:language ?language.
} GROUP BY ?language


License & citations The LinkedEP dataset is available under a CC BY 4.0 license. To acknowledge us, please cite us as: A.E. van Aggelen, L. Hollink. Plenary debates of the European Parliament as Linked Open Data. http://www.talkofeurope.eu/data/. Website accessed on [fill in date]. The choice to use a license was motivated as follows. Our dataset contains not just public data, but also external data, notably this database from the University of Oslo. No licensing information is provided by the makers, and we wish not to decide in their place to abandon a license. For us, the creators of the Linkedpolitics dataset, the references allow us to justify the resources spent on this project.
References

Bjørn Høyland, Indraneel Sircar, Simon Hix. An Automated Database of the European Parliament. European Union Politics, 2009, Vol 10, Issue 1, 143 -- 152.