Plenary debates of the European Parliament as Linked Open Data

The LinkedEP dataset

The Talk of Europe project curates Linked Open Data about the European Parliament (EP). The dataset covers all plenary debates held in the EP between July 1999 and July 2017, and biographical information about the members of parliament. The dataset includes: information on the monthly sessions of the EP, the agenda of debates, the spoken words and translations thereof in 23 languages; the speakers, their role and the country they represent; membership of national parties, European parties and commissions.


Links

LinkedEP contains links to GeoNames, DBpedia and the official RDF database of the Italian parliament. The European Union Data Portal provides links between Member of Parliament instances in LinkedEP and their named entity resource JRC-Names, available through their SPARQL endpoint.


Enrichments

In the second Talk of Europe Creative Camp, Adam Funk and Wim Peters (University of Sheffield) used their in-house text engineering infrastructure GATE to annotate the speeches with the concepts in them and their degree of occurrence across the proceedings. They also interconnected these concepts based on their semantic relationship. The resulting RDF (n-triples) is available for download here.


Origin of the data

To obtain data about the plenary debates, we generated RDF from the HTML pages published on the official website of the EP. We collaborated with the Political Mashup project by Maarten Marx at the University of Amsterdam, who provided scripts to scrape the HTML pages.

PoliticalMashup LogoADEP screenshot

The bibliographical data about members of parliament come from the Automated Database of the European Parliament of the University of Oslo [Høyland et al., 2009]. We translated this database to RDF, linked it to the debate data, and made it available as Linked Data as part of the LinkedEP dataset.

UPDATES & CHANGES

July 2017: Major update. Added data up to July 2017, fixed many known and reported bugs. Note that all lpv:number triples have been removed as the numbering was found to be influenced by small changes in the source data.

6 December 2016: The publication about the Talk of Europe dataset has been published: Astrid van Aggelen, Laura Hollink, Max Kemman, Martijn Kleppe, Henri Beunders. The debates of the European Parliament as Linked Open Data. Semantic Web 8(2), pp. 271-281, 2017, IOS Press.

28 January 2016: We had a film made about the Talk of Europe project! Available on YouTube (5 min.)

26 January 2016: The example SPARQL queries below are not clickable. Click to see the results in the YASGUI SPARQL editor.

23 June 2015: We have had to reset the server but all is up and running again.

15 April 2015: The data are now marked up with provenance information and other metadata using the PROV, VoID and OMV vocabularies.

2 March 2015: Problems with incorrect language tags fixed. On http://europarl.europa.eu/, speeches are sometimes displayed in other languages than the user-selected language. This happens when translations are not available. Until now, this problem persisted in LinkedEP. In the current version, we have fixed the majority of the incorrect language tags of speeches, although some remain.

18 Feb 2015: The dataset now covers the complete fifth, sixth, and seventh term (1999-2014) of the European Parliament. Note that the declared prefixes have changed, see the updated model depiction and example queries below.


Access to the data

We provide access in several ways:
  1. Through HTTP-resolvable URIs, see void:exampleResource for some typical example entry points.
  2. For full-text search through the entire LinkedEP dataset, a search box is provided in the upper right corner of this page.
  3. Through a SPARQL endpoint at http://linkedpolitics.ops.few.vu.nl/sparql/
  4. Using the browse and search options of ClioPatria. ClioPatria is the semantic web server that we use to publish our data. The displayed menu bar at the top of the query page is the ClioPatria interface. To browse the graphs, go to the Graphs option under the Places tab; to query, go to the Query tab and choose your preferred SPARQL query interface.
  5. By downloading the data in turtle (2.5Gb, gzipped tar file).
Alternatively, the data is available as triple patterns fragments at http://data.linkeddatafragments.org/linkedpolitics (Thanks to Ruben Verborgh).

The concepts of 'Linked Dataset' and 'named graph' are registered in the CLARIN Component Registry . The CMDI file describing these resources can be found here.


Data model

The schema of classes and properties used in the LinkedEP dataset is displayed in the figure below. For a description see here.

RDF model Talk of Europe


Example queries

RDF can be queried using the SPARQL query language. This data portal implements SPARQL version 1.1.

Example query 1:

Select max 100 English spoken texts in a given date range, ordered by date, agenda item and speech. (click to run)
SELECT ?date ?speechnr ?text 
WHERE { 
   ?sessionday rdf:type lpv_eu:SessionDay .
   ?sessionday dcterms:date ?date.	
   ?sessionday dcterms:hasPart ?agendaitem.
   ?agendaitem dcterms:hasPart ?speech.
  
   ?speech lpv:docno ?speechnr.
   ?speech lpv:spokenText ?text.
   FILTER ( ?date >= "2009-05-06"^^xsd:date && ?date <= "2010-05-06"^^xsd:date ) 
   FILTER(langMatches(lang(?text), "en"))
  
  } ORDER BY ?date ?speechnr LIMIT 100

Example query 2:

For a particular agenda item (the fourth item of 16 December 2010), find the frequency distribution of the speaking slots over the EU parties of the speakers involved. (click to run)
SELECT ?partyname (COUNT(DISTINCT ?speech) AS ?speechno)
WHERE {
   <http://purl.org/linkedpolitics/eu/plenary/2010-12-16_AgendaItem_4> dcterms:hasPart ?speech.
   ?speech lpv:spokenAs ?function.
   ?function lpv:institution ?party.
   ?party rdf:type lpv:EUParty.
   ?party lpv:acronym ?partyname.
} GROUP BY ?partyname

Example query 3:

Count the agenda items in which at least one MEP from France spoke out. (click to run)
SELECT (COUNT (DISTINCT ?ai) as ?count)
WHERE {
   ?ai rdf:type .
   ?ai dcterms:hasPart ?speech.
   ?speech lpv:speaker ?speaker.
   ?speaker lpv:countryOfRepresentation ?country.
   ?country rdfs:label ?label.
   filter(?label="France"@en)
}

Example query 4:

Get the transcript of (an arbitrary selection of 10) speeches that contain the word "agriculture". This query uses efficient indexed text search by deploying ClioPatria's Text Property Functions (tpf) SPARQL extensions. (click to run)
SELECT ?speech ?text
WHERE {
   ?speech tpf:match (lpv:text 'agriculture' ?text)
} LIMIT 10

Example query 5:

Get the number of speeches held in each language (counting only speeches of which the languages was indicated). (click to run)
SELECT DISTINCT ?language (COUNT(DISTINCT ?speech) AS ?speechno)
WHERE {
   ?speech dcterms:language ?language .
   ?speech a lpv_eu:Speech .
} GROUP BY ?language

Example query 6:

Get the 10 most recent agenda items annotated with the Eurovoc SKOS concept "Syria". (click to run)
SELECT ?date ?agendaItem
WHERE {
  ?concept skos:prefLabel "Syria"@en .
  ?annot dcterms:subject ?concept .
  ?agendaItem lpv:topicAnnotation ?annot .
  ?agendaItem dcterms:date ?date .
} ORDER BY DESC(?date)

License & citations

The LinkedEP dataset is available under a CC BY 4.0 license. To acknowledge us, please cite us as: A.E. van Aggelen, L. Hollink. Plenary debates of the European Parliament as Linked Open Data. http://www.talkofeurope.eu/data/. Website accessed on [fill in date]. The choice to use a license was motivated as follows. Our dataset contains not just public data, but also external data, notably this database from the University of Oslo. No licensing information is provided by the makers, and we wish not to decide in their place to abandon a license. For us, the creators of the Linkedpolitics dataset, the references allow us to justify the resources spent on this project.

References

Bjørn Høyland, Indraneel Sircar, Simon Hix. An Automated Database of the European Parliament. European Union Politics, 2009, Vol 10, Issue 1, 143 -- 152.