Skip to content Skip to footer

Query (use) over resources

STATUS: READY FOR REVIEW

Research is formalized curiosity. It is poking and prying with a purpose. Zora Neale Hurston

Querying FAIR resources is the process of asking structured questions to retrieve data that is easy to find, access, and reuse. When resources follow FAIR principles, queries become more efficient because the data is well-organized and clearly described. In this step, we will explore the tools and platforms that enable effective querying of FAIR-compliant resources.

Short description

When machine-readable (meta)data is exposed (see Metroline step: Transform and Expose FAIR metadata), it becomes an accessible FAIR resource. In other words, a dataset or metadata collection which can be found, queried, and reused. Such resources are often hosted or described in catalogues and/or via FAIR Data Points, which expose (meta)data in a standardised way. This ability to discover and reuse data using the metadata resources is what makes FAIR so powerful: it turns isolated data into actionable knowledge for science.

These catalogues offer different levels of interaction:

Query results can be displayed in formats like HTML, JSON, XML or CSV, depending on the tool or user preference.

This Metroline page focuses on SPARQL as it is the standard query language for RDF-based resources, which are the foundation of the semantic web and linked data. These concepts aim to make data interoperable and machine-readable across domains, enabling powerful integration and reuse. SPARQL’s standardised syntax and ability to retrieve both metadata and data from diverse sources make it uniquely suited for querying structured web resources. While SPARQL is prominent in the semantic web domain, there are many other query languages tailored to different data models, research fields, and application needs (see table below).

Query Language Purpose Used In Example Repositories
Structured Query Language (SQL) Querying relational databases Tabular data, metadata Dryad, Dataverse, OpenAIRE
SPARQL (for RDF/Linked Data) Querying semantic web data Ontologies, linked datasets UniProt, OpenPHACTS, ELIXIR, Bio2RDF
GraphQL Flexible API queries Nested data structures EMBL-EBI
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Metadata harvesting Repository interoperability Zenodo, Figshare, institutional repositories
JSONPath / XPath Extracting data from JSON/XML API responses, metadata Ensembl, NIH
Cypher Querying graph databases Networked biological data Neo4j-based bioinformatics platforms

Why is this step important

Querying FAIR data is important because it is how you actually use the data. FAIR data is only valuable if it can be discovered, filtered, combined and analysed and querying is how this is made possible.

  • Find exactly what you need. General search and filtering allow you to locate datasets or specific information quickly, without manually checking every record.
  • Explore and understand data. Browsing and faceted search help you see what datasets exist, what they describe and how they are structured.
  • Combine and reuse information efficiently. Advanced queries (e.g. SPARQL) let you combine and analyse data from multiple sources without moving large datasets.

How to

This how-to gives information about querying FAIR resources, starting with simple browsing and filtering, moving to visual query tools, and advancing to federated multi-source querying with SPARQL.

Step 1 Start with browsing and filtering

The easiest way to explore FAIR data is through a catalogue or FAIR Data Point interface, such as the National Health Data Portal, FAIRsharing.org or Local FAIR Data Points (see Metroline Step: Transform and expose FAIR (meta)data to learn more about FAIR Data Points).

Here you can:

  • Browse datasets and read metadata (description, owner, access conditions).
  • Search by keywords, e.g. “muscular dystrophy” or “metabolomics”.
  • Filter results by categories such as data type, disease, measurement, or year.

This helps you discover what exists before performing any (complex) queries.

🧪 Example step 1. Browse Wikidata for information about Inflammatory bowel disease

To begin, we search for “Inflammatory bowel disease” in the search bar on Wikidata.org. This leads us to the item Q917447 which represents IBD in Wikidata. This item confirms that IBD is a recognised disease entity with structured metadata (such as classifications, related conditions, and identifiers) providing a solid starting point for further data exploration. We gained insight into what the catalogue contains, what metadata is available, and how we might formulate more specific queries to retrieve related information.

Step 2 - Use visual or guided tools to construct queries

Some linked-data portals offer visual query builders that help users construct SPARQL queries without needing to learn the syntax. These tools automatically translate your selections (such as ticking checkboxes or choosing from dropdown menus) into SPARQL and run the query in the background. Such as SPARQL Query builder or Wikidata Query Builder.

The results are typically displayed in a table or graph, making it easy to explore data without writing any code. This approach is ideal for users who want to go beyond simple browsing but aren’t yet ready to write SPARQL manually.

🧪 Example step 2. Wikidata query builder – finding diseases associated with IL23R gene

We want to continue our exploration of inflammatory bowel disease. In our first exploration of the Wikipedia page, we saw that IBD has genetic association to the gene IL23R. We want to query what other items this gene has genetic association to. In the Wikidata query builder, we would then put “genetic association” under Property and “IL23R” as value and run the query. We get 6 results, as seen in the below table.

item itemLabel
wd:Q179945 psoriasis
wd:Q917447 inflammatory bowel diseases
wd:Q32144272 inflammatory bowel disease 17
wd:Q52849 ankylosing spondylitis
wd:Q1472 Crohn’s disease
wd:Q1477 ulcerative colitis

Step 3 - Access the SPARQL endpoint to write and refine SPARQL queries

Note: The following steps are meant specifically for querying catalogues and repositories with SPARQL endpoint. If you are trying to query a catalogue based on another querying approach (e.g. SQL), these may not be directly applicable.

When you need more flexibility, connect directly to the SPARQL endpoint. Depending on the catalog you can use:

Try simple queries first, such as listing datasets or retrieving specific metadata fields. As you become more comfortable, you can write more complex queries that join related information, apply filters, or aggregate data using SPARQL syntax.

🧪 Example step 3. : Use Wikidata query service

We saw that the gene IL23R is associated with the disease psoriasis. Now, let’s take it a step further and run a more complex SPARQL query using the Wikidata query service to find which genes are associated with both Inflammatory Bowel Disease (IBD) and psoriasis.

See and run the query yourself at this link: https://w.wiki/FsqH

The results show all genes linked to both psoriasis and an IBD condition. For each gene, you can also see the specific IBD disease it is associated with (such as Crohn’s disease or ulcerative colitis) providing a richer context for analysis.

Step 4 - Combine multiple FAIR sources (federated queries)

When your question spans several data sources, use federated querying. This allows you to connect endpoints across registries, institutions, or countries, combining data without moving it.

In SPARQL, federated queries are implemented using the SERVICE keyword, which lets you call another SPARQL endpoint within your query. This enables seamless integration of data across different FAIR sources. See documentation on SPARQL federated querying here.

Step 5 - Export and reuse query results

Query results can be downloaded in multiple formats (e.g. CSV, JSON, XML) for reuse in data analysis tools like Python, R, or Excel. Depending on the query language and platform, it may also be possible to integrate queries directly into your workflow (for example, by calling SPARQL endpoints from Python or R scripts) so that results flow into subsequent analysis steps without the need to download files manually.

For human users, many catalog interfaces also provide built-in visualisation options, allowing results to be displayed as tables, graphs, or maps directly in the browser without additional tools.

🧪 Example step 5. Visualisation of query results

In Wikidata, you can visualise query results in different ways by switching between different result views. Try to run the example query from above https://w.wiki/FsqH and experiment with the various visualisation and export options.

Expertise requirements for this step

To successfully perform this step, you may need help from the following experts:

  • Researcher/domain expert. Uses domain knowledge to formulate queries and interpret results.
  • Data scientist. Executes queries, processes results and handles federated queries.
  • Semantic expert. Ensures correct use of metadata, vocabularies and ontologies for queries.

See Metroline Step: Build the Team for more information.

Practical examples from the community

SPHN Data Exploration and Analysis System (DEAS).
DEAS is a cross-hospital federated query tool developed by the Swiss Personalized Health Network (SPHN) to replace the previous Federated Query System. It enables researchers to securely query aggregated clinical data from multiple Swiss university hospitals without moving patient-level data.

Training

Suggestions

This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.