Storing and querying triples using Apache Rya

Table of contents
Reading Time: 3 minutes

Apache Rya is a tool for storing and querying triples at scale. It is not used much and consequently it is poorly documented and it is difficult to get started using it. This blog post is intended to give people the information they need to get started with Rya. In order to run Apache Rya you will need Accumulo, Hadoop, Zookeeper, and of course Rya itself. We will also be using Tomcat, though Rya can be run without Tomcat. I used accumulo-1.9.3, hadoop-3.1.2, zookeeper-3.4.14, rya-project-3.2.12-incubating, and tomcat-8.5.40. I am using Ubuntu 18.04. I don’t know what other versions of Hadoop, Zookeeper, and Tomcat work with Rya, but accumulo-2.0.0-alpha-2 does NOT work with Rya. I will assume that you already have Hadoop, Zookeeper, and Tomcat installed.

Run Zookeeper and HDFS

Zookeeper and HDFS need to be running in order for Rya to work. To start Zookeeper run the following command

$ZOOKEEPER_HOME/bin/zkServer.sh start

To run HDFS execute the following commands

cd $HADOOP_HOME
bin/hdfs namenode -format
sbin/start-dfs.sh
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/$USERNAME
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input

If you have run hdfs previously than you might only need to execute sbin/start-dfs.sh.

Install Accumulo

You can get the installation instructions for Accumulo here, but I will go into more detail. Download the version 1.9.3 binary here.

Type the following

cd <install_location>
tar xzf accumulo-1.9.3-bin.tar.gz
cd accumulo-1.9.3
./bin/build_native_library.sh
./bin/bootstrap_config.sh

You will then be asked some questions about your desired configuration. I chose 3) 3GB, 2) Native, and 5) Hadoop 3. Additional configurations will need to be set manually. Change the first property in conf/accumulo-site.xml to

<property>
   <name>instance.volumes</name>
   <value>hdfs://127.0.0.1:9000/accumulo</value>
   <description>comma separated list URIs for volumes. example: hdfs://localhost:9000/accumulo</description>
</property>

You might need to set the value that is appropriate for you. Change the next property similarly.

<property>
   <name>instance.zookeeper.host</name>
   <value>127.0.0.1:2181</value>
   <description>comma separated list of zookeeper servers</description>
</property>

Change instance.secret, which is the next property to whatever you want. Also change trace.token.property.password, which is farther down. Add the following line to the last property

$HADOOP_PREFIX/share/hadoop/common/lib/[^.].*.jar

Set HADOOP_PREFIX, JAVA_HOME, and ZOOKEEPER_HOME in conf/accumulo-env.sh. I also set HADOOP_HOME, ZOOKEEPER_HOME, and ACCUMULO_HOME in .bashrc. You should now be able to run the following command

$ACCUMULO_HOME/bin/accumulo init

It will ask you for an instance name a password. Use any instance name and password you like. Next start the Accumulo master, tserver, monitor, and gc.

$ACCUMULO_HOME/bin/accumulo master
$ACCUMULO_HOME/bin/accumulo tserver
$ACCUMULO_HOME/bin/accumulo monitor
$ACCUMULO_HOME/bin/accumulo gc

Install Rya

You can find the quickstart for Rya at . Rya can be downloaded from here. To install Rya execute the following commands

unzip rya-project-3.2.12-incubating-source-release.zip
cd rya-project-3.2.12-incubating
mvn clean install

If Rya was successfully installed there will be a war file at web/web.rya/target/web.rya.war. Copy the contents of $RYA_HOME/web/web.rya/target to Tomcat’s webapp directory.

cp -r $RYA_HOME/web/web.rya/target/* $TOMCAT_HOME/webapps

You can also obtain openrdf-sesame.war and openrdf-workbench.war and put them in $TOMCAT_HOME/webapps. Next create a file named environment.properties in $RYA_HOME with the following contents

instance.name=<instance_name>
instance.zk=localhost:2181
instance.username=root
instance.password=<instance_password>
rya.tableprefix=rya_
rya.displayqueryplan=true

Replace <instance_name> and <instance_password> with the instance name and instance password you entered when you ran accumulo init. You might also want to change instance.zk. You need to tell Tomcat where it can find this file. In $TOMCAT_HOME/conf/catalina.properties set shared.loader=”$RYA_HOME/environment.properties”. Now start Tomcat.

$TOMCAT_HOME/bin/startup.sh

Load Triples

You can use the following code to load triples into Rya

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;

public class LoadDataServletRun {

    public static void main(String[] args) {
        try {
            String inputFile=args[0];
            String format=args[1];
     
            final InputStream resourceAsStream = Thread.currentThread().getContextClassLoader()
            .getResourceAsStream(inputFile);
            URL url = new URL("http://localhost:8080/web.rya/loadrdf" +
                    "?format=" + format + "");
            URLConnection urlConnection = url.openConnection();
            urlConnection.setRequestProperty("Content-Type", "text/plain");
            urlConnection.setDoOutput(true);

            final OutputStream os = urlConnection.getOutputStream();

            int read;
            while((read = resourceAsStream.read()) >= 0) {
                os.write(read);
            }
            resourceAsStream.close();
            os.flush();
            BufferedReader rd = new BufferedReader(new InputStreamReader(
                    urlConnection.getInputStream()));
            String line;
            while ((line = rd.readLine()) != null) {
                System.out.println(line);
            }
            rd.close();
            os.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

As an example you could put the following contents in src/main/resources/ntriples.ntrips

<http://mynamespace/ProductType1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://mynamespace/ProductType> .
<http://mynamespace/ProductType1> <http://www.w3.org/2000/01/rdf-schema#label> "Thing" .
<http://mynamespace/ProductType1> <http://purl.org/dc/elements/1.1/publisher> <http://mynamespace/Publisher1> .

You can then load the triples by running

sbt "runMain LoadDataServletRun ntriples.ntrips N-triples"

Query Triples in Rya

You can use the following Java code to query triples in Rya

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.nio.file.Files;
import java.nio.file.Paths;

public class QueryDataServletRun {

    public static void main(String[] args) {
        try {
            String queryFile = args[0];
            String query = new String(Files.readAllBytes(Paths.get(queryFile)));
            String queryenc = URLEncoder.encode(query, "UTF-8");
            URL url = new URL("http://localhost:8080/web.rya/queryrdf?query.infer=true&query=" + queryenc);
            URLConnection urlConnection = url.openConnection();
            urlConnection.setDoOutput(true);

            BufferedReader rd = new BufferedReader(new InputStreamReader(
                    urlConnection.getInputStream()));
            String line;
            while ((line = rd.readLine()) != null) {
                System.out.println(line);
            }
            rd.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

As an example you can put the following in a file, query.txt

select * where {
   <http://mynamespace/ProductType1> ?p ?o.   
}                                

and then run the following command

sbt "runMain QueryDataServletRun query.txt"

Hopefully you are now able to store and query triples using Rya. If not please leave a comment with the error you encountered.

1 thought on “Storing and querying triples using Apache Rya5 min read

  1. Hello and thanks for the instructions. I was looking for something like that in order to find a solution to a problem that I am facing for several weeks now. However I still have the same problems I faced before (when I used the versions suggested by the VagrantFile in the v4.0.0-incubating-SNAPSHOT). I am using Debian GNU/Linux 9.9, HDFS, Zookeeper and Accumulo the versions from this guide, Tomcat v8.5.42, sesame-http-server-4.1.2.war and sesame-http-workbench-4.1.2.war. The Sesma Server and Workbench load properly but Web-Rya gives the following error “HTTP Status 404 – Not Found” and a more detailed one “The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.”. However the Accumulo shell shows that the following tables have been created:
    rya_ns
    rya_osp
    rya_po
    rya_prospects
    rya_spo

    Any suggestion would by appreciated.
    Thanks

    Theofilos

Comments are closed.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading