Import all of wikipidea into a graph database (Neo4j)

How to import all of wikipidea into a graph database Neo4j and how I setup my neo4j server on linode.

Import all of wikipidea into a graph database (Neo4j)

date created: 20211129

last updated: 20220108

here's how I setup my neo4j server on linode.

How-To: Run Neo4j in Docker - Developer Guides
You will learn how to create and run a Neo4j graph database in a Docker container. This tutorial is designed for you to follow along and step through the process.

TL;DR

ssh to box

docker run -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j

NOTE: this may take a while

Keep the server running with -d

docker rm -f neo4j
docker run -d --restart=always --name=neo4j -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j

visit the  web ui

yourservers-ip-address:7474

Signin with creds

Errors and debugging

if you get this error you probibly didnt use the username and password that was part of the docker command please retry with that info in the default case its

username: neo4j

Password: s3cr3t

Success!

Import some stuff!

lets import all of wikipidea!

7 gb file link good as of 20220108

https://os.unil.cloud.switch.ch/swift/v1/lts2-wikipedia/wikipedia_nrc.dump

cd ~ # or in my case /mnt/hdd/import
wget https://os.unil.cloud.switch.ch/swift/v1/lts2-wikipedia/wikipedia_nrc.dump

the commands i used on my system


mkdir /mnt/hdd/import
mkdir /mnt/hdd/data
mkdir /mnt/hdd/logs

cd /mnt/hdd/import # or cd ~ for you home directory
wget https://os.unil.cloud.switch.ch/swift/v1/lts2-wikipedia/wikipedia_nrc.dump


docker run -d --restart=always  --volume=/mnt/hdd/import:/var/lib/neo4j/import  --env=NEO4J_dbms_allow_upgrade=true  --volume=/mnt/hdd/logs:/var/lib/neo4j/logs --env=NEO4J_dbms_allow_format_migration=true   --volume=/mnt/hdd/data:/var/lib/neo4j/data --name=neo4j -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j:3.5.30


# verify if you did it right

docker exec --interactive --tty  neo4j bash
ls /var/lib/neo4j/import
exit

tmux
docker exec  --interactive --tty  \
    --user="$(id -u):$(id -g)" \
	neo4j \
    neo4j-admin load --force --from=/import/wikipedia_nrc.dump --database=wikipedia.db
    
docker exec and ls commands
the most important command

Advance way to adding a mount to running container

Attach a volume to a container while it is running

Take note of HOSTPATH, and CONTPATH these must be modifed to match your use

nano add_mount.sh

add_mount.sh

#!/bin/sh
set -e
CONTAINER=neo4j
HOSTPATH=/mnt/extrashit/
CONTPATH=/import

REALPATH=$(readlink --canonicalize $HOSTPATH)
FILESYS=$(df -P $REALPATH | tail -n 1 | awk '{print $6}')

while read DEV MOUNT JUNK
do [ $MOUNT = $FILESYS ] && break
done </proc/mounts
[ $MOUNT = $FILESYS ] # Sanity check!

while read A B C SUBROOT MOUNT JUNK
do [ $MOUNT = $FILESYS ] && break
done < /proc/self/mountinfo 
[ $MOUNT = $FILESYS ] # Moar sanity check!

SUBPATH=$(echo $REALPATH | sed s,^$FILESYS,,)
DEVDEC=$(printf "%d %d" $(stat --format "0x%t 0x%T" $DEV))

docker-enter $CONTAINER -- sh -c \
	     "[ -b $DEV ] || mknod --mode 0600 $DEV b $DEVDEC"
docker-enter $CONTAINER -- mkdir /tmpmnt
docker-enter $CONTAINER -- mount $DEV /tmpmnt
docker-enter $CONTAINER -- mkdir -p $CONTPATH
docker-enter $CONTAINER -- mount -o bind /tmpmnt/$SUBROOT/$SUBPATH $CONTPATH
docker-enter $CONTAINER -- umount /tmpmnt
docker-enter $CONTAINER -- rmdir /tmpmnt

run that shit

bash add_mount.sh


Author

by oran collins
github.com/wisehackermonkey

docker run -d     --dbms.active_database=new.db --restart=always  --volume=/mnt/hdd/import:/var/lib/neo4j/import  --env=NEO4J_dbms_allow_upgrade=true  --volume=/mnt/hdd/logs:/var/lib/neo4j/logs --env=NEO4J_dbms_allow_format_migration=true   --volume=/mnt/hdd/data:/var/lib/neo4j/data --name=neo4j -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j:3.5.30

docker exec  --interactive --tty  \
    --user="neo4j:neo4j" \
 --env=NEO4J_dbms_allow_upgrade=true \
  --env=NEO4J_dbms_allow_format_migration=true \
	neo4j \
    neo4j-admin load --force --from=/var/lib/neo4j/import/wikipedia_nrc.dump --database=wikipedia.db
    

sudo -u neo4j  neo4j-admin load --force --from=/var/lib/neo4j/import/wikipedia_nrc.dump --database=wikipedia.db
echo "dbms.allow_upgrade=true" >> neo4j.conf

```bash

:use systemcreate database wikipedia

If you want to help me out and give some donations here's my monero address: 432ZNGoNLjTXZHz7UCJ8HLQQsRGDHXRRVLJi5yoqu719Mp31x4EQWKaQ9DCQ5p2FvjQ8mJSQHbD9WVmFNhctJsjkLVHpDEZ I use a tracker that is pravicy focused so if you block its cool, im big on blocking stuff on my own machine. im doing it to see if anyone is actualy reading my blog posts...:)