Import all of wikipidea into a graph database (Neo4j)
How to import all of wikipidea into a graph database Neo4j and how I setup my neo4j server on linode.
date created: 20211129
last updated: 20220108
here's how I setup my neo4j server on linode.
TL;DR
ssh to box
docker run -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j
NOTE: this may take a while
Keep the server running with -d
docker rm -f neo4j
docker run -d --restart=always --name=neo4j -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j
visit the web ui
yourservers-ip-address:7474
Signin with creds
Errors and debugging
if you get this error you probibly didnt use the username and password that was part of the docker command please retry with that info in the default case its
username: neo4j
Password: s3cr3t
Success!
Import some stuff!
lets import all of wikipidea!
7 gb file link good as of 20220108
https://os.unil.cloud.switch.ch/swift/v1/lts2-wikipedia/wikipedia_nrc.dump
cd ~ # or in my case /mnt/hdd/import
wget https://os.unil.cloud.switch.ch/swift/v1/lts2-wikipedia/wikipedia_nrc.dump
the commands i used on my system
mkdir /mnt/hdd/import
mkdir /mnt/hdd/data
mkdir /mnt/hdd/logs
cd /mnt/hdd/import # or cd ~ for you home directory
wget https://os.unil.cloud.switch.ch/swift/v1/lts2-wikipedia/wikipedia_nrc.dump
docker run -d --restart=always --volume=/mnt/hdd/import:/var/lib/neo4j/import --env=NEO4J_dbms_allow_upgrade=true --volume=/mnt/hdd/logs:/var/lib/neo4j/logs --env=NEO4J_dbms_allow_format_migration=true --volume=/mnt/hdd/data:/var/lib/neo4j/data --name=neo4j -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j:3.5.30
# verify if you did it right
docker exec --interactive --tty neo4j bash
ls /var/lib/neo4j/import
exit
tmux
docker exec --interactive --tty \
--user="$(id -u):$(id -g)" \
neo4j \
neo4j-admin load --force --from=/import/wikipedia_nrc.dump --database=wikipedia.db
Advance way to adding a mount to running container
Take note of HOSTPATH, and CONTPATH these must be modifed to match your use
nano add_mount.sh
add_mount.sh
#!/bin/sh
set -e
CONTAINER=neo4j
HOSTPATH=/mnt/extrashit/
CONTPATH=/import
REALPATH=$(readlink --canonicalize $HOSTPATH)
FILESYS=$(df -P $REALPATH | tail -n 1 | awk '{print $6}')
while read DEV MOUNT JUNK
do [ $MOUNT = $FILESYS ] && break
done </proc/mounts
[ $MOUNT = $FILESYS ] # Sanity check!
while read A B C SUBROOT MOUNT JUNK
do [ $MOUNT = $FILESYS ] && break
done < /proc/self/mountinfo
[ $MOUNT = $FILESYS ] # Moar sanity check!
SUBPATH=$(echo $REALPATH | sed s,^$FILESYS,,)
DEVDEC=$(printf "%d %d" $(stat --format "0x%t 0x%T" $DEV))
docker-enter $CONTAINER -- sh -c \
"[ -b $DEV ] || mknod --mode 0600 $DEV b $DEVDEC"
docker-enter $CONTAINER -- mkdir /tmpmnt
docker-enter $CONTAINER -- mount $DEV /tmpmnt
docker-enter $CONTAINER -- mkdir -p $CONTPATH
docker-enter $CONTAINER -- mount -o bind /tmpmnt/$SUBROOT/$SUBPATH $CONTPATH
docker-enter $CONTAINER -- umount /tmpmnt
docker-enter $CONTAINER -- rmdir /tmpmnt
run that shit
bash add_mount.sh
Author
by oran collins
github.com/wisehackermonkey
docker run -d --dbms.active_database=new.db --restart=always --volume=/mnt/hdd/import:/var/lib/neo4j/import --env=NEO4J_dbms_allow_upgrade=true --volume=/mnt/hdd/logs:/var/lib/neo4j/logs --env=NEO4J_dbms_allow_format_migration=true --volume=/mnt/hdd/data:/var/lib/neo4j/data --name=neo4j -p7474:7474 -p7473:7473 -p7687:7687 -e NEO4J_AUTH=neo4j/s3cr3t neo4j:3.5.30
docker exec --interactive --tty \
--user="neo4j:neo4j" \
--env=NEO4J_dbms_allow_upgrade=true \
--env=NEO4J_dbms_allow_format_migration=true \
neo4j \
neo4j-admin load --force --from=/var/lib/neo4j/import/wikipedia_nrc.dump --database=wikipedia.db
sudo -u neo4j neo4j-admin load --force --from=/var/lib/neo4j/import/wikipedia_nrc.dump --database=wikipedia.db
echo "dbms.allow_upgrade=true" >> neo4j.conf
```bash
:use systemcreate database
wikipedia