Rinnegan - A distributed tracer for blackbox systems
TLDR
Rinnegan is a tool that I wrote for greatly reducing my time in understanding and reversing complex distributed systems. Source available at https://github.com/tunnelshade/rinnegan.
1. Background
Imagine a setup of Chef which has a server, message queues, distributed database, configuration system and many more processes running on each of servers that run core chef infrastructure. Now, finding bugs in Chef is a lot easier if you can understand how it works and sadly chef being a product that sells, all it's inner workings are not publicly documented for us to read and understand.
So, I needed a way to visualize what components are running and what communications are happening between those components. Enter Rinnegan, named after most powerul eyes from Naruto verse.
2. Idea
Rinnegan uses Grafana dashboards for visualizing and influxdb for storing data. A collection of scripts help in deploying/managing a small agent on all appliances of interest which help in collecting traces and do some basic tasks.
3. Walkthrough
Let us use Rinnegan to start reversing HDFS working.
- To start using rinnegan, we need to start rinnegan infrastructure (Influx database & dashboard servers). All required steps are handled by a Makefile. Currently, you need GOPATH setup for cross compilation of agent binary.
$ go get https://github.com/tunnelshade/rinnegan
$ cd $GOPATH/src/github.com/tunnelshade/rinnegan/infrastructure
$ make start
$ docker ps --format="{{.Names}}"
grafana
influxdb
- Visit http://<localhostname>:3000 on your browser and use
default credentials
admin:admin
to login. Navigate to rinnegan dashboard and you should see something like below.
- We need a test hdfs setup to play around. I highly recommend using docker instances as they tend to not have noisy system processes and traffic. Let us use runtime-compose setup here.
$ git clone https://github.com/flokkr/runtime-compose.git
$ cd runtime-compose/hdfs/viewfs
$ docker-compose up -d
$ docker ps --format="{{.Names}}" | grep viewfs
viewfs_datanodex_1
viewfs_nny_1
viewfs_nnx_1
viewfs_datanodey_1
- To do any operation on target containers/hosts,
./bin/rinnegan.sh
is the right utility. To use it, we need to fix two files present insamples/
directory.hosts
file is used to list one target per line.variables
has some necessary environment variables set, fix them accordingly. - In current example, we are dealing with containers so hosts file
should have names of all containers. Enable environment variable
RINNEGAN_DOCKER
in variables to true and source it out.
$ docker ps --format="{{.Names}}" > ./samples/hosts
$ source ./samples/variables
- Once sucessfully setup, running help should work.
$ ./bin/rinnegan.sh --help
Usage: rinnegan <host_regex> [agent|deploy|list|stop|wipe|exec]
<host_regex> grep regex that will be applied to filter hosts
agent Interact with agents deployed on targets
deploy Deploy agents on to targets
list List all active agents
stop Stop all active agents
wipe Remove any file leftovers on targets, run after stopping
exec Run commands on targets directly, nothing fancy
- Let us compile agent to be deployed. As all containers in this
example are linux, just run
make linux_agent
. - Time to deploy our agents and check if agent is running. Ignore any warnings of missing dependencies for modules.
$ ./bin/rinnegan.sh "." deploy
$ ./bin/rinnegan.sh "." list
- Many times there is a necessity to run some commands on all the targets, this is easily possible in rinnegan. Let us see how to do that by installing procps on all containers.
- Let us see, what all processes are run as part of a hdfs setup. Once command is run, checkout dashboard to see data over there.
$ ./bin/rinnegan.sh "." agent module run ps
- Namenodes (nnx & nny) seem to have main process under pid 125. Let us trace it's network calls. For this we will be needing strace module, hence let us install it first only on nnx.
$ ./bin/rinnegan.sh "nnx" exec apk add strace
- Even after installing it, we do not see
strace
module. This way rinnegan is quite verbose in telling what is missing, which in this case is wrong ptrace_scope value. Let us start strace module as well.
$ ./bin/rinnegan.sh "." exec sysctl -w kernel.yama.ptrace_scope=0
$ ./bin/rinnegan.sh "nnx" agent module run strace 125 trace=desc
- Dashboard should now be showing network traffic graphs and syscall traces in Network panel.
- It seems to be some kind of heartbeat, so let us stop this network tracer and find out which host is connecting to it.
$ ./bin/rinnegan.sh "nnx" agent module list
$ ./bin/rinnegan.sh "nnx" agent module stop strace_trace=desc_125
- Since this seems to be a server listening, let us look for ESTABLISHED connections of this process using netstat module.
$ ./bin/rinnegan.sh "nnx" agent module run netstat 125
- Dashboard should be showing connections, from which we can deduce using raddr column that host 064c7310222b is the one talking to our nnx.
- Stop netstat module and start network tracing nnx (pid: 125) & 064c7310222b (pid: 68). Pids can be easily obtained from process panel. Pay attention that hostname is not always equal to container name that is used in targets list.
$ ./bin/rinnegan.sh "nnx" agent module run strace 125 trace=desc
$ ./bin/rinnegan.sh "nodex" agent module run strace 68 trace=desc
- It is deducible that both hosts have a heartbeat kind of interaction in idle state. Filtering out on hosts should help remove remainder host's graphs. Best part is that dragging a rectangle on those graphs to include two spikes will modify timerange and you will only see syscall traces during that period.
- What next? Just enable tracers and try writing a file to hdfs to see how file blocks are written.
- So, just pick any containerised blackbox distributed system and go about finding bugs by understading communications.
4. Capabilities
What else is rinnegan capable of doing?
- Use iptables to easily redirect traffic between components to live
tamper with traffic
agent iptables --help
. A good http reverse proxy ismitmproxy
.
$ ./bin/rinnegan.sh "nnx" agent iptables --help
- Use frida to run scripts like ssl-bypass for mitming ssl traffic.
Rinnegan comes with cert check bypass script for openssl. Frida
scripts are present in
build/frida/
, adding a new script there requires you to redeploy or get that script to target and then just use script name without extension.
$ ./bin/rinnegan.sh "nnx" exec apk add py-pip
$ ./bin/rinnegan.sh "nnx" exec pip install frida-tools
$ ./bin/rinnegan.sh "nnx" agent module run frida 125 ssl-bypass
5. Last word
Rinnegan is a very experimental software which gets feature as and when I need them, but it has been super helpful in reversing some complex blackbox systems. It was built to solve my constant frustration of having to check processes, trace them, redirect traffic and tamper with those.
If something seems to be not working
- Wipe agent from particular target.
- Kill rinnegan related processes (HINT: Use exec).
- Redeploy agent and resume.