How to Run Chaos Experiments on Your Physical Machine
Author: Xiang Wang (Committer of Chaos Mesh, Engineer at PingCAP)
Transcreator: Yajing Wang; Editor: Tom Dewan
Chaos Mesh® is a cloud-native Chaos Engineering platform that orchestrates chaos in Kubernetes environments. With Chaos Mesh, you can simulate a variety of failures, and use Chaos Dashboard, a web UI, to manage chaos experiments directly. Since it was open-sourced, Chaos Mesh has been adopted by many companies to ensure their systems' resilience and robustness. But over the past year, we have frequently heard requests from the community asking how to run chaos experiments when the services are not deployed on Kubernetes.
To meet the growing needs of chaos testing on physical machines, we are excited to present an enhanced toolkit called chaosd. You might find the name familiar. That's because it evolved from
chaos-daemon, a key component in Chaos Mesh. At TiDB Hackathon 2020, we refactored chaosd to make it more than a command-line tool. Now with chaosd v1.0.1, you can simulate specific errors that target physical machines, and then, undo the chaos experiments like nothing had happened.
chaosd has the following advantages:
Easy-to-use: You can easily create and manage chaos experiments with chaosd commands.
Various fault types: You can simulate faults to be injected on physical machines at different levels, including process faults, network faults, Java Virtual Machine (JVM) application faults, stress scenarios, disk faults, and host faults.
Multiple work modes: You can use chaosd as a command-line tool or as a service.
Without further ado, let's give it a try.
In this section, I will walk you through how to inject a network fault with chaosd. Your Linux kernel version must be v2.17 or later.
To download chaosd, run the following command:
curl -fsSL -o chaosd-v1.0.1-linux-amd64.tar.gz https://mirrors.chaos-mesh.org/chaosd-v1.0.1-linux-amd64.tar.gz
Unzip the file. It contains two file folders:
chaosdcontains the tool entry of chaosd.
toolscontains the tools needed to perform the chaos experiment, including stress-ng (to simulate stress scenarios), Byteman (to simulate JVM application faults), and PortOccupyTool (to simulate network faults).
In this chaos experiment, the server will be unable to access chaos-mesh.org.
Run the following command:
sudo ./chaosd attack network loss --percent 100 --hostname chaos-mesh.org --device ens33
Attack network successfully, uid: c55a84c5-c181-426b-ae31-99c8d4615dbe
In this simulation, the ens33 network interface card cannot send network packets to or receive packets from chaos-mesh.org. The reason why you have to use
sudo commands is that the chaos experiment modifies network rules, which require root privileges.
Also, don't forget to save the
uid of the chaos experiment. You'll be entering that later as part of the recovery process.
ping command to see if the server can access chaos-mesh.org:
ping chaos-mesh.org PING chaos-mesh.org (22.214.171.124) 56(84) bytes of data.
When you execute the command, it's very likely that the site won't respond. Press
C to stop the ping process. You should be able to see the statistics of the
100% packet loss.
2 packets transmitted, 0 received, 100% packet loss, time 1021ms
To recover the experiment, run the following command:
sudo ./chaosd recover c55a84c5-c181-426b-ae31-99c8d4615dbe
Recover c55a84c5-c181-426b-ae31-99c8d4615dbe successfully
In this step, you also need to use
sudo commands because root privileges are required. When you finish recovering the experiment, try to ping chaos-mesh.org again to verify the connection.
As you can see, chaosd is fairly easy to use. But we can make it easier—a dashboard web for chaosd is currently under extensive development.
We will continue to enhance its usability and implement more functionalities such as managing chaos experiments run with chaosd as well as those run with Chaos Mesh. This will provide a consistent and unified user experience for chaos testing on Kubernetes and physical machines. The architecture below is just a simple example:
Currently, chaosd provides six fault injection types. We plan to develop more types that have been supported by Chaos Mesh, including HTTPChaos and IOChaos.
If you are interested in helping us improve chaosd, you are welcome to pick an issue and get started!
If you are interested in using chaosd and want to explore more, check out the documentation. If you come across an issue when you run chaosd, or if you have a feature request, feel free to create an issue. We would love to hear your voice!