e1e0.net

sources for e1e0 website
git clone https://git.e1e0.net/e1e0.net.git
Log | Files | Refs

commit c8946f2d18f8e8f4966833b87eaafea3b25bceb3
parent 5e7159d83dfbc3ab2056fc5229f791f658225e04
Author: Paco Esteban <paco@onna.be>
Date:   Wed, 31 Jul 2019 14:06:49 +0200

finishing article

Diffstat:
Msrc/long-wireless-links-and-monitoring.md | 289+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 240 insertions(+), 49 deletions(-)

diff --git a/src/long-wireless-links-and-monitoring.md b/src/long-wireless-links-and-monitoring.md @@ -1,5 +1,5 @@ # Long Wireless links and monitoring. -2019-07-19 +2019-07-31 ## Intro @@ -18,14 +18,14 @@ optimal solution. In the end it has been working for almost 3 years now. This is an attempt to document all the infrastructure and the bits and pieces used so I do not forget -about it and maybe it can be of use to somebody else. +about them and maybe it can be of use to somebody else. ## First steps and research As I said, I knew nothing about this before tackling the project. I have some solid knowledge about networking, but I knew little about long (for me) -wireless links, antennas, propagation and a bunch of other stuff I did not know -existed, so I had to do some research. +wireless links, antennas, propagation and a bunch of other stuff I never heard +of. So I had to do some research. If you want to do something like this, is better to plan ahead. See what the requisites are and start digging. @@ -129,7 +129,7 @@ Specifically the location of the antennas and the clear line of sight. I have to admit that I did a sloppy job on the second link, because I did not know about the [Fresnel zone][2] back then, but there's some things you can do -to mitigate is effects. +to mitigate its effects. ### Calculate signal strength @@ -146,11 +146,11 @@ emitterPower + emitterGain - signalLoss + receiverGain I say this is the simplified formula, because it does not take into account loses on cables and connectors, that's because I choose to use a _"all in one -packet"_ type of antenna, so that makes no sense. This is a huge advantage for -a beginner. Also, because I only take into account the free space loss and not -any other kinds of loss, that would be a lot more difficult to calculate. That -was sufficient for me anyway, as the conditions of line of sight are pretty -good. +packet"_ type of antenna, so that makes no sense in this case. This is a huge +advantage for a beginner. Also, because I only take into account the free +space loss and not any other kinds of loss, that would be a lot more difficult +to calculate. That was sufficient for me anyway, as the conditions of line of +sight are pretty good. To calculate signal loss, this is the formula: @@ -173,30 +173,30 @@ So, as an example, let's say I choose channel `137` which is `5685 MHz`, and the 2 endpoints are 5.2km apart. That gives us a signal loss of `121.85 dB`. According to the antenna datasheet the transmission power is `5 dBm`, the gain -of the antenna is `25 dBi` (that's on average I guess). So putting all that -together I should get on the other end `-66.86 dBm`. This works both ways in -this case, so now we have to check sensitivity. Again according to the -datasheet, there's no problem in any modulation negotiation with this kind of -signal strength (in theory, so to be on the safe side add at least `-3 dB` to -your results). +of the antenna is `25 dBi` (that's on average I guess across the whole range of +channels). So putting all that together I should get on the other end `-66.86 +dBm`. This works both ways in this case, so now we have to check sensitivity. +Again according to the datasheet, there's no problem in any modulation +negotiation with this kind of signal strength (in theory, so to be on the safe +side add at least `-3 dB` to your results). ### Physical setup and alignment -With the theory calculations out of the way, knowing that is possible, the fun -part starts, we have to get on the roof now and install the antennas. +With the theory calculations out of the way, knowing that was possible, the fun +part started, I had to get on the roof and install the antennas. Of course I won't be saying much about this, as this is different for every single installation. Suffice to say, I had a _"pretty fun time"_ up on ladders and climbing to places not meant to be climbed ... -With the antenna installed, before attaching it securely to the pole, it has to -be aligned the right way. +Before securing the antenna to the pole in its final position it has to be +aligned. I did this the best I could given the lack of specialised equipment. -On the datasheet you'll find radiation plots for your model. The principle is -simple, those are 2D representations of the radiation lobes of the antenna, and -the loss referred to the total gain. So basically you want to point them to -one another as perfectly as possible, specially for parabolic antennas, which -have a very narrow beam. +On the datasheet there are radiation plots for the chosen model. The principle +is simple, those are 2D representations of the radiation lobes of the antenna, +and the loss referred to the total gain. So basically you want to point them +to one another as perfectly as possible, specially for parabolic antennas, +which have a very narrow beam. Those radiation plots confused me at first as, in case of the PowerBeam there are 4 of them "Vertical Azimuth", "Vertical Elevation", "Horizontal Azimuth" and @@ -206,24 +206,24 @@ drove me nuts. It turns out it refers to both polarisations of the signal that those devices create ... Once you understand that is easy, they are just the same measurement but times 2, one for each polarisation. -Once you know how many angles you have before starting to loose signal, and -with a bit of the good old trigonometry, you know your margin of error when +Once I knew how much of an angle I had before starting to loose signal, and +with a bit of the good old trigonometry, I knew my margin of error when pointing the antennas to each other. I did this standing behind the antenna and looking as if my line of sight was the beam. With some fiddling, that should be enough for the horizontal -alignment. For the vertical one, it's easier, because the error margin is +alignment. For the vertical one, it was easier, because the error margin is pretty big compared to the distance to the ground, even if you're on a tall building (again, trigonometry, that angle at 5km is some meters ...). Anyway -with the help of some online tool you can calculate that easily to make it as +with the help of some online tool I could calculate that easily to make it as precise as possible (search for "antenna downtilt calculator" on your favourite search engine). ### Network diagram and configuration -With the antennas installed, it's time for some configuration. +With the antennas installed, it was time for some configuration. -This is a basic diagram of the network setup: +This is a basic diagram of the network setup I came up with: ``` 192.168.1.6/24 @@ -260,9 +260,9 @@ maintain. That produces double NAT on my siblings', but that's a small price to pay for having a stable setup. Yes, I know that's a shitty thing to do for an ISP (they break your dhcp -reservations and port forwarding), but most of the ISPs where I live are the -biggest idiots and do the dumbest stuff you can imagine, so that's not even -something for them. +reservations and port forwarding too ...), but most of the ISPs where I live +are the biggest idiots and do the dumbest stuff you can imagine, so that's not +even something for them. The PowerBeams are configurable via a web interface that is pretty intuitive. They can also be configured via an SSH access and editing a text file + some @@ -270,22 +270,21 @@ commands. Some things I did: -* enable WDS (transparent bridge mode), so I see the MAC addresses of all the - chain from my monitoring station. That helps on debugging if something - network goes wrong. +* Enable WDS (transparent bridge mode), so I could see the MAC addresses of all + the chain from my monitoring station. That helps on debugging if something + network goes wrong. * I enabled SNMP for monitoring, SSH server for access (with public keys) and - NTP so the antennas have the right time (good for logs). -* All 4 antennas are set up on bridge mode. -* The ones connected to the ISP router are set up as "Access Point" and the - other 2 as "Stations" + NTP so the antennas have the right time (good for logs). +* All 4 antennas were set up on bridge mode. +* The ones connected to the ISP router were set up as "Access Point" and the + other 2 as "Stations" * The antenna startup wizard asks you for country location. That's because - they apply the necessary regulation restrictions automatically. Do not - cheat here, you can have problems with your local authorities. Besides, if - you do not have good signal within the power output regulations chances are - you're doing something wrong or the conditions of line of sight, etc. are - not really good, so it won't matter and you'll be breaking the law for - nothing (and probably causing problems to other antennas and - installations). + they apply the necessary regulation restrictions automatically. Do not cheat + here, you can have problems with your local authorities. Besides, if you do + not have good signal within the power output regulations chances are you're + doing something wrong or the conditions of line of sight, etc. are not really + good, so it won't matter and you'll be breaking the law for nothing (and + probably causing problems to other antennas and installations). If you prefer the command line to configure the antennas, log into them via SSH and edit the file `/tmp/system.cfg`. Then save to `NVRAM` with the command @@ -327,10 +326,202 @@ is still enough to watch online videos at `1080p` with today's compressions and is more than enough to do any kind of browsing, email and whatever ... so I guess is enough. -### Monitoring +### Monitoring and management + +For various reasons I wanted to monitor the whole thing. My brother had some +network outages and I did not know why (I'm pretty sure they are related to +some firmware bug introduced on a recent update, but I have no proof). + +My idea for this was to put a Raspberry PI on my parent's network that I could +connect to and install all the necessary software for monitoring. + +As I said earlier, I have little control over the ISP router. Also, I did not +want to setup a VPN at my house or something similar on a VPS ... So I ended +up using [Zerotier][6] to create a _"local network"_ between one of my hosts at +my home office and the PI at my parent's. This software creates an interface +on the device with a private range, just like a VPN. The main difference in +this case is that the _server_ part is managed (you can host it yourself too) +and it uses some clever tricks to find the best path between to endpoints so +latency is always the least possible. It falls back to relay servers if none +of the direct strategies work. Besides, is quite easy to add or remove devices +to/from a given virtual network. + +They have some [documentation][10] to make this process easy. + +Having the monitoring PI on a local network segment, I could now use it as +a jump box to ssh into the antennas and routers (using `ProxyJump`), making +management easier. + +In the end I decided to have some data collection and graphing and, after some +consideration, I choose [influxdb][7] + [telegraf][8] + [grafana][9]. That gives +me also alerts (more on that later). + +InfluxDB for the database backend, telegraf as the _"agent collector"_ and +grafana for graphing tool. + +I choose influxdb because is really [easy to setup][11] on the PI. Check that +the retention is enabled so you do not fill up the little SD card on the PI. +Is also quite easy to [set up telegraf][12] and [grafana][13]. + +With that running I set up the InfluxDB data source on Grafana. I used the +database named _"telegraf"_, which was automatically created by the telegraf +process as soon as it started collecting data. + +Then I configured telegraf to get snmp data from the "Access point" antennas +and also from the routers at my siblings'. + +To do this I had to add a file to the configuration folder +(something `/etc/telegraf/telegraf.d/snmp.conf`) with the snmp config +parameters: + +``` +[[inputs.snmp]] + agents = [ "192.168.1.2", "192.168.1.3", "192.168.1.6", "192.168.1.7" ] + version = 1 + community = "mycommunity" + interval = "60s" + timeout = "10s" + retries = 3 + + [[inputs.snmp.field]] + name = "hostname" + oid = "RFC1213-MIB::sysName.0" + is_tag = true + + [[inputs.snmp.field]] + name = "uptime" + oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance" + + # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards. + [[inputs.snmp.table]] + name = "interface" + inherit_tags = [ "hostname" ] + oid = "IF-MIB::ifTable" + + # Interface tag - used to identify interface in metrics database + [[inputs.snmp.table.field]] + name = "ifDescr" + oid = "IF-MIB::ifDescr" + is_tag = true +``` + +The info that comes from this is basically network traffic for all interfaces +and uptime. + +I also set up telegraf to collect pings to the remote routers. That gives me +info about the health of the link, and I based some alerts on that. + +The needed config was: + +``` +[[inputs.ping]] + ## List of urls to ping + urls = ["192.168.1.6", "192.168.1.7"] + + ## Number of pings to send per collection (ping -c <COUNT>) + count = 3 + ## Per-ping timeout, in s. 0 == no timeout (ping -W <TIMEOUT>) + timeout = 1.0 +``` + +And finally, I wanted to have some info the devices provide, but only through +some internal commands. For instance, the number of connected devices. + +There are 2 commands that run on those devices that provide some internal +information (like signal strength, connected devices, and much more). They are +`mca-status` and `wstalist`. + +It turns out telegraf can execute commands and store that as metrics data, no +problem. The configuration looks like this: + +``` +[[inputs.exec]] + ## Commands array + commands = [ "/usr/local/bin/get_connected_devices.sh router1" ] + interval = "300s" + + name_override = "conn_devices" + tag_keys = [ "hostname" ] + timeout = "5s" + data_format = "json" +``` + +The script is this: + +``` +#!/bin/sh + +set -eu + +device=${1:-router1} +device_info=$(ssh "ubnt@$device" mca-status | tr -d "\r") +connected_devices=$(echo "$device_info" |grep wlanConnections| cut -d'=' -f 2) + +printf '{"hostname": "%s", "devices": %d }' "$device" "$connected_devices" +``` + +It outputs some JSON that telegraf understands. + +After this it was just a matter of setting up some grafana dashboards to see +what I wanted to see. I think there is enough information on the internet on +how to do that, so I won't be explaining it here. + +As I mentioned my brother was having some outages that I still cannot explain. +They are fixed rebooting the "access point" part of the link (I'm pretty sure +they would go away simply kicking out the client, but I could not be bothered +in looking how to do that programatically ...). + +So I thought on automating the reboot process as a mitigation for the +inconveniences it produces. I set up an alert on grafana for the ping metric +that, when it triggers calls a webhook. + +I did it that way because I wanted to be notified and also automatically take +action based on those alerts. The setup I came up with may seem a bit +complicated, but it works with simple tools and it has been on service for some +months now. + +For the webhook, I found [this][14], which is meant to be a sort of gateway +from webhook to XMPP. It only accepts grafana calls but it can be adapted +pretty easily. + +I did [some modifications][15] to not only send an xmpp message, but also to write +a flag file on disk on a specified folder if it gets an alert with a specific +string on it. Then, there's a cron job running that checks for those flags +and, if it finds any, executes the script of the same name and deletes the flag +on success. All pretty simple to do with shell script. + +On the ping alert case, the shell scripts just connect to the "access point" +antenna and perform a `reboot(8)`. + +With that done, outages do not last more than 5 minutes, and they are pretty +rare anyway. So I think is a good solution until the day I take the time to +dig into it (if I ever do it ...). + +I also created a custom handler with super simple payload, so I could use it +from other scripts (not necessarily from this project) to just be notified via +xmpp. + +## Conclusion + +And that's the whole setup. Without using anything too complicated or +expensive I could connect those isolated flats, have some insight on what +happens on the network, have alerts on the most interesting metrics and even +automate responses if I need to. + +I hope this may serve as a source of ideas for similar projects. [1]: https://en.wikipedia.org/wiki/Point-to-point_(telecommunications) [2]: https://en.wikipedia.org/wiki/Fresnel_zone [3]: https://www.ui.com/airmax/powerbeam/ [4]: https://en.wikipedia.org/wiki/Power_over_Ethernet [5]: https://www.konigelectronic.com/computer/networking/network-cable-reel-cat5e-futp-100-m-black-solid-55896639 +[6]: https://www.zerotier.com/ +[7]: https://www.influxdata.com/time-series-platform/ +[8]: https://www.influxdata.com/time-series-platform/telegraf/ +[9]: https://grafana.com/ +[10]: https://zerotier.atlassian.net/wiki/spaces/SD/pages/8454145/Getting+Started+with+ZeroTier +[11]: https://docs.influxdata.com/influxdb/v1.7/introduction/installation/ +[12]: https://docs.influxdata.com/telegraf/v1.11/introduction/installation/ +[13]: https://grafana.com/docs/installation/debian/ +[14]: https://github.com/opthomas-prime/xmpp-webhook/ +[15]: https://git.onna.be/xmpp-webhook