long-wireless-links-and-monitoring.md (24255B)
1 Title: Long Wireless links and monitoring. 2 Author: paco 3 Date: 2019-07-31 4 Type: article 5 6 _Update 2021-05-29: This setup is quite outdated now. One of the endpoints 7 does not exist anymore. Also, I replaced Zerotier by Wireguard, and the 8 monitoring part changed quite a bit too. All the research, materials, and 9 build sections may be still useful so I left it all here. I may revisit this 10 document in the future and update it_ 11 12 ## Intro 13 14 Some time ago I built 2 [P-t-P][1] links between some family members' buildings. 15 16 Thing is that my brother and my sister live in an area with no coverage from 17 traditional ISPs, but that is quite close (5.5km on a straight line, with no 18 obstacles) to my parent's which have good coverage (even FTTH) and plenty of 19 providers to choose from. 20 21 This project has grown _organically_ so to speak, and the requisites kept 22 changing. 23 24 That, and my lack of experience on the subject make all this far from an 25 optimal solution. 26 27 In the end it has been working for almost 3 years now. This is an attempt to 28 document all the infrastructure and the bits and pieces used so I do not forget 29 about them and maybe it can be of use to somebody else. 30 31 ## First steps and research 32 33 As I said, I knew nothing about this before tackling the project. I have some 34 solid knowledge about networking, but I knew little about long (for me) 35 wireless links, antennas, propagation and a bunch of other stuff I never heard 36 of. So I had to do some research. 37 38 If you want to do something like this, is better to plan ahead. See what the 39 requisites are and start digging. 40 41 Some things to take into consideration are: 42 43 * Budget. This is an important one in this scenario, as this is for personal 44 use only. 45 * Distance between the endpoints of the link. Modern hardware (more on my 46 choice later), can easily cover 10km or maybe more, but read the 47 manufacturer's datasheet and look for output power, antenna gain and 48 sensitivity. And always take their numbers with a grain of salt, as they 49 are usually tested on ideal conditions you won't encounter. You'll find 50 later a way to calculate the ideal numbers to have an estimate. 51 * Obstacles. There has to be perfect clear vision between endpoints. Wireless 52 communications, especially WiFi either on 2.4GHz or 5GHz, are very 53 sensitive to obstacles. Even partial cover can have a big impact on link 54 quality. And clear vision does not mean _"I can see a single point in the 55 distance"_, there's this thing called [Fresnel zone][2], under some 56 atmospheric conditions or spectrum saturation it will give you a lot of 57 trouble. 58 * Materials. Don't be cheap. This will have to resists the outdoor conditions 59 for as long as possible. 60 * Neighbours and regulations. There's the legal part (RF regulations in your 61 country and things like that) and the _"social"_ part, in this case my 62 family does not live in detached houses but on apartments, so that has to 63 be taken into consideration if there are any rules about this. 64 * Infrastructure. And by that I mean all the necessary to be able to install 65 the antennas, route the cables, install connectors, etc. I'm not only 66 talking about tools, but also access to the best spots to put the antennas, 67 etc. 68 * Antenna location. As a rule of thumb, the higher the better. But this 69 depends a lot on your particular situation. It deserves some thought. 70 * Spectrum saturation. Wifi is ubiquitous now. That may be a challenge for 71 any installation specially on urban areas. Ideally, you should check how 72 _crowded_ the spectrum is, but this is usually pretty difficult for 73 amateurs without special equipment. Some antennas have a built in spectrum 74 analyser, but it may perform badly. 75 76 ## Materials 77 78 This is a list of materials I choose and why I choose them. It is short, as it 79 is really an easy installation. 80 81 ### Antennas 82 83 I ended up using [Ubiquity PowerBeams][3] to create the 2 links. Four in 84 total, 2 for each link. 85 86 I was looking for some reputable manufacturer trying to avoid problems in the 87 future. Also, I wanted something as simple as possible. This kind of antennas 88 have the _"emitter/receiver"_ and the antenna all in the same device. So no 89 special connectors to be crimped, virtually no losses on cables, just an easy 90 [PoE][4] setup from the house to the rooftop. 91 92 Also, this antenna has an easy to setup web interface _and_ an SSH server that 93 leaves you in a busybox with some proprietary commands that are pretty handy 94 for automation and data collection. 95 96 There are newer models now and other manufacturers. Do your research, read on 97 forums and all the usual stuff. I can say those work for this setup with minor 98 issues. 99 100 If you know something about this subject you may be wondering why I did not use 101 something with a wider angle on the _"access point"_ side and use just 3 102 antennas instead of 4. Truth is, I tried, but I had some problems with the 2nd 103 link giving poor performance. Not being an expert on this I can only guess 104 that the partial obstruction on the LOS (line of sight) path for the second 105 link was the cause of the poor performance, specially on bad weather days (WiFi 106 is pretty sensitive to heavy rain) and episodes of spectrum saturation. 107 108 Creating a separate link with a dedicated pair of antennas improved the 109 situation a lot. 110 111 ### Cables 112 113 As the antennas only need a network connection, we only need Ethernet cable. 114 Be sure that is CAT5e or better. 115 116 Always use cable rated for outdoor use. Regular network cable will not last 117 long exposed to rain and the sun's UV. I went for [this one][5] because it was 118 available at the time on Amazon. 119 120 ### Connectors 121 122 Don't go extra cheap on this, but anything with reasonable quality will do 123 here. The antennas are built in a way that the connectors are never exposed, 124 so this part is not that critical. 125 126 ### Antenna pole and other hardware 127 128 I cannot say much about this. What to buy here depends a lot on your 129 particular setup. Remember that the higher the better for the antennas, and 130 remember wind is a thing ... you do not want it to fly away like a plastic bag. 131 132 ## Build steps 133 134 This is a list of the build steps I took. I started checking the list 135 mentioned on the [First steps](#First steps and research) section. 136 Specifically the location of the antennas and the clear line of sight. 137 138 I have to admit that I did a sloppy job on the second link, because I did not 139 know about the [Fresnel zone][2] back then, but there's some things you can do 140 to mitigate its effects. 141 142 ### Calculate signal strength 143 144 There's a simple way to calculate the signal strength you should see on the 145 other side of the link (on ideal conditions). This can be taken as a reference 146 to see if the setup is viable and what conditions and speed negotiation you can 147 expect between the 2 endpoints of the link. 148 149 The simplified formula to calculate the signal is: 150 151 ``` 152 emitterPower + emitterGain - signalLoss + receiverGain 153 ``` 154 155 I say this is the simplified formula, because it does not take into account 156 loses on cables and connectors, that's because I choose to use a _"all in one 157 packet"_ type of antenna, so that makes no sense in this case. This is a huge 158 advantage for a beginner. Also, because I only take into account the free 159 space loss and not any other kinds of loss, that would be a lot more difficult 160 to calculate. That was sufficient for me anyway, as the conditions of line of 161 sight are pretty good. 162 163 To calculate signal loss, this is the formula: 164 165 ``` 166 loss = 20*log((4*π*d)/λ) 167 ``` 168 169 Being `d` the distance between the 2 endpoints in meters and `λ` the 170 wavelength, also in meters. If you do not remember how to calculate the 171 wavelength from the frequency is just: 172 173 ``` 174 λ = C/f 175 ``` 176 177 Being `C` the speed of light in meters per second and `f` the frequency in 178 Hertz. 179 180 So, as an example, let's say I choose channel `137` which is `5685 MHz`, and 181 the 2 endpoints are 5.2km apart. That gives us a signal loss of `121.85 dB`. 182 183 According to the antenna datasheet the transmission power is `5 dBm`, the gain 184 of the antenna is `25 dBi` (that's on average I guess across the whole range of 185 channels). So putting all that together I should get on the other end `-66.86 186 dBm`. This works both ways in this case, so now we have to check sensitivity. 187 Again according to the datasheet, there's no problem in any modulation 188 negotiation with this kind of signal strength (in theory, so to be on the safe 189 side add at least `-3 dB` to your results). 190 191 ### Physical setup and alignment 192 193 With the theory calculations out of the way, knowing that was possible, the fun 194 part started, I had to get on the roof and install the antennas. 195 196 Of course I won't be saying much about this, as this is different for every 197 single installation. Suffice to say, I had a _"pretty fun time"_ up on ladders 198 and climbing to places not meant to be climbed ... 199 200 Before securing the antenna to the pole in its final position it has to be 201 aligned. I did this the best I could given the lack of specialised equipment. 202 203 On the datasheet there are radiation plots for the chosen model. The principle 204 is simple, those are 2D representations of the radiation lobes of the antenna, 205 and the loss referred to the total gain. So basically you want to point them 206 to one another as perfectly as possible, specially for parabolic antennas, 207 which have a very narrow beam. 208 209 Those radiation plots confused me at first as, in case of the PowerBeam there 210 are 4 of them "Vertical Azimuth", "Vertical Elevation", "Horizontal Azimuth" and 211 "Horizontal Elevation". This did not make any sense for me in the beginning, 212 as the azimuth is an horizontal angle and elevation is a vertical one. It 213 drove me nuts. It turns out it refers to both polarisations of the signal that 214 those devices create ... Once you understand that is easy, they are just the 215 same measurement but times 2, one for each polarisation. 216 217 Once I knew how much of an angle I had before starting to loose signal, and 218 with a bit of the good old trigonometry, I knew my margin of error when 219 pointing the antennas to each other. 220 221 I did this standing behind the antenna and looking as if my line of sight was 222 the beam. With some fiddling, that should be enough for the horizontal 223 alignment. For the vertical one, it was easier, because the error margin is 224 pretty big compared to the distance to the ground, even if you're on a tall 225 building (again, trigonometry, that angle at 5km is some meters ...). Anyway 226 with the help of some online tool I could calculate that easily to make it as 227 precise as possible (search for "antenna downtilt calculator" on your favourite 228 search engine). 229 230 ### Network diagram and configuration 231 232 With the antennas installed, it was time for some configuration. 233 234 This is a basic diagram of the network setup I came up with: 235 236 ``` 237 192.168.1.6/24 238 +--------+ 239 | Bro. | 240 192.168.1.2/24 192.168.1.4/24 | Router | 241 +---------+ +----------+ +--------+ 242 | Antenna | | Antenna | / 192.168.10.1/24 243 ----| AP1 |+++++++++| ST1 |--- 244 192.168.1.1/24 ---/ +---------+ +----------+ 245 +---------+ 246 +---------+ -| ISP | 247 |Internet |-/ | Router | 248 +---------+ +---------+ 249 | --\ +---------+ +-----------+ 250 \ --\ | Antenna | | Antenna | 251 \ --| AP2 |+++++++++| ST2 |-\ 252 | +---------+ +-----------+ -\ 192.168.1.7/24 253 \ 192.168.1.3/24 192.168.1.5/24 +---------+ 254 +------------+ | Sis. | 255 | Rpi | | Router | 256 | Monitoring | +---------+ 257 +------------+ 192.168.10.1/24 258 192.168.1.10/24 259 ``` 260 261 All are cable connections but the `++++` ones, which are the 5km links. 262 263 On the routers/APs at the end of the chain I used the same network segment for 264 both, as hey will be isolated and do NAT. I did this because I have little 265 control over the ISP router. It is _"reset to defaults"_ from time to time and 266 that caused me problems before. So setting static routes would be a pain to 267 maintain. That produces double NAT on my siblings', but that's a small price 268 to pay for having a stable setup. 269 270 Yes, I know that's a shitty thing to do for an ISP (they break your dhcp 271 reservations and port forwarding too ...), but most of the ISPs where I live 272 are the biggest idiots and do the dumbest stuff you can imagine, so that's not 273 even something for them. 274 275 The PowerBeams are configurable via a web interface that is pretty intuitive. 276 They can also be configured via an SSH access and editing a text file + some 277 commands. 278 279 Some things I did: 280 281 * Enable WDS (transparent bridge mode), so I could see the MAC addresses of all 282 the chain from my monitoring station. That helps on debugging if something 283 network goes wrong. 284 * I enabled SNMP for monitoring, SSH server for access (with public keys) and 285 NTP so the antennas have the right time (good for logs). 286 * All 4 antennas were set up on bridge mode. 287 * The ones connected to the ISP router were set up as "Access Point" and the 288 other 2 as "Stations" 289 * The antenna startup wizard asks you for country location. That's because 290 they apply the necessary regulation restrictions automatically. Do not cheat 291 here, you can have problems with your local authorities. Besides, if you do 292 not have good signal within the power output regulations chances are you're 293 doing something wrong or the conditions of line of sight, etc. are not really 294 good, so it won't matter and you'll be breaking the law for nothing (and 295 probably causing problems to other antennas and installations). 296 297 If you prefer the command line to configure the antennas, log into them via SSH 298 and edit the file `/tmp/system.cfg`. Then save to `NVRAM` with the command 299 `cfgmtd -w`. Then reset with `/usr/etc/rc.d/rc.softrestart force`. 300 301 I do not recommend that method at the beginning, until you get familiar with 302 all the options and configurations possible. You can make a pretty big mess. 303 304 As I said earlier, those antennas have a sort of spectrum analyser you can use 305 to determine which channel is less busy. It uses some java applet (yes, I know 306 ...) and it has been broken in 2 occasions on some firmware updates. But it 307 can be of assistance if your spectrum is really busy. 308 309 310 ### Performance tests 311 312 There are 2 ways to easily test the throughput of the links. The web interface 313 has a "speed test" built in. You have to put the credentials of the other end 314 and it can test TX, RX or both. 315 316 The other way (that I like the most) is `iperf(1)`. The antennas have installed 317 a basic implementation of that tool, so log into the antenna on the other end, 318 and use `iperf(1)` either as server or client to test both sides of the 319 communication. 320 321 Play a bit with the channel width. More channel width allows for faster 322 transfer rates, but a narrow channel increases stability. 323 324 I ended up using `20 MHz` for one of the links and `10 MHz` for the other. 325 That last one is the one with less than ideal LOS situation. In the end 326 reducing the channel width and choosing the least busy channel did the trick 327 and I could get a stable link. 328 329 In the end for the first link I get around `32Mbps` symmetrical. The second 330 link is a lot more variable depending on the conditions and the interferences 331 from other stations. I get up to `17Mbps` symmetrical, and is usually more 332 than `12Mbps`, but on worst case scenario it can get as low as `6Mbps`. Which 333 is still enough to watch online videos at `1080p` with today's compressions and 334 is more than enough to do any kind of browsing, email and whatever ... so 335 I guess is enough. 336 337 ### Monitoring and management 338 339 For various reasons I wanted to monitor the whole thing. My brother had some 340 network outages and I did not know why (I'm pretty sure they are related to 341 some firmware bug introduced on a recent update, but I have no proof). 342 343 My idea for this was to put a Raspberry PI on my parent's network that I could 344 connect to and install all the necessary software for monitoring. 345 346 As I said earlier, I have little control over the ISP router. Also, I did not 347 want to setup a VPN at my house or something similar on a VPS ... So I ended 348 up using [Zerotier][6] to create a _"local network"_ between one of my hosts at 349 my home office and the PI at my parent's. This software creates an interface 350 on the device with a private range, just like a VPN. The main difference in 351 this case is that the _server_ part is managed (you can host it yourself too) 352 and it uses some clever tricks to find the best path between to endpoints so 353 latency is always the least possible. It falls back to relay servers if none 354 of the direct strategies work. Besides, is quite easy to add or remove devices 355 to/from a given virtual network. 356 357 They have some [documentation][10] to make this process easy. 358 359 Having the monitoring PI on a local network segment, I could now use it as 360 a jump box to ssh into the antennas and routers (using `ProxyJump`), making 361 management easier. 362 363 In the end I decided to have some data collection and graphing and, after some 364 consideration, I choose [influxdb][7] + [telegraf][8] + [grafana][9]. That gives 365 me also alerts (more on that later). 366 367 InfluxDB for the database backend, telegraf as the _"agent collector"_ and 368 grafana for graphing tool. 369 370 I choose influxdb because is really [easy to setup][11] on the PI. Check that 371 the retention is enabled so you do not fill up the little SD card on the PI. 372 Is also quite easy to [set up telegraf][12] and [grafana][13]. 373 374 With that running I set up the InfluxDB data source on Grafana. I used the 375 database named _"telegraf"_, which was automatically created by the telegraf 376 process as soon as it started collecting data. 377 378 Then I configured telegraf to get snmp data from the "Access point" antennas 379 and also from the routers at my siblings'. 380 381 To do this I had to add a file to the configuration folder 382 (something `/etc/telegraf/telegraf.d/snmp.conf`) with the snmp config 383 parameters: 384 385 ``` 386 [[inputs.snmp]] 387 agents = [ "192.168.1.2", "192.168.1.3", "192.168.1.6", "192.168.1.7" ] 388 version = 1 389 community = "mycommunity" 390 interval = "60s" 391 timeout = "10s" 392 retries = 3 393 394 [[inputs.snmp.field]] 395 name = "hostname" 396 oid = "RFC1213-MIB::sysName.0" 397 is_tag = true 398 399 [[inputs.snmp.field]] 400 name = "uptime" 401 oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance" 402 403 # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards. 404 [[inputs.snmp.table]] 405 name = "interface" 406 inherit_tags = [ "hostname" ] 407 oid = "IF-MIB::ifTable" 408 409 # Interface tag - used to identify interface in metrics database 410 [[inputs.snmp.table.field]] 411 name = "ifDescr" 412 oid = "IF-MIB::ifDescr" 413 is_tag = true 414 ``` 415 416 The info that comes from this is basically network traffic for all interfaces 417 and uptime. 418 419 I also set up telegraf to collect pings to the remote routers. That gives me 420 info about the health of the link, and I based some alerts on that. 421 422 The needed config was: 423 424 ``` 425 [[inputs.ping]] 426 ## List of urls to ping 427 urls = ["192.168.1.6", "192.168.1.7"] 428 429 ## Number of pings to send per collection (ping -c <COUNT>) 430 count = 3 431 ## Per-ping timeout, in s. 0 == no timeout (ping -W <TIMEOUT>) 432 timeout = 1.0 433 ``` 434 435 And finally, I wanted to have some info the devices provide, but only through 436 some internal commands. For instance, the number of connected devices. 437 438 There are 2 commands that run on those devices that provide some internal 439 information (like signal strength, connected devices, and much more). They are 440 `mca-status` and `wstalist`. 441 442 It turns out telegraf can execute commands and store that as metrics data, no 443 problem. The configuration looks like this: 444 445 ``` 446 [[inputs.exec]] 447 ## Commands array 448 commands = [ "/usr/local/bin/get_connected_devices.sh router1" ] 449 interval = "300s" 450 451 name_override = "conn_devices" 452 tag_keys = [ "hostname" ] 453 timeout = "5s" 454 data_format = "json" 455 ``` 456 457 The script is this: 458 459 ``` 460 #!/bin/sh 461 462 set -eu 463 464 device=${1:-router1} 465 device_info=$(ssh "ubnt@$device" mca-status | tr -d "\r") 466 connected_devices=$(echo "$device_info" |grep wlanConnections| cut -d'=' -f 2) 467 468 printf '{"hostname": "%s", "devices": %d }' "$device" "$connected_devices" 469 ``` 470 471 It outputs some JSON that telegraf understands. 472 473 After this it was just a matter of setting up some grafana dashboards to see 474 what I wanted to see. I think there is enough information on the internet on 475 how to do that, so I won't be explaining it here. 476 477 As I mentioned my brother was having some outages that I still cannot explain. 478 They are fixed rebooting the "access point" part of the link (I'm pretty sure 479 they would go away simply kicking out the client, but I could not be bothered 480 in looking how to do that programatically ...). 481 482 So I thought on automating the reboot process as a mitigation for the 483 inconveniences it produces. I set up an alert on grafana for the ping metric 484 that, when it triggers calls a webhook. 485 486 I did it that way because I wanted to be notified and also automatically take 487 action based on those alerts. The setup I came up with may seem a bit 488 complicated, but it works with simple tools and it has been on service for some 489 months now. 490 491 For the webhook, I found [this][14], which is meant to be a sort of gateway 492 from webhook to XMPP. It only accepts grafana calls but it can be adapted 493 pretty easily. 494 495 I did [some modifications][15] to not only send an xmpp message, but also to write 496 a flag file on disk on a specified folder if it gets an alert with a specific 497 string on it. Then, there's a cron job running that checks for those flags 498 and, if it finds any, executes the script of the same name and deletes the flag 499 on success. All pretty simple to do with shell script. 500 501 On the ping alert case, the shell scripts just connect to the "access point" 502 antenna and perform a `reboot(8)`. 503 504 With that done, outages do not last more than 5 minutes, and they are pretty 505 rare anyway. So I think is a good solution until the day I take the time to 506 dig into it (if I ever do it ...). 507 508 I also created a custom handler with super simple payload, so I could use it 509 from other scripts (not necessarily from this project) to just be notified via 510 xmpp. 511 512 ## Conclusion 513 514 And that's the whole setup. Without using anything too complicated or 515 expensive I could connect those isolated flats, have some insight on what 516 happens on the network, have alerts on the most interesting metrics and even 517 automate responses if I need to. 518 519 I hope this may serve as a source of ideas for similar projects. 520 521 [1]: https://en.wikipedia.org/wiki/Point-to-point_(telecommunications) 522 [2]: https://en.wikipedia.org/wiki/Fresnel_zone 523 [3]: https://www.ui.com/airmax/powerbeam/ 524 [4]: https://en.wikipedia.org/wiki/Power_over_Ethernet 525 [5]: https://www.konigelectronic.com/computer/networking/network-cable-reel-cat5e-futp-100-m-black-solid-55896639 526 [6]: https://www.zerotier.com/ 527 [7]: https://www.influxdata.com/time-series-platform/ 528 [8]: https://www.influxdata.com/time-series-platform/telegraf/ 529 [9]: https://grafana.com/ 530 [10]: https://zerotier.atlassian.net/wiki/spaces/SD/pages/8454145/Getting+Started+with+ZeroTier 531 [11]: https://docs.influxdata.com/influxdb/v1.7/introduction/installation/ 532 [12]: https://docs.influxdata.com/telegraf/v1.11/introduction/installation/ 533 [13]: https://grafana.com/docs/installation/debian/ 534 [14]: https://github.com/opthomas-prime/xmpp-webhook/ 535 [15]: https://git.e1e0.net/xmpp-webhook/log.html