sources for e1e0 website
git clone https://git.e1e0.net/e1e0.net.git
Log | Files | Refs

long-wireless-links-and-monitoring.md (24255B)

      1 Title: Long Wireless links and monitoring.
      2 Author: paco
      3 Date: 2019-07-31
      4 Type: article
      6 _Update 2021-05-29: This setup is quite outdated now.  One of the endpoints
      7 does not exist anymore.  Also, I replaced Zerotier by Wireguard, and the
      8 monitoring part changed quite a bit too.  All the research, materials, and
      9 build sections may be still useful so I left it all here.  I may revisit this
     10 document in the future and update it_
     12 ## Intro
     14 Some time ago I built 2 [P-t-P][1] links between some family members' buildings.
     16 Thing is that my brother and my sister live in an area with no coverage from
     17 traditional ISPs, but that is quite close (5.5km on a straight line, with no
     18 obstacles) to my parent's which have good coverage (even FTTH) and plenty of
     19 providers to choose from.
     21 This project has grown _organically_ so to speak, and the requisites kept
     22 changing.
     24 That, and my lack of experience on the subject make all this far from an
     25 optimal solution.
     27 In the end it has been working for almost 3 years now.  This is an attempt to
     28 document all the infrastructure and the bits and pieces used so I do not forget
     29 about them and maybe it can be of use to somebody else.
     31 ## First steps and research
     33 As I said, I knew nothing about this before tackling the project.  I have some
     34 solid knowledge about networking, but I knew little about long (for me)
     35 wireless links, antennas, propagation and a bunch of other stuff I never heard
     36 of.  So I had to do some research.
     38 If you want to do something like this, is better to plan ahead.  See what the
     39 requisites are and start digging.
     41 Some things to take into consideration are:
     43 * Budget.  This is an important one in this scenario, as this is for personal
     44     use only.
     45 * Distance between the endpoints of the link.  Modern hardware (more on my
     46     choice later), can easily cover 10km or maybe more, but read the
     47     manufacturer's datasheet and look for output power, antenna gain and
     48     sensitivity.  And always take their numbers with a grain of salt, as they
     49     are usually tested on ideal conditions you won't encounter.  You'll find
     50     later a way to calculate the ideal numbers to have an estimate.
     51 * Obstacles.  There has to be perfect clear vision between endpoints.  Wireless
     52     communications, especially WiFi either on 2.4GHz or 5GHz, are very
     53     sensitive to obstacles.  Even partial cover can have a big impact on link
     54     quality.  And clear vision does not mean _"I can see a single point in the
     55     distance"_, there's this thing called [Fresnel zone][2], under some
     56     atmospheric conditions or spectrum saturation it will give you a lot of
     57     trouble.
     58 * Materials.  Don't be cheap.  This will have to resists the outdoor conditions
     59     for as long as possible.
     60 * Neighbours and regulations.  There's the legal part (RF regulations in your
     61     country and things like that) and the _"social"_ part, in this case my
     62     family does not live in detached houses but on apartments, so that has to
     63     be taken into consideration if there are any rules about this.
     64 * Infrastructure.  And by that I mean all the necessary to be able to install
     65     the antennas, route the cables, install connectors, etc.  I'm not only
     66     talking about tools, but also access to the best spots to put the antennas,
     67     etc.
     68 * Antenna location.  As a rule of thumb, the higher the better.  But this
     69     depends a lot on your particular situation.  It deserves some thought.
     70 * Spectrum saturation.  Wifi is ubiquitous now.  That may be a challenge for
     71     any installation specially on urban areas.  Ideally, you should check how
     72     _crowded_ the spectrum is, but this is usually pretty difficult for
     73     amateurs without special equipment.  Some antennas have a built in spectrum
     74     analyser, but it may perform badly.
     76 ## Materials
     78 This is a list of materials I choose and why I choose them.  It is short, as it
     79 is really an easy installation.
     81 ### Antennas
     83 I ended up using [Ubiquity PowerBeams][3] to create the 2 links.  Four in
     84 total, 2 for each link.
     86 I was looking for some reputable manufacturer trying to avoid problems in the
     87 future.  Also, I wanted something as simple as possible.  This kind of antennas
     88 have the _"emitter/receiver"_ and the antenna all in the same device.  So no
     89 special connectors to be crimped, virtually no losses on cables, just an easy
     90 [PoE][4] setup from the house to the rooftop.
     92 Also, this antenna has an easy to setup web interface _and_ an SSH server that
     93 leaves you in a busybox with some proprietary commands that are pretty handy
     94 for automation and data collection.
     96 There are newer models now and other manufacturers.  Do your research, read on
     97 forums and all the usual stuff.  I can say those work for this setup with minor
     98 issues.
    100 If you know something about this subject you may be wondering why I did not use
    101 something with a wider angle on the _"access point"_ side and use just 3
    102 antennas instead of 4.  Truth is, I tried, but I had some problems with the 2nd
    103 link giving poor performance.  Not being an expert on this I can only guess
    104 that the partial obstruction on the LOS (line of sight) path for the second
    105 link was the cause of the poor performance, specially on bad weather days (WiFi
    106 is pretty sensitive to heavy rain) and episodes of spectrum saturation.
    108 Creating a separate link with a dedicated pair of antennas improved the
    109 situation a lot.
    111 ### Cables
    113 As the antennas only need a network connection, we only need Ethernet cable.
    114 Be sure that is CAT5e or better.
    116 Always use cable rated for outdoor use.  Regular network cable will not last
    117 long exposed to rain and the sun's UV.  I went for [this one][5] because it was
    118 available at the time on Amazon.
    120 ### Connectors
    122 Don't go extra cheap on this, but anything with reasonable quality will do
    123 here.  The antennas are built in a way that the connectors are never exposed,
    124 so this part is not that critical.
    126 ### Antenna pole and other hardware
    128 I cannot say much about this.  What to buy here depends a lot on your
    129 particular setup.  Remember that the higher the better for the antennas, and
    130 remember wind is a thing ... you do not want it to fly away like a plastic bag.
    132 ## Build steps
    134 This is a list of the build steps I took.  I started checking the list
    135 mentioned on the [First steps](#First steps and research) section.
    136 Specifically the location of the antennas and the clear line of sight.
    138 I have to admit that I did a sloppy job on the second link, because I did not
    139 know about the [Fresnel zone][2] back then, but there's some things you can do
    140 to mitigate its effects.
    142 ### Calculate signal strength
    144 There's a simple way to calculate the signal strength you should see on the
    145 other side of the link (on ideal conditions).  This can be taken as a reference
    146 to see if the setup is viable and what conditions and speed negotiation you can
    147 expect between the 2 endpoints of the link.
    149 The simplified formula to calculate the signal is:
    151 ```
    152 emitterPower + emitterGain - signalLoss + receiverGain
    153 ```
    155 I say this is the simplified formula, because it does not take into account
    156 loses on cables and connectors, that's because I choose to use a _"all in one
    157 packet"_ type of antenna, so that makes no sense in this case.  This is a huge
    158 advantage for a beginner.  Also, because I only take into account the free
    159 space loss and not any other kinds of loss, that would be a lot more difficult
    160 to calculate.  That was sufficient for me anyway, as the conditions of line of
    161 sight are pretty good.
    163 To calculate signal loss, this is the formula:
    165 ```
    166 loss = 20*log((4*π*d)/λ)
    167 ```
    169 Being `d` the distance between the 2 endpoints in meters and `λ` the
    170 wavelength, also in meters.  If you do not remember how to calculate the
    171 wavelength from the frequency is just:
    173 ```
    174 λ = C/f
    175 ```
    177 Being `C` the speed of light in meters per second and `f` the frequency in
    178 Hertz.
    180 So, as an example, let's say I choose channel `137` which is `5685 MHz`, and
    181 the 2 endpoints are 5.2km apart.  That gives us a signal loss of `121.85 dB`.
    183 According to the antenna datasheet the transmission power is `5 dBm`, the gain
    184 of the antenna is `25 dBi` (that's on average I guess across the whole range of
    185 channels).  So putting all that together I should get on the other end `-66.86
    186 dBm`.  This works both ways in this case, so now we have to check sensitivity.
    187 Again according to the datasheet, there's no problem in any modulation
    188 negotiation with this kind of signal strength (in theory, so to be on the safe
    189 side add at least `-3 dB` to your results).
    191 ### Physical setup and alignment
    193 With the theory calculations out of the way, knowing that was possible, the fun
    194 part started, I had to get on the roof and install the antennas.
    196 Of course I won't be saying much about this, as this is different for every
    197 single installation.  Suffice to say, I had a _"pretty fun time"_ up on ladders
    198 and climbing to places not meant to be climbed ...
    200 Before securing the antenna to the pole in its final position it has to be
    201 aligned.  I did this the best I could given the lack of specialised equipment.
    203 On the datasheet there are radiation plots for the chosen model.  The principle
    204 is simple, those are 2D representations of the radiation lobes of the antenna,
    205 and the loss referred to the total gain.  So basically you want to point them
    206 to one another as perfectly as possible, specially for parabolic antennas,
    207 which have a very narrow beam.
    209 Those radiation plots confused me at first as, in case of the PowerBeam there
    210 are 4 of them "Vertical Azimuth", "Vertical Elevation", "Horizontal Azimuth" and
    211 "Horizontal Elevation".  This did not make any sense for me in the beginning,
    212 as the azimuth is an horizontal angle and elevation is a vertical one.  It
    213 drove me nuts.  It turns out it refers to both polarisations of the signal that
    214 those devices create ... Once you understand that is easy, they are just the
    215 same measurement but times 2, one for each polarisation.
    217 Once I knew  how much of an angle I had before starting to loose signal, and
    218 with a bit of the good old trigonometry, I knew my margin of error when
    219 pointing the antennas to each other.
    221 I did this standing behind the antenna and looking as if my line of sight was
    222 the beam.  With some fiddling, that should be enough for the horizontal
    223 alignment.  For the vertical one, it was easier, because the error margin is
    224 pretty big compared to the distance to the ground, even if you're on a tall
    225 building (again, trigonometry, that angle at 5km is some meters ...).  Anyway
    226 with the help of some online tool I could calculate that easily to make it as
    227 precise as possible (search for "antenna downtilt calculator" on your favourite
    228 search engine).
    230 ### Network diagram and configuration
    232 With the antennas installed, it was time for some configuration.
    234 This is a basic diagram of the network setup I came up with:
    236 ```
    238                                                                 +--------+
    239                                                                 | Bro.   |
    240                     | Router |
    241                             +---------+         +----------+    +--------+
    242                             | Antenna |         | Antenna  |   /
    243                         ----| AP1     |+++++++++| ST1      |---           
    244 ---/    +---------+         +----------+              
    245               +---------+                                                 
    246 +---------+  -|  ISP    |                                                 
    247 |Internet |-/ |  Router |                                                 
    248 +---------+   +---------+                                                 
    249                  |  --\     +---------+         +-----------+             
    250                   \    --\  | Antenna |         |  Antenna  |             
    251                    \      --| AP2     |+++++++++|  ST2      |-\           
    252                    |        +---------+         +-----------+  -\
    253                     \ +---------+
    254              +------------+                                     | Sis.    |
    255              | Rpi        |                                     | Router  |
    256              | Monitoring |                                     +---------+
    257              +------------+                          
    259 ```
    261 All are cable connections but the `++++` ones, which are the 5km links.
    263 On the routers/APs at the end of the chain I used the same network segment for
    264 both, as hey will be isolated and do NAT.  I did this because I have little
    265 control over the ISP router.  It is _"reset to defaults"_ from time to time and
    266 that caused me problems before.  So setting static routes would be a pain to
    267 maintain.  That produces double NAT on my siblings', but that's a small price
    268 to pay for having a stable setup.
    270 Yes, I know that's a shitty thing to do for an ISP (they break your dhcp
    271 reservations and port forwarding too ...), but most of the ISPs where I live
    272 are the biggest idiots and do the dumbest stuff you can imagine, so that's not
    273 even something for them.
    275 The PowerBeams are configurable via a web interface that is pretty intuitive.
    276 They can also be configured via an SSH access and editing a text file + some
    277 commands.
    279 Some things I did:
    281 * Enable WDS (transparent bridge mode), so I could see the MAC addresses of all
    282   the chain from my monitoring station.  That helps on debugging if something
    283   network goes wrong.
    284 * I enabled SNMP for monitoring, SSH server for access (with public keys) and
    285   NTP so the antennas have the right time (good for logs).
    286 * All 4 antennas were set up on bridge mode.
    287 * The ones connected to the ISP router were set up as "Access Point" and the
    288   other 2 as "Stations"
    289 * The antenna startup wizard asks you for country location.  That's because
    290   they apply the necessary regulation restrictions automatically.  Do not cheat
    291   here, you can have problems with your local authorities.  Besides, if you do
    292   not have good signal within the power output regulations chances are you're
    293   doing something wrong or the conditions of line of sight, etc. are not really
    294   good, so it won't matter and you'll be breaking the law for nothing (and
    295   probably causing problems to other antennas and installations).
    297 If you prefer the command line to configure the antennas, log into them via SSH
    298 and edit the file `/tmp/system.cfg`.  Then save to `NVRAM` with the command
    299 `cfgmtd -w`.  Then reset with `/usr/etc/rc.d/rc.softrestart force`.
    301 I do not recommend that method at the beginning, until you get familiar with
    302 all the options and configurations possible.  You can make a pretty big mess.
    304 As I said earlier, those antennas have a sort of spectrum analyser you can use
    305 to determine which channel is less busy.  It uses some java applet (yes, I know
    306 ...) and it has been broken in 2 occasions on some firmware updates.  But it
    307 can be of assistance if your spectrum is really busy.
    310 ### Performance tests
    312 There are 2 ways to easily test the throughput of the links.  The web interface
    313 has a "speed test" built in.  You have to put the credentials of the other end
    314 and it can test TX, RX or both.
    316 The other way (that I like the most) is `iperf(1)`.  The antennas have installed
    317 a basic implementation of that tool, so log into the antenna on the other end,
    318 and use `iperf(1)` either as server or client to test both sides of the
    319 communication.
    321 Play a bit with the channel width.  More channel width allows for faster
    322 transfer rates, but a narrow channel increases stability.
    324 I ended up using `20 MHz` for one of the links and `10 MHz` for the other.
    325 That last one is the one with less than ideal LOS situation.  In the end
    326 reducing the channel width and choosing the least busy channel did the trick
    327 and I could get a stable link.
    329 In the end for the first link I get around `32Mbps` symmetrical.  The second
    330 link is a lot more variable depending on the conditions and the interferences
    331 from other stations.  I get up to `17Mbps` symmetrical, and is usually more
    332 than `12Mbps`, but on worst case scenario it can get as low as `6Mbps`.  Which
    333 is still enough to watch online videos at `1080p` with today's compressions and
    334 is more than enough to do any kind of browsing, email and whatever ...  so
    335 I guess is enough.
    337 ### Monitoring and management
    339 For various reasons I wanted to monitor the whole thing.  My brother had some
    340 network outages and I did not know why (I'm pretty sure they are related to
    341 some firmware bug introduced on a recent update, but I have no proof).
    343 My idea for this was to put a Raspberry PI on my parent's network that I could
    344 connect to and install all the necessary software for monitoring.
    346 As I said earlier, I have little control over the ISP router.  Also, I did not
    347 want to setup a VPN at my house or something similar on a VPS ...  So I ended
    348 up using [Zerotier][6] to create a _"local network"_ between one of my hosts at
    349 my home office and the PI at my parent's.  This software creates an interface
    350 on the device with a private range, just like a VPN.  The main difference in
    351 this case is that the _server_ part is managed (you can host it yourself too)
    352 and it uses some clever tricks to find the best path between to endpoints so
    353 latency is always the least possible.  It falls back to relay servers if none
    354 of the direct strategies work.  Besides, is quite easy to add or remove devices
    355 to/from a given virtual network.
    357 They have some [documentation][10] to make this process easy.
    359 Having the monitoring PI on a local network segment, I could now use it as
    360 a jump box to ssh into the antennas and routers (using `ProxyJump`), making
    361 management easier.
    363 In the end I decided to have some data collection and graphing and, after some
    364 consideration, I choose [influxdb][7] + [telegraf][8] + [grafana][9].  That gives
    365 me also alerts (more on that later).
    367 InfluxDB for the database backend, telegraf as the _"agent collector"_ and
    368 grafana for graphing tool.
    370 I choose influxdb because is really [easy to setup][11] on the PI.  Check that
    371 the retention is enabled so you do not fill up the little SD card on the PI.
    372 Is also quite easy to [set up telegraf][12] and [grafana][13].
    374 With that running I set up the InfluxDB data source on Grafana.  I used the
    375 database named _"telegraf"_, which was automatically created by the telegraf
    376 process as soon as it started collecting data.
    378 Then I configured telegraf to get snmp data from the "Access point" antennas
    379 and also from the routers at my siblings'.
    381 To do this I had to add a file to the configuration folder
    382 (something `/etc/telegraf/telegraf.d/snmp.conf`) with the snmp config
    383 parameters:
    385 ```
    386 [[inputs.snmp]]
    387   agents = [ "", "", "", "" ]
    388   version = 1
    389   community = "mycommunity"
    390   interval = "60s"
    391   timeout = "10s"
    392   retries = 3
    394   [[inputs.snmp.field]]
    395     name = "hostname"
    396     oid = "RFC1213-MIB::sysName.0"
    397     is_tag = true
    399   [[inputs.snmp.field]]
    400     name = "uptime"
    401     oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance"
    403   # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
    404   [[inputs.snmp.table]]
    405     name = "interface"
    406     inherit_tags = [ "hostname" ]
    407     oid = "IF-MIB::ifTable"
    409     # Interface tag - used to identify interface in metrics database
    410     [[inputs.snmp.table.field]]
    411       name = "ifDescr"
    412       oid = "IF-MIB::ifDescr"
    413       is_tag = true
    414 ```
    416 The info that comes from this is basically network traffic for all interfaces
    417 and uptime.
    419 I also set up telegraf to collect pings to the remote routers.  That gives me
    420 info about the health of the link, and I based some alerts on that.
    422 The needed config was:
    424 ```
    425 [[inputs.ping]]
    426   ## List of urls to ping
    427   urls = ["", ""]
    429   ## Number of pings to send per collection (ping -c <COUNT>)
    430   count = 3
    431   ## Per-ping timeout, in s. 0 == no timeout (ping -W <TIMEOUT>)
    432   timeout = 1.0
    433 ```
    435 And finally, I wanted to have some info the devices provide, but only through
    436 some internal commands.  For instance, the number of connected devices.
    438 There are 2 commands that run on those devices that provide some internal
    439 information (like signal strength, connected devices, and much more).  They are
    440 `mca-status` and `wstalist`.
    442 It turns out telegraf can execute commands and store that as metrics data, no
    443 problem.  The configuration looks like this:
    445 ```
    446 [[inputs.exec]]
    447   ## Commands array
    448   commands = [ "/usr/local/bin/get_connected_devices.sh router1" ]
    449   interval = "300s"
    451   name_override = "conn_devices"
    452   tag_keys = [ "hostname" ]
    453   timeout = "5s"
    454   data_format = "json"
    455 ```
    457 The script is this:
    459 ```
    460 #!/bin/sh
    462 set -eu
    464 device=${1:-router1}
    465 device_info=$(ssh "ubnt@$device" mca-status | tr -d "\r")
    466 connected_devices=$(echo "$device_info" |grep wlanConnections| cut -d'=' -f 2)
    468 printf '{"hostname": "%s", "devices": %d }' "$device" "$connected_devices"
    469 ```
    471 It outputs some JSON that telegraf understands.
    473 After this it was just a matter of setting up some grafana dashboards to see
    474 what I wanted to see.  I think there is enough information on the internet on
    475 how to do that, so I won't be explaining it here.
    477 As I mentioned my brother was having some outages that I still cannot explain.
    478 They are fixed rebooting the "access point" part of the link (I'm pretty sure
    479 they would go away simply kicking out the client, but I could not be bothered
    480 in looking how to do that programatically ...).
    482 So I thought on automating the reboot process as a mitigation for the
    483 inconveniences it produces.  I set up an alert on grafana for the ping metric
    484 that, when it triggers calls a webhook.
    486 I did it that way because I wanted to be notified and also automatically take
    487 action based on those alerts.  The setup I came up with may seem a bit
    488 complicated, but it works with simple tools and it has been on service for some
    489 months now.
    491 For the webhook, I found [this][14], which is meant to be a sort of gateway
    492 from webhook to XMPP.  It only accepts grafana calls but it can be adapted
    493 pretty easily.
    495 I did [some modifications][15] to not only send an xmpp message, but also to write
    496 a flag file on disk on a specified folder if it gets an alert with a specific
    497 string on it.  Then, there's a cron job running that checks for those flags
    498 and, if it finds any, executes the script of the same name and deletes the flag
    499 on success.  All pretty simple to do with shell script.
    501 On the ping alert case, the shell scripts just connect to the "access point"
    502 antenna and perform a `reboot(8)`.
    504 With that done, outages do not last more than 5 minutes, and they are pretty
    505 rare anyway.  So I think is a good solution until the day I take the time to
    506 dig into it (if I ever do it ...).
    508 I also created a custom handler with super simple payload, so I could use it
    509 from other scripts (not necessarily from this project) to just be notified via
    510 xmpp.
    512 ## Conclusion
    514 And that's the whole setup.  Without using anything too complicated or
    515 expensive I could connect those isolated flats, have some insight on what
    516 happens on the network, have alerts on the most interesting metrics and even
    517 automate responses if I need to.
    519 I hope this may serve as a source of ideas for similar projects.
    521 [1]: https://en.wikipedia.org/wiki/Point-to-point_(telecommunications)
    522 [2]: https://en.wikipedia.org/wiki/Fresnel_zone
    523 [3]: https://www.ui.com/airmax/powerbeam/
    524 [4]: https://en.wikipedia.org/wiki/Power_over_Ethernet
    525 [5]: https://www.konigelectronic.com/computer/networking/network-cable-reel-cat5e-futp-100-m-black-solid-55896639
    526 [6]: https://www.zerotier.com/
    527 [7]: https://www.influxdata.com/time-series-platform/
    528 [8]: https://www.influxdata.com/time-series-platform/telegraf/
    529 [9]: https://grafana.com/
    530 [10]: https://zerotier.atlassian.net/wiki/spaces/SD/pages/8454145/Getting+Started+with+ZeroTier
    531 [11]: https://docs.influxdata.com/influxdb/v1.7/introduction/installation/
    532 [12]: https://docs.influxdata.com/telegraf/v1.11/introduction/installation/
    533 [13]: https://grafana.com/docs/installation/debian/
    534 [14]: https://github.com/opthomas-prime/xmpp-webhook/
    535 [15]: https://git.e1e0.net/xmpp-webhook/log.html