e1e0.net

sources for e1e0 website
git clone https://git.e1e0.net/e1e0.net.git
Log | Files | Refs

long-wireless-links-and-monitoring.md (24255B)


      1 Title: Long Wireless links and monitoring.
      2 Author: paco
      3 Date: 2019-07-31
      4 Type: article
      5 
      6 _Update 2021-05-29: This setup is quite outdated now.  One of the endpoints
      7 does not exist anymore.  Also, I replaced Zerotier by Wireguard, and the
      8 monitoring part changed quite a bit too.  All the research, materials, and
      9 build sections may be still useful so I left it all here.  I may revisit this
     10 document in the future and update it_
     11 
     12 ## Intro
     13 
     14 Some time ago I built 2 [P-t-P][1] links between some family members' buildings.
     15 
     16 Thing is that my brother and my sister live in an area with no coverage from
     17 traditional ISPs, but that is quite close (5.5km on a straight line, with no
     18 obstacles) to my parent's which have good coverage (even FTTH) and plenty of
     19 providers to choose from.
     20 
     21 This project has grown _organically_ so to speak, and the requisites kept
     22 changing.
     23 
     24 That, and my lack of experience on the subject make all this far from an
     25 optimal solution.
     26 
     27 In the end it has been working for almost 3 years now.  This is an attempt to
     28 document all the infrastructure and the bits and pieces used so I do not forget
     29 about them and maybe it can be of use to somebody else.
     30 
     31 ## First steps and research
     32 
     33 As I said, I knew nothing about this before tackling the project.  I have some
     34 solid knowledge about networking, but I knew little about long (for me)
     35 wireless links, antennas, propagation and a bunch of other stuff I never heard
     36 of.  So I had to do some research.
     37 
     38 If you want to do something like this, is better to plan ahead.  See what the
     39 requisites are and start digging.
     40 
     41 Some things to take into consideration are:
     42 
     43 * Budget.  This is an important one in this scenario, as this is for personal
     44     use only.
     45 * Distance between the endpoints of the link.  Modern hardware (more on my
     46     choice later), can easily cover 10km or maybe more, but read the
     47     manufacturer's datasheet and look for output power, antenna gain and
     48     sensitivity.  And always take their numbers with a grain of salt, as they
     49     are usually tested on ideal conditions you won't encounter.  You'll find
     50     later a way to calculate the ideal numbers to have an estimate.
     51 * Obstacles.  There has to be perfect clear vision between endpoints.  Wireless
     52     communications, especially WiFi either on 2.4GHz or 5GHz, are very
     53     sensitive to obstacles.  Even partial cover can have a big impact on link
     54     quality.  And clear vision does not mean _"I can see a single point in the
     55     distance"_, there's this thing called [Fresnel zone][2], under some
     56     atmospheric conditions or spectrum saturation it will give you a lot of
     57     trouble.
     58 * Materials.  Don't be cheap.  This will have to resists the outdoor conditions
     59     for as long as possible.
     60 * Neighbours and regulations.  There's the legal part (RF regulations in your
     61     country and things like that) and the _"social"_ part, in this case my
     62     family does not live in detached houses but on apartments, so that has to
     63     be taken into consideration if there are any rules about this.
     64 * Infrastructure.  And by that I mean all the necessary to be able to install
     65     the antennas, route the cables, install connectors, etc.  I'm not only
     66     talking about tools, but also access to the best spots to put the antennas,
     67     etc.
     68 * Antenna location.  As a rule of thumb, the higher the better.  But this
     69     depends a lot on your particular situation.  It deserves some thought.
     70 * Spectrum saturation.  Wifi is ubiquitous now.  That may be a challenge for
     71     any installation specially on urban areas.  Ideally, you should check how
     72     _crowded_ the spectrum is, but this is usually pretty difficult for
     73     amateurs without special equipment.  Some antennas have a built in spectrum
     74     analyser, but it may perform badly.
     75 
     76 ## Materials
     77 
     78 This is a list of materials I choose and why I choose them.  It is short, as it
     79 is really an easy installation.
     80 
     81 ### Antennas
     82 
     83 I ended up using [Ubiquity PowerBeams][3] to create the 2 links.  Four in
     84 total, 2 for each link.
     85 
     86 I was looking for some reputable manufacturer trying to avoid problems in the
     87 future.  Also, I wanted something as simple as possible.  This kind of antennas
     88 have the _"emitter/receiver"_ and the antenna all in the same device.  So no
     89 special connectors to be crimped, virtually no losses on cables, just an easy
     90 [PoE][4] setup from the house to the rooftop.
     91 
     92 Also, this antenna has an easy to setup web interface _and_ an SSH server that
     93 leaves you in a busybox with some proprietary commands that are pretty handy
     94 for automation and data collection.
     95 
     96 There are newer models now and other manufacturers.  Do your research, read on
     97 forums and all the usual stuff.  I can say those work for this setup with minor
     98 issues.
     99 
    100 If you know something about this subject you may be wondering why I did not use
    101 something with a wider angle on the _"access point"_ side and use just 3
    102 antennas instead of 4.  Truth is, I tried, but I had some problems with the 2nd
    103 link giving poor performance.  Not being an expert on this I can only guess
    104 that the partial obstruction on the LOS (line of sight) path for the second
    105 link was the cause of the poor performance, specially on bad weather days (WiFi
    106 is pretty sensitive to heavy rain) and episodes of spectrum saturation.
    107 
    108 Creating a separate link with a dedicated pair of antennas improved the
    109 situation a lot.
    110 
    111 ### Cables
    112 
    113 As the antennas only need a network connection, we only need Ethernet cable.
    114 Be sure that is CAT5e or better.
    115 
    116 Always use cable rated for outdoor use.  Regular network cable will not last
    117 long exposed to rain and the sun's UV.  I went for [this one][5] because it was
    118 available at the time on Amazon.
    119 
    120 ### Connectors
    121 
    122 Don't go extra cheap on this, but anything with reasonable quality will do
    123 here.  The antennas are built in a way that the connectors are never exposed,
    124 so this part is not that critical.
    125 
    126 ### Antenna pole and other hardware
    127 
    128 I cannot say much about this.  What to buy here depends a lot on your
    129 particular setup.  Remember that the higher the better for the antennas, and
    130 remember wind is a thing ... you do not want it to fly away like a plastic bag.
    131 
    132 ## Build steps
    133 
    134 This is a list of the build steps I took.  I started checking the list
    135 mentioned on the [First steps](#First steps and research) section.
    136 Specifically the location of the antennas and the clear line of sight.
    137 
    138 I have to admit that I did a sloppy job on the second link, because I did not
    139 know about the [Fresnel zone][2] back then, but there's some things you can do
    140 to mitigate its effects.
    141 
    142 ### Calculate signal strength
    143 
    144 There's a simple way to calculate the signal strength you should see on the
    145 other side of the link (on ideal conditions).  This can be taken as a reference
    146 to see if the setup is viable and what conditions and speed negotiation you can
    147 expect between the 2 endpoints of the link.
    148 
    149 The simplified formula to calculate the signal is:
    150 
    151 ```
    152 emitterPower + emitterGain - signalLoss + receiverGain
    153 ```
    154 
    155 I say this is the simplified formula, because it does not take into account
    156 loses on cables and connectors, that's because I choose to use a _"all in one
    157 packet"_ type of antenna, so that makes no sense in this case.  This is a huge
    158 advantage for a beginner.  Also, because I only take into account the free
    159 space loss and not any other kinds of loss, that would be a lot more difficult
    160 to calculate.  That was sufficient for me anyway, as the conditions of line of
    161 sight are pretty good.
    162 
    163 To calculate signal loss, this is the formula:
    164 
    165 ```
    166 loss = 20*log((4*π*d)/λ)
    167 ```
    168 
    169 Being `d` the distance between the 2 endpoints in meters and `λ` the
    170 wavelength, also in meters.  If you do not remember how to calculate the
    171 wavelength from the frequency is just:
    172 
    173 ```
    174 λ = C/f
    175 ```
    176 
    177 Being `C` the speed of light in meters per second and `f` the frequency in
    178 Hertz.
    179 
    180 So, as an example, let's say I choose channel `137` which is `5685 MHz`, and
    181 the 2 endpoints are 5.2km apart.  That gives us a signal loss of `121.85 dB`.
    182 
    183 According to the antenna datasheet the transmission power is `5 dBm`, the gain
    184 of the antenna is `25 dBi` (that's on average I guess across the whole range of
    185 channels).  So putting all that together I should get on the other end `-66.86
    186 dBm`.  This works both ways in this case, so now we have to check sensitivity.
    187 Again according to the datasheet, there's no problem in any modulation
    188 negotiation with this kind of signal strength (in theory, so to be on the safe
    189 side add at least `-3 dB` to your results).
    190 
    191 ### Physical setup and alignment
    192 
    193 With the theory calculations out of the way, knowing that was possible, the fun
    194 part started, I had to get on the roof and install the antennas.
    195 
    196 Of course I won't be saying much about this, as this is different for every
    197 single installation.  Suffice to say, I had a _"pretty fun time"_ up on ladders
    198 and climbing to places not meant to be climbed ...
    199 
    200 Before securing the antenna to the pole in its final position it has to be
    201 aligned.  I did this the best I could given the lack of specialised equipment.
    202 
    203 On the datasheet there are radiation plots for the chosen model.  The principle
    204 is simple, those are 2D representations of the radiation lobes of the antenna,
    205 and the loss referred to the total gain.  So basically you want to point them
    206 to one another as perfectly as possible, specially for parabolic antennas,
    207 which have a very narrow beam.
    208 
    209 Those radiation plots confused me at first as, in case of the PowerBeam there
    210 are 4 of them "Vertical Azimuth", "Vertical Elevation", "Horizontal Azimuth" and
    211 "Horizontal Elevation".  This did not make any sense for me in the beginning,
    212 as the azimuth is an horizontal angle and elevation is a vertical one.  It
    213 drove me nuts.  It turns out it refers to both polarisations of the signal that
    214 those devices create ... Once you understand that is easy, they are just the
    215 same measurement but times 2, one for each polarisation.
    216 
    217 Once I knew  how much of an angle I had before starting to loose signal, and
    218 with a bit of the good old trigonometry, I knew my margin of error when
    219 pointing the antennas to each other.
    220 
    221 I did this standing behind the antenna and looking as if my line of sight was
    222 the beam.  With some fiddling, that should be enough for the horizontal
    223 alignment.  For the vertical one, it was easier, because the error margin is
    224 pretty big compared to the distance to the ground, even if you're on a tall
    225 building (again, trigonometry, that angle at 5km is some meters ...).  Anyway
    226 with the help of some online tool I could calculate that easily to make it as
    227 precise as possible (search for "antenna downtilt calculator" on your favourite
    228 search engine).
    229 
    230 ### Network diagram and configuration
    231 
    232 With the antennas installed, it was time for some configuration.
    233 
    234 This is a basic diagram of the network setup I came up with:
    235 
    236 ```
    237                                                                192.168.1.6/24
    238                                                                 +--------+
    239                                                                 | Bro.   |
    240                             192.168.1.2/24      192.168.1.4/24  | Router |
    241                             +---------+         +----------+    +--------+
    242                             | Antenna |         | Antenna  |   / 192.168.10.1/24
    243                         ----| AP1     |+++++++++| ST1      |---           
    244      192.168.1.1/24 ---/    +---------+         +----------+              
    245               +---------+                                                 
    246 +---------+  -|  ISP    |                                                 
    247 |Internet |-/ |  Router |                                                 
    248 +---------+   +---------+                                                 
    249                  |  --\     +---------+         +-----------+             
    250                   \    --\  | Antenna |         |  Antenna  |             
    251                    \      --| AP2     |+++++++++|  ST2      |-\           
    252                    |        +---------+         +-----------+  -\ 192.168.1.7/24
    253                     \        192.168.1.3/24      192.168.1.5/24 +---------+
    254              +------------+                                     | Sis.    |
    255              | Rpi        |                                     | Router  |
    256              | Monitoring |                                     +---------+
    257              +------------+                                    192.168.10.1/24
    258              192.168.1.10/24
    259 ```
    260 
    261 All are cable connections but the `++++` ones, which are the 5km links.
    262 
    263 On the routers/APs at the end of the chain I used the same network segment for
    264 both, as hey will be isolated and do NAT.  I did this because I have little
    265 control over the ISP router.  It is _"reset to defaults"_ from time to time and
    266 that caused me problems before.  So setting static routes would be a pain to
    267 maintain.  That produces double NAT on my siblings', but that's a small price
    268 to pay for having a stable setup.
    269 
    270 Yes, I know that's a shitty thing to do for an ISP (they break your dhcp
    271 reservations and port forwarding too ...), but most of the ISPs where I live
    272 are the biggest idiots and do the dumbest stuff you can imagine, so that's not
    273 even something for them.
    274 
    275 The PowerBeams are configurable via a web interface that is pretty intuitive.
    276 They can also be configured via an SSH access and editing a text file + some
    277 commands.
    278 
    279 Some things I did:
    280 
    281 * Enable WDS (transparent bridge mode), so I could see the MAC addresses of all
    282   the chain from my monitoring station.  That helps on debugging if something
    283   network goes wrong.
    284 * I enabled SNMP for monitoring, SSH server for access (with public keys) and
    285   NTP so the antennas have the right time (good for logs).
    286 * All 4 antennas were set up on bridge mode.
    287 * The ones connected to the ISP router were set up as "Access Point" and the
    288   other 2 as "Stations"
    289 * The antenna startup wizard asks you for country location.  That's because
    290   they apply the necessary regulation restrictions automatically.  Do not cheat
    291   here, you can have problems with your local authorities.  Besides, if you do
    292   not have good signal within the power output regulations chances are you're
    293   doing something wrong or the conditions of line of sight, etc. are not really
    294   good, so it won't matter and you'll be breaking the law for nothing (and
    295   probably causing problems to other antennas and installations).
    296 
    297 If you prefer the command line to configure the antennas, log into them via SSH
    298 and edit the file `/tmp/system.cfg`.  Then save to `NVRAM` with the command
    299 `cfgmtd -w`.  Then reset with `/usr/etc/rc.d/rc.softrestart force`.
    300 
    301 I do not recommend that method at the beginning, until you get familiar with
    302 all the options and configurations possible.  You can make a pretty big mess.
    303 
    304 As I said earlier, those antennas have a sort of spectrum analyser you can use
    305 to determine which channel is less busy.  It uses some java applet (yes, I know
    306 ...) and it has been broken in 2 occasions on some firmware updates.  But it
    307 can be of assistance if your spectrum is really busy.
    308 
    309 
    310 ### Performance tests
    311 
    312 There are 2 ways to easily test the throughput of the links.  The web interface
    313 has a "speed test" built in.  You have to put the credentials of the other end
    314 and it can test TX, RX or both.
    315 
    316 The other way (that I like the most) is `iperf(1)`.  The antennas have installed
    317 a basic implementation of that tool, so log into the antenna on the other end,
    318 and use `iperf(1)` either as server or client to test both sides of the
    319 communication.
    320 
    321 Play a bit with the channel width.  More channel width allows for faster
    322 transfer rates, but a narrow channel increases stability.
    323 
    324 I ended up using `20 MHz` for one of the links and `10 MHz` for the other.
    325 That last one is the one with less than ideal LOS situation.  In the end
    326 reducing the channel width and choosing the least busy channel did the trick
    327 and I could get a stable link.
    328 
    329 In the end for the first link I get around `32Mbps` symmetrical.  The second
    330 link is a lot more variable depending on the conditions and the interferences
    331 from other stations.  I get up to `17Mbps` symmetrical, and is usually more
    332 than `12Mbps`, but on worst case scenario it can get as low as `6Mbps`.  Which
    333 is still enough to watch online videos at `1080p` with today's compressions and
    334 is more than enough to do any kind of browsing, email and whatever ...  so
    335 I guess is enough.
    336 
    337 ### Monitoring and management
    338 
    339 For various reasons I wanted to monitor the whole thing.  My brother had some
    340 network outages and I did not know why (I'm pretty sure they are related to
    341 some firmware bug introduced on a recent update, but I have no proof).
    342 
    343 My idea for this was to put a Raspberry PI on my parent's network that I could
    344 connect to and install all the necessary software for monitoring.
    345 
    346 As I said earlier, I have little control over the ISP router.  Also, I did not
    347 want to setup a VPN at my house or something similar on a VPS ...  So I ended
    348 up using [Zerotier][6] to create a _"local network"_ between one of my hosts at
    349 my home office and the PI at my parent's.  This software creates an interface
    350 on the device with a private range, just like a VPN.  The main difference in
    351 this case is that the _server_ part is managed (you can host it yourself too)
    352 and it uses some clever tricks to find the best path between to endpoints so
    353 latency is always the least possible.  It falls back to relay servers if none
    354 of the direct strategies work.  Besides, is quite easy to add or remove devices
    355 to/from a given virtual network.
    356 
    357 They have some [documentation][10] to make this process easy.
    358 
    359 Having the monitoring PI on a local network segment, I could now use it as
    360 a jump box to ssh into the antennas and routers (using `ProxyJump`), making
    361 management easier.
    362 
    363 In the end I decided to have some data collection and graphing and, after some
    364 consideration, I choose [influxdb][7] + [telegraf][8] + [grafana][9].  That gives
    365 me also alerts (more on that later).
    366 
    367 InfluxDB for the database backend, telegraf as the _"agent collector"_ and
    368 grafana for graphing tool.
    369 
    370 I choose influxdb because is really [easy to setup][11] on the PI.  Check that
    371 the retention is enabled so you do not fill up the little SD card on the PI.
    372 Is also quite easy to [set up telegraf][12] and [grafana][13].
    373 
    374 With that running I set up the InfluxDB data source on Grafana.  I used the
    375 database named _"telegraf"_, which was automatically created by the telegraf
    376 process as soon as it started collecting data.
    377 
    378 Then I configured telegraf to get snmp data from the "Access point" antennas
    379 and also from the routers at my siblings'.
    380 
    381 To do this I had to add a file to the configuration folder
    382 (something `/etc/telegraf/telegraf.d/snmp.conf`) with the snmp config
    383 parameters:
    384 
    385 ```
    386 [[inputs.snmp]]
    387   agents = [ "192.168.1.2", "192.168.1.3", "192.168.1.6", "192.168.1.7" ]
    388   version = 1
    389   community = "mycommunity"
    390   interval = "60s"
    391   timeout = "10s"
    392   retries = 3
    393 
    394   [[inputs.snmp.field]]
    395     name = "hostname"
    396     oid = "RFC1213-MIB::sysName.0"
    397     is_tag = true
    398 
    399   [[inputs.snmp.field]]
    400     name = "uptime"
    401     oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance"
    402 
    403   # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
    404   [[inputs.snmp.table]]
    405     name = "interface"
    406     inherit_tags = [ "hostname" ]
    407     oid = "IF-MIB::ifTable"
    408 
    409     # Interface tag - used to identify interface in metrics database
    410     [[inputs.snmp.table.field]]
    411       name = "ifDescr"
    412       oid = "IF-MIB::ifDescr"
    413       is_tag = true
    414 ```
    415 
    416 The info that comes from this is basically network traffic for all interfaces
    417 and uptime.
    418 
    419 I also set up telegraf to collect pings to the remote routers.  That gives me
    420 info about the health of the link, and I based some alerts on that.
    421 
    422 The needed config was:
    423 
    424 ```
    425 [[inputs.ping]]
    426   ## List of urls to ping
    427   urls = ["192.168.1.6", "192.168.1.7"]
    428 
    429   ## Number of pings to send per collection (ping -c <COUNT>)
    430   count = 3
    431   ## Per-ping timeout, in s. 0 == no timeout (ping -W <TIMEOUT>)
    432   timeout = 1.0
    433 ```
    434 
    435 And finally, I wanted to have some info the devices provide, but only through
    436 some internal commands.  For instance, the number of connected devices.
    437 
    438 There are 2 commands that run on those devices that provide some internal
    439 information (like signal strength, connected devices, and much more).  They are
    440 `mca-status` and `wstalist`.
    441 
    442 It turns out telegraf can execute commands and store that as metrics data, no
    443 problem.  The configuration looks like this:
    444 
    445 ```
    446 [[inputs.exec]]
    447   ## Commands array
    448   commands = [ "/usr/local/bin/get_connected_devices.sh router1" ]
    449   interval = "300s"
    450 
    451   name_override = "conn_devices"
    452   tag_keys = [ "hostname" ]
    453   timeout = "5s"
    454   data_format = "json"
    455 ```
    456 
    457 The script is this:
    458 
    459 ```
    460 #!/bin/sh
    461 
    462 set -eu
    463 
    464 device=${1:-router1}
    465 device_info=$(ssh "ubnt@$device" mca-status | tr -d "\r")
    466 connected_devices=$(echo "$device_info" |grep wlanConnections| cut -d'=' -f 2)
    467 
    468 printf '{"hostname": "%s", "devices": %d }' "$device" "$connected_devices"
    469 ```
    470 
    471 It outputs some JSON that telegraf understands.
    472 
    473 After this it was just a matter of setting up some grafana dashboards to see
    474 what I wanted to see.  I think there is enough information on the internet on
    475 how to do that, so I won't be explaining it here.
    476 
    477 As I mentioned my brother was having some outages that I still cannot explain.
    478 They are fixed rebooting the "access point" part of the link (I'm pretty sure
    479 they would go away simply kicking out the client, but I could not be bothered
    480 in looking how to do that programatically ...).
    481 
    482 So I thought on automating the reboot process as a mitigation for the
    483 inconveniences it produces.  I set up an alert on grafana for the ping metric
    484 that, when it triggers calls a webhook.
    485 
    486 I did it that way because I wanted to be notified and also automatically take
    487 action based on those alerts.  The setup I came up with may seem a bit
    488 complicated, but it works with simple tools and it has been on service for some
    489 months now.
    490 
    491 For the webhook, I found [this][14], which is meant to be a sort of gateway
    492 from webhook to XMPP.  It only accepts grafana calls but it can be adapted
    493 pretty easily.
    494 
    495 I did [some modifications][15] to not only send an xmpp message, but also to write
    496 a flag file on disk on a specified folder if it gets an alert with a specific
    497 string on it.  Then, there's a cron job running that checks for those flags
    498 and, if it finds any, executes the script of the same name and deletes the flag
    499 on success.  All pretty simple to do with shell script.
    500 
    501 On the ping alert case, the shell scripts just connect to the "access point"
    502 antenna and perform a `reboot(8)`.
    503 
    504 With that done, outages do not last more than 5 minutes, and they are pretty
    505 rare anyway.  So I think is a good solution until the day I take the time to
    506 dig into it (if I ever do it ...).
    507 
    508 I also created a custom handler with super simple payload, so I could use it
    509 from other scripts (not necessarily from this project) to just be notified via
    510 xmpp.
    511 
    512 ## Conclusion
    513 
    514 And that's the whole setup.  Without using anything too complicated or
    515 expensive I could connect those isolated flats, have some insight on what
    516 happens on the network, have alerts on the most interesting metrics and even
    517 automate responses if I need to.
    518 
    519 I hope this may serve as a source of ideas for similar projects.
    520 
    521 [1]: https://en.wikipedia.org/wiki/Point-to-point_(telecommunications)
    522 [2]: https://en.wikipedia.org/wiki/Fresnel_zone
    523 [3]: https://www.ui.com/airmax/powerbeam/
    524 [4]: https://en.wikipedia.org/wiki/Power_over_Ethernet
    525 [5]: https://www.konigelectronic.com/computer/networking/network-cable-reel-cat5e-futp-100-m-black-solid-55896639
    526 [6]: https://www.zerotier.com/
    527 [7]: https://www.influxdata.com/time-series-platform/
    528 [8]: https://www.influxdata.com/time-series-platform/telegraf/
    529 [9]: https://grafana.com/
    530 [10]: https://zerotier.atlassian.net/wiki/spaces/SD/pages/8454145/Getting+Started+with+ZeroTier
    531 [11]: https://docs.influxdata.com/influxdb/v1.7/introduction/installation/
    532 [12]: https://docs.influxdata.com/telegraf/v1.11/introduction/installation/
    533 [13]: https://grafana.com/docs/installation/debian/
    534 [14]: https://github.com/opthomas-prime/xmpp-webhook/
    535 [15]: https://git.e1e0.net/xmpp-webhook/log.html