DJ5CW's Ham Radio Stuff

Contents

All kinds of things too precious not to share, in short articles. Constantly under construction or destruction. Note that code snippets are usually working minimum examples (often stripped of extra features). If you plan to use and of the code for anything serious, you may want to contact me. It's likely that I can provide you with some updates and good (?) advice. fabian@fkurz.net


Fabian KurzHam RadioStuff


RBN activity charts

This article is a work in progress - it will describe the technical side of the Reverse Beacon Network (RBN) activity charts available at https://rbn.telegraphy.de/activity/DJ1YFK including source code and the rationale behind the design of the system.

Scope of RBN activity charts

The goal of the RBN activity charts is to get a very quick overview of the activity of a callsign, as reported by the RBN. This includes all modes supported by the RBN receivers, currently this is mainly CW with some Skimmers also supporting RTTY, PSK31 and other digital modes.

A desired representation includes a heat-map style calendar view for at least one year back, where the reported RBN activity per day is mapped to a color intensity, very similar to the commit activity maps over at at github. Activity points are awarded for each hour of the day, and within each hour one point for each band on which the RBN reported a spot, plus the number of continents on which the station was heard during that hour. This gives a hopefully more meaningful picture of the activity than just counting the raw number of spots, which can (with a decent signal and suitable propagation) nowadays easily reach 50 for a single CQ.

Additionally, a band breakdown of the reported activity is desired, to get a quick idea of the activity focus of a certain callsign. To round it off, a graph showing the activity versus the hours of the day gives an interesting insight into typical operating hours, and with a little extra work, this chart can also include the bands on which the activity took place.

Finally, a summary active hours per day, total for the last year, and the number of hours in which the callsign was detected by a skimmer on each of the continents rounds off the information presented.

Examples

The finished "product" can be seen on the URL mentioned above. Just for illustration, you may find some of the following exemplary overviews interesting:

Raw data and how to process and store it efficiently

The raw aggregated data from the RBN is available at reversebeacon.net in CSV format, one file a day. After unzipping the data, typical file sizes range between about 10 MB (150k spots, week days) and over 150 MB (2.5M spots, contest weekend) for a single day, i. e. more than 1 GB per month, and consequently over 12 GB or something in the range of 50 million spots a year. This really is "big data".

The naive way to handle this kind of data would be to pay a lot of money for a high performance server or rent something at AWS, throw each line into a database with a fancy name, install all the frameworks that currently are trendy, and then for each query to generate an activity chart, search through millions of database rows and let the CPU melt away a bit. "Cloud developers" love this approach.

The smart way, however, is to process the data once, convert it into a suitable format in which the desired data is immediately available, and save it efficiently. That's the way I chose, because I love small and scalable solutions, much more than throwing money and ressources at problems until they disappear (which they often won't).

What do we really need?

For the type of activity report we want to generate, there's a simple data relationship, which is fundamentally different from the "table" view of a CSV file with all the spots in single rows: What we need is a key → value relationship between a callsign, and all data that is associated with this callsign.

Furthermore, in order to generate the kind of report described above, we can do away with a lot of redundancy that is contained in the RBN raw data. If we only like to know if DJ1YFK was heard by the RBN on 20m in the 23:00 UTC hour on January 1st, 2015, we don't need to save 50 spots (each with 50 or so bytes) for that, because it's only a few bits of actual information. It doesn't matter which precise skimmers received the call, the continents are sufficient. The same is true for the precise frequency, SNR, CW speed or any of the other information that's available in a fine grained form in the raw data. In the end, the whole information that is saved for one hour in which a callsign was active, is a bit field with 32 bits as tabulated below.

NB: It started out with 16 bits per hour, but later additional bands were added, so the bitfield was increased to 32 bits, which even leaves some room for future additions.

Bit0-891011121314151617 18 19 20 21 22 23 24 25 26 27 28 29 30 31
MeaningRFU2200m630m6m4m2m70cm23cnEU NA AS SA AF OC 10m 12m 15m 17m 20m 30m 40m 60m 80m 160m

Data representation

The next question was: How to save this bitfield in an efficient manner, and not only for one single hour, but for any hour and date. And it needs to be saved in a way that makes is very easy not only to retrieve this data, but also to update any particular hour or day without much overhead and moving data around.

After considering and rejecting formats like JSON, the best solution was infinitely simpler and faster: All data was stuffed into an array of bytes, where a certain starting point (in this case January 1st, 2009), was byte 0, and for each hour from that point four bytes were allocated. The appropriate bits in this array are then filled day by day with the new raw data.

The nice thing about this way of representing the data is that you can immediately access any hour without doing a lot of math (calculating the hours from a certain starting point is something that library functions do perfectly), and since all hours are pre-allocated, there is no shifting around when adding new data (possibly in a non-chronological way). A year of this data takes 365 * 24 * 4 = 35040 bytes.

With more than 500k callsigns appearing on the RBN within a year's frame, this is a lot of data, more than 16 GB. If you want to save two consecutive years, it's twice that, and with ten years, it's ten times that. Not practical! Fortunately, the data typically contains a lot of zeroes, simply because the activity of most radio amateurs is not spread evenly over the hours of the day, and most hams are not active every single day. This means you can compress the data very well, which - in my case simply with gzip - results in an average size of just above 200 bytes per callsign, a compression factor of about 160. The penalty in speed and additional CPU load introduced by gzip is a small price to pay for that.

Data storage

With the data representation solved, the next issue was how to store it. In a first version, this was done simply in the filesystem: The data for each callsign was saved in a single file - in order to avoid a single folder with half a million files, instead there were a number of folders with the first two letters of the MD5 hash of the call, over which the files were spread out.

This was very ineffective, with every file, no matter how small it is, taking 4 kB on the file system, making the best compression useless. It's also a nightmare to backup so many small files.

So a "real" database was needed. While this sounds like a perfect use case for one of the many key-value store databases, I took a short cut and used MySQL (because it was there already) with a very simple table that had two fields, the callsign (indexed) and a blob data field for the gzip compressed array described above. Without taking a benchmark against anything else this solution "felt" absolutely fine, with very short query times and a total database size of less than 300 MB, I stopped to look for anything better. No need to over-engineer things!

Now everything was in place, the last thing that was needed was the actual software filling the database with the daily batch of RBN data. With all design questions answered, this was written as a small Perl script that is started by a cron job every night. The source files are listed below.

TODO: Add a description of the RBN live/real stime statistics here.

Web frontend

All data is useless if you cannot present it in a way that is easy to read and at least to some degree pleasant looking. It's also important to keep the site size small, use JavaScript carefully and keep the number of external resources that are included low (or avoid them completely if somehow possible). This philosophy is called common sense and is not very fashionable in modern web design any more.

Loading a typical RBN activity chart (it depends a little on the actual activity represented in it) will take less than 25 kB of HTML. The first time you load a chart, an overhead of 1 kB CSS and regrettably about 200 kB of JavaScript are required. And 775 bytes of PNG images. All together you end up with about 75 kB of compressed data that has to be transferred over the wire, when the browser supports gzip compression (and practically all browsers do).

For the heatmap calender, against my habits, I used a third party JavaScript library, cal-heatmap, which itself uses D3.js. The graphs below the heatmap are homemade with 1999 style tables, but it works well enough, with various browsers and gives a decent user experience also on mobile devices. Everything except the heatmap data is produced by a small PHP script. The heatmap script fetches a JSON file generated by a separate PHP script with the data in a way cal-heatmap can understand. I like clean and "speaking" URLs (/activity/CALLSIGN instead of index.php?action=activity&call=CALLSIGN) which is easy to do with Apache's mod_rewrite.

Embeddable version

If you embed an RBN activity chart in an iframe on another website, this will be detected by a JavaScript and the layout is adjusted to optimize the site for embedding it on another website, for example into your profile on HamQTH.com or QRZ.com. The header size is reduced, buttons for band filtering are removed, and a small "banner" is added at the bottom.

Misc.

Performance

(Web interface render times: below 15ms per Request. Daily database update: ca. 2 minutes for an average day with 250k spots)

Statistics

...

Sources

The RBN activity charts are part of the offering at rbn.telegraphy.de, and the souces can be found in the git repository https://git.fkurz.net/dj1yfk/cwclubspotter

.

Added: 07-Jul-2016. Last modified: 27-Jul-2020.


Fabian KurzHam RadioStuff