Programm UKW-Tagung 2025 Software

Andreas Krüger, DJ3EI, 2025-09-03, Quelle.

🇬🇧 Post is also available in English.

Hier geht es darum, wie ich meine Sortierung des Programms der UKW-Tagung fabriziert habe.

Das grundsätzliche Vorgehen dabei ist, ein Skript zu schreiben, das die folgenden Dinge tut:

Die beiden HTML-Seiten von der UKW-Tagung herunterladen
und cachen. Ich werde mein Skript oft laufen lassen und möchte nicht jedes mal neu per HTTPS den Server belasten.
Das HTML parsen und die Termininformationen sowie die Abstracts extrahieren.
Nicht geplant, aber leider nötig: Bei 17 von 36 Punkten des Programms passen Terminplan und Abstracts nicht ganz zusammen. Das fängt mit schlichten Tippfehlern im Namen an und hört auf damit, dass “Workshops” keinen Abstract haben.
Dann die beiden Informationsquellen abmischen und das Ergebnis als Markdown-Datei in mein Blog kopieren.

Da ich so etwas öfters tue, habe ich es hier mal verhältnismäßig ordentlich aufgeschrieben. Ich plane, diese Seite als “Steinbruch” für spätere “Bauten” zu nutzen.

Benutzte Software

Ich nutze Python, zur Zeit Version 3.11.2, und habe in ein venv folgende Pakete mit pip installiert:

requests
beautifulsoup4

was einschließlich Abhängigkeiten am heutigen Tag bei mir zu folgender Installation führte:

beautifulsoup4==4.13.5
certifi==2025.8.3
charset-normalizer==3.4.3
idna==3.10
requests==2.32.5
soupsieve==2.8
typing_extensions==4.15.0
urllib3==2.5.0

Es ist nützlich, die BeautifulSoup Dokumentation griffbereit zu haben. Wer DOM kennt: Das ist eine Software, die für Python DOM-Baum-Funktionalität zur Verfügung stellt.

Politur

Es lief schon, aber vor der Veröffentlichung habe ich noch poliert, wie ich Pythonprogrammen mehr oder weniger gewohnheitsmäßig poliere.

Dazu brauche ich diese Software:

pip install isort flake8 black types-requests mypy

Und sorge dafür, dass die folgenden Befehle fehlerfrei durchlaufen:

isort sort_ukwtagung_programm
black -l 120 sort_ukwtagung_programm
flake8 --max-line-length=120 --ignore E203,E701,E704 --color=never sort_ukwtagung_programm
mypy --strict sort_ukwtagung_programm

Das --ignore-Zeug der flake8-Kommandozeile habe ich aus der black-Doku übernommen. Es schaltet drei Tests aus, die sonst flake8 dazu bringen würden, an bestimmte Formatierungen von black herum zu mäkeln.

Das Programm

Hier ist der Code meines Programms sort_ukwtagung_programm:

#!/usr/bin/env python

# This programm is in the public domain,
# so usable by anyone for any purpose without restrictions,
# see https://creativecommons.org/publicdomain/zero/1.0/ .

import os
import re
from dataclasses import dataclass
from os.path import isfile
from typing import cast

import requests
from bs4 import BeautifulSoup
from bs4.element import NavigableString, Tag

DEFAULT_CACHE_DIR = "cache"


def get_html_page(
    session: requests.Session, uri: str, cache_filename: str, cache_dir: str = DEFAULT_CACHE_DIR
) -> BeautifulSoup:
    """Download a HTML page from the internet and extract the DOM tree.

    This caches on  first access and never bothers to invalidate the cache,
    that is, assumes the cache is always fresh."""

    cache_fq_filename = f"{cache_dir}/{cache_filename}"
    if isfile(cache_fq_filename):
        # We downloaded the file earlier, so don't bother the HTTP-Server again:
        pass
    else:
        page_response = session.get(uri)
        page_response.raise_for_status()
        content_type = page_response.headers["content-type"].lower()
        if "text/html" in content_type:
            with open(cache_fq_filename, "w") as cache_file:
                cache_file.write(page_response.text)
        else:
            raise RuntimeError(f"Expected text/html, found {content_type} for {uri}")

    with open(cache_fq_filename, "r") as cache_file:
        return BeautifulSoup(cache_file, "html.parser")


@dataclass
class PlanItem:
    """Data class for items from the plan (timetable)."""

    time: str
    room: str
    op: str
    title: str


def grab_plan(session: requests.Session, uri: str, cache_filename: str) -> list[PlanItem]:
    """Grab plan from the internet and extract the plan."""
    plan_html = get_html_page(session, uri, cache_filename)
    table = plan_html.table
    if table is None:
        raise RuntimeError(f"No table in {uri} content")
    tbody = table.tbody
    if tbody is None:
        raise RuntimeError(f"No table.tbody in {uri} content")
    trs = tbody.find_all("tr")

    def parse_tr(tr: Tag) -> list[str]:
        """Split a <tr> into <td>s and grab the string content of each <td>."""
        tds = tr.find_all("td")
        # Consistency check, true for the table we want to parse:
        if len(tds) not in [5, 6]:
            raise RuntimeError(f"Expecting 5 or 6 columns in row {tr.prettify()}")
        result: list[str] = []
        for td in tds:
            # Remove all HTML markup and map every <td> content to a single-line string.
            result.append(" ".join(cast(Tag, td).strings).replace("\n", "").strip())
        return result

    # The table puts info that belongs on top of the table in the table header / <td>
    # and info that belongs in the table header in the initial row:
    initial_row = parse_tr(cast(Tag, trs[0]))

    result = []

    # Each logical table row is coded as two HTML table rows.
    for row_i in range(3, len(trs) - 2, 2):
        time_row = parse_tr(cast(Tag, trs[row_i]))
        # This row does not have the initial column.
        title_row = parse_tr(cast(Tag, trs[row_i + 1]))

        time = time_row[0]
        # Merge info from the initial row and the two data rows,
        # processing columnwise.
        for room, op, title in zip(initial_row[1:], time_row[1:], title_row):
            if op and title:
                # There is a lot of inconsistency between the table and the abstracts,
                # regarding op and title.
                # In cases where the abstracts info was considered better,
                # we replace that info here.
                if "Erich H. Franke" in op:
                    op = "Erich H. Franke, DK6II"
                    title = "Künstliche Intelligenz in der Elektronik-Entwicklung. Ernsthaftes Hilfsmittel oder Hype?"
                elif "Satelliten-Funk – quo vadis? 52 Jahre AMSAT DL E.V." in title:
                    title = "Satelliten-Funk – quo vadis? 52 Jahre AMSAT DL e.V."
                elif "Umweltsensordaten des Urban Weather Project im „Digitalen Zwilling“ der mrn" in title:
                    title = (
                        "Umweltsensordaten des Urban Weather Project im „Digitalen Zwilling“ "
                        "der Metropolregion Rhein-Neckar"
                    )
                elif "Hol mehr aus dem Si5351 heraus" in title:
                    op = "Pieter-Tjerk de Boer, PA3FWM"
                    title = "Hol mehr aus dem Si5351 heraus: höhere Frequenzauflösung, Messungen und Modulation"
                elif "Ein modularer Mehrkanal-VNA von 9 kHz bis (evt.) 26.5 GHz" in title:
                    op = "Paul Boven, PE1NUT"
                    title = "Ein modularer Mehrkanal-VNA von 9 kHz bis (evt.) 26.5 GHz: Erste Schritte"
                elif "DJ1NG" in op:
                    op = "Guido Liedtke, DJ1NG"
                    title = (
                        "Jedermannfunkgeräte für den Notfunk – welche u.U. auch als Amateurfunkgeräte interessant sind"
                    )
                elif "Paul Boven" in op:
                    op = "Wolfgang Herrmann, Paul Boven, PE1NUT"
                elif "Ein Streifzug durch die Geoinformatik für Funkamateure und Dxer" in title:
                    title = "Ein Streifzug durch die Geoinformatik für Funkamateure und DXer"
                result.append(PlanItem(time=time, room=room, op=op, title=title))

    return result


@dataclass
class Abstract:
    """Data class for items from the abstracts page."""

    op: str
    title: str
    abstract_lines: list[str]


def grab_abstracts(session: requests.Session, uri: str, cache_filename: str) -> list[Abstract]:
    """Grab abstracts from the internet and extract the individual abstracts."""

    def split_tag_in_lines(t: Tag) -> list[str]:
        """Helper that splits a tag's contents in individual lines."""
        result: list[str] = []
        for c in t.contents:
            if type(c) is Tag:
                for sub_line in split_tag_in_lines(c):
                    # Do this recursively.
                    # This hopes they didn't use <span> or <a> or similar inline stuff.
                    result.append(sub_line)
            elif type(c) is NavigableString:
                result.append(" ".join(c.strings))
            else:
                raise RuntimeError(f"Didn't expect type {type(c)} of {c}")
        return result

    abstracts_html = get_html_page(session, uri, cache_filename)

    # They put a two-digit number in front of the author that I want to remove:
    split_away_number = re.compile(r"\d\d\s+([^\s].+[^\s])\s*")

    result: list[Abstract] = []
    for h4_raw in abstracts_html.find_all("h4"):
        h4: Tag = cast(Tag, h4_raw)
        abstract_lines: list[str] = []
        num_and_op, _br, title = h4.contents
        if type(num_and_op) is NavigableString:
            num_and_op_s = num_and_op.string
        elif type(num_and_op) is Tag:
            num_and_op_maybe_s = num_and_op.string
            if num_and_op_maybe_s is None:
                raise RuntimeError(f"num_and_op not found in {h4.prettify()}")
            else:
                num_and_op_s = num_and_op_maybe_s
        else:
            raise SystemError(f"Unexpected type {type(num_and_op)} of {num_and_op}")
        if num_and_op_mo := split_away_number.fullmatch(num_and_op_s):
            op = num_and_op_mo.group(1)
        else:
            raise SystemError(f"Could not parse {num_and_op_s}")
        # Now harvest the lines of the abstract.
        # This is simply all the stuff that follows, until the next h4
        # or the end; all abstracts are in a <div>.
        sib = h4.next_sibling
        while True:
            while sib is not None and type(sib) is not Tag:
                if type(sib) is NavigableString:
                    if "" == str(sib).strip():
                        pass
                    else:
                        abstract_lines.append(str(sib))
                sib = sib.next_sibling
            if sib is None or "h4" == sib.name:
                break
            else:
                # <p> or something
                for line in split_tag_in_lines(sib):
                    abstract_lines.append(line)
                sib = sib.next_sibling

        title_s = cast(Tag, title).string
        if title_s is None:
            raise RuntimeError(f"No title for abstract of {op}")
        else:
            if "DJ3EI" in op:
                # Late fiddling of my own talk:
                abstract_lines.append(
                    "Folien und Tagungsbandbeitrag sind veröffentlicht unter "
                    "[https://dj3ei.famsik.de/2025-JS8/](https://dj3ei.famsik.de/2025-JS8/)."
                )
            result.append(Abstract(op=op, title=title_s, abstract_lines=abstract_lines))

    # We have a few things in the plan that don't have their own abstract:
    NO_COMMENT = ["(Keine weitere Beschreibung)"]
    result.append(
        Abstract(
            op="Charly Eichhorn, DK3ZL",
            title="Live-QSO mit der Neumayer III Südpolstation über QO-100",
            abstract_lines=NO_COMMENT,
        )
    )
    result.append(Abstract(op="Michael Dörr", title="Workshop NeoPixel", abstract_lines=["Siehe Vortrag 12:30-13:15"]))
    result.append(
        Abstract(
            op="Alex Knochel DK3HD",
            title="Vorbereitung eines Stratosphärenballons mit SSTV auf der Wiese der DBS",
            abstract_lines=NO_COMMENT,
        )
    )
    result.append(
        Abstract(
            op="Alex Knochel DK3HD",
            title="Start eines Stratosphärenballons mit SSTV auf der Wiese der DBS",
            abstract_lines=NO_COMMENT,
        )
    )

    return result


def main() -> None:
    # Create the cache dir on first run:
    os.makedirs(DEFAULT_CACHE_DIR, exist_ok=True)

    with requests.Session() as session:
        plan_items = grab_plan(session, "https://ukw-tagung.org/vortragsplan-ukw-tagung-2025/", "plan.html")
        abstracts = grab_abstracts(
            session, "https://ukw-tagung.org/abstracts-der-vortraege-der-70-ukw-tagung-2025/", "abstracts.html"
        )

    # Provide the abstracts in a dict with key a concatenation of op and title, with a " " intervening:
    op_title2abstract_lines: dict[str, list[str]] = {}
    for abstract in abstracts:
        # Fix inconsistencies.
        # Where we think the plan has the better representation of the op and/or the title,
        # we use that.
        if abstract.op == "Bernd Sierk":
            op_title = (
                "Bernd Sierk, EUMETSAT Die Erde vom Weltraum aus gesehen – "
                "was man von Satelliten alles messen kann (und wie)"
            )
        elif "Dopplerpeiler-Konzepts" in abstract.title:
            op_title = (
                "Michael Kugel, DC1PAA Realisierung des Relais / QRG-Monitors als "
                "ein Basis – Modul des Dopplerpeiler-Konzepts"
            )
        elif "Einstieg in CircuitPython mit dem Raspberry-Pi-Pico und NeoPixel-Matrizen" == abstract.title:
            op_title = "Michael Dörr Einstieg in CircuitPython mit dem Rasperry-Pico und NeoPixel-Matrizen"
        elif "DK5LV" in abstract.op:
            op_title = "Henning-Christof Weddig, DK5LV Mein erstes Funkgerät für das 2m Band"
        elif "WSPR – wie Amateurfunkverfahren Luft- und Raumfahrt in entlegenen Gebieten hilft" in abstract.title:
            op_title = (
                "Robert Westphal, DJ4FF WSPR – wie Amateurfunkverfahren Schiff- und Luftfahrt "
                "in entlegenen Gebieten hilft"
            )
        else:
            op_title = f"{abstract.op} {abstract.title}"
        op_title2abstract_lines[op_title] = abstract.abstract_lines

    # Now write the result as a Markdown file for my blog, to my blog:
    with open("../content/posts/2025/ukw_tagung_toc.de.md", "w") as toc:
        # Head matter.
        toc.write(
            """title: Programm UKW-Tagung 2025
slug: ukw_tagung_toc
date: 2025-09-03 01:32:00 UTC+02:00
modified: 2025-09-12 06:12:00 UTC+02:00
type: text
special_copyright: <p>Die Rechte an den Abstracts gehören den jeweiligen Autoren.</p>

## Was ist das?

Ich wollte das Programm der diesjährigen [UKW-Tagung](https://ukw-tagung.org/) anders sortiert haben.

Weil: Wenn ein Vortrag zu Ende ist und ich überlege, wo ich als
nächstes hingehe, möchte ich aus den Vorträgen, die als nächstes
anfangen, einen aussuchen.  Dazu will ich die hintereinander weg sehen
können *einschließlich der Vortragszusammenfassungen.*

Das leistet weder der [Vortragsplan](https://ukw-tagung.org/vortragsplan-ukw-tagung-2025/)
(Abstracts fehlen) noch die
[Seite der Abstracts](https://ukw-tagung.org/abstracts-der-vortraege-der-70-ukw-tagung-2025/)
(liefert keine Raum- und Zeitinformation).

Also habe ich ein [Pythonskript gebaut](/de/posts/2025/ukw_tagung_toc_software/),
das diese beiden Seiten der UKW-Tagung einliest, parst,
17 nickelige Inkonsistenzen bereinigt und die
resultierenden Daten sortiert hier wieder ausgibt.

"""
        )

        last_time = None
        for plan_item in plan_items:
            if plan_item.time != last_time:
                if last_time is not None:
                    toc.write("------\n")
                toc.write(f"\n\n## {plan_item.time} Uhr\n\n")
                last_time = plan_item.time
            toc.write(f"\n### {plan_item.title}\n\n**{plan_item.room}**\n\n")
            toc.write(f"{plan_item.op}, {plan_item.time} Uhr\n\n")
            # Retrieve and output the abstract:
            op_title = f"{plan_item.op} {plan_item.title}"
            if op_title in op_title2abstract_lines:
                for line in op_title2abstract_lines[op_title]:
                    toc.write(f"{line}\n\n")
            else:
                raise RuntimeError(f'"{op_title}" not found in abstracts')
        toc.write(
            "_Wer diesen Blogbeitrag kommentieren will und\n"
            "einen Fediverse-Zugang hat, kann\n"
            "[https://mastodon.radio/@dj3ei/115140232578322877]"
            "(https://mastodon.radio/@dj3ei/115140232578322877) kommentieren._\n\n"
        )


if __name__ == "__main__":
    main()

Abweichend vom sonstigen Text dieser Seite entlasse ich diesen Code in die “public domain”: Jede(r) darf ihn für alles nutzen. Natürlich gebe ich im Zusammenhang mit dieser Software keinerlei Garantien ab.

Wer diesen Blogbeitrag kommentieren will und einen Fediverse-Zugang hat, kann https://mastodon.radio/@dj3ei/115140305869717657 kommentieren.