BTstack and RP2040: HCI (Host Controller Interface)

V. Hunter Adams



Purpose and intended audience for this document

This webpage attempts to offer a mental model for the HCI layer of BTstack, the implementation of the Bluetooth stack which is used for the Raspberry Pi Pico W. Please note that I did not contribute to BTstack (thank you, Matthias Ringwald!), nor did I contribute to porting it to the RP2040 (thank you Graham Sanderson and Peter Harper!). If anyone more informed than me reads this and notices an error or misunderstanding, please let me know (vha3@cornell.edu) and I'll correct it.

This is the first of what will be a handful of webpages. This webpage investigates what happens when we call cyw43_arch_init() on the RP2040. This investigation reveals how the device-agnostic BTstack code gets associated with the RP2040-specific code which ports it to the RP2040. It also reveals the mechanisms which underly the high-level async library upon which the RP2040 port of BTstack is assembled. It reveals which hardware resources get used by BTstack (two DMA channels, two PIO state machines, one hardware alarm, and an ARM user interrupt) and the logical organization of the low-level HCI layer.

Coming webpages will move up the layers of abstraction to investigate the L2CAP layer, security manager, etc. Everything on this webpage comes from careful reading of the pico-examples/pico_w/bt/standalone example from the pico-examples repository. This example includes both a Bluetooth server and client, we'll focus in particular on the server.

The intended audience includes students in ECE 4760 at Cornell.


Getting oriented in the directories

After you install the build environment and C SDK on your chosen directory on your machine (Windows, Mac, Linux), that chosen directory will contain two subdirectories: pico-sdk and pico-examples. As the names suggest, the pico-sdk directory contains the C SDK for configuring and controlling the RP2040, and the pico-examples directory contains a large collections of example projects which utilize that SDK.

Each of these directories contains a ton of files, but we are going to direct our attention to a few locations in particular. One is pico-sdk/lib/btstack. This directory contains all of the device-agnostic C code which implements the Bluetooth stack. This Bluetooth stack implementation was developed by Blue Kitchen, and their documentation is available here. This documentation is one of the primary resources for this webpage.

We will also spend some time in pico-sdk/lib/cyw43-driver, which contains the device-agnostic C code which abstracts a low-level driver to the CYW43 Wifi/Bluetooth chip which is included on the Pi Pico W.

This device-agnostic C code must be integrated with some RP2040-specific code to port it to this specific hardware. We will spend the rest of our time in these directories, in particular pico-sdk/src/rp2_common/pico_btstack, which implements the RP2040-specific code that integrates the device-agnostic library in pico-sdk/lib/btstack. And pico-sdk/src/rp2040_common/pico_cyw43_driver, which contains the RP2040-specific code to integrate the device-agnostic CYW43 driver in pico-sdk/lib/cyw43-driver. Finally, we'll spend some time in pico-sdk/src/rp2040_common/pico_cyw43_arch, which includes some more RP2040-specific code for integrating with the device-agnostic CYW43 driver.

It is often the case that material like this is clearest when presented by way of an example. The particular example which we will consider as a case study is pico-examples/pico_w/bt/standalone. This example includes both a Bluetooth server and client, we'll focus in particular on the server.


Walking through an example

We'll consider pico-examples/pico_w/bt/standalone/server.c in particular. We won't dive deeply into lines that aren't related to BTstack (e.g. configuring UART or SPI interfaces), but we'll study the BTstack-related calls to some depth.

We'll start with main(). After initializing stdio, the program then calls a function called cyw43_arch_init(). What happens when we call this function?

This function lives in pico-sdk/src/rp2_common/pico_cyw43_arch/cyw43_arch_threadsafe_background.c, and is copied below.

int cyw43_arch_init(void) {
    async_context_t *context = cyw43_arch_async_context();
    if (!context) {
        context = cyw43_arch_init_default_async_context();
        if (!context) return PICO_ERROR_GENERIC;
        cyw43_arch_set_async_context(context);
    }
    bool ok = cyw43_driver_init(context);
#if CYW43_LWIP
    ok &= lwip_nosys_init(context);
#endif
#if CYW43_ENABLE_BLUETOOTH
    ok &= btstack_cyw43_init(context);
#endif
    if (!ok) {
        cyw43_arch_deinit();
        return PICO_ERROR_GENERIC;
    } else {
        return 0;
    }
}

This function first sets the value of a pointer to an object of type async_context_t (called "context") to that which is returned from the function cyw43_arch_async_context(). All that this function does is return the value of a global pointer to an object of type async_context_t (see pico-sdk/src/rp2040_common/pico_cyw43_arch/cyw_43_arch.c). If that global pointer has not been initialized, then the above function sets the value of the local variable context to that which is returned from cyw43_arch_init_default_async_context(). This function (also in pico-sdk/src/rp2_common/pico_cyw43_arch/cyw43_arch_threadsafe_background.c) is copied below.

async_context_t *cyw43_arch_init_default_async_context(void) {
    async_context_threadsafe_background_config_t config = async_context_threadsafe_background_default_config();
    if (async_context_threadsafe_background_init(&cyw43_async_context_threadsafe_background, &config))
        return &cyw43_async_context_threadsafe_background.core;
    return NULL;
}

You can see that this function declares a local variable of type async_context_threadsafe_background_config_t called config, and sets the value of that local variable to that which is returned from async_context_threadsafe_background_default_config(). This data type is defined in the high-level pico-async-context library of the C SDK. You can find the definition at pico-sdk/src/rp2_common/pico_async_context/async_context_threadsafe_background.c and it is copied below. Note that this struct contains a char which will specify the interrupt handler priority level for this async context, and an alarm pool. By default, this uses the default alarm pool. The default alarm pool uses hardware alarm number 3, which attaches to interrupt TIMER_IRQ_3.

/**
 * \brief Configuration object for async_context_threadsafe_background instances.
 */
typedef struct async_context_threadsafe_background_config {
/**
 * the priority of the low priority IRQ
 */
    uint8_t low_priority_irq_handler_priority;
    /**
     * a specific alarm pool to use (or NULL to use ta default)
     *
     * \note this alarm pool MUST be on the same core as the async_context
     *
     * The default alarm pool used is the "default alarm pool" (see
     * \ref alarm_pool_get_default()) if available, and if that is on the same
     * core, otherwise a private alarm_pool instance created during
     * initialization.
     */
    alarm_pool_t *custom_alarm_pool;
} async_context_threadsafe_background_config_t;

async_context_threadsafe_background_default_config() is implemented in pico-sdk/src/rp2_common/pico_async_context/async_context_threadsafe_background.c and copied below. It sets the interrupt priority to the lowest possible value (using a macro from the SDK), and it sets the custom_alarm_pool parameter to NULL. If we didn't want to use the default alarm pool, we could point this to a different alarm pool.

async_context_threadsafe_background_config_t async_context_threadsafe_background_default_config(void) {
    async_context_threadsafe_background_config_t config = {
            .low_priority_irq_handler_priority = ASYNC_CONTEXT_THREADSAFE_BACKGROUND_DEFAULT_LOW_PRIORITY_IRQ_HANDLER_PRIORITY,
            .custom_alarm_pool = NULL,
    };
    return config;
}

Popping back up to cyw43_arch_init_default_async_context(), the local variable config is then set such to specify the lowest possible interrupt priority, and to use the default alarm pool. This function then calls async_context_threadsafe_background_init(&cyw43_async_context_threadsafe_background, &config). As arguments, this function takes a pointer to a globally declared object of type async_context_threadsafe_background_t (declared in pico-sdk/src/rp2_common/pico_async_context/async_context_threadsafe_background.c) and a pointer to the configuration that we just created. Let's look more carefully at that type async_context_threadsafe_background_t:

struct async_context_threadsafe_background {
    async_context_t core;
    alarm_pool_t *alarm_pool; // this must be on the same core as core_num
    absolute_time_t last_set_alarm_time;
    recursive_mutex_t lock_mutex;
    semaphore_t work_needed_sem;
    volatile alarm_id_t alarm_id;
#if ASYNC_CONTEXT_THREADSAFE_BACKGROUND_MULTI_CORE
    volatile alarm_id_t force_alarm_id;
    bool alarm_pool_owned;
#endif
    uint8_t low_priority_irq_num;
    volatile bool alarm_pending;
};

We see that this struct includes something of type async_context_t called core, a pointer to an alarm pool, a pointer to an object of type absolute_time_t (contains absolute time since boot), a lock mutex, a semaphore to indicate work needs to be done, the ID number for the alarm that we're using, some stuff that is only relevant for multicore, a char to specify the irq number, and a boolean to indicate whether or not an alarm is pending. Let's look more closely at that type async_context_t (from pico-sdk/src/rp2_common/pico_async_context/include/pico/async_context.h):

struct async_context {
    const async_context_type_t *type;
    async_when_pending_worker_t *when_pending_list;
    async_at_time_worker_t *at_time_list;
    absolute_time_t next_time;
    uint16_t flags;
    uint8_t  core_num;
};

This contains a pointer to something of type async_context_type_t, a pointer to a list of async_when_pending_worker objects, a pointer to a list of async_at_time_worker objects, an object of type absolute_time_t called next_time, a short called flags, and a char that specifies the core number. Let's now pop down one more layer, and look more closely at the type async_context_type_t, also from pico-sdk/src/rp2_common/pico_async_context/include/pico/async_context.h:

typedef struct async_context_type {
    uint16_t type;
    // see wrapper functions for documentation
    void (*acquire_lock_blocking)(async_context_t *self);
    void (*release_lock)(async_context_t *self);
    void (*lock_check)(async_context_t *self);
    uint32_t (*execute_sync)(async_context_t *context, uint32_t (*func)(void *param), void *param);
    bool (*add_at_time_worker)(async_context_t *self, async_at_time_worker_t *worker);
    bool (*remove_at_time_worker)(async_context_t *self, async_at_time_worker_t *worker);
    bool (*add_when_pending_worker)(async_context_t *self, async_when_pending_worker_t *worker);
    bool (*remove_when_pending_worker)(async_context_t *self, async_when_pending_worker_t *worker);
    void (*set_work_pending)(async_context_t *self, async_when_pending_worker_t *worker);
    void (*poll)(async_context_t *self); // may be NULL
    void (*wait_until)(async_context_t *self, absolute_time_t until);
    void (*wait_for_work_until)(async_context_t *self, absolute_time_t until);
    void (*deinit)(async_context_t *self);
} async_context_type_t;

So, just to clarify the heirarchy, the core field of the async_context_threadsafe_background struct is of type async_context (the other fields contain information about mutexes, semaphores, and interrupts, the utility of which will become clear later). The type field of the async_context struct points to an object of type async_context_type. The other fields of the async_context struct include a pointer to a list of objects of type async_when_pending_worker, a pointer to a list of objects of type async_at_time_worker, the time, flags, and the core number. The lowest-level struct, async_context_type, contains pointers to a bunch of functions that facilitate adding/removing workers to to the lists of its parent async_context, marking a worker as pending, polling workers, etc. More on these later.

What happens when we call async_context_threadsafe_background_init(&cyw43_async_context_threadsafe_background, &config)? Let's take a look at that function, in pico-sdk/src/rp2040_common/pico_async_context/async_context_threadsafe_background.c:

bool async_context_threadsafe_background_init(async_context_threadsafe_background_t *self, async_context_threadsafe_background_config_t *config) {
    memset(self, 0, sizeof(*self));
    self->core.type = &template;
    self->core.flags = ASYNC_CONTEXT_FLAG_CALLBACK_FROM_IRQ | ASYNC_CONTEXT_FLAG_CALLBACK_FROM_NON_IRQ;
    self->core.core_num = get_core_num();
    if (config->custom_alarm_pool) {
        self->alarm_pool = config->custom_alarm_pool;
    } else {
#if PICO_TIME_DEFAULT_ALARM_POOL_DISABLED
        self->alarm_pool = alarm_pool_create_with_unused_hardware_alarm(ASYNC_CONTEXT_THREADSAFE_BACKGROUND_ALARM_POOL_MAX_ALARMS);
        self->alarm_pool_owned = true;
#else
        self->alarm_pool = alarm_pool_get_default();
#if ASYNC_CONTEXT_THREADSAFE_BACKGROUND_MULTI_CORE
        if (self->core.core_num != alarm_pool_core_num(self->alarm_pool)) {
            self->alarm_pool = alarm_pool_create_with_unused_hardware_alarm(ASYNC_CONTEXT_THREADSAFE_BACKGROUND_ALARM_POOL_MAX_ALARMS);
            self->alarm_pool_owned = true;
        }
#endif
#endif
    }
    assert(self->core.core_num == alarm_pool_core_num(self->alarm_pool));
    sem_init(&self->work_needed_sem, 1, 1);
    recursive_mutex_init(&self->lock_mutex);
    bool ok = low_prio_irq_init(self, config->low_priority_irq_handler_priority);
    return ok;
}

This clears the section of memory which contains the global cyw43_async_context_threadsafe_background object of type async_context_background. Then, it sets the core field (which points to an object of type async_context_type) to a globally declared object of the same type called template. This object is defined at the bottom of the same file, and is copied below. It sets the fields for template to point to a collection of functions also defined in that file. More on these in a bit.

static const async_context_type_t template = {
        .type = ASYNC_CONTEXT_THREADSAFE_BACKGROUND,
        .acquire_lock_blocking = async_context_threadsafe_background_acquire_lock_blocking,
        .release_lock = async_context_threadsafe_background_release_lock,
        .lock_check = async_context_threadsafe_background_lock_check,
        .execute_sync = async_context_threadsafe_background_execute_sync,
        .add_at_time_worker = async_context_threadsafe_background_add_at_time_worker,
        .remove_at_time_worker = async_context_threadsafe_background_remove_at_time_worker,
        .add_when_pending_worker = async_context_threadsafe_background_add_when_pending_worker,
        .remove_when_pending_worker = async_context_threadsafe_background_when_pending_worker,
        .set_work_pending = async_context_threadsafe_background_set_work_pending,
        .poll = 0,
        .wait_until = async_context_threadsafe_background_wait_until,
        .wait_for_work_until = async_context_threadsafe_background_wait_for_work_until,
        .deinit = async_context_threadsafe_background_deinit,
};

The async_context_threadsafe_background_init() function also sets the alarm_pool field of the global cyw43_async_context_threadsafe_background to the default alarm pool (unless otherwise specified), initializes the work_needed semaphore (more on this in a bit), initializes a mutex, and sets up a low-priority interrupt by calling low_prio_irq_init(). This function is copied below:

static bool low_prio_irq_init(async_context_threadsafe_background_t  *self, uint8_t priority) {
    assert(get_core_num() == self->core.core_num);
    int irq = user_irq_claim_unused(false);
    if (irq < 0) return false;
    self->low_priority_irq_num = (uint8_t) irq;
    uint index = irq - FIRST_USER_IRQ;
    assert(index < count_of(async_contexts_by_user_irq));
    async_contexts_by_user_irq[index] = self;
    irq_set_exclusive_handler(self->low_priority_irq_num, low_priority_irq_handler);
    irq_set_enabled(self->low_priority_irq_num, true);
    irq_set_priority(self->low_priority_irq_num, priority);
    return true;
}

This function grabs an unused, low-priority interrupt, enables it, and associates a low-priority interrupt handler. That handler is copied below.

// Low priority interrupt handler to perform background processing
static void low_priority_irq_handler(void) {
    uint index = __get_current_exception() - VTABLE_FIRST_IRQ - FIRST_USER_IRQ;
    assert(index < count_of(async_contexts_by_user_irq));
    async_context_threadsafe_background_t *self = async_contexts_by_user_irq[index];
    if (!self) return;
    assert(self->core.core_num == get_core_num());
    if (recursive_mutex_try_enter(&self->lock_mutex, NULL)) {
        // if the recurse count is not 1 then we have pre-empted something which held the lock on the same core,
        // so we cannot do processing here (however processing will be done when that lock is released)
        if (recursive_mutex_enter_count(&self->lock_mutex) == 1) {
            process_under_lock(self);
        }
        recursive_mutex_exit(&self->lock_mutex);
    }
}

This handler, which would be associated with each async_context, first looks up which interrupt caused for the handler to be entered. The handler then tries to grab a mutex (to make certain it isn't interrupting another process). When it succeeds, it calls a function called process_under_lock(). What's this function do?

static void process_under_lock(async_context_threadsafe_background_t *self) {
#ifndef NDEBUG
    async_context_threadsafe_background_lock_check(&self->core);
    assert(self->core.core_num == get_core_num());
#endif
    do {
        absolute_time_t next_time = async_context_base_execute_once(&self->core);
        // if the next wakeup time is in the past then loop
        if (absolute_time_diff_us(get_absolute_time(), next_time) <= 0) continue;
        // if there is no next wakeup time, we're done
        if (is_at_the_end_of_time(next_time)) {
            // cancel the alarm early (we will have been called soon after an alarm wakeup), so that
            // we don't risk alarm_id collision.
            if (self->alarm_id > 0) {
                alarm_pool_cancel_alarm(self->alarm_pool, self->alarm_id);
                self->alarm_id = 0;
            }
            break;
        }
        // the following is an optimization; we are often called much more frequently than timeouts actually change,
        // and removing and re-adding the timers has some non-trivial overhead (10s of microseconds), we choose
        // to allow the existing timeout to run to completion, and then re-asses from there, unless the new wakeup
        // time is before the last one set.
        //
        // note that alarm_pending is not protected, however, if it is wrong, it is wrong in the sense that it is
        // false when it should be true, so if it is wrong due to a race, we will cancel and re-add the alarm which is safe.
        if (self->alarm_pending && absolute_time_diff_us(self->last_set_alarm_time, next_time) > 0) break;
        // cancel the existing alarm (it may no longer exist)
        if (self->alarm_id > 0) alarm_pool_cancel_alarm(self->alarm_pool, self->alarm_id);
        self->last_set_alarm_time = next_time;
        self->alarm_pending = true;
        self->alarm_id = alarm_pool_add_alarm_at(self->alarm_pool, next_time, alarm_handler, self, false);
        if (self->alarm_id > 0) break;
        self->alarm_pending = false;
    } while (true);
}

The first thing that this function does is set the value of next_time to that which is returned from async_context_base_execute_once(&self->core);

absolute_time_t async_context_base_execute_once(async_context_t *self) {
    async_at_time_worker_t *at_time_worker;
    while (NULL != (at_time_worker = async_context_base_remove_ready_at_time_worker(self))) {
        at_time_worker->do_work(self, at_time_worker);
    }
    for(async_when_pending_worker_t *when_pending_worker = self->when_pending_list; when_pending_worker; when_pending_worker = when_pending_worker->next) {
        if (when_pending_worker->work_pending) {
            when_pending_worker->work_pending = false;
            when_pending_worker->do_work(self, when_pending_worker);
        }
    }
    async_context_base_refresh_next_timeout(self);
    return self->next_time;
}

This function runs the do_work process associated with each at_time_worker which is ready to run, and then runs each when_pending_worker which has been marked as work_pending. This function then calls async_context_base_refresh_next_timeout, which looks through the at_time_workers and updates the next timeout to be that associated with the nearest at_time_worker.

Popping back up to process_under_lock, this time is then used to determine whether or not another alarm should be set. In the event that an alarm should be set, it sets it with the updated timeout value and with alarm_handler as the callback function. Let's look at alarm_handler:

static int64_t alarm_handler(__unused alarm_id_t id, void *user_data) {
    async_context_threadsafe_background_t *self = (async_context_threadsafe_background_t*)user_data;
#if ASYNC_CONTEXT_THREADSAFE_BACKGROUND_MULTI_CORE
    self->force_alarm_id = 0;
#endif
    self->alarm_pending = false;
    async_context_threadsafe_background_wake_up(&self->core);
    return 0;
}

This function simply de-asserts the alarm_pending field, and calls async_context_threadsafe_background_wake_up(&self->core); Let's look at that:

static void async_context_threadsafe_background_wake_up(async_context_t *self_base) {
    async_context_threadsafe_background_t *self = (async_context_threadsafe_background_t *)self_base;
#if ASYNC_CONTEXT_THREADSAFE_BACKGROUND_MULTI_CORE
    if (self_base->core_num == get_core_num()) {
        // on same core, can dispatch directly
        irq_set_pending(self->low_priority_irq_num);
    } else {
        // remove the existing alarm (it may have already fired) so we don't overflow the pool with repeats
        //
        // note that force_alarm_id is not protected here, however if we miss removing one, they will fire
        // almost immediately anyway (since they were set in the past)
        alarm_id_t force_alarm_id = self->force_alarm_id;
        if (force_alarm_id > 0) {
            alarm_pool_cancel_alarm(self->alarm_pool, force_alarm_id);
        }
        // we cause an early timeout (0 is always in the past) on the alarm_pool core
        // note that by the time this returns, the timer may already have fired, so we
        // may end up setting self->force_alarm_id to a stale timer id, but that is fine as we
        // will harmlessly cancel it again next time
        self->force_alarm_id = alarm_pool_add_alarm_at_force_in_context(self->alarm_pool, from_us_since_boot(0),
                                                                        alarm_handler, self);
    }
#else
    // on same core, can dispatch directly
    irq_set_pending(self->low_priority_irq_num);
#endif
    sem_release(&self->work_needed_sem);
}

All that this does (if we're not doing multicore) is irq_set_pending(self->low_priority_irq_num);! That sends us right back to low_priority_irq_handler! So we arrive at this function anytime the context alarm expires, and anytime that we call async_context_threadsafe_background_wake_up (which, as we'll see, we do to set a when_pending_worker as pending).

Popping all the way back up to cyw43_arch_init_default_async_context(void), this function then returns a pointer to the core field of the async_context_threadsafe_background object which we just initialized, called cyw43_async_context_threadsafe_background. So now that instantiation of the async_context_threadsafe_background_t object has been initialized, and we've passed a pointer to its core field back up to cyw43_arch_init() and used that pointer to set the value of context. See below:

int cyw43_arch_init(void) {
    async_context_t *context = cyw43_arch_async_context();
    if (!context) {
        context = cyw43_arch_init_default_async_context();
        if (!context) return PICO_ERROR_GENERIC;
        cyw43_arch_set_async_context(context);
    }
    bool ok = cyw43_driver_init(context);
#if CYW43_LWIP
    ok &= lwip_nosys_init(context);
#endif
#if CYW43_ENABLE_BLUETOOTH
    ok &= btstack_cyw43_init(context);
#endif
    if (!ok) {
        cyw43_arch_deinit();
        return PICO_ERROR_GENERIC;
    } else {
        return 0;
    }
}

After a quick error check, we then call cyw43_arch_set_async_context(context). Let's look at that function:

static async_context_t *async_context;

void cyw43_arch_set_async_context(async_context_t *context) {
    async_context = context;
}

All that this function does is set the value of a global pointer to an async_context_t called async_context to the value passed into it as an argument. In this case, we are setting the value of this global pointer such that it points to the core field of the (also global) async_context_threadsafe_background_t object called cyw43_async_context_threadsafe_background.

As yet, this context has no workers, and it doesn't yet know anything about how to communicate with the Bluetooth chip. That will happen next, but let's first pause to reflect.


Pausing to reflect

We haven't yet gotten past cyw43_arch_init(), which is essentially the first line of main(), but we've learned a lot about the infrastructure underlying BTstack on the RP2040.

We are instantiating a data structure which contains two linked-lists of pointers to workers of two varieties. We have async_when_pending_workers and async_at_time_workers. These workers are themselves data structures which contain a pointer to a function, a linked-list item, and a boolean to indicate whether or not the associated function should be run (in the case of the when_pending_workers) or a time at which the associated function should run (in the case of the at_time_workers).

The execution of all of these workers is instantiated from a low-priority interrupt handler. That interrupt handler calls a function which goes through each list of workers and checks whether any of them are ready to run. In the event that one is ready to run, its associated function gets called and its boolean gets reset. We enter this interrupt either by setting it as pending in software, and this occurs in one of two ways. If the application wants to mark one of the when_pending_workers as ready to run, it sets its associated boolean and sets the interrupt as pending. Alternatively, the at_time_workers have their timing controlled by an alarm pool attached to one of the low-level hardware alarms. The callback function for this alarm sets the low-priority interrupt as pending.

This data structure makes use of sephaphores and mutexes to make certain that the execution of external user code does not happen concurrently with worker code.


Initializing the CYW43 Driver

The next line of interest in cyw43_arch_init() is bool ok = cyw43_driver_init(context). Let's take a closer look at that:

bool cyw43_driver_init(async_context_t *context) {
    cyw43_init(&cyw43_state);
    cyw43_async_context = context;
    // we need the IRQ to be on the same core as the context, because we need to be able to enable/disable the IRQ
    // from there later
    async_context_execute_sync(context, cyw43_irq_init, NULL);
    async_context_add_when_pending_worker(context, &cyw43_poll_worker);
    return true;
}

The first thing that this function does is call cyw43_init(&cyw43_state);:

void cyw43_init(cyw43_t *self) {
    #ifdef CYW43_PIN_WL_HOST_WAKE
    cyw43_hal_pin_config(CYW43_PIN_WL_HOST_WAKE, CYW43_HAL_PIN_MODE_INPUT, CYW43_HAL_PIN_PULL_NONE, 0);
    #endif
    cyw43_hal_pin_config(CYW43_PIN_WL_REG_ON, CYW43_HAL_PIN_MODE_OUTPUT, CYW43_HAL_PIN_PULL_NONE, 0);
    cyw43_hal_pin_low(CYW43_PIN_WL_REG_ON);
    #ifdef CYW43_PIN_WL_RFSW_VDD
    cyw43_hal_pin_config(CYW43_PIN_WL_RFSW_VDD, CYW43_HAL_PIN_MODE_OUTPUT, CYW43_HAL_PIN_PULL_NONE, 0); // RF-switch power
    cyw43_hal_pin_low(CYW43_PIN_WL_RFSW_VDD);
    #endif

    cyw43_ll_init(&self->cyw43_ll, self);

    self->itf_state = 0;
    self->wifi_scan_state = 0;
    self->wifi_join_state = 0;
    self->pend_disassoc = false;
    self->pend_rejoin = false;
    self->pend_rejoin_wpa = false;
    self->ap_channel = 3;
    self->ap_ssid_len = 0;
    self->ap_key_len = 0;

    cyw43_poll = NULL;
    self->initted = true;

    #if CYW43_ENABLE_BLUETOOTH
    self->bt_loaded = false;
    #endif
}

This function sets the fields of the global cyw43_state object of type cyw43_t. In particular, it's associating a series of RP2040 GPIO pins as inputs/outputs for interfacing with various of the CYW43's inputs/outputs. Of particular note for us is that it sets the cyw43_poll field to NULL and it sets the initted field to true. These will be important details later on.

Next, cyw43_driver_init() calls cyw43_async_context = context;. This is setting the value of another globally declared pointer to an async_context_t to the same address that is pointed to by context. As discussed previously, context points to the core field of async_context_threadsafe_background_t. The consequence is that cyw43_async_context and async_context now both point to the same context object.

Next, cyw43_driver_init() calls async_context_execute_sync(context, cyw43_irq_init, NULL);. This function gets passed a pointer to a function and executes it synchronously with the context. In this case, we're passing it the function cyw43_irq_init. Let's look at that.

uint32_t cyw43_irq_init(__unused void *param) {
#ifndef NDEBUG
    assert(get_core_num() == async_context_core_num(cyw43_async_context));
#endif
    gpio_add_raw_irq_handler_with_order_priority(CYW43_PIN_WL_HOST_WAKE, cyw43_gpio_irq_handler, CYW43_GPIO_IRQ_HANDLER_PRIORITY);
    cyw43_set_irq_enabled(true);
    irq_set_enabled(IO_IRQ_BANK0, true);
    return 0;
}

This function associates an interrupt handler with a particular GPIO pin called cyw43_gpio_irq_handler. Here's what that interrupt handler does:

// GPIO interrupt handler to tell us there's cyw43 has work to do
static void cyw43_gpio_irq_handler(void)
{
    uint32_t events = gpio_get_irq_event_mask(CYW43_PIN_WL_HOST_WAKE);
    if (events & GPIO_IRQ_LEVEL_HIGH) {
        // As we use a high level interrupt, it will go off forever until it's serviced
        // So disable the interrupt until this is done. It's re-enabled again by CYW43_POST_POLL_HOOK
        // which is called at the end of cyw43_poll_func
        cyw43_set_irq_enabled(false);
        async_context_set_work_pending(cyw43_async_context, &cyw43_poll_worker);
    }
}

This function confirms that the CYW43_PIN_WL_HOST_WAKE GPIO pin caused the interrupt. It then disables the interrupt (it will be re-enabled elsewhere) and sets the cyw43_poll_worker as pending. Note that we add this worker to the async_context in cyw43_driver_init, right after async_context_execute_sync(context, cyw43_irq_init, NULL); when we call async_context_add_when_pending_worker(context, &cyw43_poll_worker);. What does this worker then do?

static async_when_pending_worker_t cyw43_poll_worker = {
        .do_work = cyw43_do_poll
};

It's do_work field points to the function cyw43_do_poll:

static void cyw43_do_poll(async_context_t *context, __unused async_when_pending_worker_t *worker) {
#ifndef NDEBUG
    assert(get_core_num() == async_context_core_num(cyw43_async_context));
#endif
    if (cyw43_poll) {
        if (cyw43_sleep > 0) {
            cyw43_sleep--;
        }
        cyw43_poll();
        if (cyw43_sleep) {
            async_context_add_at_time_worker_in_ms(context, &sleep_timeout_worker, CYW43_SLEEP_CHECK_MS);
        } else {
            async_context_remove_at_time_worker(context, &sleep_timeout_worker);
        }
    }
}

This function calls a function called cyw43_poll(). Elsewhere in the code (we'll come to it) cyw43_poll is set to point to cy243_poll_func(), copied below:

static void cyw43_poll_func(void) {
    CYW43_THREAD_LOCK_CHECK;

    if (cyw43_poll == NULL) {
        // Poll scheduled during deinit, just ignore it
        return;
    }

    CYW43_STAT_INC(CYW43_RUN_COUNT);

    cyw43_t *self = &cyw43_state;

    #if CYW43_ENABLE_BLUETOOTH
    if (self->bt_loaded && cyw43_ll_bt_has_work(&self->cyw43_ll)) {
        cyw43_bluetooth_hci_process();
    }
    #endif

    if (cyw43_ll_has_work(&self->cyw43_ll)) {
        cyw43_ll_process_packets(&self->cyw43_ll);
    }

    if (self->pend_disassoc) {
        self->pend_disassoc = false;
        cyw43_ll_ioctl(&self->cyw43_ll, CYW43_IOCTL_SET_DISASSOC, 0, NULL, CYW43_ITF_STA);
    }

    if (self->pend_rejoin_wpa) {
        self->pend_rejoin_wpa = false;
        cyw43_ll_wifi_set_wpa_auth(&self->cyw43_ll);
    }

    if (self->pend_rejoin) {
        self->pend_rejoin = false;
        cyw43_ll_wifi_rejoin(&self->cyw43_ll);
        self->wifi_join_state = WIFI_JOIN_STATE_ACTIVE;
    }

    if (cyw43_sleep == 0) {
        cyw43_ll_bus_sleep(&self->cyw43_ll, true);
        #if !USE_SDIOIT && !CYW43_USE_SPI
        cyw43_sdio_deinit(); // save power while WLAN bus sleeps
        #endif
    }

    #if USE_SDIOIT
    cyw43_sdio_set_irq(true);
    #endif

    #ifdef CYW43_POST_POLL_HOOK
    CYW43_POST_POLL_HOOK
    #endif

}

In the case that we're communicating with Bluetooth, this checks that bluetooth has been loaded and that there's work to be done, then calls cyw43_bluetooth_hci_process();:

void cyw43_bluetooth_hci_process(void) {
    if (hci_transport_ready) {
        btstack_run_loop_poll_data_sources_from_irq();
    }
}

This function calls btstack_run_loop_poll_data_sources_from_irq(), this does the following:

void btstack_run_loop_poll_data_sources_from_irq(void){
    btstack_assert(the_run_loop != NULL);
    btstack_assert(the_run_loop->poll_data_sources_from_irq != NULL);
    the_run_loop->poll_data_sources_from_irq();
}

As we'll see, this is the trigger to loop through our data sources. So, we've built a mechanism by which the CYW43 can make the RP2040 run through its data sources to retrieve data by means of a GPIO interrupt that sets an async_when_pending worker as pending. To fill in the details about how this works, we need to talk about Bluetooth, which is the next line in cyw43_arch_init().

Note that the final Macro in cyw43_poll_func, which is CYW43_POST_POLL_HOOK, is what re-enables the GPIO interrupt.


Pausing again to reflect

So at this point, we have a single object of type async_context_threadsafe_background. The core field of this object is an async_context. These async_context objects contain two lists of pointers to functions called when_pending_workers and at_time_workers. Using the functions pointed to by the type field of the core (which is an async_type object), we can add/remove workers from these lists, mark them as pending, etc.

Under the hood, marking a worker as pending for a particular async_context actually means setting a particular low-level interrupt as pending. The interrupt service routine associated with this interrupt calls a function which goes through each worker. In the event that a worker is ready to run, it runs the function pointed to by that worker's do_work field. The highest-level async_context_threadsafe_background object also has an associated alarm pool. The callback function associated with that alarm leads ultimately back to the same low-level interrupt service routine.

Right now, we have just one when_pending_worker in our async_context. This worker is called cyw43_poll_worker, and its do_work function points to cyw43_poll_func(), listed above. This function ends up calling btstack_run_loop_poll_data_sources_from_irq(), about which we'll learn more shortly. This worker gets marked as pending in a GPIO interrupt. So, this provides a mechanism by which the CYW43 can tell the RP2040 to run btstack_run_loop_poll_data_sources_from_irq() which, as we're going to learn, goes and gets data from the device.


Here comes Bluetooth

We are still dissecting cyw43_arch_init(), but now we're passed the first few lines. The next thing that happens, in the example that we're considering, is ok &= btstack_cyw43_init(context) gets called. This function takes context, a pointer to an async_context object, as an argument. It returns a boolean to communicate success or failure. Let's look more closely at this function.

bool btstack_cyw43_init(async_context_t *context) {
    // Initialise bluetooth
    btstack_memory_init();
    btstack_run_loop_init(btstack_run_loop_async_context_get_instance(context));

#if WANT_HCI_DUMP
#ifdef ENABLE_SEGGER_RTT
    hci_dump_init(hci_dump_segger_rtt_stdout_get_instance());
#else
    hci_dump_init(hci_dump_embedded_stdout_get_instance());
#endif
#endif

    hci_init(hci_transport_cyw43_instance(), NULL);

    // setup TLV storage
    setup_tlv();
    return true;
}

The first line of this function sets aside the amount of memory that we'll require for the btstack, per the configurations in btstack_config.h. The next line of this function is really interesting. We have nested function calls here, so let's start in the interior and work out.

We are calling btstack_run_loop_init() with the value returned from btstack_run_loop_async_context_get_instance(context). What is btstack_run_loop_async_context_get_instance(context) doing?

const btstack_run_loop_t *btstack_run_loop_async_context_get_instance(async_context_t *async_context)
{
    assert(!btstack_async_context || btstack_async_context == async_context);
    btstack_async_context = async_context;
    return &btstack_run_loop_async_context;
}

This function takes our local pointer to an async_context object called context as an argument. Remember though that we set the value of this local pointer such that it points to the core field of the global cyw43_async_context_threadsafe_background, and we configured this global object to its default configurations. So, we're passing a pointer to the core field of this global object into the function.

The function then sets btstack_async_context = async_context;. The right side of this expression points to the core field of that global async_context_threadsafe_background_t, and the left side is a global pointer to an async_context_t declared in the same file as the above function. This is an important link! The pointer to an async_context_t called btstack_async_context now points to the same place as the locally declared context variable that we passed as an argument to btstack_run_loop_async_context_get_instance(), and this local variable points to the core field of the globally declared async_context_threadsafe_background_t called cyw43_async_context_threadsafe_background. The other functions make reference to btstack_async_context, but now in doing so they will be manipulating the async_context_t object which is the core field of cyw43_async_context_threadsafe_background.

Note what this function returns! It returns the address of a global variable (created in this file) of type btstack_run_loop_t called btstack_run_loop_async_context. This data structure is copied below:

static const btstack_run_loop_t btstack_run_loop_async_context = {
    &btstack_run_loop_async_context_init,
    &btstack_run_loop_async_context_add_data_source,
    &btstack_run_loop_async_context_remove_data_source,
    &btstack_run_loop_async_context_enable_data_source_callbacks,
    &btstack_run_loop_async_context_disable_data_source_callbacks,
    &btstack_run_loop_async_context_set_timer,
    &btstack_run_loop_async_context_add_timer,
    &btstack_run_loop_async_context_remove_timer,
    &btstack_run_loop_async_context_execute,
    &btstack_run_loop_async_context_dump_timer,
    &btstack_run_loop_async_context_get_time_ms,
    &btstack_run_loop_async_context_poll_data_sources_from_irq,
    &btstack_run_loop_async_context_execute_on_main_thread,
    &btstack_run_loop_async_context_trigger_exit,
};

Note that each of the fields of btstack_run_loop_async_context point to one of the other functions declared in this file. These functions use the async infrastructure to add/remove objects of type bstack_data_source and bstack_timer_source (under the hood, these look just like the workers already discussed) to the linked lists which are referenced by the higher-level bstack_run_loop.

Popping back up to bstack_cyw43_init(async_context_t *context), the address of btstack_run_loop_async_context is then passed as an argument to btstack_run_loop_init(). Let's look at this function:

void btstack_run_loop_init(const btstack_run_loop_t * run_loop){
    btstack_assert(the_run_loop == NULL);
    the_run_loop = run_loop;
    the_run_loop->init();
}

This function takes a pointer to a btstack_run_loop_t object (we point to btstack_run_loop_async_context), and it sets the value of the variable the_run_loop to the value of this pointer. the_run_loop is a globally declared pointer to a btstack_run_loop_t, and the device-agnostic btstack code uses this pointer. So, we've now attached this device agnostic code to our device-specific RP2040 code!

This function then calls the init field of the_run_loop. Because the_run_loop now points to btstack_run_loop_async_context, this is actually calling the init field of this object. We see that the init field of this object points to the function btstack_run_loop_async_context_init. Let's look at that function.

static void btstack_run_loop_async_context_init(void) {
    btstack_run_loop_base_init();
    btstack_timeout_worker.do_work = btstack_timeout_reached;
    btstack_processing_worker.do_work = btstack_work_pending;
    async_context_add_when_pending_worker(btstack_async_context, &btstack_processing_worker);
}

This function first calls btstack_run_loop_base_init(), which does the following:

// private data (access only by run loop implementations)
btstack_linked_list_t  btstack_run_loop_base_timers;
btstack_linked_list_t  btstack_run_loop_base_data_sources;
btstack_linked_list_t  btstack_run_loop_base_callbacks;

void btstack_run_loop_base_init(void){
    btstack_run_loop_base_timers = NULL;
    btstack_run_loop_base_data_sources = NULL;
    btstack_run_loop_base_callbacks = NULL;
}

You can see that this simply clears three linked lists, one containing timers, another containing data sources, and a third containing callbacks. The function then calls btstack_timeout_worker.do_work = btstack_timeout_reached;. This is setting the do_work field of the btstack_timeout_worker object (globally declared, of type async_at_time_worker_t) to point to a function called btstack_timout_reached. Here's that function:

static void btstack_timeout_reached(__unused async_context_t *context, __unused async_at_time_worker_t *worker) {
    // simply wakeup worker
    async_context_set_work_pending(btstack_async_context, &btstack_processing_worker);
}

As you can see, this simply wakes up another worker, which we'll look at now. The next line of btstack_run_loop_async_context_init is btstack_processing_worker.do_work = btstack_work_pending;. This is setting the do_work field of an async_when_pending_worker_t object to point to the function btstack_work_pending. Here's what that function looks like:

static void btstack_work_pending(__unused async_context_t *context, __unused async_when_pending_worker_t *worker) {
    // poll data sources
    btstack_run_loop_base_poll_data_sources();

    // execute callbacks
    btstack_run_loop_base_execute_callbacks();

    uint32_t now = to_ms_since_boot(get_absolute_time());

    // process timers
    btstack_run_loop_base_process_timers(now);
    now = to_ms_since_boot(get_absolute_time());
    int ms = btstack_run_loop_base_get_time_until_timeout(now);
    if (ms == -1) {
        async_context_remove_at_time_worker(btstack_async_context, &btstack_timeout_worker);
    } else {
        async_context_add_at_time_worker_in_ms(btstack_async_context, &btstack_timeout_worker, ms);
    }
}

Here is where we go get the data from the data sources! The first line calls btstack_run_loop_base_poll_data_sources();. Here's what that does:

void btstack_run_loop_base_poll_data_sources(void){
    // poll data sources
    btstack_data_source_t *ds;
    btstack_data_source_t *next;
    for (ds = (btstack_data_source_t *) btstack_run_loop_base_data_sources; ds != NULL ; ds = next){
        next = (btstack_data_source_t *) ds->item.next; // cache pointer to next data_source to allow data source to remove itself
        if (ds->flags & DATA_SOURCE_CALLBACK_POLL){
            ds->process(ds, DATA_SOURCE_CALLBACK_POLL);
        }
    }
}

It goes through every data source in the btstack_run_loop_base_data_sources linked list, checks whether that data source is labeled as DATA_SOURCE_CALLBACK_POLL and, if so, executes the function pointed to by that data source's process field. As we'll see, it's this process field which implements the low-level data transaction with (in our case) the Bluetooth module. After having executed all necessary process functions, the btstack_work_pending worker then calls btstack_run_loop_base_execute_callbacks();. Here's that function:

void btstack_run_loop_base_execute_callbacks(void){
    while (1){
        btstack_context_callback_registration_t * callback_registration = (btstack_context_callback_registration_t *) btstack_linked_list_pop(&btstack_run_loop_base_callbacks);
        if (callback_registration == NULL){
            break;
        }
        (*callback_registration->callback)(callback_registration->context);
    }
}

It goes through each callback in btstack_run_loop_base_callbacks. In the event that the callback is registered, it will call that function with user arguments as specified by the context field of the btstack_context_callback_registration_t object.

Next, this worker calls uint32_t now = to_ms_since_boot(get_absolute_time()); to lookup the current time since boot and store it in the local variable now. This is passed as an argument to btstack_run_loop_base_process_timers(now);, which does the following:

void btstack_run_loop_base_process_timers(uint32_t now){
    // process timers, exit when timeout is in the future
    while (btstack_run_loop_base_timers) {
        btstack_timer_source_t * timer = (btstack_timer_source_t *) btstack_run_loop_base_timers;
        int32_t delta = btstack_time_delta(timer->timeout, now);
        if (delta > 0) break;
        btstack_run_loop_base_remove_timer(timer);
        timer->process(timer);
    }
}

This goes through btstack_run_loop_base_timers. It checks the current time since boot against the time at which the callback for the timer should run. In the event that we haven't reached that time yet, the function just breaks. Otherwise, we remove the timer from btstack_run_loop_base_timers and execute the function pointed to by the process field of the btstack_timer_source_t object. btstack_work_pending then looks up the time since boot again, and uses that time to call int ms = btstack_run_loop_base_get_time_until_timeout(now);. This function returns the time until the next timeout. In the final conditional of this function, pasted below, we either remove the btstack_timeout_worker from the btstack_async_context in the event that there are no more timout workers, or alternatively we update the timeout time for the at_time_worker

if (ms == -1) {
        async_context_remove_at_time_worker(btstack_async_context, &btstack_timeout_worker);
    } else {
        async_context_add_at_time_worker_in_ms(btstack_async_context, &btstack_timeout_worker, ms);
    }

Let's think about this. We've created/initialized a few things:

  • bstack_async_context: We've pointed this to async_context, which points to the core field of cyw43_async_context_threadsafe_background.
    • bstack_processing_worker is an async_when_pending_worker for this context. The do_work field of this worker points to btstack_work_pending, which polls the data sources, executes callbacks, and updates at_time_workers.
    • bstack_timout_worker is an async_at_time_worker for this context. The do_work field of this worker points to btstack_timeout_reached, which in turn wakes up the bstack_processing_worker so that we end up back in bstack_work_pending.
  • btstack_run_loop_base_timers, btstack_run_loop_base_data_sources, and btstack_run_loop_base_callbacks are three linked lists which are manipulated/interacted with by bstack_work_pending.

We haven't yet added the Bluetooth module as a data source. That happens next.


More reflecting

Last time we paused to think, we had just one when_pending_worker in our async_context. This worker was called cyw43_poll_worker, and its do_work function points to cyw43_poll_func(), listed above. This function ends up calling btstack_run_loop_poll_data_sources_from_irq().

Now we've added another when_pending_worker to this async_context! This one is called btstack_processing_worker. The do_work field of this worker points to btstack_work_pending().

As we saw, this function goes through the linked list of data sources (btstack_run_loop_base_data_sources) and executes the function pointed to by the process field of each of those data sources (we haven't added any data sources yet). It then goes through and runs any registered callbacks in the linked list of callback functions (btstack_run_loop_base_callbacks), and then it runs the process function of any at_time workers that are ready to run (in btstack_run_loop_base_timers). In the event that there are no more at_time workers, it removes the btstack_at_time_worker from our async_context list of at_time_workers. Otherwise, it updates the time for this at_time_worker for the next item in btstack_run_loop_base_timers. Note that all the do_work field for the at_time_worker of the context does is mark btstack_processing_worker as pending.

Remember how the do_work field of the cyw43_poll_worker ended up calling btstack_run_loop_poll_data_sources_from_irq()? Well this ends up calling the function below:

static void btstack_run_loop_async_context_poll_data_sources_from_irq(void)
{
    async_context_set_work_pending(btstack_async_context, &btstack_processing_worker);
}

As you can see, all this function does is mark btstack_processing_worker as pending. So, all roads lead to the do_work function of this worker, which is btstack_work_pending(). Let's now add a data source.


Bluetooth as a data source

Let's pop back up to btstack_cyw43_init(async_context_t *context). That function is copied again below:

bool btstack_cyw43_init(async_context_t *context) {
    // Initialise bluetooth
    btstack_memory_init();
    btstack_run_loop_init(btstack_run_loop_async_context_get_instance(context));

#if WANT_HCI_DUMP
#ifdef ENABLE_SEGGER_RTT
    hci_dump_init(hci_dump_segger_rtt_stdout_get_instance());
#else
    hci_dump_init(hci_dump_embedded_stdout_get_instance());
#endif
#endif

    hci_init(hci_transport_cyw43_instance(), NULL);

    // setup TLV storage
    setup_tlv();
    return true;
}

The next line that executes (assuming we don't want an HCI dump) is hci_init(hci_transport_cyw43_instance(), NULL);. Let's take a look at that inner function, hci_transport_cyw43_instance():

// configure and return hci transport singleton
static const hci_transport_t hci_transport_cyw43 = {
        /* const char * name; */                                        "CYW43",
        /* void   (*init) (const void *transport_config); */            &hci_transport_cyw43_init,
        /* int    (*open)(void); */                                     &hci_transport_cyw43_open,
        /* int    (*close)(void); */                                    &hci_transport_cyw43_close,
        /* void   (*register_packet_handler)(void (*handler)(...); */   &hci_transport_cyw43_register_packet_handler,
        /* int    (*can_send_packet_now)(uint8_t packet_type); */       &hci_transport_cyw43_can_send_now,
        /* int    (*send_packet)(...); */                               &hci_transport_cyw43_send_packet,
        /* int    (*set_baudrate)(uint32_t baudrate); */                NULL,
        /* void   (*reset_link)(void); */                               NULL,
        /* void   (*set_sco_config)(uint16_t voice_setting, int num_connections); */ NULL,
};

const hci_transport_t *hci_transport_cyw43_instance(void) {
    return &hci_transport_cyw43;
}

This function returns a pointer to the globally declared hci_transport_cyw43 object, which is of type hci_transport_t. This object points to a collection of functions which implement the RP2040-specific logic for doing transport between microcontroller and CYW43. We'll look at a few of these in more detail, but let's remain at this level of abstraction for the moment.

hci_init() takes a pointer to the above object (and NULL) as arguments. Let's see what it then does with those arguments:

void hci_init(const hci_transport_t *transport, const void *config){

#ifdef HAVE_MALLOC
    if (!hci_stack) {
        hci_stack = (hci_stack_t*) malloc(sizeof(hci_stack_t));
    }
#else
    hci_stack = &hci_stack_static;
#endif
    memset(hci_stack, 0, sizeof(hci_stack_t));

    // reference to use transport layer implementation
    hci_stack->hci_transport = transport;

    // reference to used config
    hci_stack->config = config;

    // setup pointer for outgoing packet buffer
    hci_stack->hci_packet_buffer = &hci_stack->hci_packet_buffer_data[HCI_OUTGOING_PRE_BUFFER_SIZE];

    // max acl payload size defined in config.h
    hci_stack->acl_data_packet_length = HCI_ACL_PAYLOAD_SIZE;

    // register packet handlers with transport
    transport->register_packet_handler(&packet_handler);

    hci_stack->state = HCI_STATE_OFF;

    // class of device
    hci_stack->class_of_device = 0x007a020c; // Smartphone 

    // bondable by default
    hci_stack->bondable = 1;

#ifdef ENABLE_CLASSIC
    // classic name
    hci_stack->local_name = default_classic_name;

    // Master slave policy
    hci_stack->master_slave_policy = 1;

    // Allow Role Switch
    hci_stack->allow_role_switch = 1;

    // Default / minimum security level = 2
    hci_stack->gap_security_level = LEVEL_2;

    // Default Security Mode 4
    hci_stack->gap_security_mode = GAP_SECURITY_MODE_4;

    // Errata-11838 mandates 7 bytes for GAP Security Level 1-3
    hci_stack->gap_required_encyrption_key_size = 7;

    // Link Supervision Timeout
    hci_stack->link_supervision_timeout = HCI_LINK_SUPERVISION_TIMEOUT_DEFAULT;

#endif

    // Secure Simple Pairing default: enable, no I/O capabilities, general bonding, mitm not required, auto accept 
    hci_stack->ssp_enable = 1;
    hci_stack->ssp_io_capability = SSP_IO_CAPABILITY_NO_INPUT_NO_OUTPUT;
    hci_stack->ssp_authentication_requirement = SSP_IO_AUTHREQ_MITM_PROTECTION_NOT_REQUIRED_GENERAL_BONDING;
    hci_stack->ssp_auto_accept = 1;

    // Secure Connections: enable (requires support from Controller)
    hci_stack->secure_connections_enable = true;

    // voice setting - signed 16 bit pcm data with CVSD over the air
    hci_stack->sco_voice_setting = 0x60;

#ifdef ENABLE_BLE
    hci_stack->le_connection_scan_interval = 0x0060;   //    60 ms
    hci_stack->le_connection_scan_window   = 0x0030;    //   30 ms
    hci_stack->le_connection_interval_min  = 0x0008;    //   10 ms
    hci_stack->le_connection_interval_max  = 0x0018;    //   30 ms
    hci_stack->le_connection_latency       =      4;    //    4
    hci_stack->le_supervision_timeout      = 0x0048;    //  720 ms
    hci_stack->le_minimum_ce_length        =      0;    //    0 ms
    hci_stack->le_maximum_ce_length        =      0;    //    0 ms
#endif

#ifdef ENABLE_LE_CENTRAL
    hci_stack->le_connection_phys          =   0x01;    // LE 1M PHY

    // default LE Scanning
    hci_stack->le_scan_type     =  0x01; // active
    hci_stack->le_scan_interval = 0x1e0; // 300 ms
    hci_stack->le_scan_window   =  0x30; //  30 ms
    hci_stack->le_scan_phys     =  0x01; // LE 1M PHY
#endif

#ifdef ENABLE_LE_PERIPHERAL
    hci_stack->le_max_number_peripheral_connections = 1; // only single connection as peripheral

    // default advertising parameters from Core v5.4 -- needed to use random address without prior adv setup
    hci_stack->le_advertisements_interval_min =                         0x0800;
    hci_stack->le_advertisements_interval_max =                         0x0800;
    hci_stack->le_advertisements_type =                                      0;
    hci_stack->le_own_addr_type =                       BD_ADDR_TYPE_LE_PUBLIC;
    hci_stack->le_advertisements_direct_address_type =  BD_ADDR_TYPE_LE_PUBLIC;
    hci_stack->le_advertisements_channel_map =                            0x07;
    hci_stack->le_advertisements_filter_policy =                             0;
#endif

    // connection parameter range used to answer connection parameter update requests in l2cap
    hci_stack->le_connection_parameter_range.le_conn_interval_min =          6; 
    hci_stack->le_connection_parameter_range.le_conn_interval_max =       3200;
    hci_stack->le_connection_parameter_range.le_conn_latency_min =           0;
    hci_stack->le_connection_parameter_range.le_conn_latency_max =         500;
    hci_stack->le_connection_parameter_range.le_supervision_timeout_min =   10;
    hci_stack->le_connection_parameter_range.le_supervision_timeout_max = 3200;

#ifdef ENABLE_LE_ISOCHRONOUS_STREAMS
    hci_stack->iso_packets_to_queue = 1;
#endif

#ifdef ENABLE_LE_PRIVACY_ADDRESS_RESOLUTION
    hci_stack->le_privacy_mode = LE_PRIVACY_MODE_DEVICE;
#endif

    hci_state_reset();
}

This is a lengthy function, but all that it's doing is setting the various fields of a globally declared object called hci_stack, which is of type hci_stack_t. We won't dissect each of these fields, but I'd like to point out a couple in particular.

Note that one of the fields is called transport, which is of type hci_transport_t. We set the value of this field such that it points to our RP2040-specific hci_transport_cyw43, the pointer to which we passed in as an argument. The next notable line is:

// register packet handlers with transport
transport->register_packet_handler(&packet_handler);

In this line, we call the register_packet_handler field of the transport field (which points to hci_transport_cyw43) with a pointer to a function called packet_handler. Let's look at what that register_packet_handler function does. It points to the following:

static void hci_transport_cyw43_register_packet_handler(void (*handler)(uint8_t packet_type, uint8_t *packet, uint16_t size)) {
    hci_transport_cyw43_packet_handler = handler;
}

This function takes a pointer to a function, and sets a globally declared pointer to a function called hci_transport_cyw43_packet_handler such that it points to the same function. This is important, because some of the other functions pointed to by the fields of hci_transport_cyw43 call the function pointed to by hci_transport_cyw43_packet_handler. Now, when they do, they'll call the function packet_handler. This function is defined in the same file. It's extremely long, and for that reason it's not copied here, but it is a very large switch/case statement which implements the Bluetooth stack. It looks at the type of HCI_EVENT_PACKET received and takes action accordingly. Many of those actions involve calling the functions pointed to by the transport field of the HCI stack, which implement the low-level and RP2040-specific communication with the external CYW43.

One of the things that is important to note about packet_handler, however, is that it calls hci_emit_event(packet, size, 0);. This function is copied below.

static void hci_emit_event(uint8_t * event, uint16_t size, int dump){
    // dump packet
    if (dump) {
        hci_dump_packet( HCI_EVENT_PACKET, 1, event, size);
    } 

    // dispatch to all event handlers
    btstack_linked_list_iterator_t it;
    btstack_linked_list_iterator_init(&it, &hci_stack->event_handlers);
    while (btstack_linked_list_iterator_has_next(&it)){
        btstack_packet_callback_registration_t * entry = (btstack_packet_callback_registration_t*) btstack_linked_list_iterator_next(&it);
        entry->callback(HCI_EVENT_PACKET, 0, event, size);
    }
}

As you can see, this function goes through the linked list of pointers to functions in the hci_stack field event_handlers and runs their associated callback functions. This is how we notify the upper stack of the availability of an HCI packet.

But we haven't yet added the hci_tranport as a data source for our run loop! Where does this actually happen? Note that the hci_stack object above has a field called hci_stack->state which is initialized to HCI_STATE_OFF. One of the last lines of main() in the server example that we're considering is hci_power_control(HCI_POWER_ON); This function checks hci_stack->state, finds that it is HCI_STATE_OFF, and so calls hci_power_control_state_off(HCI_POWER_ON). This function then notices that the argument is HCI_POWER_ON, and thus calls hci_power_control_on(). That function, among other things, calls hci_stack->hci_transport->open(). In other words, it calls the function pointed to by the open field of the transport field of hci_stack, which is hci_transport_cyw43. Here's the function that open points to:

static int hci_transport_cyw43_open(void) {
    int err = cyw43_bluetooth_hci_init();
    if (err != 0) {
        CYW43_PRINTF("Failed to open cyw43 hci controller: %d\n", err);
        return err;
    }

    // OTP should be set in which case BT gets an address of wifi mac + 1
    // If OTP is not set for some reason BT gets set to 43:43:A2:12:1F:AC.
    // So for safety, set the bluetooth device address here.
    bd_addr_t addr;
    cyw43_hal_get_mac(0, (uint8_t*)&addr);
    addr[BD_ADDR_LEN - 1]++;
    hci_set_chipset(btstack_chipset_cyw43_instance());
    hci_set_bd_addr(addr);

    btstack_run_loop_set_data_source_handler(&transport_data_source, &hci_transport_data_source_process);
    btstack_run_loop_enable_data_source_callbacks(&transport_data_source, DATA_SOURCE_CALLBACK_POLL);
    btstack_run_loop_add_data_source(&transport_data_source);
    hci_transport_ready = true;

    return 0;
}

At the bottom of this function, you'll see that we associate a data source handler with a global object of type btstack_data_source_t called transport_data_source. This data source handler points to a function called hci_transport_data_source_process, which we'll look at in a moment. The next line above enables callbacks for this data source, and indicates them of type DATA_SOURCE_CALLBACK_POLL. The next line then adds this data source to the linked list of data sources. This then makes the hci transport layer a data source for our btstack_run_loop.

Let's look more closely at that source handler, hci_transport_data_source_process:

static void hci_transport_data_source_process(btstack_data_source_t *ds, btstack_data_source_callback_type_t callback_type) {
    assert(callback_type == DATA_SOURCE_CALLBACK_POLL);
    assert(ds == &transport_data_source);
    (void)callback_type;
    (void)ds;
    hci_transport_cyw43_process();
}

This function does a few error checks, then calls hci_transport_cyw43_process();. Let's look at that function:

// Called to perform bt work from a data source
static void hci_transport_cyw43_process(void) {
    CYW43_THREAD_LOCK_CHECK
    uint32_t len = 0;
    bool has_work;
    do {
        int err = cyw43_bluetooth_hci_read(hci_packet_with_pre_buffer, sizeof(hci_packet_with_pre_buffer), &len);
        BT_DEBUG("bt in len=%lu err=%d\n", len, err);
        if (err == 0 && len > 0) {
            hci_transport_cyw43_packet_handler(hci_packet_with_pre_buffer[3], hci_packet_with_pre_buffer + 4, len - 4);
            has_work = true;
        } else {
            has_work = false;
        }
    } while (has_work);
}

This function calls cyw43_bluetooth_hci_read, which executes an SPI transaction with the CYW43 and puts the received data into a buffer called hci_packet_with_pre_buffer. It then uses this buffer as an argument to hci_transport_cyw43_packet_handler. But remember! Since hci_init() called transport->register_packet_handler(&packet_handler);, we associated this function with that extensive packet_handler() function that implements the bluetooth stack. As we'll see, we can also add our own custom callbacks based on the contents of this packet, but we ultimately end up in packet_handler().


Pausing again

Last time we paused, our async_context contained two when_pending_workers (cyw43_poll_worker and bstack_processing_worker). The do_work fields of both of these workers lead to btstack_work_pending. This function goes through the linked lists of data sources, callbacks, and timer objects to run the functions pointed to by their process field, if that particular item is ready to be run. Previously, we didn't have any data sources, callbacks, or timers in any of those lists. Now we have a data source!

The data source that we've added is called transport_data_source, and the function pointed to by its process field is hci_transport_data_source_process. This function performs a low-level transaction with the CYW43 (an SPI transaction, as we'll see) and stores the received packet in a buffer called hci_packet_with_pre_buffer. It then uses that buffer as an argument to hci_transport_cyw43_packet_handler, which points to the packet_handler function in hci.c which implements the bluetooth stack.

So now when either of our when_pending_workers get marked as pending, we'll enter a low-priority interrupt service routine which executes the do_work field of whichever worker is marked as pending. The do_work field of each of our workers points ultimately to btstack_work_pending. This function will then go through each data source and run the function pointed to by its process field. We've now added the HCI transport layer as a data source, and its process function gathers a packet from the CYW43 over an SPI channel, and then calls a function which takes various actions based on that packet which implement the bluetooth stack. One of the things that this function (packet_handler) does is call hci_emit_event, which then runs each of the callback functions in hci_stack->event_handlers.

As we'll see, we can also add our own custom callback functions to take actions based on the data that we find in these packets.


Tidying up a loose end

Incidentally, let's take a closer look at cyw43_bluetooth_hci_read() (which is called in hci_transport_cyw43_process(), above). If you follow this all the way down, you end up at a function called cyw43_spi_transfer which looks like this:

int cyw43_spi_transfer(cyw43_int_t *self, const uint8_t *tx, size_t tx_length, uint8_t *rx,
                       size_t rx_length) {

    if ((tx == NULL) && (rx == NULL)) {
        return CYW43_FAIL_FAST_CHECK(-CYW43_EINVAL);
    }

    bus_data_t *bus_data = (bus_data_t *)self->bus_data;
    start_spi_comms(self);
    if (rx != NULL) {
        if (tx == NULL) {
            tx = rx;
            assert(tx_length && tx_length < rx_length);
        }
        DUMP_SPI_TRANSACTIONS(
                printf("[%lu] bus TX/RX %u bytes rx %u:", counter++, tx_length, rx_length);
                dump_bytes(tx, tx_length);
        )
        assert(!(tx_length & 3));
        assert(!(((uintptr_t)tx) & 3));
        assert(!(((uintptr_t)rx) & 3));
        assert(!(rx_length & 3));

        pio_sm_set_enabled(bus_data->pio, bus_data->pio_sm, false);
        pio_sm_set_wrap(bus_data->pio, bus_data->pio_sm, bus_data->pio_offset, bus_data->pio_offset + SPI_OFFSET_END - 1);
        pio_sm_clear_fifos(bus_data->pio, bus_data->pio_sm);
        pio_sm_set_pindirs_with_mask(bus_data->pio, bus_data->pio_sm, 1u << DATA_OUT_PIN, 1u << DATA_OUT_PIN);
        pio_sm_restart(bus_data->pio, bus_data->pio_sm);
        pio_sm_clkdiv_restart(bus_data->pio, bus_data->pio_sm);
        pio_sm_put(bus_data->pio, bus_data->pio_sm, tx_length * 8 - 1);
        pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_out(pio_x, 32));
        pio_sm_put(bus_data->pio, bus_data->pio_sm, (rx_length - tx_length) * 8 - 1);
        pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_out(pio_y, 32));
        pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_jmp(bus_data->pio_offset));
        dma_channel_abort(bus_data->dma_out);
        dma_channel_abort(bus_data->dma_in);

        dma_channel_config out_config = dma_channel_get_default_config(bus_data->dma_out);
        channel_config_set_bswap(&out_config, true);
        channel_config_set_dreq(&out_config, pio_get_dreq(bus_data->pio, bus_data->pio_sm, true));

        dma_channel_configure(bus_data->dma_out, &out_config, &bus_data->pio->txf[bus_data->pio_sm], tx, tx_length / 4, true);

        dma_channel_config in_config = dma_channel_get_default_config(bus_data->dma_in);
        channel_config_set_bswap(&in_config, true);
        channel_config_set_dreq(&in_config, pio_get_dreq(bus_data->pio, bus_data->pio_sm, false));
        channel_config_set_write_increment(&in_config, true);
        channel_config_set_read_increment(&in_config, false);
        dma_channel_configure(bus_data->dma_in, &in_config, rx + tx_length, &bus_data->pio->rxf[bus_data->pio_sm], rx_length / 4 - tx_length / 4, true);

        pio_sm_set_enabled(bus_data->pio, bus_data->pio_sm, true);
        __compiler_memory_barrier();

        dma_channel_wait_for_finish_blocking(bus_data->dma_out);
        dma_channel_wait_for_finish_blocking(bus_data->dma_in);

        __compiler_memory_barrier();
        memset(rx, 0, tx_length); // make sure we don't have garbage in what would have been returned data if using real SPI
    } else if (tx != NULL) {
        DUMP_SPI_TRANSACTIONS(
                printf("[%lu] bus TX only %u bytes:", counter++, tx_length);
                dump_bytes(tx, tx_length);
        )
        assert(!(((uintptr_t)tx) & 3));
        assert(!(tx_length & 3));
        pio_sm_set_enabled(bus_data->pio, bus_data->pio_sm, false);
        pio_sm_set_wrap(bus_data->pio, bus_data->pio_sm, bus_data->pio_offset, bus_data->pio_offset + SPI_OFFSET_LP1_END - 1);
        pio_sm_clear_fifos(bus_data->pio, bus_data->pio_sm);
        pio_sm_set_pindirs_with_mask(bus_data->pio, bus_data->pio_sm, 1u << DATA_OUT_PIN, 1u << DATA_OUT_PIN);
        pio_sm_restart(bus_data->pio, bus_data->pio_sm);
        pio_sm_clkdiv_restart(bus_data->pio, bus_data->pio_sm);
        pio_sm_put(bus_data->pio, bus_data->pio_sm, tx_length * 8 - 1);
        pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_out(pio_x, 32));
        pio_sm_put(bus_data->pio, bus_data->pio_sm, 0);
        pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_out(pio_y, 32));
        pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_jmp(bus_data->pio_offset));
        dma_channel_abort(bus_data->dma_out);

        dma_channel_config out_config = dma_channel_get_default_config(bus_data->dma_out);
        channel_config_set_bswap(&out_config, true);
        channel_config_set_dreq(&out_config, pio_get_dreq(bus_data->pio, bus_data->pio_sm, true));

        dma_channel_configure(bus_data->dma_out, &out_config, &bus_data->pio->txf[bus_data->pio_sm], tx, tx_length / 4, true);

        uint32_t fdebug_tx_stall = 1u << (PIO_FDEBUG_TXSTALL_LSB + bus_data->pio_sm);
        bus_data->pio->fdebug = fdebug_tx_stall;
        pio_sm_set_enabled(bus_data->pio, bus_data->pio_sm, true);
        while (!(bus_data->pio->fdebug & fdebug_tx_stall)) {
            tight_loop_contents(); // todo timeout
        }
        __compiler_memory_barrier();
        pio_sm_set_enabled(bus_data->pio, bus_data->pio_sm, false);
        pio_sm_set_consecutive_pindirs(bus_data->pio, bus_data->pio_sm, DATA_IN_PIN, 1, false);
    } else if (rx != NULL) { /* currently do one at a time */
        DUMP_SPI_TRANSACTIONS(
                printf("[%lu] bus TX %u bytes:", counter++, rx_length);
                dump_bytes(rx, rx_length);
        )
        panic_unsupported();
    }
    pio_sm_exec(bus_data->pio, bus_data->pio_sm, pio_encode_mov(pio_pins, pio_null)); // for next time we turn output on

    stop_spi_comms();
    DUMP_SPI_TRANSACTIONS(
            printf("RXed:");
            dump_bytes(rx, rx_length);
            printf("\n");
    )

    return 0;
}

All the way at the bottom, the interface makes use of two PIO state machines. The system will first try to put these state machines on PIO 1, but if that's not available it will put them on PIO0. It uses the claim mechanism to grab two unused state machines. You can see that this interface also uses two DMA channels, which it also uses the claim mechanism to grab. Be careful, these hardware claims could conflict with application code if you try to use a DMA channel or state machine which has already been claimed..

Let's pop back up to hci_transport_cyw43_open(). We thought about the last few lines of this function, but let's now look at the first few lines. The first line of the function calls cyw43_bluetooth_hci_init();. What does that do?

// Just load firmware
int cyw43_bluetooth_hci_init(void) {
    return cyw43_ensure_bt_up(&cyw43_state);
}

And what does cyw43_ensure_bt_up() do (that argument is a global variable of type cyw43_t).

static int cyw43_ensure_bt_up(cyw43_t *self) {
    CYW43_THREAD_ENTER;
    int ret = cyw43_ensure_up(self);
    if (ret == 0 && !self->bt_loaded) {
        ret = cyw43_btbus_init(&self->cyw43_ll); // todo: Passing cyw43_ll is a bit naff
        if (ret == 0) {
            self->bt_loaded = true;
        }
    }
    CYW43_THREAD_EXIT;
    return ret;
}

One of the first things that it does is call cyw43_ensure_up(self). Let's look at that:

static int cyw43_ensure_up(cyw43_t *self) {
    CYW43_THREAD_LOCK_CHECK;

    #ifndef NDEBUG
    assert(cyw43_is_initialized(self)); // cyw43_init has not been called
    #endif
    if (cyw43_poll != NULL) {
        cyw43_ll_bus_sleep(&self->cyw43_ll, false);
        return 0;
    }

    // Disable the netif if it was previously up
    cyw43_cb_tcpip_deinit(self, CYW43_ITF_STA);
    cyw43_cb_tcpip_deinit(self, CYW43_ITF_AP);
    self->itf_state = 0;

    // Reset and power up the WL chip
    cyw43_hal_pin_low(CYW43_PIN_WL_REG_ON);
    cyw43_delay_ms(20);
    cyw43_hal_pin_high(CYW43_PIN_WL_REG_ON);
    cyw43_delay_ms(50);

    #if !CYW43_USE_SPI
    // Initialise SDIO bus
    // IRQ priority only needs to be higher than CYW43_THREAD_ENTER/EXIT protection (PENDSV)
    cyw43_sdio_init();
    #endif

    // Initialise the low-level driver
    #if !CYW43_USE_OTP_MAC
    cyw43_hal_get_mac(CYW43_HAL_MAC_WLAN0, self->mac);

    int ret = cyw43_ll_bus_init(&self->cyw43_ll, self->mac);
    #else
    // Not setting mac address. It should come from otp
    int ret = cyw43_ll_bus_init(&self->cyw43_ll, NULL);
    #endif

    if (ret != 0) {
        return ret;
    }

    #if CYW43_USE_OTP_MAC
    // Get our mac address cyw43_hal_get_mac can get this from cyw43_state.mac
    cyw43_ll_wifi_get_mac(&self->cyw43_ll, self->mac);
    #endif

    CYW43_DEBUG("cyw43 loaded ok, mac %02x:%02x:%02x:%02x:%02x:%02x\n",
        self->mac[0], self->mac[1], self->mac[2], self->mac[3], self->mac[4], self->mac[5]);

    // Enable async events from low-level driver
    cyw43_sleep = CYW43_SLEEP_MAX;
    cyw43_poll = cyw43_poll_func;
    #if USE_SDIOIT
    cyw43_sdio_set_irq(true);
    #elif !CYW43_USE_SPI
    // If CYW43_PIN_WL_HOST_WAKE has a falling edge, cyw43_poll (if it's not NULL) should be called.
    cyw43_hal_pin_config_irq_falling(CYW43_PIN_WL_HOST_WAKE, true);
    #endif

    // Kick things off
    cyw43_schedule_internal_poll_dispatch(cyw43_poll_func);

    return ret;
}

I'd like to point out the line cyw43_poll = cyw43_poll_func;. It is here that we set the pointer for cyw43_poll, which is the do_work field of the when_pending_worker called cyw43_poll_worker. Recall that this worker is set to pending in the GPIO interrupt service routine that can be triggered by the CYW43.

We can now pop all the way back up to main(). We've completed the analysis of cyw43_arch_init().