Writing a Bootloader for the ARM SAM3 SAM4 Microprocessor

A bootloader is a very important feature for many embedded devices. It allows a manufacturer to easily upgrade firmware, to add features and fix bugs after the product has been released. This can done on-site by a field service technician or remotely via the Internet.

I will use as an example, software I wrote for a medical device that needed a bootloader. The system used multiple ARM processors. I needed to write a custom bootloader for the Atmel (now Microchip) SAM3E processor for this system to enable in circuit firmware updates. We were downloading the new firmware through a CAN bus or via a serial port. However, the method described here works for other transport protocols as well.

Bootloaders can be tricky to write because they require intimate knowledge of the inner workings of the CPU. The Bootloader application needs to be able to create self modifying code. It needs to be able to write to areas of memory that are normally protected. The compiler’s linker file needs to be modified to handle a memory map that is different from that used in a normal application. And special compiler options need to be set up. I’ll describe how to write a bootloader, first for the SAM3E and then, for the SAM4E. Hopefully, this will save you time and give you a head start on writing your own bootloader.

Flash Memory Partitioning

The SAM3 stores program code in non volatile flash memory. For this medical device, it was essential that in case of a failed download, there is always a working version of the code available so the instrument can continue to run. In order to do this, the flash needs to be partitioned so there is a separate area of flash for the current application code and an area to download the new firmware.

However, the SAME3E has an additional hardware restriction. You cannot erase a section of flash while you are executing a program from the flash block. Aspecial feature of this processor allows the user to configure the flash as either 1 large block of flash or 2 separate blocks of flash. This allows the application to write to one block of memory for downloads while executing the program from the other block of memory.

Dual Bank Bootloader

The manufacturer provides an Application Note, “Atmel AT02333: Safe and Secure Bootloader Implementation for SAM3/4”. It is a general overview of the bootloader process and issues. It describes a Dual Banking method for the SAM3 which sounded perfect for this application. The flash memory is split into two banks.

While executing from one bank you can use the other bank to download new firmware. Once the code is verified, you can tell the processor to swap the address of the banks and reboot. Since the new code will be remapped to the correct stating address, it should be able to run without modification.
While the Application Note provides a good overview of the issues involved in creating a bootloader, it provides little useful information on how to actually write the code for a bootloader. And, there was a much bigger problem: The method described in the Application does not work. The SAM3E has a bug in its implementation (not identified in the Application Note) and this dual bank method will not work.

SAM3E Bug

The SAM3E supports a reasonably large 512KB of program memory. Due to bug in the chip (later confirmed by Atmel), the Dual Bank method does not work with programs greater than 64K. The SAM3E uses the GPNVM2 signal to select which of the 2 banks are used to execute the application program. However, on page 1143 of the SAME3 datasheet, in section 49.1.1.3 of the Errata, it describes this bug in the chip.

“GPNVM2 enables to select if Flash0 or Flash1 is used for the boot. But when GPNVM2is set, only the first 64 Kbytes of the Flash 1 are seen in the Boot Memory. The rest of the Boot memory corresponds to Flash 0 content.

Problem Fix/Workaround
No fix required if the code size is less than 64Kbytes. Otherwise the software needs to take into account this limitation if GPNVM2 is used.”

This means that due to a bug in the chip (which they have never fixed) a program could not use more than 25% of the available program space. Apparently the designers did not properly map the entire address space. The GPNVM2 signal can only swap the first 64K of program space. This means that if your application was any larger, only the first 64K could be mapped into the program space. The remaining code after this first 64K will be old code or garbage data from a previous version of the application. This 64K program maximum program size was unacceptable for my project, so I needed a different approach.

A Single Bank Bootloader

The Application Note describes another architecture for a bootloader: Single Bank Flash Programming. Using this method, the entire flash memory is treated as one bank. The upper section of memory is reserved for downloading new firmware.

However, this method introduces additional complications. The first one is described as follows:

“Single banked Flash cannot be read and written at the same time since it is single plane. Therefore, a program executing from the Flash cannot perform a memory write. Since the bootloader is located inside the Flash, it must be copied to the RAM prior to execution”

You cannot write to a bank of flash while you are reading from it. So you have to write to flash from a routine that executes out of RAM. This make the job a little more complicated, since by default, all routines execute from flash. The manufacturer provided some source code to show how to do this. However, the code provided does not compile correctly and does not work. But I found that I could still write using a s RAM function by writing code that uses some of the low level ASF driver functions. The manufacturer provides a library, the ASF library, of functions to handle some low level drivers that configure the ARM processor. They provide very little documentation on how to use this library. But they do provide the source code so with a bit of digging you can find some very useful routines. Some of these allow you write the flash from RAM.

The Single Bank method has its complications. One of the tricky aspects of bootloaders is that you cannot just download a new program over the currently executing program. Both programs have the same starting address. You cannot erase the program you are currently executing and put new code in its place. And of course the SAM3E does not allow you to write to a block of flash while you are executing from that block. Therefore, you have to load your new code to a buffer area in flash.

Additionally, application programs are coded to execute from specific addresses in lower memory. The bootloader downloads code into a different area of flash. The code cannot execute from there. This means that we have to move the code to its intended location before the application can run.

Bootloader Memory Map

The SAM3E8E loads its flash program memory at location 0x80000 by default. If we use the upper half of flash as a download buffer, then the buffer starts at 0x80000 + 0x40000 (1/2 flash) = 0xC0000. When the download is complete, the new code will reside at 0xC00000. However, it cannot execute from there because the compiler created code that expects to be loaded at 0x80000. The addresses of fixed locations in memory will be wrong. Also the interrupt vector table will be pointing to an address in the wrong section of flash.

My custom bootloader has three code components. The first component is a section of code included as part of each application program. This code downloads firmware and places it into the flash buffer area.

The second component is the new firmware application. It now resides in a block of upper memory, the flash buffer. The current application needs to verify that new code has been downloaded correctly. It does this by calculating the CRC of the new code and comparing that CRC to the CRC that was stored by the compiler in the application code. Many compilers have an option to embed a CRC. If a particular compiler does not have that option, this can be added by post processing the compiled HEX file and adding a CRC. If the CRC does not match, then the error is flagged and the download can be retried.

The third component is a Loader program that resides at the top of Flash memory. This is a separate standalone piece of code. Its sole purpose is to copy the code from the buffer to the lower memory where the program will execute. This Loader firmware is burned in once at the factory using a JTAG programmer and is never changed. This Loader is not field upgradeable.

This post describes the architecture for the bootloader. I’ve used a similar method to implement bootloaders for the SAM4 as well. I’ will describe how to implement the code in more detail in a future post.

Bob Weiman (TheOracle) is a Consultant designing Real-Time Embedded Systems for Medical, Commercial, and Robotics applications. He is the President of Oracle Engineering Inc. (www.AskTheOracle.com).