JACK WHITHAM PhD MEng
Professional Activities - Publications - Software - Articles
   
       
    Home -> Software -> Scratchpad Memory Management Unit    

 
 
  

SMMU Downloads

 
smmu_kit_1.03.tar.bz2
SMMU version 1.03 for Microblaze. Includes VHDL, simulator, test programs and "jpeg-6b" case study. A Xilinx Platform Studio (XPS) project is included, suitable for EDK version 10.1 SP3 or 11.3. Released 29/3/10.
 
hardware / software / simulator
SMMU version 1.00 for Microblaze. Released 14/4/09.
 

SMMU Publications

 
YCS-2009-439
Technical report - "The Scratchpad Memory Management Unit for Microblaze: Implementation, Testing and Case Study".
 
EMSOFT '09 pp 265-274
Peer-reviewed conference paper - "Implementing Time-predictable Load and Store Operations".
 
RTAS '10 pp 205-214
(experiment software)
Peer-reviewed conference paper - "Studying the Applicability of the Scratchpad Memory Management Unit".
 
ECRTS '10 (to appear)
(experiment software)
Peer-reviewed conference paper - "Investigating average versus worst-case timing behaviour of data caches and data scratchpads".
 
 
More in the pipeline..
See also: my publications page
  
 

The Scratchpad Memory Management Unit

A data cache alternative for hard real-time systems


The scratchpad memory management unit (SMMU) is a hardware device that makes it easier to use a scratchpad RAM (SPM) from a conventional C program.

SPM can be used to speed up programs and reduce the energy they consume, both of which are important in mobile devices such as the Nintendo DS. However, the purpose of the SMMU is time predictability, a second advantage of SPM. In a hard real-time system, it is important to know the upper bound on the execution time of a program, and that means knowing the upper bound on the execution of each instruction in the program, at least in some sense. This is not always easy! Load and store instructions, which access the computer's memory, are particularly problematic especially if a cache is used, because the memory access times (latencies) depend on the contents of the cache, and the contents of the cache depend on earlier references to memory (the "reference string").

With the SMMU, the programmer or compiler can declare which data objects should be stored in SPM, guaranteeing low-latency accesses to those objects. This eliminates dependence on the reference string. It is similar to the well-known processes of locking objects into cache, or copying objects into SPM, but the SMMU eliminates the following important concerns:

The SMMU is designed to avoid these problems while remaining time-predictable and fast.

The benefit: dynamic data structures can be used within time-predictable software. The SMMU enables the use of object-oriented programming techniques in hard real-time code, along with advanced algorithms that are not suitable for conventional SPM and cache analysis techniques.

Address Transparency

The SMMU implements address transparency. An object can be relocated to SPM without changing its logical address, the address used by the program. Relocation does not invalidate any pointers and causes no issues related to pointer aliasing.

This feature is provided by the SMMU's remapping table. It takes addresses generated by the CPU, and compares them to the addresses of the objects currently loaded into SPM. If there is a match, the access is redirected to the SPM. The program notices no difference (except that the access is very fast). Otherwise, the access goes on to external memory.

This makes it easy to store heap data (e.g. dynamic data structures) within SPM while retaining fully time-predictable memory accesses. The programmer or compiler need only state which objects should be in SPM. Objects may be moved between SPM and external memory at any time. There is no need to update any pointers to reflect such a transition.

Efficient use of SPM space

The remapping process can place objects at arbitrary SPM addresses. This means that SPM space can be used very efficiently. Objects do not need to take up an entire cache way: potentially, all of the SPM space can be used.

Time-predictability

Any "load" or "store" instruction has single-cycle latency if the object being accessed is in SPM. A local data flow analysis reveals if the "load"/"store" uses a pointer that is guaranteed to be in SPM: given this information, worst case execution time (WCET) analysis can determine an estimate for the maximum execution time of the program (provided that the program is otherwise suitable for WCET analysis).

In SMMU terminology, we say that an object in SPM is "OPEN". Programs OPEN the objects that are likely to be accessed repeatedly. This terminology has been chosen because of the similarity to opening a file on disk, e.g. with "fopen". If the file will be accessed repeatedly, a well-behaved program will typically keep it open, rather than repeatedly running the sequence fopen/update/fclose. This is because fopen and fclose incur a time overhead, which becomes very significant if the operations are used repeatedly. So, smart programmers keep files open whenever possible.

Similarly, programs should keep data on-chip whenever possible, to avoid the overhead of transferring it to and from external memory. A cache handles this job implicitly: easy to use, but not time-predictable. An SMMU places the task of OPENing and CLOSEing objects in the hands of the programmer or compiler: more difficult to use, but time-predictable.

Hardware and Software Downloads

SMMU hardware designs are available under the GNU Lesser General Public License version 2.1.

The first release of the hardware and software was in April 2009.

The current version, released March 2010, is 1.03: smmu_kit_1.03.tar.bz2. This includes support for read-only objects. As with many aspects of the SMMU's design, the devil is in the details, and read-only object support proved to be quite tricky to implement correctly. A big issue with SMMU design is dealing with OPEN objects that overlap in memory, and this becomes even more difficult when some of the objects may be read-only. Writes to objects OPENed as "read-only" may occur in correct programs due to pointer aliasing, so it was very important to handle this situation.

The downloadable software and hardware should be enough to incorporate the SMMU into your own hardware designs and experiments. Hardware experiments will be particularly easy if you have access to the Xilinx EDK tools, since you will be able to use the SMMU with the Microblaze CPU and other Xilinx components. The designs have been tested with tool versions 10.1 (SP3) and 11.3.

Publications

To date, there are four SMMU publications other than this web page.

Criticisms of the SMMU

There are three major technical criticisms.

Roadmap

Research is ongoing. In future I will be focusing on integration of SPM/SMMU control into a compiler, most likely based on the LLVM technology. This will enable further research into the applicability and costs/benefits of the technology.

But I also want to encourage other people to think about using scratchpads for real-time and embedded systems, with and without the SMMU, and I welcome feedback, suggestions and criticisms. I'm also providing the source code and hardware designs and you are welcome to make use of this for your own work.


       
  Copyright (C) Jack Whitham 1997-2010