Basic Concepts » History » Revision 27
Revision 26 (Alexander Kamkin, 03/07/2013 08:10 AM) → Revision 27/44 (Alexander Kamkin, 03/07/2013 08:14 AM)
h1. Basic Concepts of Reconfigurable Test Program Generation _~By Alexander Kamkin~_ {{toc}} h2. Introduction Nowadays, a large number of microprocessor devices are designed and produced all round the world. Many of them are based on standard _instruction set architectures_ (_ISA_), but there are also designs with the unique instruction sets and designs that significantly extend the existing standards. A frequently used approach to verify such kind of microprocessors is to develop testing tools from scratch without using the components developed previously. However, it does not work in the modern economy with the ever-shrinking time to market and the exponential growth in hardware complexity. It is well known that _test program generation_ (_TPG_) is the central means for verifying microprocessors and other programmable devices at the core or full-chip level. A _test program_ is a valid sequence of instructions that solves a specific verification task (in other words, creates a certain set of interactions within the target design and, possibly, checks the correctness of the design''s state at the end of execution). A tool that creates test programs for a given microprocessor architecture in an automated way is referred to as a _generator_ or _TPG tool_. Here, we describe a concept of a _reconfigurable test program generator_ "MicroTESK":http://forge.ispras.ru/projects/microtesk, which has a platform-independent generation core and is configured by modeling the target ISA and basic microarchitectural features. The idea is not new and has already been implemented in the industrial TPG tools such as *Genesys-Pro* (IBM Research Lab) and *RAVEN* (Obsidian Software/ARM). However, tuning the generators for concrete architectures (or even microarchitectures) is still a difficult task that is carried out by special staff only (namely, _modeling engineers_). Our goal is to reach a new level in the TPG configuration flexibility. In order to do it, we suggest applying so-called _architecture description languages_ (_ADL_), which are commonly used for developing microprocessor simulators. In the approach suggested, a microprocessor ISA is described in some ADL and then automatically translated into a _microprocessor model_ together with a _test coverage model_. Fine tuning of the generator is performed by the subsystem-specific configuration files. Such an approach is especially useful in the early design stages when the project is frequently modified and rapid generator reconfiguration is required. h2. Reconfigurable Test Program Generation The key idea of the reconfigurable TPG is that configuration of a generator for a concrete microprocessor architecture is implemented as easy as possible. It is obvious that to fulfill this requirement, a generator should be divided into two parts: (1) a platform-independent component (so-called _core_ or _engine_) and (2) a component describing all platform-specific features (_model_ or _configuration_). Such kinds of TPG tools are usually called _model-based test program generators_. This is the first and probably the most important step the generators have made during their evolution. Let us consider the model-based approach a bit more in detail using Genesys-Pro as an example (see Fig. 1). p=. !mbtpg.png! p=. *Figure 1. General structure of a model-based test program generator* A verification engineer provides a _test template_ of the scenario that should occur in the test program. A TPG tool generates a program by formulating and then solving constraints for each instruction of the test template or for groups of instructions. It also randomizes values of the unspecified parameters to produce different test program variations for the template given. The constraints are formulated in terms of the model, which consists of the _microprocessor model_ and _test coverage model_. The first of them defines syntax and semantics of the target design''s instructions, while the second one describes special conditions (known as _test situations_) to be achieved when verifying the design. After formulating the CSP, the generation core solves it by using external or built-in solvers and assigns the values to the instructions'' operands. Then, it interprets the instructions under processing (for example, by using a _microprocessor simulator_) and takes the next group of instructions. Having performance limitations, TPG tools generally do not process test templates as a whole. Instead, they split them up into small pieces and formulate several CSPs. It is clear that such a technique can fail to find an exact solution of the joint CSP even when one is available, but it works well for test programs with low-density dependencies between instructions. Coming back to the topic, what is the main distinction of the _reconfigurable_ TPG from the conventional model-based TPG? Tuning of a generator (in other words, creation of a microprocessor model together with a test coverage model) should be as much flexible and automated as it can be done by an ordinary verification engineer, not only by a trained modeling specialist. The important technological requirement for reconfigurable tools is usage of well-known or easy-to-learn languages, for example, ADLs (nML, ISDL, EXPRESSION, etc.) to represent a model. One of the possible solutions is implemented in a research prototype MA ^2^ TG (National University of Defense Technology, China). p=. !adltpg.png! p=. *Figure 2. General structure of an ADL-based test program generator* The main contributions of the scheme (see Fig. 2) are as follows. First, it eases microprocessor modeling by using ADL specifications. Second, it automates test coverage extraction via static analysis of the ADL code. Third, it allows constructing a microprocessor simulator automatically. Being rather flexible, the ADL-based approach is however not exactly what we mean by _reconfigurable_. Specification of microarchitecture by means of ADLs is rather awkward. There is often no need to create a microarchitectural description (especially a structure-oriented one). All information the generator should know can be expressed with a high-level configuration of microprocessor subsystems. A good example is a cache memory. As structural description of a typical cache unit consists of several thousand lines of code (LOC), its configuration can be set by five parameters only: (1) size, (2) associativity, (3) an index calculation function, (4) a tag calculation function, and (5) policy of data displacement. Moreover, if we are speaking about data displacement, only a few strategies are in general use (LRU, pseudo LRU, and some others). Thus, it is possible to create a library of typical configurations and choose an appropriate one when tuning the generator. So, the idea is to describe microarchitectural properties using subsystem-specific parameters. This concept is illustrated in Fig. 3. p=. !rtpg.png! p=. *Figure 3. General structure of a reconfigurable test program generator* Another problem being solved by the reconfigurable TPG is _automated test template generation_ (of course, the possibility to develop templates manually remains as before). There is an evident dilemma here. To be able to generate test templates providing high-quality verification, accurate models are needed, but their creation is labor-consuming and thereby contradicts the concept of easy configuration. The dilemma is usually solved as follows. In early design stages, simple models coupled with the light-weight test generation techniques are used. As soon as the project becomes more mature, one can apply more accurate models and more directed tests. Concluding the section, let us outline the basic requirements for the reconfigurable TPG. First, there should be a possibility to quickly change the microprocessor instruction set (add new instructions, modify the existing ones, and so on). Second, a user should be able to easily change the microarchitectural properties of the design (cache parameters, address translation algorithm, etc.). Third, a reconfigurable TPG tool should implements facilities for adding new test situations for single instructions and for groups of instructions. Finally, it should be able to generate test programs automatically according to high-level parameters. These problems have not been solved by the existing tools, and they are the main subject of our work. h2. Configuration of Microprocessor Subsystems In this section, we consider the suggested method for developing reconfigurable test program generators. A target design''s architecture is described in one of the supported ADLs (the present version of the tool supports only *nML/Sim-nML*; the list of the supported languages will be extended in the future). A microprocessor description focuses on general functionality of the instructions not considering the design''s microarchitecture. Sim-nML code specifying the integer addition instruction (@ADD@) from the MIPS ISA is shown below. <pre> op ADD(rd: GPR, rs: GPR, rt: GPR) action = { if(NotWordValue(rs) || NotWordValue(rt)) then UNPREDICTABLE(); endif; tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>; if(tmp_word<32..32> != tmp_word<31..31>) then SignalException("IntegerOverflow"); else rd = sign_extend(tmp_word<31..0>); endif; } syntax = format("add %s, %s, %s", rd.syntax, rs.syntax, rt.syntax) op ALU = ADD | SUB | ... } </pre> This description is rather compact and almost replicates the instruction''s specification in the MIPS manual. Several particular features should be emphasized. First, ADL specifications can use predefined functions, like @UNPREDICTABLE@, which indicates the situations, where the design''s behavior is undefined (such situations should not be created in test programs). Second, by analyzing control flow one can automatically extract test coverage models for individual instructions. In the example above, there are two basic situations for the @ADD@ instruction: <pre> IntegerOverflow: ... tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>; ASSERT(tmp_word<32..32> != tmp_word<31..31>); Default: ... tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>; ASSERT(tmp_word<32..32> == tmp_word<31..31>); </pre> Third, it is possible to examine potential dependencies via registers by determining which of the instructions use the registers of the same type. Fourth, instructions can be grouped into hierarchical classes providing a generator and a verification engineer with useful information to be utilized within test templates. Finally, basing on such specifications, a generator is able to predict the correct result of the test program execution. From the practical point of view, the things stated above mean that as soon as the microprocessor ISA is specified, without any additional development one can use test templates of the following kind (of course, they can be complicated by using the sequencing-control statements): <pre> ALU ?, ?, ? ADD R, ?, ? @ Default SUB ?, R, ? @ IntegerOverflow </pre> The first line of the template produces any instruction from the @ALU@ class (@ADD@, @SUB@, @MUL@, @DIV@ or some other). Since the test situation for the instruction is unspecified, it might be any valid situation. The second line produces an @ADD@ instruction being executed normally. The third one produces a @SUB@ instruction causing the integer overflow exception. There is also a register dependency between the second @ADD@ and the third @SUB@. In spite of shallow nature of the behavioral specifications, they might however be useful. There are three types of tests that can be generated fully automatically: # _tests for single instructions_ (covering all test situations for all instructions); # _combinatorial tests_ (enumerating short sequences of instructions, test situations for them and dependencies between them); # _random tests_ (producing random but valid sequences of instructions). These facilities suffice for many verification tasks. If more directed test programs are needed, one should provide an additional description of the microprocessor subsystems. To configure the generator, we suggest using _subsystem-specific languages_ instead of the universal structure-oriented ADLs. It significantly eases the generator tuning and simplifies analysis of specifications. h3. Memory Management Unit Configuration of a _memory management unit_ (_MMU_) touches upon such devices as a _cache memory_ (_L1_ and _L2_), _translation lookaside buffer_ (_TLB_), _main memory_ and, probably, some others. The main goal of the MMU configuration is to enable the generator to process memory-related events and complex data dependencies in test templates. These events include _buffers hits/misses_ and _memory exceptions_. The data dependencies take into account structural properties of the memory devices, for example, there can be such dependencies as _hit into the same cache set_ or the like. A typical memory buffer (cache or TLB) is modeled as an _array_ consisting of _sets_ of _lines_, where a line can be a structure comprising several bit vectors, so-called _fields_. For example, a TLB entry of a MIPS microprocessor consists of the following fields: @R@, @VPN2@, @MASK@, @G@, @ASID@, @PFN0@, @V0@, @C0@, @D0@, @PFN1@, @V1@, @C1@ and @D1@. A _fully-associative_ device contains a single set of lines; a _direct-mapped_ one is an array of single-line sets. Each device stores some _data_ which can be accessed (read from or written into) by their _address_. If a device contains a line with a given address, this situation is called a _hit_; the opposite situation is refered to as a _miss_. If a miss occurs, the device usually displaces one of the set''s lines with the line associated with the address given. All memory devices are configured with the following set of parameters: _sets_ (associativity, i.e. set size), _lines_ (number of sets), _line_ (optional description of a line''s fields), _index_ (function for index calculation), _match_ (predicate checking whether a line and an address match each other) and _policy_ (data displacement strategy). Here is an example: <pre> buffer L1 { sets = 4; lines = 128; line = (tag:30, data:256); index(addr:PA) = addr<9..8>; match(addr:PA) = line.tag == addr<39..10>; policy = lru; } </pre> The configuration above describes a 4-way set associative buffer (see the parameter @sets@) consisting of 128 sets (@lines@). Each line of the cache contains a 30-bit tag and 256-bit data (@line@). When accessing data, the cache determines a set by calculating a 2-bit index (@index@). If a miss occurs, i.e. the set does not contain a line with the tag equal to the 30 upper bits of the physical address (@match@), the cache displaces the _least-recently-used_ (_LRU_) line of the set (@policy@). When all the memory devices are configured, the general memory access algorithms should be defined. Let us consider the MIPS''s instruction @LD@, which loads a 64-bit double word from the main memory into the general-purpose register (GPR). Its Sim-nML specification written according to the ISA manual looks as follows: <pre> vAddr := sign_extend(offset) + base; if(vAddr<2..0> != 0) then SignalException("AddressError"); endif pAddr::CCA := AddressTranslation(vAddr, DATA, LOAD); memdoubleword := LoadMemory(CCA, DWORD, pAddr, vAddr, DATA); rt := memdoubleword; </pre> If the virtual address @vAddr@ is unaligned, then the address error exception is raised. Otherwise, the virtual address is translated into the physical one (@pAddr@), and cache coherence algorithm (@CCA@) is get. After that, a 64-bit double word (@memdoubleword@) is loaded from the main memory and then written into the general-purpose register @rt@. Translation of a virtual address is modeled as the invocation of the @AddressTranslation@ function. Reading from the main memory is described with the help of the @LoadMemory@ function. In terms of the example, we need to concretize these two functions. Special operators are proposed for this purpose: * @hit<Buffer>(Address) {[loaded:Data1[;]][storing:Data2]}@ states that an instruction accesses @Buffer@ using @Address@, and a hit occurs; @Data1@ are data loaded from the corresponding line of the device (read operation), and @Data2@ are data to be stored in the line (write operation); * @miss<Buffer>(Address){[replacing:Data]}$ states that an instruction accesses @Buffer@ using @Address@ and a miss occurs; @Data@ are data that replace the data stored in the corresponding line of the device. Analysing the specifications of the memory access algorithms, the generator automatically constructs the set of memory-related test situations. A bit simplified fragment of the MIPS address translation specification corresponding to a TLB hit and a L1 hit is given below: <pre> vAddr := sign_extend(offset) + base; if(hit<TLB>(vAddr){loaded:PFN}) then pAddr := PFN::vAddr<12..0>; if(hit<L1>(pAddr){loaded:memdoubleword}) then rt := memdoubleword; ... endif ... endif </pre> Here is an example demonstrating the basic facilities of the TPG tool on specifying memory-related test situations in test templates (@SD@ is an instruction that stores a 64-bit double word into the main memory). <pre> LD ?, ?(?) @ hit<TLB>{loaded:PFN1} && miss<L1> && hit<L2> ... SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2 && hit<L1> </pre> h3. Pipeline Control Unit Configuration of a _pipeline control unit_ (_PCU_) is targeted at automated generation of test templates covering pipeline hazards including _exceptions_, _data hazards_, _structural hazards_ and _control hazards_. The latter are described more in detail in the next section. Configuration describes a static structure of the microprocessor pipeline as well as its dynamic behavior. Description of the classical 5-stages RISC pipeline looks as follows. <pre> pipeline() { IF(I); // Fetch ID(I); // Decode EX(I); // Execute MEM(I); // Memory access WR(I); // Writeback } </pre> It starts with the scope @pipeline@ that contains top-level @units@ (so-called, _pipe stages_) participating in the instruction processing. Each unit takes a single input/output parameter, which is an instruction being processed. It should be said that a unit may include nested subunits and use the usual conditional statements (@if@, @switch@, @while@, etc.) to describe the pipeline control flow. A simplified example is given below. <pre> EX(I: Instruction) { switch(I.type) { case ALU: EX_ALU(I); case FPU: EX_FPU(I); ... }} } </pre> Two items should be mentioned here. First, each unit can specify a set of exceptions it may raise (nested units are allowed to extend the set by specific exceptions). Second, each calling sequence should contain at least one operator @DELAY@ used to specify how many cycles a certain operation (or microoperation) takes. Here is an example illustrating the above mentioned things. <pre> FPU(I: Instruction) throws Inexact, Overflow, Underflow { DELAY(10..15); } </pre> Exceptions and delays provides the TPG tool with information that can be used for automated generation of test templates aimed at covering _simultaneous exceptions_ (i.e. exceptions being raised at the same cycle or close to each other) and _structural hazards_ (i.e. situations when two or more instructions tries to access the same unit of the microprocessor). When analysing the pipeline configuration, the generator extracts the _basic templates_ covering all single hazards. Such templates are usually very simple ones; for example, a basic template for the structural hazard via the floating point unit (FPU) looks as follows. <pre> $PreInstructions FPU ?, ?, ? // The first FPU access $InnerInstructions FPU ?, ?, ? // The second FPU access $PostInstructions </pre> As soon as the basic templates are extracted, it is possible to generate tests programs covering all the single hazards. The generator creates all possible combinations of the conflicting instructions randomizing their environment (in the example above, @$PreInstructions@, @$InnerInstructions@ and @$PostInstructions@). The main restriction here is that an environment should not cancel the target hazard (for example, it should not access the unit shared by the conflicting instructions). A more advanced technique for test program generation uses _composition_ of the basic templates. There are several types of composition operators including _overlapping_ (@T1 || T2@), _shift_ (@T1 >> T2@), _nesting_ (@T1[T2]@) and _concatenation_ (@T1 . T2@). The generator applies these operators to the basic templates constructing various syntactical structures. The approach allows testing rather complex situations in the microprocessor behavior. Application of the composition operators to basic templates is shown in the example below (@T1[T2 >> T3] . T4@). <pre> // Template T1 (Memory Dependency) LD ?, ?(?) @ hit<TLB>{loaded:PFN1} // Template T2 (Register Dependency) ADD R, ?, ? // Template T3 (Structural Hazard) DIV.S ?, ?, ? SUB ?, R, ? DIV.D ?, ?, ? SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2 // Template T4 (Exception) FPU ?, ?, ? @ Overflow </pre> h3. Branch Processing Unit ADL-based instruction set specifications provide enough information for generating branching programs not getting caught in an endless loop. TPG is performed by enumeration (or random construction) of various _control flow graphs_ (_CFG_) and different _execution traces_ for those graphs. Enumerating execution traces for a given CFG is based on the bounded depth-first exploration of the branch execution tree. To guarantee a test to be executed according to a trace given, auxiliary instructions (so-called _control instructions_) are automatically inserted into the test program. Those instructions change the registers used by the branch instructions so as to fulfill the conditions specified in the trace. The template below contains a branch instruction @BEQ@ (branch if equal) which is executed two times (when it is executed for the first time, the branch condition is true; then, the condition becomes false, and test execution is finished). The control instruction (@ADDI@) increments one of the registers used by the branch instruction (the initial values of the registers are equal). <pre> INIT: ORI R1, R0, 2011 ORI R2, R0, 2011 // R1 = R2 ... J START NOP ... BACK: ADDI R1, R1, 1 // Control Instruction FPU ?, ?, ? START: BEQ R1, R2, BACK @ trace = {true, false} LD ?, ?(?) STOP: </pre> Speaking about configuration of a _branch processing unit_ (_BPU_), we recognize two levels (namely, simple configuration and advanced configuration). The first of them is simply setting a number of _branch delay slots_ (i.e. instructions located immediately after a branch instruction, which are executed even if the preceding branch is taken). Sometimes, this information can be derived from the specifications, but sometimes it is absent. Advanced configuration describes more sophisticated mechanisms, like _branch prediction_. _~Based on the paper Alexander Kamkin, Eugene Kornykhin A.Kamkin, E.Kornykhin and Dmitry Vorobyev. D.Vorobyev. "Reconfigurable Model-Based Test Program Generator for Microprocessors", A-MOST 2011.~_