Project

General

Profile

Basic Concepts » History » Version 7

Alexander Kamkin, 02/22/2013 10:56 AM

1 3 Alexander Kamkin
h1. Basic Concepts of Reconfigurable Test Program Generation
2 1 Alexander Kamkin
3 2 Alexander Kamkin
{{toc}}
4 1 Alexander Kamkin
5 2 Alexander Kamkin
h2. Introduction
6 1 Alexander Kamkin
7 2 Alexander Kamkin
Nowadays, a large number of microprocessor devices are designed and produced all round the world. Many of them are based on standard _instruction set architectures_ (_ISA_), but there are also designs with the unique instruction sets and designs that significantly extend the existing standards. A frequently used approach to verify such kind of microprocessors is to develop testing tools from scratch without using the components developed previously. However, it does not work in the modern economy with the ever-shrinking time to market and the exponential growth in hardware complexity.
8 1 Alexander Kamkin
9 2 Alexander Kamkin
It is well known that _test program generation_ (_TPG_) is the central means for verifying microprocessors and other programmable devices at the core or full-chip level. A _test program_ is a valid sequence of instructions that solves a specific verification task (in other words, creates a certain set of interactions within the target design and, possibly, checks the correctness of the design''s state at the end of execution). A tool that creates test programs for a given microprocessor architecture in an automated way is referred to as a _generator_ or _TPG tool_.
10 1 Alexander Kamkin
11 2 Alexander Kamkin
Here, we describe a concept of a _reconfigurable test program generator_ "MicroTESK":http://forge.ispras.ru/projects/microtesk, which has a platform-independent generation core and is configured by modeling the target ISA and basic microarchitectural features. The idea is not new and has already been implemented in the industrial TPG tools such as *Genesys-Pro* (IBM Research Lab) and *RAVEN* (Obsidian Software/ARM). However, tuning the generators for concrete architectures (or even microarchitectures) is still a difficult task that is carried out by special staff only (namely, _modeling engineers_).
12
13
Our goal is to reach a new level in the TPG configuration flexibility. In order to do it, we suggest applying so-called _architecture description languages_ (_ADL_), which are commonly used for developing microprocessor simulators. In the approach suggested, a microprocessor ISA is described in some ADL and then automatically translated into a _microprocessor model_ together with a _test coverage model_. Fine tuning of the generator is performed by the subsystem-specific configuration files. Such an approach is especially useful in the early design stages when the project is frequently modified and rapid generator reconfiguration is required.
14
15
h2. Reconfigurable Test Program Generation
16
17
The key idea of the reconfigurable TPG is that configuration of a generator for a concrete microprocessor architecture is implemented as easy as possible. It is obvious that to fulfill this requirement, a generator should be divided into two parts: (1) a platform-independent component (so-called _core_ or _engine_) and (2) a component describing all platform-specific features (_model_ or _configuration_). Such kinds of TPG tools are usually called _model-based test program generators_. This is the first and probably the most important step the generators have made during their evolution. Let us consider the model-based approach a bit more in detail using Genesys-Pro as an example (see Fig. 1).
18
19
>> Figure 1. General structure of a model-based test program generator
20
21
A verification engineer provides a _test template_ of the scenario that should occur in the test program. A TPG tool generates a program by formulating and then solving constraints for each instruction of the test template or for groups of instructions. It also randomizes values of the unspecified parameters to produce different test program variations for the template given. The constraints are formulated in terms of the model, which consists of the _microprocessor model_ and _test coverage model_. The first of them defines syntax and semantics of the target design''s instructions, while the second one describes special conditions (known as _test situations_) to be achieved when verifying the design.
22
23
After formulating the CSP, the generation core solves it by using external or built-in solvers and assigns the values to the instructions'' operands. Then, it interprets the instructions under processing (for example, by using a _microprocessor simulator_) and takes the next group of instructions. Having performance limitations, TPG tools generally do not process test templates as a whole. Instead, they split them up into small pieces and formulate several CSPs. It is clear that such a technique can fail to find an exact solution of the joint CSP even when one is available, but it works well for test programs with low-density dependencies between instructions.
24
25
Coming back to the topic, what is the main distinction of the _reconfigurable_ TPG from the conventional model-based TPG? Tuning of a generator (in other words, creation of a microprocessor model together with a test coverage model) should be as much flexible and automated as it can be done by an ordinary verification engineer, not only by a trained modeling specialist. The important technological requirement for reconfigurable tools is usage of well-known or easy-to-learn languages, for example, ADLs (nML, ISDL, EXPRESSION, etc.) to represent a model. One of the possible solutions is implemented in a research prototype MA2TG (National University of Defense Technology, China).
26
27
>> Figure 2. General structure of an ADL-based test program generator
28
29
The main contributions of the scheme (see Fig. 2) are as follows. First, it eases microprocessor modeling by using ADL specifications. Second, it automates test coverage extraction via static analysis of the ADL code. Third, it allows constructing a microprocessor simulator automatically. Being rather flexible, the ADL-based approach is however not exactly what we mean by _reconfigurable_. Specification of microarchitecture by means of ADLs is rather awkward. There is often no need to create a microarchitectural description (especially a structure-oriented one). All information the generator should know can be expressed with a high-level configuration of microprocessor subsystems.
30
31
A good example is a cache memory. As structural description of a typical cache unit consists of several thousand lines of code (LOC), its configuration can be set by five parameters only: (1) size, (2) associativity, (3) an index calculation function, (4) a tag calculation function, and (5) policy of data displacement. Moreover, if we are speaking about data displacement, only a few strategies are in general use (LRU, pseudo LRU, and some others). Thus, it is possible to create a library of typical configurations and choose an appropriate one when tuning the generator. So, the idea is to describe microarchitectural properties using subsystem-specific parameters. This concept is illustrated in Fig. 3.
32
33
>> Figure 3. General structure of a reconfigurable test program generator
34
35
Another problem being solved by the reconfigurable TPG is _automated test template generation_ (of course, the possibility to develop templates manually remains as before). There is an evident dilemma here. To be able to generate test templates providing high-quality verification, accurate models are needed, but their creation is labor-consuming and thereby contradicts the concept of easy configuration. The dilemma is usually solved as follows. In early design stages, simple models coupled with the light-weight test generation techniques are used. As soon as the project becomes more mature, one can apply more accurate models and more directed tests.
36
37
Concluding the section, let us outline the basic requirements for the reconfigurable TPG. First, there should be a possibility to quickly change the microprocessor instruction set (add new instructions, modify the existing ones, and so on). Second, a user should be able to easily change the microarchitectural properties of the design (cache parameters, address translation algorithm, etc.). Third, a reconfigurable TPG tool should implements facilities for adding new test situations for single instructions and for groups of instructions. Finally, it should be able to generate test programs automatically according to high-level parameters. These problems have not been solved by the existing tools, and they are the main subject of our work.
38
39
h2. Configuration of Microprocessor Subsystems
40
41
In this section, we consider the suggested method for developing reconfigurable test program generators. A target design''s architecture is described in one of the supported ADLs (the present version of the tool supports only *nML/Sim-nML*; the list of the supported languages will be extended in the future). A microprocessor description focuses on general functionality of the instructions not considering the design''s microarchitecture. Sim-nML code specifying the integer addition instruction (@ADD@) from the MIPS ISA is shown below.
42
43
<pre>
44
     op ADD(rd: GPR, rs: GPR, rt: GPR)
45
     action = {
46
         if(NotWordValue(rs) || NotWordValue(rt))
47
         then
48
             UNPREDICTABLE();
49
         endif;
50 6 Alexander Kamkin
51
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
52
53 2 Alexander Kamkin
         if(tmp_word<32..32> != tmp_word<31..31>)
54
         then
55
             SignalException("IntegerOverflow");
56
         else
57
             rd = sign_extend(tmp_word<31..0>);
58
         endif;
59
     }
60
     
61 7 Alexander Kamkin
     syntax = format("add %s, %s, %s", rd.syntax, rs.syntax, rt.syntax)
62 2 Alexander Kamkin
     
63
     op ALU = ADD | SUB | ...
64
}
65
</pre>
66
67
This description is rather compact and almost replicates the instruction''s specification in the *MIPS* manual. Several particular features should be emphasized. First, ADL specifications can use predefined functions, like @UNPREDICTABLE@, which indicates the situations, where the design''s behavior is undefined (such situations should not be created in test programs). Second, by analyzing control flow one can automatically extract test coverage models for individual instructions. In the example above, there are two basic situations for the @ADD@ instruction:
68
69
<pre>
70
     IntegerOverflow:
71
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
72
         ASSERT(tmp_word<32..32> != tmp_word<31..31>);
73
74
     Default:
75
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
76
         ASSERT(tmp_word<32..32> == tmp_word<31..31>);
77
</pre>
78
79
Third, it is possible to examine potential dependencies via registers by determining which of the instructions use the registers of the same type. Fourth, instructions can be grouped into hierarchical classes providing a generator and a verification engineer with useful information to be utilized within test templates. Finally, basing on such specifications, a generator is able to predict the correct result of the test program execution. From the practical point of view, the things stated above mean that as soon as the microprocessor ISA is specified, without any additional development one can use test templates of the following kind (of course, they can be complicated by using the sequencing-control statements):
80
81
<pre>
82
         ALU ?, ?, ?
83
         ADD R, ?, ? @ Default
84
         SUB ?, R, ? @ IntegerOverflow
85
</pre>
86
87
The first line of the template produces any instruction from the @ALU@ class (@ADD@, @SUB@, @MUL@, @DIV@ or some other). Since the test situation for the instruction is unspecified, it might be any valid situation. The second line produces an @ADD@ instruction being executed normally. The third one produces a @SUB@ instruction causing the integer overflow exception. There is also a register dependency between the second @ADD@ and the third @SUB@. In spite of shallow nature of the behavioral specifications, they might however be useful. There are three types of tests that can be generated fully automatically:
88
89
# _tests for single instructions_ (covering all test situations for all instructions);
90
# _combinatorial tests_ (enumerating short sequences of instructions, test situations for them and dependencies between them);
91
# _random tests_ (producing random but valid sequences of instructions).
92
93
These facilities suffice for many verification tasks. If more directed test programs are needed, one should provide an additional description of the microprocessor subsystems. To configure the generator, we suggest using _subsystem-specific languages_ instead of the universal structure-oriented ADLs. It significantly eases the generator tuning and simplifies analysis of specifications.
94
95
h3. Memory Management Unit
96
97
Configuration of a _memory management unit_ (_MMU_) touches upon such devices as a _cache memory_ (_L1_ and _L2_), _translation lookaside buffer_ (_TLB_), _main memory_ and, probably, some others. The main goal of the MMU configuration is to enable the generator to process memory-related events and complex data dependencies in test templates. These events include _buffers hits/misses_ and _memory exceptions_. The data dependencies take into account structural properties of the memory devices, for example, there can be such dependencies as _hit into the same cache set_ or the like.
98
99
A typical memory buffer (cache or TLB) is modeled as an _array_ consisting of _sets_ of _lines_, where a line can be a structure comprising several bit vectors, so-called _fields_. For example, a TLB entry of a MIPS microprocessor consists of the following fields: @R@, @VPN2@, @MASK@, @G@, @ASID@, @PFN0@, @V0@, @C0@, @D0@, @PFN1@, @V1@, @C1@ and @D1@. A _fully-associative_ device contains a single set of lines; a _direct-mapped_ one is an array of single-line sets. Each device stores some _data_ which can be accessed (read from or written into) by their _address_. If a device contains a line with a given address, this situation is called a _hit_; the opposite situation is refered to as a _miss_. If a miss occurs, the device usually displaces one of the set''s lines with the line associated with the address given.
100
101
All memory devices are configured with the following set of parameters: _set_ (associativity, i.e. set size), _length_ (number of sets), _line_ (optional description of a line''s fields), _index_ (function for index calculation), _match_ (predicate checking whether a line and an address match each other) and _policy_ (data displacement strategy). Here is an example:
102
103
<pre>
104
     buffer L1 {
105
         set = 4;
106
         length = 128;
107
         policy = LRU;
108
         line(tag:27, data:32);
109
         index(addr:36) = (addr[8:2]);
110
         match(addr:36) = (addr[35:9] == tag);
111
     }
112
</pre>
113
114
The configuration above describes a 4-way set associative buffer (see the parameter @set@) consisting of 128 sets (@length@). Each line of the cache contains a 27-bit tag and 32-bit data (@line@). When accessing data, the cache determines a set by calculating a 7-bit index (@index@). If a miss occurs, i.e. the set does not contain a line with the tag equal to the 27 upper bits of the physical address (@match@), the cache displaces the _least-recently-used_ (_LRU_) line of the set (@policy@).
115
116
When all the memory devices are configured, the general memory access algorithms should be defined. Let us consider the MIPS''s instruction @LD@, which loads a 64-bit double word from the main memory into the general-purpose register (GPR). Its Sim-nML specification written according to the ISA manual looks as follows:
117
118
<pre>
119
     vAddr := sign_extend(offset) + base;
120
     
121
     if(vAddr<2..0> != 0)
122
     then
123
         SignalException("AddressError");
124
     endif
125
126
     pAddr::CCA := AddressTranslation(vAddr, DATA, LOAD);
127
     memdoubleword := LoadMemory(CCA, DWORD, pAddr, vAddr, DATA);
128
129
     rt := memdoubleword;
130
</pre>
131
132
If the virtual address @vAddr@ is unaligned, then the address error exception is raised. Otherwise, the virtual address is translated into the physical one (@pAddr@), and cache coherence algorithm (@CCA@) is get. After that, a 64-bit double word (@memdoubleword@) is loaded from the main memory and then written into the general-purpose register @rt@. Translation of a virtual address is modeled as the invocation of the @AddressTranslation@ function. Reading from the main memory is described with the help of the @LoadMemory@ function.
133
134
In terms of the example, we need to concretize these two functions. Special operators are proposed for this purpose: 
135
136
* @hit<Buffer>(Address) {[loaded:Data1[;]][storing:Data2]}@
137
states that an instruction accesses @Buffer@ using @Address@, and a hit occurs; @Data1@ are data loaded from the corresponding line of the device (read operation), and @Data2@ are data to be stored in the line (write operation);
138
* @miss<Buffer>(Address){[replacing:Data]}$
139
states that an instruction accesses @Buffer@ using @Address@ and a miss occurs; @Data@ are data that replace the data stored in the corresponding line of the device.
140
141
Analysing the specifications of the memory access algorithms, the generator automatically constructs the set of memory-related test situations. A bit simplified fragment of the MIPS address translation specification corresponding to a TLB hit and a L1 hit is given below:
142
143
<pre>
144
     vAddr := sign_extend(offset) + base;
145
     if(hit<TLB>(vAddr){loaded:PFN})
146
     then
147
         pAddr := PFN::vAddr<12..0>;
148
         if(hit<L1>(pAddr){loaded:memdoubleword})
149
         then
150
             rt := memdoubleword;
151
             ...
152
         endif
153
         ...
154
     endif
155
</pre>
156
157
Here is an example demonstrating the basic facilities of the TPG tool on specifying memory-related test situations in test templates (@SD@ is an instruction that stores a 64-bit double word into the main memory).
158
159
<pre>
160
     LD ?, ?(?) @ hit<TLB>{loaded:PFN1} && miss<L1> && hit<L2>
161
     ...
162
     SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2 && hit<L1>
163
</pre>
164
165
h3. Pipeline Control Unit
166
167
Configuration of a _pipeline control unit_ (_PCU_) is targeted at automated generation of test templates covering pipeline hazards including _exceptions_, _data hazards_, _structural hazards_ and _control hazards_. The latter are described more in detail in the next section. Configuration describes a static structure of the microprocessor pipeline as well as its dynamic behavior. Description of the classical 5-stages RISC pipeline looks as follows.
168
169
<pre>
170
         pipeline() {
171
             IF(I);  // Fetch
172
             ID(I);  // Decode
173
             EX(I);  // Execute
174
             MEM(I); // Memory access
175
             WR(I);  // Writeback
176
         }
177
</pre>
178
179
It starts with the scope @pipeline@ that contains top-level @units@ (so-called, _pipe stages_) participating in the instruction processing. Each unit takes a single input/output parameter, which is an instruction being processed. It should be said that a unit may include nested subunits and use the usual conditional statements (@if@, @switch@, @while@, etc.) to describe the pipeline control flow. A simplified example is given below.
180
181
<pre>
182
         EX(I: Instruction) {
183
             switch(I.type) {
184
             case ALU: EX_ALU(I);
185
             case FPU: EX_FPU(I);
186
             ...
187
         }}
188
}
189
</pre>
190
191
Two items should be mentioned here. First, each unit can specify a set of exceptions it may raise (nested units are allowed to extend the set by specific exceptions). Second, each calling sequence should contain at least one operator @DELAY@ used to specify how many cycles a certain operation (or microoperation) takes. Here is an example illustrating the above mentioned things.
192
193
<pre>
194
         FPU(I: Instruction) throws
195
             Inexact, Overflow, Underflow {
196
             DELAY(10..15);
197
         }
198
</pre>
199
200
Exceptions and delays provides the TPG tool with information that can be used for automated generation of test templates aimed at covering _simultaneous exceptions_ (i.e. exceptions being raised at the same cycle or close to each other) and _structural hazards_ (i.e. situations when two or more instructions tries to access the same unit of the microprocessor). When analysing the pipeline configuration, the generator extracts the _basic templates_ covering all single hazards. Such templates are usually very simple ones; for example, a basic template for the structural hazard via the floating point unit (FPU) looks as follows.
201
202
<pre>
203
         $PreInstructions
204
         FPU ?, ?, ? // The first FPU access
205
         $InnerInstructions
206
         FPU ?, ?, ? // The second FPU access
207
         $PostInstructions
208
</pre>
209
210
As soon as the basic templates are extracted, it is possible to generate tests programs covering all the single hazards. The generator creates all possible combinations of the conflicting instructions randomizing their environment (in the example above, @$PreInstructions@, @$InnerInstructions@ and @$PostInstructions@). The main restriction here is that an environment should not cancel the target hazard (for example, it should not access the unit shared by the conflicting instructions).
211
212
A more advanced technique for test program generation uses _composition_ of the basic templates. There are several types of composition operators including _overlapping_ (@T1 || T2@), _shift_ (@T1 >> T2@), _nesting_ (@T1[T2]@) and _concatenation_ (@T1 . T2@). The generator applies these operators to the basic templates constructing various syntactical structures. The approach allows testing rather complex situations in the microprocessor behavior. Application of the composition operators to basic templates is shown in the example below (@T1[T2 >> T3] . T4@).
213
214
<pre>
215
         // Template T1 (Memory Dependency)
216
         LD ?, ?(?) @ hit<TLB>{loaded:PFN1}
217
                 // Template T2 (Register Dependency)
218
                 ADD R, ?, ?
219
                         // Template T3 (Structural Hazard)
220
                         DIV.S ?, ?, ?
221
                 SUB ?, R, ?
222
                         DIV.D ?, ?, ?
223
         SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2
224
         // Template T4 (Exception)
225
         FPU ?, ?, ? @ Overflow
226
</pre>
227
228
h3. Branch Processing Unit
229
230
ADL-based instruction set specifications provide enough information for generating branching programs not getting caught in an endless loop. TPG is performed by enumeration (or random construction) of various _control flow graphs_ (_CFG_) and different _execution traces_ for those graphs. Enumerating execution traces for a given CFG is based on the bounded depth-first exploration of the branch execution tree. To guarantee a test to be executed according to a trace given, auxiliary instructions (so-called _control instructions_) are automatically inserted into the test program. Those instructions change the registers used by the branch instructions so as to fulfill the conditions specified in the trace.
231
232
The template below contains a branch instruction @BEQ@ (branch if equal) which is executed two times (when it is executed for the first time, the branch condition is true; then, the condition becomes false, and test execution is finished). The control instruction (@ADDI@) increments one of the registers used by the branch instruction (the initial values of the registers are equal).
233
234
<pre>
235
     INIT:
236
         ORI R1, R0, 2011
237
         ORI R2, R0, 2011 // R1 = R2
238
         ...
239
         J START
240
         NOP
241
         ...
242
     BACK:
243
         ADDI R1, R1, 1 // Control Instruction
244
         FPU  ?, ?, ?
245
     START:
246
         BEQ R1, R2, BACK @ trace = {true, false}
247
         LD ?, ?(?)
248
     STOP:
249
</pre>
250
251
Speaking about configuration of a _branch processing unit_ (_BPU_), we recognize two levels (namely, simple configuration and advanced configuration). The first of them is simply setting a number of _branch delay slots_ (i.e. instructions located immediately after a branch instruction, which are executed even if the preceding branch is taken). Sometimes, this information can be derived from the specifications, but sometimes it is absent. Advanced configuration describes more sophisticated mechanisms, like _branch prediction_.
252
253 5 Alexander Kamkin
_Based on the paper A.Kamkin, E.Kornykhin and D.Vorobyev. "Reconfigurable Model-Based Test Program Generator for Microprocessors", A-MOST 2011._