Project

General

Profile

Basic Concepts » History » Version 20

Alexander Kamkin, 03/07/2013 08:04 AM

1 10 Alexander Kamkin
h1. Basic Concepts of Reconfigurable Test Program Generation
2 1 Alexander Kamkin
3 2 Alexander Kamkin
{{toc}}
4 1 Alexander Kamkin
5 2 Alexander Kamkin
h2. Introduction
6 1 Alexander Kamkin
7 2 Alexander Kamkin
Nowadays, a large number of microprocessor devices are designed and produced all round the world. Many of them are based on standard _instruction set architectures_ (_ISA_), but there are also designs with the unique instruction sets and designs that significantly extend the existing standards. A frequently used approach to verify such kind of microprocessors is to develop testing tools from scratch without using the components developed previously. However, it does not work in the modern economy with the ever-shrinking time to market and the exponential growth in hardware complexity.
8 1 Alexander Kamkin
9 2 Alexander Kamkin
It is well known that _test program generation_ (_TPG_) is the central means for verifying microprocessors and other programmable devices at the core or full-chip level. A _test program_ is a valid sequence of instructions that solves a specific verification task (in other words, creates a certain set of interactions within the target design and, possibly, checks the correctness of the design''s state at the end of execution). A tool that creates test programs for a given microprocessor architecture in an automated way is referred to as a _generator_ or _TPG tool_.
10 1 Alexander Kamkin
11 2 Alexander Kamkin
Here, we describe a concept of a _reconfigurable test program generator_ "MicroTESK":http://forge.ispras.ru/projects/microtesk, which has a platform-independent generation core and is configured by modeling the target ISA and basic microarchitectural features. The idea is not new and has already been implemented in the industrial TPG tools such as *Genesys-Pro* (IBM Research Lab) and *RAVEN* (Obsidian Software/ARM). However, tuning the generators for concrete architectures (or even microarchitectures) is still a difficult task that is carried out by special staff only (namely, _modeling engineers_).
12
13
Our goal is to reach a new level in the TPG configuration flexibility. In order to do it, we suggest applying so-called _architecture description languages_ (_ADL_), which are commonly used for developing microprocessor simulators. In the approach suggested, a microprocessor ISA is described in some ADL and then automatically translated into a _microprocessor model_ together with a _test coverage model_. Fine tuning of the generator is performed by the subsystem-specific configuration files. Such an approach is especially useful in the early design stages when the project is frequently modified and rapid generator reconfiguration is required.
14
15
h2. Reconfigurable Test Program Generation
16
17
The key idea of the reconfigurable TPG is that configuration of a generator for a concrete microprocessor architecture is implemented as easy as possible. It is obvious that to fulfill this requirement, a generator should be divided into two parts: (1) a platform-independent component (so-called _core_ or _engine_) and (2) a component describing all platform-specific features (_model_ or _configuration_). Such kinds of TPG tools are usually called _model-based test program generators_. This is the first and probably the most important step the generators have made during their evolution. Let us consider the model-based approach a bit more in detail using Genesys-Pro as an example (see Fig. 1).
18
19 14 Alexander Kamkin
p=. !mbtpg.png!
20 13 Alexander Kamkin
21 17 Alexander Kamkin
p=. *Figure 1. General structure of a model-based test program generator*
22 2 Alexander Kamkin
23
A verification engineer provides a _test template_ of the scenario that should occur in the test program. A TPG tool generates a program by formulating and then solving constraints for each instruction of the test template or for groups of instructions. It also randomizes values of the unspecified parameters to produce different test program variations for the template given. The constraints are formulated in terms of the model, which consists of the _microprocessor model_ and _test coverage model_. The first of them defines syntax and semantics of the target design''s instructions, while the second one describes special conditions (known as _test situations_) to be achieved when verifying the design.
24
25
After formulating the CSP, the generation core solves it by using external or built-in solvers and assigns the values to the instructions'' operands. Then, it interprets the instructions under processing (for example, by using a _microprocessor simulator_) and takes the next group of instructions. Having performance limitations, TPG tools generally do not process test templates as a whole. Instead, they split them up into small pieces and formulate several CSPs. It is clear that such a technique can fail to find an exact solution of the joint CSP even when one is available, but it works well for test programs with low-density dependencies between instructions.
26
27 16 Alexander Kamkin
Coming back to the topic, what is the main distinction of the _reconfigurable_ TPG from the conventional model-based TPG? Tuning of a generator (in other words, creation of a microprocessor model together with a test coverage model) should be as much flexible and automated as it can be done by an ordinary verification engineer, not only by a trained modeling specialist. The important technological requirement for reconfigurable tools is usage of well-known or easy-to-learn languages, for example, ADLs (nML, ISDL, EXPRESSION, etc.) to represent a model. One of the possible solutions is implemented in a research prototype MA ^2^ TG (National University of Defense Technology, China).
28 2 Alexander Kamkin
29 15 Alexander Kamkin
p=. !adltpg.png!
30 1 Alexander Kamkin
31 17 Alexander Kamkin
p=. *Figure 2. General structure of an ADL-based test program generator*
32 15 Alexander Kamkin
33 2 Alexander Kamkin
The main contributions of the scheme (see Fig. 2) are as follows. First, it eases microprocessor modeling by using ADL specifications. Second, it automates test coverage extraction via static analysis of the ADL code. Third, it allows constructing a microprocessor simulator automatically. Being rather flexible, the ADL-based approach is however not exactly what we mean by _reconfigurable_. Specification of microarchitecture by means of ADLs is rather awkward. There is often no need to create a microarchitectural description (especially a structure-oriented one). All information the generator should know can be expressed with a high-level configuration of microprocessor subsystems.
34 1 Alexander Kamkin
35
A good example is a cache memory. As structural description of a typical cache unit consists of several thousand lines of code (LOC), its configuration can be set by five parameters only: (1) size, (2) associativity, (3) an index calculation function, (4) a tag calculation function, and (5) policy of data displacement. Moreover, if we are speaking about data displacement, only a few strategies are in general use (LRU, pseudo LRU, and some others). Thus, it is possible to create a library of typical configurations and choose an appropriate one when tuning the generator. So, the idea is to describe microarchitectural properties using subsystem-specific parameters. This concept is illustrated in Fig. 3.
36
37 15 Alexander Kamkin
p=. !rtpg.png!
38
39 17 Alexander Kamkin
p=. *Figure 3. General structure of a reconfigurable test program generator*
40 2 Alexander Kamkin
41
Another problem being solved by the reconfigurable TPG is _automated test template generation_ (of course, the possibility to develop templates manually remains as before). There is an evident dilemma here. To be able to generate test templates providing high-quality verification, accurate models are needed, but their creation is labor-consuming and thereby contradicts the concept of easy configuration. The dilemma is usually solved as follows. In early design stages, simple models coupled with the light-weight test generation techniques are used. As soon as the project becomes more mature, one can apply more accurate models and more directed tests.
42
43
Concluding the section, let us outline the basic requirements for the reconfigurable TPG. First, there should be a possibility to quickly change the microprocessor instruction set (add new instructions, modify the existing ones, and so on). Second, a user should be able to easily change the microarchitectural properties of the design (cache parameters, address translation algorithm, etc.). Third, a reconfigurable TPG tool should implements facilities for adding new test situations for single instructions and for groups of instructions. Finally, it should be able to generate test programs automatically according to high-level parameters. These problems have not been solved by the existing tools, and they are the main subject of our work.
44
45
h2. Configuration of Microprocessor Subsystems
46
47
In this section, we consider the suggested method for developing reconfigurable test program generators. A target design''s architecture is described in one of the supported ADLs (the present version of the tool supports only *nML/Sim-nML*; the list of the supported languages will be extended in the future). A microprocessor description focuses on general functionality of the instructions not considering the design''s microarchitecture. Sim-nML code specifying the integer addition instruction (@ADD@) from the MIPS ISA is shown below.
48
49
<pre>
50
     op ADD(rd: GPR, rs: GPR, rt: GPR)
51
     action = {
52
         if(NotWordValue(rs) || NotWordValue(rt))
53
         then
54
             UNPREDICTABLE();
55
         endif;
56 6 Alexander Kamkin
57
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
58
59 2 Alexander Kamkin
         if(tmp_word<32..32> != tmp_word<31..31>)
60
         then
61
             SignalException("IntegerOverflow");
62
         else
63
             rd = sign_extend(tmp_word<31..0>);
64
         endif;
65
     }
66
     
67 7 Alexander Kamkin
     syntax = format("add %s, %s, %s", rd.syntax, rs.syntax, rt.syntax)
68 2 Alexander Kamkin
     
69
     op ALU = ADD | SUB | ...
70
}
71
</pre>
72
73 19 Alexander Kamkin
This description is rather compact and almost replicates the instruction''s specification in the MIPS manual. Several particular features should be emphasized. First, ADL specifications can use predefined functions, like @UNPREDICTABLE@, which indicates the situations, where the design''s behavior is undefined (such situations should not be created in test programs). Second, by analyzing control flow one can automatically extract test coverage models for individual instructions. In the example above, there are two basic situations for the @ADD@ instruction:
74 2 Alexander Kamkin
75
<pre>
76
     IntegerOverflow:
77 8 Alexander Kamkin
         ...
78 2 Alexander Kamkin
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
79
         ASSERT(tmp_word<32..32> != tmp_word<31..31>);
80
81
     Default:
82 8 Alexander Kamkin
         ...
83 2 Alexander Kamkin
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
84
         ASSERT(tmp_word<32..32> == tmp_word<31..31>);
85
</pre>
86
87
Third, it is possible to examine potential dependencies via registers by determining which of the instructions use the registers of the same type. Fourth, instructions can be grouped into hierarchical classes providing a generator and a verification engineer with useful information to be utilized within test templates. Finally, basing on such specifications, a generator is able to predict the correct result of the test program execution. From the practical point of view, the things stated above mean that as soon as the microprocessor ISA is specified, without any additional development one can use test templates of the following kind (of course, they can be complicated by using the sequencing-control statements):
88
89
<pre>
90
         ALU ?, ?, ?
91
         ADD R, ?, ? @ Default
92
         SUB ?, R, ? @ IntegerOverflow
93
</pre>
94
95
The first line of the template produces any instruction from the @ALU@ class (@ADD@, @SUB@, @MUL@, @DIV@ or some other). Since the test situation for the instruction is unspecified, it might be any valid situation. The second line produces an @ADD@ instruction being executed normally. The third one produces a @SUB@ instruction causing the integer overflow exception. There is also a register dependency between the second @ADD@ and the third @SUB@. In spite of shallow nature of the behavioral specifications, they might however be useful. There are three types of tests that can be generated fully automatically:
96
97
# _tests for single instructions_ (covering all test situations for all instructions);
98
# _combinatorial tests_ (enumerating short sequences of instructions, test situations for them and dependencies between them);
99
# _random tests_ (producing random but valid sequences of instructions).
100
101
These facilities suffice for many verification tasks. If more directed test programs are needed, one should provide an additional description of the microprocessor subsystems. To configure the generator, we suggest using _subsystem-specific languages_ instead of the universal structure-oriented ADLs. It significantly eases the generator tuning and simplifies analysis of specifications.
102
103
h3. Memory Management Unit
104
105
Configuration of a _memory management unit_ (_MMU_) touches upon such devices as a _cache memory_ (_L1_ and _L2_), _translation lookaside buffer_ (_TLB_), _main memory_ and, probably, some others. The main goal of the MMU configuration is to enable the generator to process memory-related events and complex data dependencies in test templates. These events include _buffers hits/misses_ and _memory exceptions_. The data dependencies take into account structural properties of the memory devices, for example, there can be such dependencies as _hit into the same cache set_ or the like.
106
107
A typical memory buffer (cache or TLB) is modeled as an _array_ consisting of _sets_ of _lines_, where a line can be a structure comprising several bit vectors, so-called _fields_. For example, a TLB entry of a MIPS microprocessor consists of the following fields: @R@, @VPN2@, @MASK@, @G@, @ASID@, @PFN0@, @V0@, @C0@, @D0@, @PFN1@, @V1@, @C1@ and @D1@. A _fully-associative_ device contains a single set of lines; a _direct-mapped_ one is an array of single-line sets. Each device stores some _data_ which can be accessed (read from or written into) by their _address_. If a device contains a line with a given address, this situation is called a _hit_; the opposite situation is refered to as a _miss_. If a miss occurs, the device usually displaces one of the set''s lines with the line associated with the address given.
108
109 11 Taya Sergeeva
All memory devices are configured with the following set of parameters: _sets_ (associativity, i.e. set size), _lines_ (number of sets), _line_ (optional description of a line''s fields), _index_ (function for index calculation), _match_ (predicate checking whether a line and an address match each other) and _policy_ (data displacement strategy). Here is an example:
110 2 Alexander Kamkin
111
<pre>
112
     buffer L1 {
113 12 Taya Sergeeva
         sets = 4;
114
         lines = 128;
115
         line = (tag:30, data:256);
116
         index(addr:PA) = addr<9..8>; 
117
         match(addr:PA) = line.tag == addr<39..10>;
118
         policy = lru;
119 2 Alexander Kamkin
     }
120
</pre>
121
122 11 Taya Sergeeva
The configuration above describes a 4-way set associative buffer (see the parameter @sets@) consisting of 128 sets (@lines@). Each line of the cache contains a 30-bit tag and 256-bit data (@line@). When accessing data, the cache determines a set by calculating a 2-bit index (@index@). If a miss occurs, i.e. the set does not contain a line with the tag equal to the 30 upper bits of the physical address (@match@), the cache displaces the _least-recently-used_ (_LRU_) line of the set (@policy@).
123 2 Alexander Kamkin
124
When all the memory devices are configured, the general memory access algorithms should be defined. Let us consider the MIPS''s instruction @LD@, which loads a 64-bit double word from the main memory into the general-purpose register (GPR). Its Sim-nML specification written according to the ISA manual looks as follows:
125
126
<pre>
127
     vAddr := sign_extend(offset) + base;
128
     
129
     if(vAddr<2..0> != 0)
130
     then
131
         SignalException("AddressError");
132
     endif
133
134
     pAddr::CCA := AddressTranslation(vAddr, DATA, LOAD);
135
     memdoubleword := LoadMemory(CCA, DWORD, pAddr, vAddr, DATA);
136
137
     rt := memdoubleword;
138
</pre>
139
140
If the virtual address @vAddr@ is unaligned, then the address error exception is raised. Otherwise, the virtual address is translated into the physical one (@pAddr@), and cache coherence algorithm (@CCA@) is get. After that, a 64-bit double word (@memdoubleword@) is loaded from the main memory and then written into the general-purpose register @rt@. Translation of a virtual address is modeled as the invocation of the @AddressTranslation@ function. Reading from the main memory is described with the help of the @LoadMemory@ function.
141
142
In terms of the example, we need to concretize these two functions. Special operators are proposed for this purpose: 
143
144
* @hit<Buffer>(Address) {[loaded:Data1[;]][storing:Data2]}@
145
states that an instruction accesses @Buffer@ using @Address@, and a hit occurs; @Data1@ are data loaded from the corresponding line of the device (read operation), and @Data2@ are data to be stored in the line (write operation);
146
* @miss<Buffer>(Address){[replacing:Data]}$
147
states that an instruction accesses @Buffer@ using @Address@ and a miss occurs; @Data@ are data that replace the data stored in the corresponding line of the device.
148
149
Analysing the specifications of the memory access algorithms, the generator automatically constructs the set of memory-related test situations. A bit simplified fragment of the MIPS address translation specification corresponding to a TLB hit and a L1 hit is given below:
150
151
<pre>
152
     vAddr := sign_extend(offset) + base;
153
     if(hit<TLB>(vAddr){loaded:PFN})
154
     then
155
         pAddr := PFN::vAddr<12..0>;
156
         if(hit<L1>(pAddr){loaded:memdoubleword})
157
         then
158
             rt := memdoubleword;
159
             ...
160
         endif
161
         ...
162
     endif
163
</pre>
164
165
Here is an example demonstrating the basic facilities of the TPG tool on specifying memory-related test situations in test templates (@SD@ is an instruction that stores a 64-bit double word into the main memory).
166
167
<pre>
168
     LD ?, ?(?) @ hit<TLB>{loaded:PFN1} && miss<L1> && hit<L2>
169
     ...
170
     SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2 && hit<L1>
171
</pre>
172
173
h3. Pipeline Control Unit
174
175
Configuration of a _pipeline control unit_ (_PCU_) is targeted at automated generation of test templates covering pipeline hazards including _exceptions_, _data hazards_, _structural hazards_ and _control hazards_. The latter are described more in detail in the next section. Configuration describes a static structure of the microprocessor pipeline as well as its dynamic behavior. Description of the classical 5-stages RISC pipeline looks as follows.
176
177
<pre>
178
         pipeline() {
179
             IF(I);  // Fetch
180
             ID(I);  // Decode
181
             EX(I);  // Execute
182
             MEM(I); // Memory access
183
             WR(I);  // Writeback
184
         }
185
</pre>
186
187
It starts with the scope @pipeline@ that contains top-level @units@ (so-called, _pipe stages_) participating in the instruction processing. Each unit takes a single input/output parameter, which is an instruction being processed. It should be said that a unit may include nested subunits and use the usual conditional statements (@if@, @switch@, @while@, etc.) to describe the pipeline control flow. A simplified example is given below.
188
189
<pre>
190
         EX(I: Instruction) {
191
             switch(I.type) {
192
             case ALU: EX_ALU(I);
193
             case FPU: EX_FPU(I);
194
             ...
195
         }}
196
}
197
</pre>
198
199
Two items should be mentioned here. First, each unit can specify a set of exceptions it may raise (nested units are allowed to extend the set by specific exceptions). Second, each calling sequence should contain at least one operator @DELAY@ used to specify how many cycles a certain operation (or microoperation) takes. Here is an example illustrating the above mentioned things.
200
201
<pre>
202
         FPU(I: Instruction) throws
203
             Inexact, Overflow, Underflow {
204
             DELAY(10..15);
205
         }
206
</pre>
207
208
Exceptions and delays provides the TPG tool with information that can be used for automated generation of test templates aimed at covering _simultaneous exceptions_ (i.e. exceptions being raised at the same cycle or close to each other) and _structural hazards_ (i.e. situations when two or more instructions tries to access the same unit of the microprocessor). When analysing the pipeline configuration, the generator extracts the _basic templates_ covering all single hazards. Such templates are usually very simple ones; for example, a basic template for the structural hazard via the floating point unit (FPU) looks as follows.
209
210
<pre>
211
         $PreInstructions
212
         FPU ?, ?, ? // The first FPU access
213
         $InnerInstructions
214
         FPU ?, ?, ? // The second FPU access
215
         $PostInstructions
216
</pre>
217
218
As soon as the basic templates are extracted, it is possible to generate tests programs covering all the single hazards. The generator creates all possible combinations of the conflicting instructions randomizing their environment (in the example above, @$PreInstructions@, @$InnerInstructions@ and @$PostInstructions@). The main restriction here is that an environment should not cancel the target hazard (for example, it should not access the unit shared by the conflicting instructions).
219
220
A more advanced technique for test program generation uses _composition_ of the basic templates. There are several types of composition operators including _overlapping_ (@T1 || T2@), _shift_ (@T1 >> T2@), _nesting_ (@T1[T2]@) and _concatenation_ (@T1 . T2@). The generator applies these operators to the basic templates constructing various syntactical structures. The approach allows testing rather complex situations in the microprocessor behavior. Application of the composition operators to basic templates is shown in the example below (@T1[T2 >> T3] . T4@).
221
222
<pre>
223
         // Template T1 (Memory Dependency)
224
         LD ?, ?(?) @ hit<TLB>{loaded:PFN1}
225
                 // Template T2 (Register Dependency)
226
                 ADD R, ?, ?
227
                         // Template T3 (Structural Hazard)
228
                         DIV.S ?, ?, ?
229
                 SUB ?, R, ?
230
                         DIV.D ?, ?, ?
231
         SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2
232
         // Template T4 (Exception)
233
         FPU ?, ?, ? @ Overflow
234
</pre>
235
236
h3. Branch Processing Unit
237
238
ADL-based instruction set specifications provide enough information for generating branching programs not getting caught in an endless loop. TPG is performed by enumeration (or random construction) of various _control flow graphs_ (_CFG_) and different _execution traces_ for those graphs. Enumerating execution traces for a given CFG is based on the bounded depth-first exploration of the branch execution tree. To guarantee a test to be executed according to a trace given, auxiliary instructions (so-called _control instructions_) are automatically inserted into the test program. Those instructions change the registers used by the branch instructions so as to fulfill the conditions specified in the trace.
239
240
The template below contains a branch instruction @BEQ@ (branch if equal) which is executed two times (when it is executed for the first time, the branch condition is true; then, the condition becomes false, and test execution is finished). The control instruction (@ADDI@) increments one of the registers used by the branch instruction (the initial values of the registers are equal).
241
242
<pre>
243
     INIT:
244
         ORI R1, R0, 2011
245
         ORI R2, R0, 2011 // R1 = R2
246
         ...
247
         J START
248
         NOP
249
         ...
250
     BACK:
251
         ADDI R1, R1, 1 // Control Instruction
252
         FPU  ?, ?, ?
253
     START:
254
         BEQ R1, R2, BACK @ trace = {true, false}
255
         LD ?, ?(?)
256
     STOP:
257
</pre>
258
259
Speaking about configuration of a _branch processing unit_ (_BPU_), we recognize two levels (namely, simple configuration and advanced configuration). The first of them is simply setting a number of _branch delay slots_ (i.e. instructions located immediately after a branch instruction, which are executed even if the preceding branch is taken). Sometimes, this information can be derived from the specifications, but sometimes it is absent. Advanced configuration describes more sophisticated mechanisms, like _branch prediction_.
260
261 20 Alexander Kamkin
_~by Alexander Kamkin~_
262 18 Alexander Kamkin
_~Based on the paper A.Kamkin, E.Kornykhin and D.Vorobyev. "Reconfigurable Model-Based Test Program Generator for Microprocessors", A-MOST 2011.~_