Project

General

Profile

Basic Concepts » History » Version 38

Alexander Kamkin, 03/19/2015 07:43 AM

1 10 Alexander Kamkin
h1. Basic Concepts of Reconfigurable Test Program Generation
2 1 Alexander Kamkin
3 30 Alexander Kamkin
_~By Alexander Kamkin~_
4 28 Alexander Kamkin
5 2 Alexander Kamkin
{{toc}}
6 1 Alexander Kamkin
7 2 Alexander Kamkin
h2. Introduction
8 1 Alexander Kamkin
9 2 Alexander Kamkin
Nowadays, a large number of microprocessor devices are designed and produced all round the world. Many of them are based on standard _instruction set architectures_ (_ISA_), but there are also designs with the unique instruction sets and designs that significantly extend the existing standards. A frequently used approach to verify such kind of microprocessors is to develop testing tools from scratch without using the components developed previously. However, it does not work in the modern economy with the ever-shrinking time to market and the exponential growth in hardware complexity.
10 1 Alexander Kamkin
11 38 Alexander Kamkin
It is well known that _test program generation_ (_TPG_), or _instruction stream generation_ (_ISG_), is the central means for verifying microprocessors and other programmable devices at the core or full-chip level. A _test program_ is a valid sequence of instructions that solves a specific verification task (in other words, creates a certain set of interactions within the target design and, possibly, checks the correctness of the design''s state at the end of execution). A tool that creates test programs for a given microprocessor architecture in an automated way is referred to as a _generator_ or _TPG tool_.
12 1 Alexander Kamkin
13 2 Alexander Kamkin
Here, we describe a concept of a _reconfigurable test program generator_ "MicroTESK":http://forge.ispras.ru/projects/microtesk, which has a platform-independent generation core and is configured by modeling the target ISA and basic microarchitectural features. The idea is not new and has already been implemented in the industrial TPG tools such as *Genesys-Pro* (IBM Research Lab) and *RAVEN* (Obsidian Software/ARM). However, tuning the generators for concrete architectures (or even microarchitectures) is still a difficult task that is carried out by special staff only (namely, _modeling engineers_).
14
15
Our goal is to reach a new level in the TPG configuration flexibility. In order to do it, we suggest applying so-called _architecture description languages_ (_ADL_), which are commonly used for developing microprocessor simulators. In the approach suggested, a microprocessor ISA is described in some ADL and then automatically translated into a _microprocessor model_ together with a _test coverage model_. Fine tuning of the generator is performed by the subsystem-specific configuration files. Such an approach is especially useful in the early design stages when the project is frequently modified and rapid generator reconfiguration is required.
16
17
h2. Reconfigurable Test Program Generation
18
19
The key idea of the reconfigurable TPG is that configuration of a generator for a concrete microprocessor architecture is implemented as easy as possible. It is obvious that to fulfill this requirement, a generator should be divided into two parts: (1) a platform-independent component (so-called _core_ or _engine_) and (2) a component describing all platform-specific features (_model_ or _configuration_). Such kinds of TPG tools are usually called _model-based test program generators_. This is the first and probably the most important step the generators have made during their evolution. Let us consider the model-based approach a bit more in detail using Genesys-Pro as an example (see Fig. 1).
20
21 14 Alexander Kamkin
p=. !mbtpg.png!
22 13 Alexander Kamkin
23 17 Alexander Kamkin
p=. *Figure 1. General structure of a model-based test program generator*
24 2 Alexander Kamkin
25
A verification engineer provides a _test template_ of the scenario that should occur in the test program. A TPG tool generates a program by formulating and then solving constraints for each instruction of the test template or for groups of instructions. It also randomizes values of the unspecified parameters to produce different test program variations for the template given. The constraints are formulated in terms of the model, which consists of the _microprocessor model_ and _test coverage model_. The first of them defines syntax and semantics of the target design''s instructions, while the second one describes special conditions (known as _test situations_) to be achieved when verifying the design.
26
27
After formulating the CSP, the generation core solves it by using external or built-in solvers and assigns the values to the instructions'' operands. Then, it interprets the instructions under processing (for example, by using a _microprocessor simulator_) and takes the next group of instructions. Having performance limitations, TPG tools generally do not process test templates as a whole. Instead, they split them up into small pieces and formulate several CSPs. It is clear that such a technique can fail to find an exact solution of the joint CSP even when one is available, but it works well for test programs with low-density dependencies between instructions.
28
29 16 Alexander Kamkin
Coming back to the topic, what is the main distinction of the _reconfigurable_ TPG from the conventional model-based TPG? Tuning of a generator (in other words, creation of a microprocessor model together with a test coverage model) should be as much flexible and automated as it can be done by an ordinary verification engineer, not only by a trained modeling specialist. The important technological requirement for reconfigurable tools is usage of well-known or easy-to-learn languages, for example, ADLs (nML, ISDL, EXPRESSION, etc.) to represent a model. One of the possible solutions is implemented in a research prototype MA ^2^ TG (National University of Defense Technology, China).
30 2 Alexander Kamkin
31 15 Alexander Kamkin
p=. !adltpg.png!
32 1 Alexander Kamkin
33 17 Alexander Kamkin
p=. *Figure 2. General structure of an ADL-based test program generator*
34 15 Alexander Kamkin
35 2 Alexander Kamkin
The main contributions of the scheme (see Fig. 2) are as follows. First, it eases microprocessor modeling by using ADL specifications. Second, it automates test coverage extraction via static analysis of the ADL code. Third, it allows constructing a microprocessor simulator automatically. Being rather flexible, the ADL-based approach is however not exactly what we mean by _reconfigurable_. Specification of microarchitecture by means of ADLs is rather awkward. There is often no need to create a microarchitectural description (especially a structure-oriented one). All information the generator should know can be expressed with a high-level configuration of microprocessor subsystems.
36 1 Alexander Kamkin
37
A good example is a cache memory. As structural description of a typical cache unit consists of several thousand lines of code (LOC), its configuration can be set by five parameters only: (1) size, (2) associativity, (3) an index calculation function, (4) a tag calculation function, and (5) policy of data displacement. Moreover, if we are speaking about data displacement, only a few strategies are in general use (LRU, pseudo LRU, and some others). Thus, it is possible to create a library of typical configurations and choose an appropriate one when tuning the generator. So, the idea is to describe microarchitectural properties using subsystem-specific parameters. This concept is illustrated in Fig. 3.
38
39 15 Alexander Kamkin
p=. !rtpg.png!
40
41 17 Alexander Kamkin
p=. *Figure 3. General structure of a reconfigurable test program generator*
42 2 Alexander Kamkin
43
Another problem being solved by the reconfigurable TPG is _automated test template generation_ (of course, the possibility to develop templates manually remains as before). There is an evident dilemma here. To be able to generate test templates providing high-quality verification, accurate models are needed, but their creation is labor-consuming and thereby contradicts the concept of easy configuration. The dilemma is usually solved as follows. In early design stages, simple models coupled with the light-weight test generation techniques are used. As soon as the project becomes more mature, one can apply more accurate models and more directed tests.
44
45
Concluding the section, let us outline the basic requirements for the reconfigurable TPG. First, there should be a possibility to quickly change the microprocessor instruction set (add new instructions, modify the existing ones, and so on). Second, a user should be able to easily change the microarchitectural properties of the design (cache parameters, address translation algorithm, etc.). Third, a reconfigurable TPG tool should implements facilities for adding new test situations for single instructions and for groups of instructions. Finally, it should be able to generate test programs automatically according to high-level parameters. These problems have not been solved by the existing tools, and they are the main subject of our work.
46
47
h2. Configuration of Microprocessor Subsystems
48
49
In this section, we consider the suggested method for developing reconfigurable test program generators. A target design''s architecture is described in one of the supported ADLs (the present version of the tool supports only *nML/Sim-nML*; the list of the supported languages will be extended in the future). A microprocessor description focuses on general functionality of the instructions not considering the design''s microarchitecture. Sim-nML code specifying the integer addition instruction (@ADD@) from the MIPS ISA is shown below.
50
51
<pre>
52 34 Alexander Kamkin
     op ADD(rd: GPR, rs: GPR, rt: GPR) {
53
         action = {
54
             if(NotWordValue(rs) || NotWordValue(rt))
55
             then
56
                 UNPREDICTABLE();
57
             endif;
58 6 Alexander Kamkin
59 34 Alexander Kamkin
             tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
60 6 Alexander Kamkin
61 34 Alexander Kamkin
             if(tmp_word<32..32> != tmp_word<31..31>)
62
             then
63
                 SignalException("IntegerOverflow");
64
             else
65
                 rd = sign_extend(tmp_word<31..0>);
66
             endif;
67
         }
68
     
69
         syntax = format("add %s, %s, %s", rd.syntax, rs.syntax, rt.syntax)
70 2 Alexander Kamkin
     }
71
     
72
     op ALU = ADD | SUB | ...
73
</pre>
74
75 19 Alexander Kamkin
This description is rather compact and almost replicates the instruction''s specification in the MIPS manual. Several particular features should be emphasized. First, ADL specifications can use predefined functions, like @UNPREDICTABLE@, which indicates the situations, where the design''s behavior is undefined (such situations should not be created in test programs). Second, by analyzing control flow one can automatically extract test coverage models for individual instructions. In the example above, there are two basic situations for the @ADD@ instruction:
76 2 Alexander Kamkin
77
<pre>
78
     IntegerOverflow:
79 8 Alexander Kamkin
         ...
80 2 Alexander Kamkin
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
81
         ASSERT(tmp_word<32..32> != tmp_word<31..31>);
82
83
     Default:
84 8 Alexander Kamkin
         ...
85 2 Alexander Kamkin
         tmp_word = rs<31..31>::rs<31..0> + rs<31..31>::rt<31..0>;
86
         ASSERT(tmp_word<32..32> == tmp_word<31..31>);
87
</pre>
88
89
Third, it is possible to examine potential dependencies via registers by determining which of the instructions use the registers of the same type. Fourth, instructions can be grouped into hierarchical classes providing a generator and a verification engineer with useful information to be utilized within test templates. Finally, basing on such specifications, a generator is able to predict the correct result of the test program execution. From the practical point of view, the things stated above mean that as soon as the microprocessor ISA is specified, without any additional development one can use test templates of the following kind (of course, they can be complicated by using the sequencing-control statements):
90
91
<pre>
92
         ALU ?, ?, ?
93
         ADD R, ?, ? @ Default
94
         SUB ?, R, ? @ IntegerOverflow
95
</pre>
96
97
The first line of the template produces any instruction from the @ALU@ class (@ADD@, @SUB@, @MUL@, @DIV@ or some other). Since the test situation for the instruction is unspecified, it might be any valid situation. The second line produces an @ADD@ instruction being executed normally. The third one produces a @SUB@ instruction causing the integer overflow exception. There is also a register dependency between the second @ADD@ and the third @SUB@. In spite of shallow nature of the behavioral specifications, they might however be useful. There are three types of tests that can be generated fully automatically:
98
99
# _tests for single instructions_ (covering all test situations for all instructions);
100
# _combinatorial tests_ (enumerating short sequences of instructions, test situations for them and dependencies between them);
101
# _random tests_ (producing random but valid sequences of instructions).
102
103
These facilities suffice for many verification tasks. If more directed test programs are needed, one should provide an additional description of the microprocessor subsystems. To configure the generator, we suggest using _subsystem-specific languages_ instead of the universal structure-oriented ADLs. It significantly eases the generator tuning and simplifies analysis of specifications.
104
105
h3. Memory Management Unit
106
107
Configuration of a _memory management unit_ (_MMU_) touches upon such devices as a _cache memory_ (_L1_ and _L2_), _translation lookaside buffer_ (_TLB_), _main memory_ and, probably, some others. The main goal of the MMU configuration is to enable the generator to process memory-related events and complex data dependencies in test templates. These events include _buffers hits/misses_ and _memory exceptions_. The data dependencies take into account structural properties of the memory devices, for example, there can be such dependencies as _hit into the same cache set_ or the like.
108
109
A typical memory buffer (cache or TLB) is modeled as an _array_ consisting of _sets_ of _lines_, where a line can be a structure comprising several bit vectors, so-called _fields_. For example, a TLB entry of a MIPS microprocessor consists of the following fields: @R@, @VPN2@, @MASK@, @G@, @ASID@, @PFN0@, @V0@, @C0@, @D0@, @PFN1@, @V1@, @C1@ and @D1@. A _fully-associative_ device contains a single set of lines; a _direct-mapped_ one is an array of single-line sets. Each device stores some _data_ which can be accessed (read from or written into) by their _address_. If a device contains a line with a given address, this situation is called a _hit_; the opposite situation is refered to as a _miss_. If a miss occurs, the device usually displaces one of the set''s lines with the line associated with the address given.
110
111 32 Taya Sergeeva
All memory devices are configured with the following set of parameters: _associativity_ (associativity, i.e. set size), _sets_ (number of sets), _line_ (optional description of a line''s fields), _index_ (function for index calculation), _match_ (predicate checking whether a line and an address match each other) and _policy_ (data displacement strategy). Here is an example:
112 2 Alexander Kamkin
113
<pre>
114
     buffer L1 {
115 31 Taya Sergeeva
         associativity = 4;
116
         sets = 128;
117 12 Taya Sergeeva
         line = (tag:30, data:256);
118
         index(addr:PA) = addr<9..8>; 
119
         match(addr:PA) = line.tag == addr<39..10>;
120
         policy = lru;
121 2 Alexander Kamkin
     }
122
</pre>
123
124 33 Taya Sergeeva
The configuration above describes a 4-way set associative buffer (see the parameter @associativity@) consisting of 128 sets (@sets@). Each line of the cache contains a 30-bit tag and 256-bit data (@line@). When accessing data, the cache determines a set by calculating a 2-bit index (@index@). If a miss occurs, i.e. the set does not contain a line with the tag equal to the 30 upper bits of the physical address (@match@), the cache displaces the _least-recently-used_ (_LRU_) line of the set (@policy@).
125 2 Alexander Kamkin
126
When all the memory devices are configured, the general memory access algorithms should be defined. Let us consider the MIPS''s instruction @LD@, which loads a 64-bit double word from the main memory into the general-purpose register (GPR). Its Sim-nML specification written according to the ISA manual looks as follows:
127
128
<pre>
129
     vAddr := sign_extend(offset) + base;
130
     
131
     if(vAddr<2..0> != 0)
132
     then
133
         SignalException("AddressError");
134
     endif
135
136
     pAddr::CCA := AddressTranslation(vAddr, DATA, LOAD);
137
     memdoubleword := LoadMemory(CCA, DWORD, pAddr, vAddr, DATA);
138
139
     rt := memdoubleword;
140
</pre>
141
142
If the virtual address @vAddr@ is unaligned, then the address error exception is raised. Otherwise, the virtual address is translated into the physical one (@pAddr@), and cache coherence algorithm (@CCA@) is get. After that, a 64-bit double word (@memdoubleword@) is loaded from the main memory and then written into the general-purpose register @rt@. Translation of a virtual address is modeled as the invocation of the @AddressTranslation@ function. Reading from the main memory is described with the help of the @LoadMemory@ function.
143
144
In terms of the example, we need to concretize these two functions. Special operators are proposed for this purpose: 
145
146
* @hit<Buffer>(Address) {[loaded:Data1[;]][storing:Data2]}@
147
states that an instruction accesses @Buffer@ using @Address@, and a hit occurs; @Data1@ are data loaded from the corresponding line of the device (read operation), and @Data2@ are data to be stored in the line (write operation);
148
* @miss<Buffer>(Address){[replacing:Data]}$
149
states that an instruction accesses @Buffer@ using @Address@ and a miss occurs; @Data@ are data that replace the data stored in the corresponding line of the device.
150
151
Analysing the specifications of the memory access algorithms, the generator automatically constructs the set of memory-related test situations. A bit simplified fragment of the MIPS address translation specification corresponding to a TLB hit and a L1 hit is given below:
152
153
<pre>
154
     vAddr := sign_extend(offset) + base;
155
     if(hit<TLB>(vAddr){loaded:PFN})
156
     then
157
         pAddr := PFN::vAddr<12..0>;
158
         if(hit<L1>(pAddr){loaded:memdoubleword})
159
         then
160
             rt := memdoubleword;
161
             ...
162
         endif
163
         ...
164
     endif
165
</pre>
166
167
Here is an example demonstrating the basic facilities of the TPG tool on specifying memory-related test situations in test templates (@SD@ is an instruction that stores a 64-bit double word into the main memory).
168
169
<pre>
170
     LD ?, ?(?) @ hit<TLB>{loaded:PFN1} && miss<L1> && hit<L2>
171
     ...
172
     SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2 && hit<L1>
173
</pre>
174
175
h3. Pipeline Control Unit
176
177
Configuration of a _pipeline control unit_ (_PCU_) is targeted at automated generation of test templates covering pipeline hazards including _exceptions_, _data hazards_, _structural hazards_ and _control hazards_. The latter are described more in detail in the next section. Configuration describes a static structure of the microprocessor pipeline as well as its dynamic behavior. Description of the classical 5-stages RISC pipeline looks as follows.
178
179
<pre>
180
         pipeline() {
181
             IF(I);  // Fetch
182
             ID(I);  // Decode
183
             EX(I);  // Execute
184
             MEM(I); // Memory access
185
             WR(I);  // Writeback
186
         }
187
</pre>
188
189
It starts with the scope @pipeline@ that contains top-level @units@ (so-called, _pipe stages_) participating in the instruction processing. Each unit takes a single input/output parameter, which is an instruction being processed. It should be said that a unit may include nested subunits and use the usual conditional statements (@if@, @switch@, @while@, etc.) to describe the pipeline control flow. A simplified example is given below.
190
191
<pre>
192
         EX(I: Instruction) {
193
             switch(I.type) {
194
             case ALU: EX_ALU(I);
195
             case FPU: EX_FPU(I);
196
             ...
197
         }}
198
}
199
</pre>
200
201
Two items should be mentioned here. First, each unit can specify a set of exceptions it may raise (nested units are allowed to extend the set by specific exceptions). Second, each calling sequence should contain at least one operator @DELAY@ used to specify how many cycles a certain operation (or microoperation) takes. Here is an example illustrating the above mentioned things.
202
203
<pre>
204
         FPU(I: Instruction) throws
205
             Inexact, Overflow, Underflow {
206
             DELAY(10..15);
207
         }
208
</pre>
209
210
Exceptions and delays provides the TPG tool with information that can be used for automated generation of test templates aimed at covering _simultaneous exceptions_ (i.e. exceptions being raised at the same cycle or close to each other) and _structural hazards_ (i.e. situations when two or more instructions tries to access the same unit of the microprocessor). When analysing the pipeline configuration, the generator extracts the _basic templates_ covering all single hazards. Such templates are usually very simple ones; for example, a basic template for the structural hazard via the floating point unit (FPU) looks as follows.
211
212
<pre>
213
         $PreInstructions
214
         FPU ?, ?, ? // The first FPU access
215
         $InnerInstructions
216
         FPU ?, ?, ? // The second FPU access
217
         $PostInstructions
218
</pre>
219
220
As soon as the basic templates are extracted, it is possible to generate tests programs covering all the single hazards. The generator creates all possible combinations of the conflicting instructions randomizing their environment (in the example above, @$PreInstructions@, @$InnerInstructions@ and @$PostInstructions@). The main restriction here is that an environment should not cancel the target hazard (for example, it should not access the unit shared by the conflicting instructions).
221
222
A more advanced technique for test program generation uses _composition_ of the basic templates. There are several types of composition operators including _overlapping_ (@T1 || T2@), _shift_ (@T1 >> T2@), _nesting_ (@T1[T2]@) and _concatenation_ (@T1 . T2@). The generator applies these operators to the basic templates constructing various syntactical structures. The approach allows testing rather complex situations in the microprocessor behavior. Application of the composition operators to basic templates is shown in the example below (@T1[T2 >> T3] . T4@).
223
224
<pre>
225
         // Template T1 (Memory Dependency)
226
         LD ?, ?(?) @ hit<TLB>{loaded:PFN1}
227
                 // Template T2 (Register Dependency)
228
                 ADD R, ?, ?
229
                         // Template T3 (Structural Hazard)
230
                         DIV.S ?, ?, ?
231
                 SUB ?, R, ?
232
                         DIV.D ?, ?, ?
233
         SD ?, ?(?) @ hit<TLB>{loaded:PFN2} && PFN1 = PFN2
234
         // Template T4 (Exception)
235
         FPU ?, ?, ? @ Overflow
236
</pre>
237
238
h3. Branch Processing Unit
239
240
ADL-based instruction set specifications provide enough information for generating branching programs not getting caught in an endless loop. TPG is performed by enumeration (or random construction) of various _control flow graphs_ (_CFG_) and different _execution traces_ for those graphs. Enumerating execution traces for a given CFG is based on the bounded depth-first exploration of the branch execution tree. To guarantee a test to be executed according to a trace given, auxiliary instructions (so-called _control instructions_) are automatically inserted into the test program. Those instructions change the registers used by the branch instructions so as to fulfill the conditions specified in the trace.
241
242
The template below contains a branch instruction @BEQ@ (branch if equal) which is executed two times (when it is executed for the first time, the branch condition is true; then, the condition becomes false, and test execution is finished). The control instruction (@ADDI@) increments one of the registers used by the branch instruction (the initial values of the registers are equal).
243
244
<pre>
245
     INIT:
246
         ORI R1, R0, 2011
247
         ORI R2, R0, 2011 // R1 = R2
248
         ...
249
         J START
250
         NOP
251
         ...
252
     BACK:
253
         ADDI R1, R1, 1 // Control Instruction
254
         FPU  ?, ?, ?
255
     START:
256
         BEQ R1, R2, BACK @ trace = {true, false}
257
         LD ?, ?(?)
258
     STOP:
259
</pre>
260
261
Speaking about configuration of a _branch processing unit_ (_BPU_), we recognize two levels (namely, simple configuration and advanced configuration). The first of them is simply setting a number of _branch delay slots_ (i.e. instructions located immediately after a branch instruction, which are executed even if the preceding branch is taken). Sometimes, this information can be derived from the specifications, but sometimes it is absent. Advanced configuration describes more sophisticated mechanisms, like _branch prediction_.
262 20 Alexander Kamkin
263 35 Alexander Kamkin
_~Based on the paper ''Alexander Kamkin, Eugene Kornykhin and Dmitry Vorobyev. "Reconfigurable Model-Based Test Program Generator for Microprocessors", A-MOST, 2011.''~_