Project

General

Profile

Template Description Language

By Artemiy Utekhin and Andrei Tatarnikov

UNDER CONSTRUCTION

Introduction

MicroTESK generates test programs on the basis of test templates that provide an abstract description of scenarios to be reproduced by the generated programs. Test templates are created using the test template description language. It is a Ruby-based domain-specific language that provides facilities to describe test cases using symbolic names (that refer to a set of data satisfying certain conditions) instead of concrete input data and to manage the structure of the generated test programs. The language is implemented as a library that includes functionality for describing test templates and for further processing these test templates to produce a test program. MicroTESK uses the JRuby interpreter to process Ruby files. This allows Ruby libraries to interact with other components of MicroTESK written in Java.

How It Works

A test template in Ruby describes a test program in terms of the model of the target microprocessor ISA. The structure of the test program is described using built-in features of Ruby (conditions, loops, etc.) and facilities provided by MicroTESK libraries (blocks that help organize instruction sequences). To provide access to elements of the model such as instructions and their addressing modes, corresponding Ruby methods are created at runtime on the basis on the meta-information provided by the model. The test template subsystem interacts with the model and the testing library of MicroTESK to create a symbolic test program, simulate it on the model and generate its textual representation. Generally speaking, processing of a test template is performed in the following steps:

  • The model of the microprocessor is loaded;
  • Runtime methods to access architecture-specific elements are created on the basis of the model's meta-information;
  • The code of the test template is executed to build the internal representation of the template described as a hierarchy of code blocks;
  • Blocks are processed bottom-up to produce sequences of abstract instruction calls (at this step, their arguments can be described as a set of conditions instead of being assigned concrete values);
  • A symbolic test program is built on the basis of the produced abstract instruction call sequences by applying corresponding algorithms to find values satisfying the specified conditions;
  • The symbolic test program is simulated on the microprocessor model;
  • The code of the test program is generated and saved to the output file.

Configuration

Global settings for the test template subsystem are specified in the config.rb file. These settings are related to the package structure and dependencies of the subsystem. They are predefined and rarely need to be modified. Also, there are local settings that control processing of individual test templates. They are specified as member variables of the Template class. Test templates can override them to customize the behavior of the subsystem. The settings will be discussed in more detail in the "Writing Test Templates" section.

Running Test Program Generation

To start test program generation, a user needs to run the generate.sh script (Unix, Linux, OS X) or the generate.bat script (Windows) located in the bin folder. The script launches a Ruby program that processes the specified test template and produces a test program. The command to run the script has the following format:

generate <model name> <template file.rb> [<output file.asm>]

There are three parameters: (1) the name of the microprocessor model (generated by the Sim-nML Translator on the basis of Sim-nML specifications), (2) the name of the test template file to be processed and (3) the name of the test program file to be generated (optional, if it is skipped the program is printed to the console). For example, the following command processes the example.rb test template and saves the generated test program to the test.asm file:

sh bin/generate.sh cpu arch/demo/cpu/templates/example.rb test.asm

Writing Test Templates

Test Template Structure

A test template is implemented as a class inherited from the Template library class that provides access to all features of the library. Information on the location of the Template class is stored in the TEMPLATE environment variable. So, the definition of a test template class looks like this:

require ENV['TEMPLATE']

class MyTemplate < Template

Test template classes should contain implementations of the following methods:

  1. initialize (optional) - specifies settings for the given test template;
  2. pre (optional) - specifies the initialization code for the test program;
  3. post (optional) - specifies the finalization code for the test program;
  4. run - specifies the main code of the test program (test cases).

The definitions of optional methods can be skipped. In this case, the default implementations provided by the parent class will be used. The default implementation of the initialize method initializes the settings with default values. The default implementations of the pre and post methods do nothing.

The full interface of a test template looks as follows:

require ENV['TEMPLATE']

class MyTemplate < Template

  def initialize
    super
    # Initialize settings here 
  end

  def pre
    # Place your initialization code here
  end

  def post
    # Place your finalization code here
  end

  def run
    # Place your test problem description here
  end

end

Reusing Test Templates

It is possible to reuse code of existing test templates in other test templates. To do this, you need to subclass the template you want to reuse instead of the Template class. For example, the MyTemplate class below reuses code from the MyPrepost class that provides initialization and finalization code for similar test templates.

require ENV['TEMPLATE']
require_relative 'MyPrepost'

class MyTemplate < MyPrepost

  def run
  ... 
  end

end

Test Template Settings

Test templates use the following settings:

  1. Starting characters for single-line comments in the test program;
  2. Starting characters for multi-line comments in the test program;
  3. Terminating characters for multi-line comments in the test program;
  4. Indentation token;
  5. Token used in separator lines.

Here is how these settings are initialized with default values in the Template class:

@sl_comment_starts_with = "//" 
@ml_comment_starts_with = "/*" 
@ml_comment_ends_with   = "*/" 

@indent_token    = "\t" 
@separator_token = "=" 

The settings can be overridden in the initialize method of a test template. For example:

class MyTemplate < Template

  def initialize
    super
    @sl_comment_starts_with = ";" 
    @ml_comment_starts_with = "/=" 
    @ml_comment_ends_with   = "=/" 

    @indent_token    = "  " 
    @separator_token = "*" 
  end
  ...
end

Data Definitions

Describing data requires the use of assembler-specific directives. Information of these directives in not included in ISA specifications and should be provided in test templates. It includes textual format of data directives and mappings between nML and assembler data types used by these directives. Configuration information on data directives is specified in the data_config block, which is usually placed in the pre method. Only one such block per template is allowed. Here is an example:

data_config(:text => '.data', :target => 'M', :addressableSize => 8) {
  define_type :id => :byte, :text => '.byte', :type => type('card', 8)
  define_type :id => :half, :text => '.half', :type => type('card', 16)
  define_type :id => :word, :text => '.word', :type => type('card', 32)

  define_space :id => :space, :text => '.space', :fillWith => 0
  define_ascii_string :id => :ascii, :text => '.ascii', :zeroTerm => false
  define_ascii_string :id => :asciiz, :text => '.asciiz', :zeroTerm => true
}

The block takes the following parameters (compulsory):

  1. text - specifies the keyword that marks the beginning of the data section of the generated test program;
  2. target - specifies the memory array defined in the nML specification to which data will be placed during simulation;
  3. addressableSize - specifies the size (in bits) of addressable memory locations.

To set up particular directives, the language provides special methods that must be called inside the block. All the methods share two common parameters: id and text. The first specifies the keyword to be used in a test template to address the directive and the second specifies how it will be printed in the test program. The current version of MicroTESK provides the following methods:

  1. define_type - defines a directive to allocate memory for a data element of an nML data type specified by the type parameter;
  2. define_space - defines a directive to allocate memory (one or more addressable locations) filled with a default value specified by the fillWith parameter;
  3. define_ascii_string - defines a directive to allocate memory for an ASCII string terminated or not terminated with zero depending on the zeroTerm parameter.

The above example defines the directives byte, half, word, ascii (non-zero terminated string) and asciiz (zero terminated string) that place data in the memory array M (specified in nML using the mem keyword). The size of an addressable memory location is 8 bits (or 1 byte).

After all data directives are configured, data can be defined using the data block:

data {
  label :data1
  byte 1, 2, 3, 4

  label :data2
  half 0xDEAD, 0xBEEF

  label :data3
  word 0xDEADBEEF

  label :hello
  ascii  'Hello'

  label :world
  asciiz 'World'

  space 6
}

In this example, data is placed into memory. Data items are aligned by their size (1 byte, 2 bytes, 4 bytes). Strings are allocated at the byte border (addressable unit). For simplicity, in the current version of MicroTESK, memory is allocated starting from the address 0 (in the memory array of the executable model).


Instruction Calls

The pre, post and run methods of a test template class contain specifications of instruction call sequences. Instruction calls are specified using the instruction and addressing mode abstractions. Instructions are self-explanatory, they simply represent target assembler instructions. Every instruction argument is a parameterized addressing mode that explains the meaning of the provided values. For example, an addressing mode can refer to a register, a memory location or hold an immediate value. In other words, an instruction call is an instruction that uses appropriate addressing modes initialized with appropriate values. The format of an instruction call description looks like this:

instruction addr_mode1(:arg1_1 => value1_1, :arg1_2 => value1_2, ...), addr_mode2(:arg2_1 => value2_1, ...), ...

This format implies that addressing modes are parameterized with hash tables where they key is in the name of the addressing mode parameter and the value is the value to be assigned to this parameter. Also, there is a shorter format based on methods with a variable number of arguments. In this case, values are expected to come in the same order as corresponding parameter definitions. The shorter format looks like this:

instruction addr_mode1(value1_1, value1_2, ...), addr_mode2(value2_1, ...), ...

The code below demonstrates both approaches:

mov reg(:i => 0), imm(:i => 0xFF) # The use of hash maps
mov reg(0), imm(0xFF)             # The use of variable numbers of arguments

Instruction Call Blocks

TODO: REWRITE

Basic features

The two core abstractions used by MicroTESK parser/simulator and Ruby-TDL are an instruction and an addressing mode. An instruction is rather self-explanatory, it simply represents a target assembler instruction. Every argument of an instruction is a parametrized addressing mode that explains the meaning of the provided values to the simulator. The mode could point to the registers, for instance, or to a specific memory location. It can also denote an immediate value - e.g. a simple integer or a string. Thus, a basic template is effectively a sequence of instructions with parametrized addressing modes as their arguments.

Each template is a class that inherits a basic Template class that provides most of the core Ruby-TDL functionality. So, to write a template you need to subclass Template first:

require_relative "_path-to-the-rubymt-library_/mtruby" 

class MyTemplate < Template

While processing a template Ruby-TDL calls its pre, run and post methods, loosely meaning the pre-conditions, the main body and the post-conditions. The pre method is mostly useful for setup common to many templates, the post method will be more important once sequential testing is introduced. Most of the template code is supposed to be in the run method. Thus, a template needs to override one or more of these methods, most commonly run.

To get pre and post over with, the most common usage of these is to make a special non-executable class and then subclass it with the actual templates:

require_relative "_path-to-the-rubymt-library_/mtruby" 

class MyPrepost < Template
  def initialize
    super
    @is_executable = no
  end

  def pre
    # Your 'startup' code goes here
  end

  def post
    # Your 'cleanup' code goes here
  end
end
require_relative "_path-to-the-rubymt-library_/mtruby" 

class MyTemplate < MyPrepost
  def initialize
    super
    @is_executable = yes
  end

  def run
    # Your template code goes here
  end
end

These methods essentially contain the instructions. The general instruction format is slightly more intimidating than the native assembler and looks like this:

instruction_name addr_mode1(:arg1_1 => value, :arg1_2 => value, ...), addr_mode2(:arg2_1 => value, ...), ...

So, for instance, if the simulator has an ADD, MEM|IMM) instruction, it would look like:

add mem(:i => 42), imm(:i => 128)

Thankfully, there are shortcuts. If there's only one argument expected in the addressing mode, you can simply write its value and never have to worry about the argument name. And, by convention, the immediate values are always denoted in the simulator as the IMM addressing mode, so the template parser automatically accepts numbers and strings as such. Thus, in this case, the instruction can be simplified to:

add mem(42), 128

As a matter of fact, if you're sure about the order of addressing mode arguments, you can omit the names altogether and simply provide the values:

instruction_name addr_mode1(value1, value2, ...) ...

If the name of the instruction conflicts with an already existing Ruby method, the instruction will be available with an op_ prefix before its name.

Test situations

This section is to be taken with a grain of salt because the logic and the interface behind the situations is not yet finalized and mostly missing from the templates and shouldn't be used yet

Big TODO: define what is a test situation

To denote a test situation, add a Ruby block that describes situations to an instruction, this will loosely look like this (likely similar to the way the addressing modes are denoted):

sub mem(42), mem(21) do overflow(:op1 => 123, :op2 => 456) end

Instruction blocks

Sometimes a certain test situation should influence more than just one instruction. In that case, you can pass the instructions in an atomic block that can optionally accept a Proc of situations as its argument (because Ruby doesn't want to be nice and allow multiple blocks for a method, and passing a Hash of Proc can hardly be called comfortable).

p = lambda { overflow(:op1 => 123, :op2 => 456) }

atomic p {
  mov mem(25), mem(26)
  add mem(27), 28
  sub mem(29), 30
}

Groups and random selections (N.B. REMOVED in r1923. The implementation does not work in the current build and, therefore, was removed. The described features must be reviewed and reimplemented if required.)

From source code comments:

# VERY UNTESTED leftovers from the previous version ("V2", this is V3)
# Should work with the applied fixes but I'd be very careful to use these

# As things stand this is just a little discrete probability utility that
# may or may not find its way into the potential ruby part of the test engine

There are certain ways to group together or randomize addressing modes and instructions.

To group several addressing modes together (this only works if they have similar arguments) create a mode group like this:

mode_group "my_group" [:mem, :imm]

You can also set weights to each of the modes in the group like this:

mode_group "my_group" {:mem => 1.5, :imm => 2.5}

The name of the group is converted into a method in the Template class. To select a random mode from a group, use sample on this generated method:

add mem(42), my_group.sample(21)

TODO: sampling already parametrized modes

The first method of grouping instructions works in a similar manner with the same restrictions on arguments:

group "i_group" [:add, :sub]
group "i_group" {:add => 0.3, :sub => 0.7]
i_group.sample mem(42), 21

You can also run all of the instructions in a group at once by using the all method:

i_group.all mem(42), 21

The second one allows you to create a normal block of instructions, setting their arguments separately.

block_group "b_group" do
  mov mem(25), mem(26)
  add mem(27), 28
  sub mem(29), 30
end

In this case to set weights you should call a prob method before every instruction:

block_group "b_group" do
  prob 0.1
  mov mem(25), mem(26)
  prob 0.7
  add mem(27), 28
  prob 0.4
  sub mem(29), 30
end

The usage is almost identical, but without providing the arguments as they are already set:

b_group.sample
b_group.all

Not sure how does it work inside atomics when the group is defined outside, needs more consideration

TODO: Permutations

Any normal Ruby code is allowed inside the blocks as well as the run-type methods, letting you write more complex or inter-dependent templates.

TODO: Labels

To set a label write:

label :label_name

To use a label in an instruction that accepts one (under the hood it's just a simple immediate #IMM value - just not a pre-defined one until it's actually defined):

b greaterThan, :label_name

TODO: Debug

To get a value from registers use:

get_reg_value("register_name", index)

Right now the pre-processing and the execution of instructions are separated due to ambiguous logic regarding labels and various blocks and atomics. This may be changed later, so these special debugging blocks might become unnecessary. By default what's written in the template is run during pre-processing so you have to use special blocks if you want to run some Ruby code during the execution stage, most likely some debugging.

To print some debug in the console during the execution of the instructions use the exec_debug block:

exec_debug {
  puts "R0: " + get_reg_value("GPR", 0).to_s + ", R1: " + get_reg_value("GPR", 1).to_s# + ", label code: " + self.send("cycle" + ind.to_s).to_s
}

To save something that depends on the current state of the simulator to the resulting assembler code use exec_output that should return a string:

exec_output {
  "// The result should be " + self.get_reg_value("GPR", 0).to_s
}