hoshi-lang dev
Yet another programming language
Loading...
Searching...
No Matches
The Hoshi-lang Intermediate Representation (IR) Handbook

Version 1.1

1. Introduction

1.1. Purpose

The Hoshi-lang Intermediate Representation (IR) is a high-level, stack-based representation of a Hoshi-lang program. It serves as the crucial middle layer in the compilation pipeline, sitting between the Abstract Syntax Tree (AST) generated by the parser and the final machine code or LLVM IR generated by the backend.

Its primary goals are:

  • Decoupling: To separate the language's frontend (parsing, semantic analysis) from its backend (code generation).
  • Simplicity: To represent program logic in a simple, linear, and easy-to-analyze format.
  • High-Level Abstraction: To retain high-level concepts from Hoshi-lang like objects, methods, and reference counting, making it easier to reason about and optimize than low-level IR.

1.2. Execution Model

Yoi IR is designed to be interpreted by a stack-based virtual machine. Each function has its own evaluation stack, referred to as the temporary value stack.

  • Instructions pop their operands from the top of this stack.
  • After execution, they push their results back onto the top of the stack.
  • All values on the stack are pointers to heap-allocated, reference-counted objects.

2. Core Concepts

2.1. <tt>IRModule</tt>

An IRModule is the top-level container for a compiled Hoshi-lang source file. It contains:

  • functionTable, structTable, interfaceTable, interfaceImplementationTable
  • globalVariables, stringLiteralPool, externTable

2.2. <tt>IRValueType</tt>

The IR has its own type system to represent Hoshi-lang types. All of these (except Raw types) represent heap-allocated objects.

Type Enum Description
integerObject A reference-counted integer object.
decimalObject A reference-counted decimal object.
booleanObject A reference-counted boolean object.
stringObject A reference-counted string object.
characterObject A reference-counted character object.
structObject An instance of a user-defined struct.
interfaceObject An instance of an interface, containing a this pointer and a v-table.
none The special singleton none object.
null A null pointer type (for language-level null).
pointerObject A generic void*-like pointer for internal use.
foreign*Type Types for FFI interoperability (e.g., foreignInt32Type).
*_Raw Internal types representing unboxed literal values.

3. Instruction Set Reference

The notation ..., in1, in2 -> ..., out describes the effect of an instruction on the temporary value stack.

3.1. Stack and Memory Operations

**push_integer <value> | push_decimal <value> | push_boolean <value> | push_string <idx> | push_null**

  • Stack: ... -> ..., new_object
  • Creates a new object of the corresponding type and pushes a pointer to it onto the stack.

**pop**

  • Stack: ..., value -> ...
  • Pops a value from the stack and decreases its reference count, effectively discarding it.

**load_local <var_index> | load_global <var_index>**

  • Stack: ... -> ..., value
  • Loads the object pointer from the specified variable, increases its reference count, and pushes the pointer onto the stack.

**store_local <var_index> | store_global <var_index>**

  • Stack: ..., value -> ...
  • Pops a pointer value. It then performs the full ARC release/retain cycle for the target variable before storing value into it. The stack's reference to value is consumed.

**load_member <member_index>**

  • Stack: ..., object -> ..., member_value
  • Pops an object, accesses the member at <member_index>, loads its value, increases the member's reference count, and pushes it onto the stack. The original object's reference is consumed.

**store_member <member_index>**

  • Stack: ..., object, value -> ...
  • Pops an object and a value. It performs the full ARC release/retain cycle for the member at <member_index> before storing value. Both object and value stack references are consumed.

3.2. Arithmetic and Logical Operations

These instructions pop one or two values, unbox them, perform the operation, and push a new object containing the result. The operand objects are consumed.

**add | sub | mul | div | mod**

  • Stack: ..., lhs, rhs -> ..., result

**negate**

  • Stack: ..., value -> ..., result

**equal | not_equal | less_than | less_equal | greater_than | greater_equal**

  • Stack: ..., lhs, rhs -> ..., booleanObject

3.3. Control Flow

**jump <block_index>**

  • Unconditionally transfers execution to the target block.

**jump_if_true <block_index> / jump_if_false <block_index>**

  • Stack: ..., condition -> ...
  • Pops a booleanObject, consumes it, and jumps if the condition is met.

**ret**

  • Stack: ..., value -> [exit]
  • Performs cleanup and returns value to the caller, transferring ownership.

**ret_none**

  • Stack: ... -> [exit]
  • Performs cleanup and returns the global singleton noneObject.

3.4. Object and Array Lifecycle

**new_struct <struct_index>**

  • Stack: ... -> ..., structObject
  • Allocates a new struct instance.

**new_interface <interface_index>**

  • Stack: ... -> ..., interfaceObject
  • Allocates an uninitialized interface shell.

**construct_interface_impl <impl_index>**

  • Stack: ..., interface_shell, struct_instance -> ..., constructed_interface
  • Populates the interface shell with the struct_instance's data and v-table, pushing the completed interface back.

**new_array_* <dims...>**

  • Stack: ..., elem1, elem2, ... -> ..., arrayObject
  • Pops N elements (where N is the total size from <dims...>) and creates a new fixed-size array object containing them.

**new_dynamic_array_* <initializer_size>**

  • Stack: ..., size, elem1, ... -> ..., arrayObject
  • Pops a size integer object, then pops <initializer_size> elements to create a new dynamic array.

**array_length**

  • Stack: ..., array -> ..., integerObject
  • Pops an array object, consumes it, and pushes a new integer object with the array's length.

**load_element**

  • Stack: ..., array, index -> ..., element
  • Pops an array and an index, consumes them, loads the element at that index, increases the element's ref-count, and pushes it to the stack.

**store_element**

  • Stack: ..., array, index, value -> ...
  • Pops an array, index, and value. Performs the full ARC cycle on the element at the target index before storing value there. All three stack references are consumed.

3.5. Type Operations

**basic_cast_***

  • Stack: ..., value -> ..., casted_value
  • Pops an object, consumes it, and attempts to convert its raw value to a new type (e.g., int to deci), pushing a new object with the result.

**pointer_cast**

  • Stack: ..., value -> ..., pointerObject
  • Pops an object, consumes it, and converts its pointer to a generic pointerObject.

**typeid_* <module_idx>, <type_idx>**

  • Stack: ... -> ..., integerObject
  • Pushes a new integer object containing the unique runtime ID for the specified type.

**dyn_cast_* <module_idx>, <type_idx>**

  • Stack: ..., interface_object -> ..., struct_object_or_null
  • Pops an interface_object, consumes it. Checks if the underlying concrete type matches the target struct type. If it matches, it returns a pointer to the concrete struct (with ref-count increased). Otherwise, it returns null.

**interfaceof**

  • Stack: ..., object, typeid -> ..., booleanObject
  • Pops an object and a typeid (as an integer object). Consumes them and checks if the object's type matches the ID. Pushes a new boolean object with the result.

3.6. Function and Method Calls

**invoke <func_index>, <arg_count>**

  • Stack: ..., argN, ..., arg1 -> ..., result_or_none
  • Static function call. Transfers ownership of arguments to the callee.

**invoke_virtual <vtable_index>, <arg_count>**

  • Stack: ..., argN, ..., arg1, interface_instance -> ..., result_or_none
  • Virtual method call. Dispatches through the interface's v-table.

**invoke_imported <lib_idx>, <func_idx>, <arg_count>**

  • Stack: ..., argN, ..., arg1 -> ..., result_or_none
  • Calls a function imported from an external library via FFI.

Its primary goals are:

  • Decoupling: To separate the language's frontend (parsing, semantic analysis) from its backend (code generation), allowing different backends to be targeted in the future.
  • Simplicity: To represent program logic in a simple, linear, and easy-to-analyze format.
  • High-Level Abstraction: To retain high-level concepts from hoshi-lang like objects, methods, and reference counting, making it easier to reason about and optimize than low-level IR.

1.2. Execution Model

Yoi IR is designed to be interpreted by a stack-based virtual machine. Each function has its own evaluation stack, referred to as the temporary value stack.

  • Instructions pop their operands from the top of this stack.
  • After execution, they push their results back onto the top of the stack.
  • All values on the stack are pointers to heap-allocated, reference-counted objects as defined by the language's memory model.

2. Core Concepts

2.1. <tt>IRModule</tt>

An IRModule is the top-level container for a compiled hoshi-lang source file. It contains all the necessary information to represent that module:

  • functionTable: A table of all functions defined within the module.
  • structTable: Definitions for all struct types.
  • interfaceTable: Definitions for all interface types.
  • interfaceImplementationTable: Definitions linking a struct to an interface it implements.
  • globalVariables: A table of all global variables.
  • stringLiteralPool: A pool of all string constants used in the module.
  • externTable: A table of symbols imported from other modules.

2.2. <tt>IRFunctionDefinition</tt>

This structure defines a single function, containing:

  • name: The unique, mangled name of the function.
  • argumentTypes: A list of types for its parameters.
  • returnType: The function's return type.
  • variableTable: A symbol table for all local variables and parameters.
  • codeBlock: A vector of IRCodeBlocks that make up the function's body.

2.3. <tt>IRCodeBlock</tt>

A code block is a sequence of IR instructions that are executed linearly. It is analogous to a "basic block" in other compilers. All control flow is achieved by jumping between these blocks.

2.4. <tt>IRValueType</tt>

The IR has its own type system to represent hoshi-lang types. All of these (except Raw types) represent heap-allocated objects.

Type Enum Description
integerObject A reference-counted integer object.
decimalObject A reference-counted decimal object.
booleanObject A reference-counted boolean object.
stringObject A reference-counted string object.
characterObject A reference-counted character object.
structObject An instance of a user-defined struct.
interfaceObject An instance of an interface, containing a this pointer and a v-table.
none The special singleton none object.
null A null pointer type (for language-level null).
virtualMethod A placeholder type for a method within an interface implementation.
*_Raw Internal types representing unboxed literal values.

2.5. <tt>IROperand</tt>

Operands are the arguments to an IR instruction.

Operand Type Description
integer An immediate 64-bit integer value.
decimal An immediate 64-bit floating-point value.
boolean An immediate boolean value.
stringLiteral An index into the module's stringLiteralPool.
codeBlock The index of a target IRCodeBlock for jump instructions.
index A generic index used for various tables (functions, structs, etc.).
localVar The index of a local variable in the current function's variableTable.
globalVar The index of a global variable in the module's globalVariables table.
externVar The index of an external symbol in the module's externTable.

3. Instruction Set Reference

The notation ..., in1, in2 -> ..., out describes the effect of an instruction on the temporary value stack.

3.1. Stack and Memory Operations

These instructions move data between the stack, local variables, global variables, and object members.

**push_integer <value>**

  • Stack: ... -> ..., integerObject
  • Allocates a new integer object on the heap, initializes it with <value>, sets its reference count to 1, and pushes the pointer to it onto the stack.

**push_decimal <value>**

  • Stack: ... -> ..., decimalObject
  • Similar to push_integer, but for decimal values.

**push_boolean <value>**

  • Stack: ... -> ..., booleanObject
  • Similar to push_integer, but for boolean values.

**push_string <string_index>**

  • Stack: ... -> ..., stringObject
  • Creates a new string object from the string literal at <string_index> in the string pool and pushes a pointer to it onto the stack.

**load_local <var_index>**

  • Stack: ... -> ..., value
  • Loads the object pointer from local variable <var_index>, increases its reference count, and pushes the pointer onto the stack.

**store_local <var_index>**

  • Stack: ..., value -> ...
  • Pops an object pointer value from the stack. It then:
    1. Increases the reference count of value.
    2. Loads the old object pointer from local variable <var_index> and decreases its reference count.
    3. Stores value into the local variable.
    4. Decreases the reference count of value (balancing the initial stack reference).

**load_global <var_index>**

  • Stack: ... -> ..., value
  • Same as load_local, but for global variables.

**store_global <var_index>**

  • Stack: ..., value -> ...
  • Same as store_local, but for global variables.

**load_member <member_index>**

  • Stack: ..., object -> ..., member_value
  • Pops an object pointer object. Accesses the member at <member_index>, loads its value, increases the member's reference count, and pushes it onto the stack. The original object's reference count is decreased.

**store_member <member_index>**

  • Stack: ..., object, value -> ...
  • Pops an object pointer object and a value pointer value. It performs the full ARC release/retain cycle for the member at <member_index> in object before storing value. Both object and value stack references are consumed (ref-count decreased).

3.2. Arithmetic and Logical Operations

These instructions pop one or two values, unbox them, perform the operation, and push a new object containing the result. The operand objects are consumed (ref-count decreased).

**add | sub | mul | div | mod**

  • Stack: ..., lhs, rhs -> ..., result
  • Pops two numeric objects, performs the operation on their raw values, and pushes a new object with the result.

**negate**

  • Stack: ..., value -> ..., result
  • Pops a numeric object, negates its raw value, and pushes a new object with the result.

**equal | not_equal | less_than | less_equal | greater_than | greater_equal**

  • Stack: ..., lhs, rhs -> ..., booleanObject
  • Pops two objects, compares their raw values, and pushes a new boolean object with the result (true or false).

3.3. Control Flow

**jump <block_index>**

  • Stack: ... -> ...
  • Unconditionally transfers execution to the code block at <block_index>.

**jump_if_true <block_index> / jump_if_false <block_index>**

  • Stack: ..., condition -> ...
  • Pops a booleanObject, consumes it, and checks its raw value. If the condition matches the instruction, it jumps to <block_index>. Otherwise, execution continues to the next instruction.

**ret**

  • Stack: ..., value -> [exit]
  • Performs cleanup (releases all local variables). Then, it pops value and returns it to the caller. The caller assumes ownership of the returned object's reference.

**ret_none**

  • Stack: ... -> [exit]
  • Performs cleanup (releases all local variables) and returns the global singleton noneObject.

3.4. Object Lifecycle

**new_struct <struct_index>**

  • Stack: ... -> ..., structObject
  • Allocates memory for a struct instance of the type at <struct_index>, sets its ref-count to 1, and pushes the pointer onto the stack.

**new_interface <interface_index>**

  • Stack: ... -> ..., interfaceObject
  • Allocates memory for an interface "shell" of the type at <interface_index>, sets its ref-count to 1, and pushes the pointer onto the stack. This shell is uninitialized and must be configured with construct_interface_impl.

**construct_interface_impl <impl_index>**

  • Stack: ..., interface_shell, struct_instance -> ..., constructed_interface
  • Pops an interface_shell and a struct_instance. It populates the interface shell:
    1. Stores a pointer to struct_instance inside the interface (this pointer).
    2. Increases the ref-count of struct_instance.
    3. Populates the interface's v-table with function pointers from the implementation definition at <impl_index>.
    4. Pushes the now-initialized interfaceObject back onto the stack.

3.5. Function and Method Calls

**invoke <func_index>, <arg_count>**

  • Stack: ..., argN, ..., arg1 -> ..., result_or_none
  • Pops <arg_count> arguments from the stack. Calls the function at <func_index> with these arguments. Ownership of the arguments is transferred to the callee. The return value (which may be the noneObject) is pushed onto the stack.

**invoke_virtual <vtable_index>, <arg_count>**

  • Stack: ..., argN, ..., arg1, interface_instance -> ..., result_or_none
  • Pops <arg_count-1> user arguments and the interface_instance.
  • Loads the concrete this pointer from the interface instance.
  • Loads the target function pointer from the v-table at <vtable_index>.
  • Calls the function pointer with the concrete this pointer as the first argument, followed by the user arguments.
  • Pushes the result onto the stack.

4. Full Example

hoshi-lang Code:

func add(a: int, b: int) : int {
return a + b
}

Generated Yoi IR:

func add(int, int) : int {
Variables {
#0 a(scope#0) : int
#1 b(scope#0) : int
}
block#0:
load_local #0 // Stack: ..., pA
load_local #1 // Stack: ..., pA, pB
add // Stack: ..., pResult
ret // Returns pResult
}

Explanation:

  1. load_local #0: The pointer to the integerObject for parameter a is loaded from local variable slot 0 and pushed to the stack. Its ref-count is incremented.
  2. load_local #1: The pointer for b is loaded from slot 1 and pushed. Its ref-count is incremented.
  3. add: Pops pA and pB. It unboxes their raw integer values, adds them, and creates a new integerObject, pResult, with the sum. pA and pB are released. The pointer pResult is pushed onto the stack.
  4. ret: Cleans up the function scope (releasing a and b). It then pops pResult and returns it to the caller, transferring ownership.