|
hoshi-lang dev
Yet another programming language
|
The Hoshi-lang Intermediate Representation (IR) is a high-level, stack-based representation of a Hoshi-lang program. It serves as the crucial middle layer in the compilation pipeline, sitting between the Abstract Syntax Tree (AST) generated by the parser and the final machine code or LLVM IR generated by the backend.
Its primary goals are:
Yoi IR is designed to be interpreted by a stack-based virtual machine. Each function has its own evaluation stack, referred to as the temporary value stack.
An IRModule is the top-level container for a compiled Hoshi-lang source file. It contains:
functionTable, structTable, interfaceTable, interfaceImplementationTableglobalVariables, stringLiteralPool, externTableThe IR has its own type system to represent Hoshi-lang types. All of these (except Raw types) represent heap-allocated objects.
| Type Enum | Description |
|---|---|
integerObject | A reference-counted integer object. |
decimalObject | A reference-counted decimal object. |
booleanObject | A reference-counted boolean object. |
stringObject | A reference-counted string object. |
characterObject | A reference-counted character object. |
structObject | An instance of a user-defined struct. |
interfaceObject | An instance of an interface, containing a this pointer and a v-table. |
none | The special singleton none object. |
null | A null pointer type (for language-level null). |
pointerObject | A generic void*-like pointer for internal use. |
foreign*Type | Types for FFI interoperability (e.g., foreignInt32Type). |
*_Raw | Internal types representing unboxed literal values. |
The notation ..., in1, in2 -> ..., out describes the effect of an instruction on the temporary value stack.
**push_integer <value> | push_decimal <value> | push_boolean <value> | push_string <idx> | push_null**
... -> ..., new_object**pop**
..., value -> ...**load_local <var_index> | load_global <var_index>**
... -> ..., value**store_local <var_index> | store_global <var_index>**
..., value -> ...value. It then performs the full ARC release/retain cycle for the target variable before storing value into it. The stack's reference to value is consumed.**load_member <member_index>**
..., object -> ..., member_valueobject, accesses the member at <member_index>, loads its value, increases the member's reference count, and pushes it onto the stack. The original object's reference is consumed.**store_member <member_index>**
..., object, value -> ...object and a value. It performs the full ARC release/retain cycle for the member at <member_index> before storing value. Both object and value stack references are consumed.These instructions pop one or two values, unbox them, perform the operation, and push a new object containing the result. The operand objects are consumed.
**add | sub | mul | div | mod**
..., lhs, rhs -> ..., result**negate**
..., value -> ..., result**equal | not_equal | less_than | less_equal | greater_than | greater_equal**
..., lhs, rhs -> ..., booleanObject**jump <block_index>**
**jump_if_true <block_index> / jump_if_false <block_index>**
..., condition -> ...booleanObject, consumes it, and jumps if the condition is met.**ret**
..., value -> [exit]value to the caller, transferring ownership.**ret_none**
... -> [exit]noneObject.**new_struct <struct_index>**
... -> ..., structObjectstruct instance.**new_interface <interface_index>**
... -> ..., interfaceObjectinterface shell.**construct_interface_impl <impl_index>**
..., interface_shell, struct_instance -> ..., constructed_interfacestruct_instance's data and v-table, pushing the completed interface back.**new_array_* <dims...>**
..., elem1, elem2, ... -> ..., arrayObjectN elements (where N is the total size from <dims...>) and creates a new fixed-size array object containing them.**new_dynamic_array_* <initializer_size>**
..., size, elem1, ... -> ..., arrayObjectsize integer object, then pops <initializer_size> elements to create a new dynamic array.**array_length**
..., array -> ..., integerObject**load_element**
..., array, index -> ..., elementarray and an index, consumes them, loads the element at that index, increases the element's ref-count, and pushes it to the stack.**store_element**
..., array, index, value -> ...array, index, and value. Performs the full ARC cycle on the element at the target index before storing value there. All three stack references are consumed.**basic_cast_***
..., value -> ..., casted_valueint to deci), pushing a new object with the result.**pointer_cast**
..., value -> ..., pointerObjectpointerObject.**typeid_* <module_idx>, <type_idx>**
... -> ..., integerObject**dyn_cast_* <module_idx>, <type_idx>**
..., interface_object -> ..., struct_object_or_nullinterface_object, consumes it. Checks if the underlying concrete type matches the target struct type. If it matches, it returns a pointer to the concrete struct (with ref-count increased). Otherwise, it returns null.**interfaceof**
..., object, typeid -> ..., booleanObjectobject and a typeid (as an integer object). Consumes them and checks if the object's type matches the ID. Pushes a new boolean object with the result.**invoke <func_index>, <arg_count>**
..., argN, ..., arg1 -> ..., result_or_none**invoke_virtual <vtable_index>, <arg_count>**
..., argN, ..., arg1, interface_instance -> ..., result_or_none**invoke_imported <lib_idx>, <func_idx>, <arg_count>**
..., argN, ..., arg1 -> ..., result_or_noneIts primary goals are:
hoshi-lang like objects, methods, and reference counting, making it easier to reason about and optimize than low-level IR.Yoi IR is designed to be interpreted by a stack-based virtual machine. Each function has its own evaluation stack, referred to as the temporary value stack.
An IRModule is the top-level container for a compiled hoshi-lang source file. It contains all the necessary information to represent that module:
functionTable: A table of all functions defined within the module.structTable: Definitions for all struct types.interfaceTable: Definitions for all interface types.interfaceImplementationTable: Definitions linking a struct to an interface it implements.globalVariables: A table of all global variables.stringLiteralPool: A pool of all string constants used in the module.externTable: A table of symbols imported from other modules.This structure defines a single function, containing:
name: The unique, mangled name of the function.argumentTypes: A list of types for its parameters.returnType: The function's return type.variableTable: A symbol table for all local variables and parameters.codeBlock: A vector of IRCodeBlocks that make up the function's body.A code block is a sequence of IR instructions that are executed linearly. It is analogous to a "basic block" in other compilers. All control flow is achieved by jumping between these blocks.
The IR has its own type system to represent hoshi-lang types. All of these (except Raw types) represent heap-allocated objects.
| Type Enum | Description |
|---|---|
integerObject | A reference-counted integer object. |
decimalObject | A reference-counted decimal object. |
booleanObject | A reference-counted boolean object. |
stringObject | A reference-counted string object. |
characterObject | A reference-counted character object. |
structObject | An instance of a user-defined struct. |
interfaceObject | An instance of an interface, containing a this pointer and a v-table. |
none | The special singleton none object. |
null | A null pointer type (for language-level null). |
virtualMethod | A placeholder type for a method within an interface implementation. |
*_Raw | Internal types representing unboxed literal values. |
Operands are the arguments to an IR instruction.
| Operand Type | Description |
|---|---|
integer | An immediate 64-bit integer value. |
decimal | An immediate 64-bit floating-point value. |
boolean | An immediate boolean value. |
stringLiteral | An index into the module's stringLiteralPool. |
codeBlock | The index of a target IRCodeBlock for jump instructions. |
index | A generic index used for various tables (functions, structs, etc.). |
localVar | The index of a local variable in the current function's variableTable. |
globalVar | The index of a global variable in the module's globalVariables table. |
externVar | The index of an external symbol in the module's externTable. |
The notation ..., in1, in2 -> ..., out describes the effect of an instruction on the temporary value stack.
These instructions move data between the stack, local variables, global variables, and object members.
**push_integer <value>**
... -> ..., integerObject<value>, sets its reference count to 1, and pushes the pointer to it onto the stack.**push_decimal <value>**
... -> ..., decimalObjectpush_integer, but for decimal values.**push_boolean <value>**
... -> ..., booleanObjectpush_integer, but for boolean values.**push_string <string_index>**
... -> ..., stringObject<string_index> in the string pool and pushes a pointer to it onto the stack.**load_local <var_index>**
... -> ..., value<var_index>, increases its reference count, and pushes the pointer onto the stack.**store_local <var_index>**
..., value -> ...value from the stack. It then:value.<var_index> and decreases its reference count.value into the local variable.value (balancing the initial stack reference).**load_global <var_index>**
... -> ..., valueload_local, but for global variables.**store_global <var_index>**
..., value -> ...store_local, but for global variables.**load_member <member_index>**
..., object -> ..., member_valueobject. Accesses the member at <member_index>, loads its value, increases the member's reference count, and pushes it onto the stack. The original object's reference count is decreased.**store_member <member_index>**
..., object, value -> ...object and a value pointer value. It performs the full ARC release/retain cycle for the member at <member_index> in object before storing value. Both object and value stack references are consumed (ref-count decreased).These instructions pop one or two values, unbox them, perform the operation, and push a new object containing the result. The operand objects are consumed (ref-count decreased).
**add | sub | mul | div | mod**
..., lhs, rhs -> ..., result**negate**
..., value -> ..., result**equal | not_equal | less_than | less_equal | greater_than | greater_equal**
..., lhs, rhs -> ..., booleanObjecttrue or false).**jump <block_index>**
... -> ...<block_index>.**jump_if_true <block_index> / jump_if_false <block_index>**
..., condition -> ...booleanObject, consumes it, and checks its raw value. If the condition matches the instruction, it jumps to <block_index>. Otherwise, execution continues to the next instruction.**ret**
..., value -> [exit]value and returns it to the caller. The caller assumes ownership of the returned object's reference.**ret_none**
... -> [exit]noneObject.**new_struct <struct_index>**
... -> ..., structObjectstruct instance of the type at <struct_index>, sets its ref-count to 1, and pushes the pointer onto the stack.**new_interface <interface_index>**
... -> ..., interfaceObjectinterface "shell" of the type at <interface_index>, sets its ref-count to 1, and pushes the pointer onto the stack. This shell is uninitialized and must be configured with construct_interface_impl.**construct_interface_impl <impl_index>**
..., interface_shell, struct_instance -> ..., constructed_interfaceinterface_shell and a struct_instance. It populates the interface shell:struct_instance inside the interface (this pointer).struct_instance.<impl_index>.interfaceObject back onto the stack.**invoke <func_index>, <arg_count>**
..., argN, ..., arg1 -> ..., result_or_none<arg_count> arguments from the stack. Calls the function at <func_index> with these arguments. Ownership of the arguments is transferred to the callee. The return value (which may be the noneObject) is pushed onto the stack.**invoke_virtual <vtable_index>, <arg_count>**
..., argN, ..., arg1, interface_instance -> ..., result_or_none<arg_count-1> user arguments and the interface_instance.this pointer from the interface instance.<vtable_index>.this pointer as the first argument, followed by the user arguments.hoshi-lang Code:
Generated Yoi IR:
Explanation:
load_local #0: The pointer to the integerObject for parameter a is loaded from local variable slot 0 and pushed to the stack. Its ref-count is incremented.load_local #1: The pointer for b is loaded from slot 1 and pushed. Its ref-count is incremented.add: Pops pA and pB. It unboxes their raw integer values, adds them, and creates a new integerObject, pResult, with the sum. pA and pB are released. The pointer pResult is pushed onto the stack.ret: Cleans up the function scope (releasing a and b). It then pops pResult and returns it to the caller, transferring ownership.