Start C-Scripting today!
 
 
 
 
Go to file
Allen Webster 6c7c13f93b start a todo.txt; tweaks to example1; fix section size bug by disabling incremental linking 2025-04-24 17:01:06 -07:00
src start a todo.txt; tweaks to example1; fix section size bug by disabling incremental linking 2025-04-24 17:01:06 -07:00
.gitignore initialize 2025-04-17 12:56:46 -07:00
README.md first rough draft of README.md - first rough draft of example1 2025-04-17 17:53:23 -07:00
build.bat start a todo.txt; tweaks to example1; fix section size bug by disabling incremental linking 2025-04-24 17:01:06 -07:00
project.4coder first rough draft of README.md - first rough draft of example1 2025-04-17 17:53:23 -07:00
todo.txt start a todo.txt; tweaks to example1; fix section size bug by disabling incremental linking 2025-04-24 17:01:06 -07:00

README.md

C Scripting Introduction

"C Scripting" is a technique I developed to solve a problem in C programming. The problem doesn't have a well known name, and I think many programmers only experience this problem as an intuitively felt limitation, and not as a problem. To introduce this idea properly I will have to spell out the problem, which I will get to shortly. I named my answer to this problem "C Scripting" because it feels like a step towards the kind of expressive freedom that one gets by choosing, or hybridizing with, a scripting language. In this introduction I name and describe the problem then I describe the basic idea of a C Scripting system. Past the introduction, the repository is filled with example code, further discussions of design tradeoffs, and possibilities for future improvements of the technique.

Introduction: The Code-Data Binding Problem

The problem I am talking about is the problem of creating and maintaining a structure of code-data relationships. It'll be easier to see what I mean after some examples.

Think of what you do to setup a new program with just a command line interface. You have a set of flags to implement, each needs a name and/or an abbreviation, maybe some expected argument type, some help menu text, and of course some way of binding information from the interface into the "main computation".

CLIFlag = (name:String & abbr:String & arg_type:ArgType & desc:String & ???:???)

The first few pieces are easy enough to understand, but is that last piece that does the "binding into the main computation"? Let's answer that by looking at how systems like these tend to be done:

Immediate Mode CLI

{
  if (cli_flag(clictx, "fast", "F", ArgType_None, "go as fast as possible")){
    // do some init path stuff
  }
  if (cli_flag(clictx, "start_time", "S", ArgType_TimeStamp,
               "set the time where the output stream begins (>= 0:00) (default 0:00)")){
    TimeStamp time_stamp = cli_read_arg_TimeStamp(clictx);
    // do some init path stuff
  }
}

Here we put the data literals in as parameters to a function, and use the block of an if statement to attach some binding into the rest of the software. The hard part here will be using these expressions to run the help menu. There's a couple ways to do it, but it has to run a little "sideways" through the system.

Table Driven CLI

void cli_p_fast(CLICtx *ctx, void *arg){ /*...*/ }
void cli_p_state_time(CLICtx *ctx, void *arg){ /*...*/ }

CLIFlag flags[] = {
  { "fast",       "F", ArgType_None, "go as fast as possible", cli_p_fast },
  { "start_time", "S", ArgType_TimeStamp,
    "set the time where the output stream begins (>= 0:00) (default 0:00)", cli_p_state_time },
}

Here we put the data literals together with some "hooks" (function pointers) and can specify the whole thing as a global data table. Except the "binding into the main computation" is through this function pointer now, and the function body is unhappily far away from the declaration, and since the ArgType couples to the function body, you've got a new long-range matching problem to add to your maintenance burden. The nice thing here of course is your help menu is going to be trivial to write, and other comprehension tasks like checking for duplicate abbreviations will be easy too.

Metaprogramming

CLI_FLAG(fast, "F", ArgType_None, "go as fast as possible"){
  // ...
}
CLI_FLAG(start_time, "S", ArgType_TimeStamp,
         "set the time where the output stream begins (>= 0:00) (default 0:00)"){
  // ...
}

This is the sort of thing I did in the 4coder command system. You get the best of both worlds from the previous two systems, but you have to adopt an entire metaprogram into your build. The metaprogram scans the code for CLI_FLAG and parses out the details to generate a data table like in the second example. Then you build that data table into your final program. In the final program, the macro expands into a hook function with the name given by the first part of the macro, and the rest discarded. The downside now is that a whole new metaprogram is a heavy thing, and it looks like you'll need to expand that metaprogram for each one of these you use. With some refinement it might be pretty reusable -- but even one metaprogram is a downside.

There are many other solutions. You can bind with an enum and switch instead of hooks. You can bind with pointers to data output slots in the main context, and leave all the computation downstream. You can declare the flags in a table without code binding, then after the parse use extractors that read out the parsed data to effect the main context. The funny thing is, in all my years with C, none of them feel totally satisfying to work with, you're always trading off.

This comes up in a lot of cases too. In a plugin system, you usually want a plugin to register some new operations in one or multiple categories. How do you bind the new operations with their interface information in the plugin and process that in the core? In a game or simulation you may have systems like event handlers, entity behavior hooks, procedural generation nodes, or context specific rules.

Introduction: A C Scripting System

A C Scripting system allows for code written like this:

MY_THING_SCRIPT(foo, FooBarGroup, "Generates a foo and increments the foo counter"){
  out->name = str8_lit("foo");
  ctx->foo += 1;
}

And in a true C Scripting system, there are two more requirements. One it should work without any code generation. And two, there should be some kind of automated "registration" process that builds a data structure that lets me "comprehend" the entire set of these things. In other words there should be no manually written (or generated) list that looks like this:

{
  register_my_thing(mythings, &foo);
  register_my_thing(mythings, &bar);
}

There's no way to achieve this in standard C. However with one non-standard trick we can get there in a fairly satisfying way. And I've tested out clang and cl for compilers, clang and link for linkers, and various combinations of optimizations and link time optimizations, and found that this non-standard trick continues to behave as expected across the board. I have yet to test this out on a Linux or Mac machine, or on an a machine with an architecture other than an ARM64, so the research on this technique isn't finished but it's looking very good.

Data Sections

The plan is to set aside a new data section in the final executable that is populated only by instances of a particular struct. This is the key non-standardized compiler feature that I need. This data section then acts as an array of this struct, with one element in the array for each global you defined this way:

SECTION(".mything") MyThing foo = {
  "foo", MY_THING_GROUP(FooBarGroup), "Generates a foo and increments the foo counter", foo_hook
};
SECTION(".mything") MyThing bar = {
  "bar", MY_THING_GROUP(FooBarGroup), "Generates a bar and decrements the foo counter", bar_hook
};

That's the plan - but that alone isn't enough. This way I would still have to put the hook in by name, and leave it defined somewhere else. But the part that I want to emphasize here is that the data section .mything now contains an array of these MyThing structs. If we ever want to do work over all the MyThings we just have to locate that data section in our program -- and yes that is something we can do!

The other cool thing here is that these are still also global variables! They can be referenced by their name from anywhere, and what you get is like a hardcoded reference to a slot in that special array.

We can achieve this much with a macro like this:

#if COMPILER_CLANG || COMPILER_GCC
# define SECTION(N) __attribute__((__section__(N)))
#elif COMPILER_CL
# define SECTION(N) __declspec(allocate(N))
#endif

In the CL compiler you also need to insist that the data section exists by dropping this somewhere:

#pragma section(<section>,read,write)

Accessing the Data

That's how we'll declare the data, bind it to the code, and organize it into a comprehendible data structure. To access the data structure we'll need to be able to parse the data structure that describes the module's binary image and find the data section we're looking for. There are two steps to this:

  1. Locate the module's binary image - this involves a conversation with the operating system.
  2. Parse the image to find the data section - this involves knowledge of the binary file format.

Despite the fact that this sounds like asking you to dabble in some dark arts -- each of these is relatively easily achieved and abstracted. The final API can be as simple as: RangePtr selfimg_get_section(String8 name);

Personally I've pushed my own system to the point of automating even this part by sticking it in a BEFORE_MAIN when I setup the data section, so that my interface just looks like a typed global pointer and count that is initialized to point at the array I need. This is an additional trick - and isn't necessary. But I will discuss how and why I did it this way in discussions of the design tradeoffs and future plans.

Often, but not always, there's another step of processing that I want to run on the data in the array. Sometimes they are incomplete and the hook is actually a constructor that finishes the job. Sometimes they act as extensions to another array or at least relate to one another in ways that I want to index or construct in a higher data structure.

{
  RangePtr range = selfimg_get_section(str8_lit(".mything"));
  MyThing *my_thing = (MyThing*)range.first;
  U64 my_thing_count = ((MyThing*)range.first - (MyThing*)range.opl);
}

Wrapper Macro With Fused Hook Syntax

Next what we do is wrap the data section name and type together into a single macro that does the declaration. With preprocessor stringification and pasting, we can make your syntax more terse now:

#define MY_THING_DEF(name,group,desc,hook) \
 SECTION(".mything") MyThing name = { #name, MY_THING_GROUP(group), #desc, hook }

MY_THING_DEF(foo, FooBarGroup, "Generates a foo and increments the foo counter", foo_hook);
MY_THING_DEF(bar, FooBarGroup, "Generates a bar and decrements the foo counter", bar_hook);

Then I want to fuse the hook to the macro. To do this the last thing the macro expands has to be the function signature without a semi-colon or brace, and I need to be sure the hook is declared before the global variable in the special data section so that it can reference the function.

#define MY_THING_DEF(name,group,desc,hook) \
 void hook(MyThingCtx *ctx); \
 SECTION(".mything") MyThing name = { #name, MY_THING_GROUP(group), #desc, hook }; \
 void hook(MyThingCtx *ctx)
 
MY_THING_DEF(foo, FooBarGroup, "Generates a foo and increments the foo counter", foo_hook){
  // ...
}

MY_THING_DEF(bar, FooBarGroup, "Generates a bar and decrements the foo counter", bar_hook){
  // ...
}

And I like to take advantage of preprocessor pasting to free myself from the chore of naming the hook:

#define MY_THING_DEF(name,group,desc) \
 void my_thing_hook__##name(MyThingCtx *ctx); \
 SECTION(".mything") MyThing name = { #name, MY_THING_GROUP(group), #desc, my_thing_hook__##name }; \
 void my_thing_hook__##name(MyThingCtx *ctx)
 
MY_THING_DEF(foo, FooBarGroup, "Generates a foo and increments the foo counter"){
  // ...
}

MY_THING_DEF(bar, FooBarGroup, "Generates a bar and decrements the foo counter"){
  // ...
}

Reusable C Scripting System

The last challenge is to take this idea and make it possible to create and maintain many of these special arrays as needed. It turns out most of the setup and wrangling work can be offloaded to a generic set of macros that work the same way as this special case above works. I call one of these "special arrays" a "Symbol Set" and thus I call the headers that introduce this system my "Symbol Set" system. If you can see this, you are currently a member of mr4th.com, and so you should be able to see my the implementation as I have it in my codebase at:

https://git.mr4th.com/mr4th-members/mr4th/src/branch/main/src/mr4th_symbol_set.h https://git.mr4th.com/mr4th-members/mr4th/src/branch/main/src/mr4th_symbol_set.define.h

There are so many subtle design choices to make in this system, and it so deeply integrates with base layer concerns, that I suggest learning how to build your own. That's what the rest of this repository is for!

-- Allen Webster. April 17th, 2025