mr4th-dynamic-linking/xlist/DETAILS.txt

267 lines
8.9 KiB
Plaintext
Raw Permalink Normal View History

In this example I link everything through run-time linking. The upsides to this
are that everything dealing with the linking is in my code so I can tweak it
or debug it directly, and I don't have a mix of load-time and run-time linking
making the wranling of keyword abstraction and linker options simpler.
There are some downsides too, and in this example I show how I can mitigate
these downsides pretty well.
The two big downsides I address in this example are:
1. Symbol declaration and binding gets more difficult in C
2. Each binary requires some dynamic initialization
Finally after going over these problems in detail, I will present some details
of the solution I use in this example.
Like the *_linking examples, the expected output if the plugin loads
successfully is:
```
x = 0
provided by plugin: {
x = 1
x = 2
}
x = 3
```
And the expected output if the plugin is not found is:
```
x = 0
x = 1
```
############################# Symbol Declaration ##############################
The symbol declaration problem requires a bit of setup to fully appreciate.
Normally in C you think of your program code as header & implementation, or
declaration & definition.
So we might have a header file like:
`layer.h`
```
void* layer_a(int x);
void layer_b(void *a, void *m);
int layer_c(void *a);
```
And then an implementation file like:
`layer.c`
```
void* layer_a(int x){
// ...
}
void layer_b(void *a, void *m){
// ...
}
int layer_c(void *a){
// ...
}
```
Whether we are linking these statically (unity build) or across object files
(classic build) it's pretty easy, the header just gets included at all usage
and implementation sites as it is.
When we transition to linking across binaries, we hit a problem. The reason we
hit a problem is that with run-time linking we need to be directing our
calls through function pointers instead of through functions.
## Solution: reroute from function to function pointer ##
One way to handle this is to keep header and provide a different implementation
file:
`layer.dynamic.c`
```
void* layer_a(int x){
return(layer_funcs->layer_a(x));
}
void layer_b(void *a, void *m){
layer_funcs->layer_b(a, m);
}
int layer_c(void *a){
return(layer_funcs->layer_c(a));
}
```
Then on the side that defines the symbols, we still use `layer.c` but in any
binary that wants to load the symbols, we would use `layer.dynamic.c` which
reroutes the normal function calls to function pointers.
This `*.dynamic.c` file is pretty fatty though - requiring several lines of
pattern duplication for each function in the layer.
A metaprogramming system can help here if you want to go that route, but
putting another build program in the mix isn't exactly light weight either.
## Solution: call through function pointer table ##
Another option is to say that the place where the issue will pop-out is at
all the usage sites. Any code written as a user of the layer will switch from
calling the layer like this `layer_foo( ... )` to `layer->foo( ... )`.
In theory this eliminates maintenance work, but oh boy are we in for it the
first time we realize we have a helper that wants to work in both the context
of the layer's user AND the layer's definer. At that point we're either
duplicating the helper, which leads us to minimize the richness of helpers we
develop around the layer, or we put them in some kind of unifying
wrapper, which is the problem we were trying to solve in the first place.
## Solution: global function pointers ##
A third way to handle this is to define a new version of the header:
`layer.dynamic.h`
```
void* (*layer_a)(int x) = 0;
void (*layer_b)(void *a, void *m) = 0;
int (*layer_c)(void *a) = 0;
```
In this version each function symbol that the user wants to see gets replaced
with a global function pointer.
Now each usage site can looks like a function usage site. We still have some
maintenance burden increase like in the `*.dynamic.c` but not as much. What's
really nice about this version is we can actually generate this from an xlist.
We can't so easily use an xlist in the other case because the pattern expansion
is a little too heterogenous.
This is essentially what I do in this example. I generate the function pointers
from an xlist. I don't have a separate `base.dynamic.h` though, I just
put both versions of the function symbols in `base.h` and use the preprocessor
to select one or the other. This way they can easily have shared type
definitions and constants.
## Conclusion: Symbol Declaration ##
So the symbol declaration problem is really about deciding how to provide the
declarations that allow us to refer to symbols that get resolved dynamically.
This is only a problem for run-time linking because with load-time linking we
can just create regular function symbols with some special mark up. It would
be nice if C had anticipated this and gave us a better way to define these
run-time resolved symbols with the same basic syntax we use for regular
function symbols. But alas, that's not how it is.
########################### Dynamic Initialization ############################
The dynamic initialization problem is about how to maintain the run-time
linking code.
Imagine we have a 'base' layer like this:
`base.h`
```
void base_a(void);
void base_b(void);
void base_c(void);
```
If we just maintain the run-time linking with brute force our run-time linking
code would look *something* like this:
```
void base_init(void){
Library *library = library_open("base");
GET_PROC(base_a, library, "base_a");
GET_PROC(base_b, library, "base_b");
GET_PROC(base_c, library, "base_c");
}
```
The exact details depend on how you've solved the symbol declaration problem
and on the operating system APIs for loading and linking binaries.
We can easily clean up the maintenance burden of this part with an xlist, or
by using a single GET_PROC which then passes through a function pointer table
with the rest of the layer's functions.
The other part of dynamic initialization problem is deciding how we will ensure
the initialization actually gets done.
For instance, let's look at the layers used in this example 'base' 'plugin' and
'main'. Both 'plugin' and 'main' are users of 'base'. 'main' is responsible for
loading 'plugin' if it wants to, and for proceeding gracefully if the 'plugin'
is missing.
We need to ensure `base_init` gets called in each binary.
## Solution: manual initialization ##
For 'main' we would just say it's the responsibility of the entry point to call
`base_init`.
For 'plugin' we have two options. The first option is that when 'main' loads
the 'plugin' layer it is responsible for reaching into the module, finding its
`base_init` function and calling it. The second option is that the 'plugin'
module has an on-load entry point that calls `base_init`.
None of these options are "broken" but they do require some extra
hand shaking and protocol designing between all these binaries.
## Solution: automatic initialization ##
Another option is to have the 'base' layer itself provide the code that does
all of the initialization automatically for the users of the 'base' layer. In
order to do this 'base' will need to be able to write something like an
"on-load hook". A function that gets called automatically when the binary
loads. The code that defines the 'base' layer will only insert this hook into
binaries that are trying to run-time link to the 'base' layer definitions.
In the *_before_main examples I show how it is actually possible to do this in
C, although the details are admittedly sketchy in the case of Windows with the
CL compiler.
This basically lets us emulate the automatic linking provided by load-time
linking, but as a downside, it means we have less flexibility about how the
layer gets loaded.
################################## Solution ###################################
The big idea of my solution is to use an xlist to minimize maintenance burden
without bringing in a whole cloth code generator.
I put all the 'base' layer functions that will be run-time linked into an
xlist file `base.xlist.h`.
I also put the normal style of symbol definition list in `base.h`. Technically
I don't need this, I could just generate it from the xlist, but then I don't
have any "normal" looking version of the function declarations. The xlist is
highly reusable, suitable for almost every purpose, but it is not very
readable. Users of my code should be able to just skim some natural looking
C code with comments and formatting to understand the code they are using.
I setup a function pointer table `BASE_Funcs` so that I only have to export
one symbol from the 'base' layer implementor. That symbol fills and exposes
the function pointer table for run-time linking.
When the base layer is included in a binary that is not the implementor the
'base' layer generates a before-main hook to load the 'base' layer and
perform the run-time linking.
Thanks to this design neither 'main' nor 'plugin' have to do anything to
start using the 'base' layer except to include it.