write DETAILS files for each major example

2024-04-24 12:04:52 -07:00 · 2024-04-24 12:04:52 -07:00 · 0b6adfde1e
parent 5c44c148cd
commit 0b6adfde1e
13 changed files with 566 additions and 18 deletions
--- a/GUIDE.txt
+++ b/GUIDE.txt
@ -1,11 +1,42 @@
+#################################### Hello ####################################
+
 The main goal of this investigation is to organize shared data and code across 
 multiple binary files. This is especially important for something like a base
-layer that will be used in a program that supports hot-reloading or plugins.
+layer that is used in a program that supports hot-reloading and/or plugins.

-Each isolated example in this repository explores a way to set up the base
-layer, plugin, and main program.
+This repository contains examples of source and build details needed to achieve
+dynamic linking (shared data and code) on Windows, and Linux, with the cl, gcc,
+and clang compilers.

-The examples:
+To start exploring an example navigate into the folder of the example and run
+the build script from your command line. It should generate an executable in
+a build/ folder. Each example also has a DETAILS.txt with info about the
+details that go into the example's structure and the expected output of the
+example program.
+
+
+
+################################### Topics ####################################
+
+Looking for a specific topic? This index tells you which example to jump to.
+
+exporting symbols             ->   win32_linking, linux_linking
+load-time import symbols      ->   win32_linking, linux_linking
+run-time linking symbols      ->   win32_linking, linux_linking
+building a .dll file          ->   win32_linking
+building a .so file           ->   linux_linking
+module initialization         ->   win32_before_main, linux_before_main
+cl build line options         ->   win32_linking
+gcc build line options        ->   linux_linking
+linux load-time search paths  ->   linux_linking
+clang build line options      ->   clang
+abstracted base layer         ->   xlist
+
+
+
+################################ The Examples #################################
+
+An explanation of the main ideas in each example.

 *_linking - Concrete examples for each operating system showing how to 
            setup and use various types of dynamic linking. In these
--- a/clang/build_win32_before_main.bat
+++ b/clang/build_win32_before_main.bat
@ -11,4 +11,4 @@ cd build


 REM: Build
-clang %src%\win32_before_main.c
+clang %src%\win32_before_main.c -o win32_before_main.exe
--- a/clang/build_win32_before_main_v2.bat
+++ b/clang/build_win32_before_main_v2.bat
@ -0,0 +1,18 @@
+@echo off
+
+REM: It turns out the gcc syntax __attribute__((constructor)) works
+REM: on clang even for windows builds. You can run this script to
+REM: see for yourself.
+
+REM: Path setup
+
+cd ..\linux_before_main
+SET src=%cd%
+cd ..
+if not exist "build\" mkdir build
+cd build
+
+
+
+REM: Build
+clang %src%\linux_before_main.c -o win32_before_main_v2.exe
--- a/linux_before_main/DETAILS.txt
+++ b/linux_before_main/DETAILS.txt
@ -0,0 +1,9 @@
+gcc makes this really easy with a straightforward compiler extension. All we
+have to do is write a regular `void f(void)` function and mark it with:
+
+__attribute__((constructor))
+
+
+
+Literally that's it. Someone tell Microsoft how cool this is!
+
--- a/linux_before_main/linux_before_main.c
+++ b/linux_before_main/linux_before_main.c
@ -12,3 +12,16 @@ int main(){
  printf("x = %d\n", x);
  return(0);
 }
+
+
+#if 0
+// wrapped in a macro:
+
+#define BEFORE_MAIN(n) \
+__attribute__((constructor)) static void n(void)
+
+BEFORE_MAIN(before_main_rule){
+  // do work here
+}
+
+#endif
--- a/linux_linking/DETAILS.txt
+++ b/linux_linking/DETAILS.txt
@ -0,0 +1,117 @@
+linux_main.c defines the executable linux_main.exe
+ it depends on load-time linking with linux_base.so
+ it tries to perform run-time linking with linux_plugin.so
+
+linux_base.c defines the binary linux_base.so
+
+linux_plugin.c defines the binary linux_plugin.so
+ it depends on load-time linking with linux_base.so
+
+The build script has to build linux_base.c first because it needs the
+ results of that build to setup the load-time linking in the other builds.
+ The linux_base.so is used  to resolve load-time imported symbols.
+
+The program has a shared data structure 'int x' in the 'base' layer that is
+ read and modified from both the 'main' layer and 'plugin' layer.
+
+The expected output if the plugin loads successfully is:
+
+```
+x = 0
+provided by plugin: {
+ x = 1
+ x = 2
+}
+x = 3
+```
+
+The expected output if the plugin is not found is:
+
+```
+x = 0
+x = 1
+```
+
+The default Linux search paths for loading binaries do not include the
+current directory of the process, or the path to the executable binary file.
+It is possible to get it to behave like Windows binary loading, but some
+extra steps need to be taken. (Load-Time Search Paths) (Run-Time Search Paths)
+
+
+########################### Load-Time Search Paths ############################
+
+For load-time binary dependencies, we can actually bake extra search paths
+into a binary. GCC's options do not cover this, but there is a backdoor in
+GCC for talking right to the underlying linker (ld).
+
+The backdoor is the option -Wl (lowercase L). The syntax of this option is a
+little unusual. As soon as a space occurs the backdoor is closed, so the
+entire option has to be specified without any spaces. Since we need spaces,
+the backdoor lets us use commas. It will remove the commas and replace them
+with spaces before passing the command on to ld. It looks something like this:
+
+gcc ... -Wl,-option,value ...
+
+The specific option we want to pass through this way is -rpath. This option
+tells the linker to bake a path into the search paths of the binary, so the
+syntax for specifying a path this way with the backdoor syntax is:
+
+gcc ... -Wl,-rpath,loadpath ...
+
+In order to get the same behavior as we have on Windows, we want the path to
+be relative to the binary itself. This can be done using the special syntax
+'$ORIGIN/' as the path. But there is a problem here too. The dollar sign
+already has a meaning in the shell, so to actually pass a raw dollar sign we
+actually have to escape it with a backslash. Putting it all together the option
+looks like this:
+
+gcc ... -Wl,-rpath,\$ORIGIN/ ...
+
+If that seems like a lot, that's because it is A LOT.
+
+It's also pretty atypical to do things this way on Linux where the system tries
+to have a specific place to put all the different pieces of executables. In
+particular you might try to put the executable in a 'bin/' folder and the
+shared object binaries in a 'lib/' folder. Then you would use the binary
+relative path like this:
+
+gcc ... -Wl,-rpath,\$ORIGIN/../lib ...
+
+
+############################ Run-Time Search Paths ############################
+
+The search paths for dlopen only include the system binary paths by default.
+This matters if we are calling dlopen like this:
+
+dlopen("mylayer.so", flags)
+
+In this case the search paths will not include any binary relative rules or
+current directory relative rules.
+
+
+We can use $ORIGIN like we did in the load-time case to specify paths relative
+to the calling binary:
+
+dlopen("$ORIGIN/mylayer.so", flags)
+
+Since this version is not just a base file name, the system search paths are
+ignored and the path indicated by $ORIGIN is inspected directly.
+
+
+We can also use . to specify the current directory like we do on the command
+line when we call a script or run an executable in the current directory:
+
+dlopen("./mylayer.so", flags)
+
+In this case the system will look exactly in the current directory.
+
+
+Finally we can use full paths to specify a directory without ambiguity:
+
+dlopen("/home/username/pluginproject/mylayer.so", flags)
+
+The nice thing about this option is that it means we can perform our own
+search on the file system, and assemble a full path to get exactly what we want
+if we have to.
+
+
--- a/linux_linking/linux_main.c
+++ b/linux_linking/linux_main.c
@ -21,7 +21,7 @@ int main(){
  
  
  // to call a function with run-time linking, we must manually load and link it
-  void *module = dlopen("./linux_plugin.so", RTLD_NOW);
+  void *module = dlopen("$ORIGIN/linux_plugin.so", RTLD_NOW);
  if (module != 0){
    GET_PROC(plugin_func, module, "plugin_func");
  }
--- a/win32_before_main/DETAILS.txt
+++ b/win32_before_main/DETAILS.txt
@ -0,0 +1,39 @@
+Being able to run code 'before main' isn't just a magic trick. In a situation
+where there may be more than one layer with dynamic linking and more than one
+.dll (plugin system for instance), the maintenance burden of setting up each
+layer in a central DllMain or main is significant enough to be a burden.
+
+It is possible to get this effect on Windows through the CL compiler, but it
+would be a stretch to say that it is "supported". The way I show here works
+by relying on the fact that a special section does exist that contains function
+pointers that run before 'main' or 'DllMain'. We can use CL's compiler
+extensions to add a function pointer to that section just by declaring it as a
+global variable and marking it up with:
+
+__declspec(allocate(".CRT$XCU"))
+
+If you look up this method on the internet, you will find claims that under
+certain types of whole-program optimization, this won't work. In particular
+this happens if you use the option /GL in CL.
+
+This happens because the global variable appears to be unused from the
+perspective of the compiler & linker. Since it is never directly referenced,
+there is no C-level semantical reason to think this global variable is doing
+anything.
+
+However, in this example I show how we can still make it work. We have to
+make sure the linker won't eliminate the global variable that we are trying to
+place into the ".CRT$XCU" section. I achieve this by marking it as an export
+symbol. Export symbols can't be eliminated even if they aren't used locally.
+
+From what I've seen in testing, this works as desired, even with the /GL option.
+
+IMPORTANT RESTRICTION: Because this creates an export symbol, each time we use
+this within a binary it must have a unique name. Generally I would recommend
+naming before-main symbol by scoping it to the layer where it exists.
+
+
+CLANG NOTE: Interestingly, clang can build this, but it can also use the
+__attribute__((constructor)) extension on Windows, which is a lot closer to
+"supporting" this feature. I suspect that When I am building with clang I will
+prefer to go with this option most of the time.
--- a/win32_before_main/win32_before_main.c
+++ b/win32_before_main/win32_before_main.c
@ -7,7 +7,7 @@ static void run_before_main_func(void);

 // set the before-main execution function pointer
 __declspec(allocate(".CRT$XCU"))
-__pragma(comment(linker, "/INCLUDE:run_before_main_ptr"))
+__declspec(dllexport)
 void (*run_before_main_ptr)(void) = run_before_main_func;

 // define the "before main" function
@ -22,3 +22,19 @@ int main(){
  printf("x = %d\n", x);
  return(0);
 }
+
+
+#if 0
+// wrapped in a macro:
+
+#define BEFORE_MAIN(n) static void n(void); \
+__declspec(allocate(".CRT$XCU"))           \
+__declspec(dllexport)                      \
+void (*n##__)(void) = n;                   \
+static void n(void)
+
+BEFORE_MAIN(before_main_rule){
+  // do work here
+}
+
+#endif
--- a/win32_linking/DETAILS.txt
+++ b/win32_linking/DETAILS.txt
@ -0,0 +1,41 @@
+win32_main.c defines the executable win32_main.exe
+ it depends on load-time linking with win32_base.dll
+ it tries to perform run-time linking with win32_plugin.dll
+
+win32_base.c defines the binary win32_base.dll
+
+win32_plugin.c defines the binary win32_plugin.dll
+ it depends on load-time linking with win32_base.dll
+
+The build script has to build win32_base.c first because it needs the
+ results of that build to setup the load-time linking in the other builds.
+ The win32_base.lib that is generated along with win32_base.dll is used
+ to resolve load-time imported symbols.
+
+The program has a shared data structure 'int x' in the 'base' layer that is
+ read and modified from both the 'main' layer and 'plugin' layer.
+
+The expected output if the plugin loads successfully is:
+
+```
+x = 0
+provided by plugin: {
+ x = 1
+ x = 2
+}
+x = 3
+```
+
+The expected output if the plugin is not found is:
+
+```
+x = 0
+x = 1
+```
+
+You should be able to relocate the plugin and use a full path to it and still
+get the first result. As long as the load-time dependency win32_base.dll is
+with the executable, it will load. It can be found in some other paths, but
+you cannot specify the search paths manually, so keeping it with the executable
+is the simplest option.
+
--- a/xlist/DETAILS.txt
+++ b/xlist/DETAILS.txt
@ -0,0 +1,266 @@
+In this example I link everything through run-time linking. The upsides to this
+are that everything dealing with the linking is in my code so I can tweak it 
+or debug it directly, and I don't have a mix of load-time and run-time linking
+making the wranling of keyword abstraction and linker options simpler.
+
+There are some downsides too, and in this example I show how I can mitigate
+these downsides pretty well.
+
+The two big downsides I address in this example are:
+ 1. Symbol declaration and binding gets more difficult in C
+ 2. Each binary requires some dynamic initialization
+
+Finally after going over these problems in detail, I will present some details
+of the solution I use in this example.
+
+Like the *_linking examples, the expected output if the plugin loads
+successfully is:
+
+```
+x = 0
+provided by plugin: {
+ x = 1
+ x = 2
+}
+x = 3
+```
+
+And the expected output if the plugin is not found is:
+
+```
+x = 0
+x = 1
+```
+
+
+
+############################# Symbol Declaration ##############################
+
+The symbol declaration problem requires a bit of setup to fully appreciate.
+
+Normally in C you think of your program code as header & implementation, or
+declaration & definition.
+
+So we might have a header file like:
+
+`layer.h`
+
+```
+void* layer_a(int x);
+void  layer_b(void *a, void *m);
+int   layer_c(void *a);
+```
+
+And then an implementation file like:
+
+`layer.c`
+
+```
+void* layer_a(int x){
+ // ...
+}
+
+void  layer_b(void *a, void *m){
+ // ...
+}
+
+int   layer_c(void *a){
+ // ...
+}
+```
+
+Whether we are linking these statically (unity build) or across object files
+(classic build) it's pretty easy, the header just gets included at all usage
+and implementation sites as it is.
+
+When we transition to linking across binaries, we hit a problem. The reason we
+hit a problem is  that with run-time linking we need to be directing our
+calls through function pointers instead of through functions.
+
+
+## Solution: reroute from function to function pointer ##
+
+One way to handle this is to keep header and provide a different implementation
+file:
+
+`layer.dynamic.c`
+
+```
+void* layer_a(int x){
+ return(layer_funcs->layer_a(x));
+}
+
+void  layer_b(void *a, void *m){
+ layer_funcs->layer_b(a, m);
+}
+
+int   layer_c(void *a){
+ return(layer_funcs->layer_c(a));
+}
+```
+
+Then on the side that defines the symbols, we still use `layer.c` but in any
+binary that wants to load the symbols, we would use `layer.dynamic.c` which
+reroutes the normal function calls to function pointers.
+
+This `*.dynamic.c` file is pretty fatty though - requiring several lines of
+pattern duplication for each function in the layer.
+
+A metaprogramming system can help here if you want to go that route, but
+putting another build program in the mix isn't exactly light weight either.
+
+
+## Solution: call through function pointer table ##
+
+Another option is to say that the place where the issue will pop-out is at
+all the usage sites. Any code written as a user of the layer will switch from
+calling the layer like this `layer_foo( ... )` to `layer->foo( ... )`.
+
+In theory this eliminates maintenance work, but oh boy are we in for it the
+first time we realize we have a helper that wants to work in both the context
+of the layer's user AND the layer's definer. At that point we're either
+duplicating the helper, which leads us to minimize the richness of helpers we
+develop around the layer, or we put them in some kind of unifying
+wrapper, which is the problem we were trying to solve in the first place.
+
+
+## Solution: global function pointers ##
+
+A third way to handle this is to define a new version of the header:
+
+`layer.dynamic.h`
+
+```
+void* (*layer_a)(int x) = 0;
+void  (*layer_b)(void *a, void *m) = 0;
+int   (*layer_c)(void *a) = 0;
+```
+
+In this version each function symbol that the user wants to see gets replaced
+with a global function pointer.
+
+Now each usage site can looks like a function usage site. We still have some
+maintenance burden increase like in the `*.dynamic.c` but not as much. What's
+really nice about this version is we can actually generate this from an xlist.
+We can't so easily use an xlist in the other case because the pattern expansion
+is a little too heterogenous.
+
+This is essentially what I do in this example. I generate the function pointers
+from an xlist. I don't have a separate `base.dynamic.h` though, I just
+put both versions of the function symbols in `base.h` and use the preprocessor
+to select one or the other. This way they can easily have shared type
+definitions and constants.
+
+
+## Conclusion: Symbol Declaration ##
+
+So the symbol declaration problem is really about deciding how to provide the
+declarations that allow us to refer to symbols that get resolved dynamically.
+
+This is only a problem for run-time linking because with load-time linking we
+can just create regular function symbols with some special mark up. It would
+be nice if C had anticipated this and gave us a better way to define these
+run-time resolved symbols with the same basic syntax we use for regular
+function symbols. But alas, that's not how it is.
+
+
+########################### Dynamic Initialization ############################
+
+The dynamic initialization problem is about how to maintain the run-time
+linking code.
+
+Imagine we have a 'base' layer like this:
+
+`base.h`
+
+```
+void base_a(void);
+void base_b(void);
+void base_c(void);
+```
+
+If we just maintain the run-time linking with brute force our run-time linking
+code would look *something* like this:
+
+```
+void base_init(void){
+ Library *library = library_open("base");
+ GET_PROC(base_a, library, "base_a");
+ GET_PROC(base_b, library, "base_b");
+ GET_PROC(base_c, library, "base_c");
+}
+```
+
+The exact details depend on how you've solved the symbol declaration problem
+and on the operating system APIs for loading and linking binaries.
+
+We can easily clean up the maintenance burden of this part with an xlist, or
+by using a single GET_PROC which then passes through a function pointer table
+with the rest of the layer's functions.
+
+The other part of dynamic initialization problem is deciding how we will ensure
+the initialization actually gets done.
+
+For instance, let's look at the layers used in this example 'base' 'plugin' and
+'main'. Both 'plugin' and 'main' are users of 'base'. 'main' is responsible for
+loading 'plugin' if it wants to, and for proceeding gracefully if the 'plugin'
+is missing.
+
+We need to ensure `base_init` gets called in each binary.
+
+## Solution: manual initialization ##
+
+For 'main' we would just say it's the responsibility of the entry point to call
+`base_init`.
+
+For 'plugin' we have two options. The first option is that when 'main' loads
+the 'plugin' layer it is responsible for reaching into the module, finding its
+`base_init` function and calling it. The second option is that the 'plugin'
+module has an on-load entry point that calls `base_init`.
+
+None of these options are "broken" but they do require some extra
+hand shaking and protocol designing between all these binaries.
+
+## Solution: automatic initialization ##
+
+Another option is to have the 'base' layer itself provide the code that does
+all of the initialization automatically for the users of the 'base' layer. In
+order to do this 'base' will need to be able to write something like an
+"on-load hook". A function that gets called automatically when the binary
+loads. The code that defines the 'base' layer will only insert this hook into
+binaries that are trying to run-time link to the 'base' layer definitions.
+
+In the *_before_main examples I show how it is actually possible to do this in
+C, although the details are admittedly sketchy in the case of Windows with the
+CL compiler.
+
+This basically lets us emulate the automatic linking provided by load-time
+linking, but as a downside, it means we have less flexibility about how the
+layer gets loaded.
+
+
+################################## Solution ###################################
+
+The big idea of my solution is to use an xlist to minimize maintenance burden
+without bringing in a whole cloth code generator.
+
+I put all the 'base' layer functions that will be run-time linked into an
+xlist file `base.xlist.h`.
+
+I also put the normal style of symbol definition list in `base.h`. Technically
+I don't need this, I could just generate it from the xlist, but then I don't
+have any "normal" looking version of the function declarations. The xlist is
+highly reusable, suitable for almost every purpose, but it is not very
+readable. Users of my code should be able to just skim some natural looking
+C code with comments and formatting to understand the code they are using.
+
+I setup a function pointer table `BASE_Funcs` so that I only have to export
+one symbol from the 'base' layer implementor. That symbol fills and exposes
+the function pointer table for run-time linking.
+
+When the base layer is included in a binary that is not the implementor the
+'base' layer generates a before-main hook to load the 'base' layer and
+perform the run-time linking.
+
+Thanks to this design neither 'main' nor 'plugin' have to do anything to
+start using the 'base' layer except to include it.
--- a/xlist/base.c
+++ b/xlist/base.c
@ -53,7 +53,7 @@ BEFORE_MAIN(base_dynamic_user_init){
    }
  }
 #elif OS_LINUX
-  void *module = dlopen("./base.so", RTLD_NOW);
+  void *module = dlopen("$ORIGIN/base.so", RTLD_NOW);
  if (module != 0){
    BASE_ExportFuncs *base_export_functions = (BASE_ExportFuncs*)dlsym(module, "base_export_functions");
    if (base_export_functions != 0){
--- a/xlist/base.h
+++ b/xlist/base.h
@ -27,13 +27,15 @@
 #endif


+// before-main abstraction
+
 #if OS_WINDOWS

 # pragma section(".CRT$XCU", read)

 # define BEFORE_MAIN(n) static void n(void); \
 __declspec(allocate(".CRT$XCU"))            \
-__pragma(comment(linker, "/INCLUDE:" #n "__")) \
+__declspec(dllexport)                       \
 void (*n##__)(void) = n;                    \
 static void n(void)

@ -46,10 +48,6 @@ __attribute__((constructor)) static void n(void)
 # error BEFORE_MAIN missing for this OS
 #endif

-// base layer types
-
-typedef void BASE_Library;
-

 // base symbols shared