SharedCLibrary

Background

The SharedCLibrary is a relatively complicated component. The module provides the C run time system for most of the C applications on RISC OS. The module needs to cope with being shared between the clients, reentrant, with separate static data for every client, able to run legacy application code, module code and of course applications. The module needs to be able to cope with being executed from within USR mode and from SVC mode, and be able to retain the distinction of how the module was initially invoked. Areas of the static workspace are fixed, because they will be directly accessed by applications, and so their layout must remain the same between versions of the module.

The SharedCLibrary itself is just a particular collection of the libraries that make up the C run time system. The other major collection is the ANSILib library, which includes many of the same functions, but works differently because it is not shared, and knows the calling conventions. Additionally, SharedCLibrary may be based in ROM or RAM, which changes how it is constructed.

The legacy interface that the SharedCLibrary supported was APCS-A. This is used by some older binaries, and has a slightly different set of register bindings. That means that all the calls to the SharedCLibrary from such clients must be translated to the APCS-R register bindings before the C library routines can be executed - and vice-versa on return. Similarly, any of the routines that have callback entry points need to have a translation inserted into the call as well. The SharedCLibrary inserts a small veneer into the entry points for those clients. Fortunately there are only a few routines that have callbacks, so the number of internal veneer calls is limited.

But wait, it gets better. If an abort (or assertion, etc) happens, the SharedCLibrary run time system deals with it. In order to work out where the problem was and its backtrace, the module has to know which calling convention was in place at that time. And it has to cope with such a problem occurring whilst the SharedCLibrary was in one of its transition veneers.

Amusingly, whatever you might do, the order of the registers in the 'jmpbuf' structure are fixed as what they were in APCS-A, so any clients which needs to directly manipulate those registers has to use the APCS-A layout. The registers would go to the right place for the calling convention in use, but they are not laid out in memory in the same order that you expected.

Additionally, such aborts, assertions or whatever could happen whilst in SVC mode, so the correct details need to be obtained in those cases, where the USR mode stack isn't in the abort registers. Or an assertion might happen in SVC mode where the client is a module and doesn't have any environment handlers, and therefore has to trigger a slightly different code path. Or the abort may be a floating point problem from the FPEmulator. Or it might be an error which was raised by the FPEmulator (invalid operation, division by zero, overflow, underflow, inexact operation).

Just to make things more fun (if the above description hasn't already scared you), when built for ROM the SharedCLibrary includes RISC_OSLib, and exports symbols which ROM applications can be linked against to save space. The RAM version of SharedCLibrary never includes the RISC_OSLib library, fortunately.

The behaviour of the module has to remain the same, even if it was loaded on an earlier system. Well, that is something of a difficult thing to say. It should remain the same, but because of the complexity involved in the SharedCLibrary, loading a replacement version was strongly discouraged. The ROM version of the library would remain resident if you loaded a second copy of the SharedCLibrary, which meant that anything linked against it would work just fine. However, if you loaded another copy, the second copy (in RAM) would be removed and anything linked against it would break soon after, when the memory it occupied was reused. There are ways around that - you could leave the module resident but unlinked, but that is a memory leak, makes applications difficult to debug and means that you are forced to execute code in an anonymous block of memory (it isn't in a module, and was never allocated by any module).

Together with the SharedCLibrary and ANSILib, there were also the Stubs. The Stubs provided the necessary code to link against SharedCLibrary with veneers to point to the shared code. The Stubs were quite simple really - they registered with the module, and provided a number of entry points to handle certain operations. The Stubs had to work differently depending on whether the code was being invoked as part of a module or as part of an application, but that wasn't too complex as the invoked symbols differed depending on how the code was linked.

StubsG

Pace released a 32bit version of the SharedCLibrary, to support their release of the 32bit development tools. There had been repeated statements by people that you could not develop 32bit code without 32bit tools. This was completely correct. The important point that seemed to pass most developers by was that if they had bought the version 5 compiler, they already had a 32bit compiler. It was trivial to specify the correct APCS options to build 32bit code - and these were described within the documentation.

However, you did need a special version of the C library to work with 32bit code, and that C library would need to work on both 26bit and 32bit systems. And if it ran on a 26bit system it needed to support all the applications which expected the API to be 26bit as well. Pace's SharedCLibrary (later, the one that Castle released) made an attempt at this.

I have been quite vocally opposed to having a SharedCLibrary which is replaced on a system. This might seem at odds with my equally strong views on modularity and that stacks should be replaceable, but it is actually because of those views that I don't feel it is a great plan. As touched upon earlier, loading a replacement module will cause problems when done multiple times. The point of the modularity argument is that it encourages you to trust that restarting parts of the system is safe - and it inherently is not for the SharedCLibrary because of how it works.

Retaining the implication that it is safe to load replacements is merely inviting problems for the user. So, I believe it should be discouraged. There is the argument that if you need to load multiple replacements it will not matter because you reboot regularly, and that is a completely false argument because the safety of the system should never be predicated on the requirement to restart often. Usually the need to restart often is presented because of unsafe practices, so it is not particularly sensible to rely on restarts in order to remain safe.

There is the distribution issue with using replaceable SharedCLibrary modules. If you are reliant on features of a later version of the module, then by implication you have to distribute that module with your module or application. You could rely on the user having the right version, but that places an additional onus on the user. You are also distributing a version of a component that you do not support, and have to ensure that your users do not load multiple versions of that module, and thus cause instability in their system. And there is the blindingly obvious fact that if you are distributing the Shared C Library with your release, you have just lost all its benefits.

Anyhow, the SharedCLibrary provides all the functionality clients will require for most applications which use the standard interfaces. The fact that they expect a different calling convention is not insurmountable, and can be addressed relatively easily through the use of veneers. My alternative solution for the problem of producing 32bit applications was to create the generic stubs 'StubsG'. The 'StubsG' library was similar to the main Stubs, but was intended for use with applications and modules built to be 32bit safe.

When run on a system which had a 26bit SharedCLibrary, it would install adapter veneers to address the calling convention differences. When run on a system which had a 32bit SharedCLibrary, it would just register normally. Because some functions could not easily have veneers installed between them, these were provided statically. I seem to recall that '*printf' and the 'qsort'/'bsearch' operations were statically linked because of differences. There might have been others, but I forget.

There were a couple of slight differences in the use of functions in the release. In their wisdom, Castle had modified the 'assert' macro to call a different function, which took extra parameters. Useful though that is, the standard 'assert' entry does not support the extra parameters, so in order to not have to duplicate a complicated section of the exception handling (which I couldn't see how to do reliably on all the SharedCLibrary versions that were available) the call was reduced to its simpler form, without the extra parameters.

The StubsG distribution included 32bit versions of all of the libraries and headers which had previously been distributed (RISC_OSLib, Toolbox, Internet libraries and a few others), along with updated documentation. Because the library supports everything that the released SharedCLibrary provides, it can be used as a drop in replacement, supporting all the C99 features that had been added to the later version of the library (and which I will talk more in a section or two).

The advantage of the StubsG library - other than the fact that it does not encourage unsafe operations - was that your application worked without anything extra being required to run. You didn't need to tell the user to install additional modules, or to worry about checking which particular version of the C library you were running with.

I believed this to be a better and more reliable way to encourage the upgrading applications. A few people agreed, but in general it wasn't a success. There were some quite vocal individuals who did not care that there was a greater onus being placed on developers, that they were encouraging a practice which was unsafe or even the simple issue that the distributions of applications became larger. That is their right, I guess, and I can't help it if they are wrong <laugh>.

Separation of functionality

Originally the SharedCLibrary, the C library, the run time system, ANSILib, Stubs, and RISC_OSLib were all lumped into one complicated distribution that produced a different targets depending on how it was invoked. This was not ideal (that is, it was a really stupid way to work) and made it very difficult to maintain, especially since the inclusion of the completely unrelated RISC_OSLib meant that changes were required because of updates to !Draw or !Paint.

I separated out the components into distinct directories, each with their own build and exports. This meant that, for example, the C library could export itself just like any other library. ANSILib became just a simple section of header that pulled in parts of the exported C library and run time system. The SharedCLibrary did a similar thing, to bring in the C library, run time system (and RISC_OSLib if built for ROM).

The separation also meant that the 'ARMProf' library (which was part of the profiling which the Norcroft compiler could build in to the application) was able to become a component in its own right, rather then being dealt with as part of the ANSILib. The library was not as useful as it might be - it counted the number of calls that function, or blocks of code, had during an execution - but it worked pretty well and was useful in some circumstances if you didn't know as much about how your code was used.

Similarly, the C++ library could be grouped together with these libraries and exported with them. The library was only used for CFront based builds, but was necessary for any C++ code built with it.

There was a separation of the components that went into the SharedCLibrary. The lowest level was the 'RTSK' - the Run Time System Kernel. This provided the basic initialisation of the environment (because the SharedCLibrary has to set up environment handlers and the like) and managed the language entry points. The 'RTSK' was intended to be able to handle multiple different languages. For example, a Pascal compiler had a different language library to that used by C, but would still use the same 'RTSK' to manage the environment, and would be able to be used with C code in the same program. It was all a pretty good idea, and didn't really get used.

Above this, there was the C library itself, which used the RTSK for many of its operations. RTSK symbols were compiler/library based, so began with a single underscore '_', whereas the C library symbols were not prefixed. Calling conventions for the C library were heavily based around APCS, whereas the calling conventions for some of the RTSK routines were not as tightly controlled.

Above the C library would be RISC_OSLib, when SharedCLibrary was built into a ROM module - but most of the time you could ignore it. The distinction of the layers matters only when you are thinking about what relies on what - you cannot call from the RTSK into the core C library, but the opposite is quite acceptable.

As the components had been separated, and the version of (for example) the SharedCLibrary module would not indicate the underlying version of the components, I introduced new system variables ('SharedCLibrary$Version$library') to indicate the version of the components within the library. This should allow applications to detect the different versions of the internal components of the library as necessary.

Similarly, a program could itself read the versions of the library through the '_clib_version' entry point. This was extended slightly to provide the versions of each of the components, separated by newlines. The format was explicitly defined in the documentation, so it would be parsable in the future.

Because Pace had changed the behaviour of the 'jmpbuf', I had a choice about whether to do so as well, or not. Since making that change would break every single application that manipulated the structure, it was an easy choice to not do so - despite this making the module vary from that which Pace had produced. Simple choice - break all existing applications to match how they had broken things, or keep things working and ensure that there is a way in which to distinguish the behaviour.

The !Oregano browser had been updated to match the behaviour of the Pace implementation, which meant that it broke when run with the legacy implementation. There was another choice; change the use of 'jmpbuf' depending on whether the code was invoked in 32bit or 26bit modes. This option would have fixed the run time problems with !Oregano but would have meant that recompiling perfectly working plain C code from 26bit to 32bit would no longer work - which didn't seem acceptable.

My decision there was instead to find a way of indicating that the behaviour had changed. The '_clib_version' string gave a good way to do so. In the earlier versions, the string would be something like 'Shared C Library vsn 4.67/R [Mar 17 1997]'. The '/R' indicated that the module supported the APCS-R register bindings. This had (I believe) been dropped by the Pace C library, which made it useful as an indication of the support. So - confusingly (as if it wasn't already confusing enough) the inclusion of the '/R' could be taken to mean that the 'jmpbuf' used the APCS-A register order - and when absent, the order was that of the APCS-R registers. Of course, if Pace had just reverted the change so that everything worked the same it would have made things a lot easier.

As usual, I passed these details, and the details of the system variables to Castle so that their implementation could be aligned.

C99

Before we get into the need for the 32bit libraries, there is a secondary issue with the later versions of the SharedCLibrary which Pace made available - it supports the C99 specification, which means that there are more functions and some of the operations are slightly different. Starting with the extra functionality provided by C99, there are two different layers to them. There are the functions that the C library uses to implement the fundamental requirements of the standard, and the functions that are specific to C.

The former are, logically, reusable by other environments, not specific to how the C compiler works. The latter only make sense within the C language. As such, I labelled them 'C99Low' and 'C99'. I did wonder whether it might have been better to use a name like 'Kernel99', as they extended the RTSK routines which had been labelled as 'Kernel', but didn't really worry that much about it.

Anyhow, the former provides functions for dynamic stack extension, and operations on 64bit integers. The latter provides many of the new arithmetic functions, string operations, and a variety of miscellaneous calls that were applicable only to C99. The separation of these functions was made by Pace, so I just had to follow their lead - I might not have put things like 64bit division in the higher level library, but I couldn't make that call.

The calling conventions for these routines were sometimes slightly different to that of APCS, so when the APIs were documented they needed to state what the conventions were. I am pretty sure that Pace hadn't actually documented the API calls at all, and the conventions mattered a lot if you were producing the SharedCLibrary which had to write veneers to match the correct API.

Obviously the functions for C99 operations had to be implemented. In many cases they were quite simple functions, and having the Pace reference to work against helped a little. It also showed up a number of bugs in their implementation - which were reported (to Castle at that time), and which were on occasion fixed. I am pretty sure that at least '_Exit', the 'jmpbuf' handling which was broken, and the terrifying problems with 26bit modules running with their 32bit module were never addressed, but my knowledge is now over 6 years old, so I imagine that they could have addressed those issues. I can imagine some pretty outlandish things if I try.

The issues that I had found whilst comparing the implementations prompted me to institute checks to ensure that the replacement module was not loaded. In retrospect it was a big mistake, but I guess we have to learn the hard way. Initially the module was prevented from running, but this was a bad move. Eventually, I settled on the solution of just raising an error box (out of line, so that it didn't prevent the operation) to warn users when a dangerous version of the SharedCLibrary was loaded. I doubt anyone took it that seriously because of the previous behaviour to disable the module - and my continued statements that loading the module was unwise. Oh well.

In any case, there were functions like the printf family which had changed to allow a number of new types. These had to be extended so that we could (for example) print the 64bit integers and the like. In particular the 64bit integer support was optimised so that it would only actually take place in 64bit if there was a need to. As calculations on these larger values were more complex, they could take longer to perform so if they could just use the 32bit variants they would. I don't know if it was a worthwhile improvement, but it certainly made me feel a lot better about the code.

32 bit

Actually making the code 32bit safe was a lot of fun, and involved a lot of testing in many different environments. When built for a 26bit environment, the module supports 26bit (application and module), 32bit (application and module) and APCS-A clients. When built for a 32bit environment, the module supports only 32bit clients. In all cases (except APCS-A) the module has to work in both USR and SVC mode. So, in the 26bit environment, the module generates veneers for use with the 32bit clients so that the clients operate correctly despite the calling conventions differing - this is similar to the extant interfaces for the APCS-A handling.

This is made more fun by the fact that some of the newer C99 functions had different calling conventions, and the generated veneers had to be special cased so that they were generated properly. The '32bit on 26bit' case is the most complicated to implement. The number of different combinations multiplies up to make testing very difficult in those cases.

Many of the lower level operations in the Kernel handling had to be updated to make them work correctly on 32bit, where flag preservation is a little more fiddly - you have to explicitly preserve flags on entry, rather than their being implicitly preserved in the program counter.

I remember also having fun when I over zealously converted some code wrongly whilst working on some of the routines. Assuming that I had made a flag preservation error, or maybe unbalanced a stack, it took an hour or so to spot that I'd incorrectly read an instruction that used 'r1,v1,#PSRBits' as 'r1,r1,#PSRBits' - notice the use of 'v1' in the former, that being a register binding for 'r4'.

Rationalisation

Once the SharedCLibrary was working in both 32bit and 26bit modes, and with 32bit components in 26bit modes, and in USR and SVC modes, and in ROM and RAM, there were some other things the library sane. The testing for the above cases was really not all that fun and the amount of time I spent in the abort handlers in each of the modes, trying to check that everything still worked, is not worth recounting. But there were other things to do. Other exciting things.

Like removing the support for the BBC and Brazil from the library - there was still support in the module for the legacy systems which wasn't actually doing much harm but... well, it was 2006, and maybe time to let go of the BBC. If I were to build the C library for the BBC I think I might have other issues to contend with. Similarly, the clock was updated to use the SWI OS_ReadMonotonicTime call, rather than the BBC interval timer. Just thinking about it now, I am amazed it survived that long!

Error handling

Error handling was improved so that it understood more about the desktop. If the C library could detect that we were in a safe desktop state, it would use a Wimp error box to report errors, rather than just printing the message to a text window. The error box added a 'Postmortem' button so that the extra information can be obtained. Although the idea came from a similar use that Pace's SharedCLibrary had; the way it was implemented was significantly different.

The library would try to place the system into a safe state, disabling any print jobs, restoring output to the screen and disabling any special redirection. These were significant improvements for cases that could cause hangs, or crashes when an application reported an error.

The postmortem itself was also updated. The formatting was tidied up significantly, and the output of pointers made much clearer. Services are issued to indicate that a postmortem is required, so that other modules can provide more information if they know how. The idea was that either a debugger might be triggered at that point, a more clever backtrace could be produced, or even a crash dump being emailed to the author. The service was used by the DiagnosticDump module to produce a dump file which you could manually send on (see the earlier rambles about Debugging).

The backtrace dump would be more careful about what memory it read. As new calls (SWI OS_Memory 24) had been added to check memory accesses with more care than the old call (SWI OS_ValidateAddress), we could be safer about the writing of the backtrace. Any addresses that pointed at memory that existed but was memory mapped registers would no longer be used as part of the backtrace dump.

As well as the backtrace, I also added a register dump for the point at which the failure occurred, which might help isolate problems. It seemed like a bit of an oversight in the original implementation. The internal stack unwind code was a little amusing in that it would never have unwound the stack properly - it still believed that the registers were using APCS-A bindings, so didn't return useful registers whilst performing the unwind. It's a little amusing that it wasn't spotted as a bug long before, but that was fixed along the way as well.

The function names that are printed when during the backtrace got special treatment through a service call. The CFrontDemangler module picked up the service and would decode any C++ mangled function names into their more readable forms. That made the backtraces from CFront compiled code a little easier to understand at least. When !Flash crashed, you could vaguely understand what method it was in <smile>.

Two of the fatal errors that would have caused the SharedCLibrary to exit without any useful information were also improved. When a duplicate free() call was made, and this was explicitly detected as such, a regular error and backtrace would be produced - in general such cases were safe to report with a backtrace, so I decided that it would be more useful to produce the extra detail there. If there was some other corruption, the system would catch it during the handling, and that had also been tested quite a bit.

The other fatal error was stack exhaustion - which would previously result in a simple message saying that the stack overflowed, and no further information. By keeping a special stack exhaustion area available, the library ensured that it could always handle such a case and produce a useful backtrace. Invariably such backtraces were huge, because there was a large recursion in them, but at the very least the information would be presented and might aid in debugging. Obviously, if there had been no backtrace it would have not been so helpful. The size of the stack that was used was based on the initial stack size that the application requested, which meant that it could end up being a lot larger than you might expect, but it seemed like a reasonable choice.

The change in behaviour for stack exhaustion meant that you could actually write SIGSTAK handlers, where previously they would be very unreliable, and you would have to use longjmp almost immediately in order to be safe - an operation that would most likely destroy any hope of recovering the backtrace.

Sub-task invocation

As mentioned in the earlier Program Environment ramble, the sub-task invocation using system() was completely rewritten. The old way of handling the shifts in memory were so unreliable that it wasn't safe to close TaskWindows whilst programs were running. Similarly, pressing Escape whilst the shift was in place was made a lot safer as well. There was a lot of messing around testing the different cases for the code to move memory around.

Significantly, during the new operations, interrupts were always left enabled, which meant that just invoking another program wouldn't cause the system to stall for a moment or two whilst memory was shifted around. Really applications shouldn't need to mess with the interrupt state - they should be actively prevented from doing so - so this was a lot better.

A lot of the debug involved inserting instructions that would cause the code to break at critical points, and then checking that the application and system recovered successfully. I am pretty certain that I covered all the cases, whether triggered by aborts, interrupts, Escape or a rogue SWI OS_Exit (as would be used by the WindowManager if you were to terminate the application).