Monday, November 29, 2010

AppDomains and Shadow Copy

Technorati Tags: ,,

In .NET, applications that are started by a user (or service) will result in file system locks for the corresponding assemblies on disk (since they’re loaded into memory from those locations).  In order for that application to load or start additional applications or plugins without locking the corresponding assemblies, it will need to use a feature of .NET called “Shadow Copying”.  Since assemblies cannot be unloaded from memory when you’re done with them, these additional applications or plugins need to be hosted in their own AppDomain.  While individual assemblies cannot be unloaded from memory while an application is running, you can unload the AppDomains in which those assemblies are loaded with the exception of the primary AppDomain (which only unloads when the application exits).

When an application is started by a user (or service), the CLR automatically creates the default AppDomain and loads the application’s assemblies into that AppDomain.  When the application exits, the main AppDomain is torn down, along with any additional AppDomains that it may have created.

The main application can create additional AppDomains to run other applications or plugins, but great care must be taken not to inadvertently load unwanted assemblies into the main AppDomain.  If the main AppDomain should reference or obtain an instance of a System.Type defined in one of the secondary AppDomains, the CLR will automatically attempt to load the Type’s containing assembly into the main AppDomain.  To avoid this, the type must be created in secondary AppDomains by name (represented as a string), and the only way that the main AppDomain can reference these remote objects is through a transparent proxy or interface already known to the main AppDomain (i.e., defined in an assembly already loaded in the main AppDomain).  Additionally, any Type marshaled across AppDomain boundaries must either be serializable (such as System.String and other types marked as [Serializable]) so that the object’s data can be copied and a new instance containing equivalent data (an exact copy) can be created in the new AppDomain, or inherit from MarshalByRefObject (directly or indirectly) in which case the actual object remains in its original AppDomain and a serializable ObjRef (object reference) is copied to the new AppDomain from which the .NET Remoting infrastructure creates a transparent proxy.  Note that the main AppDomain must also have a reference to the metadata from which it creates transparent proxies and serializable objects, it is not enough to simply mark a class as [Serializable], implement ISerializable, or inherit from MarshalByRefObject.

The practical upshot of this is that at least one assembly, let’s call it “Common”, must be shared between (and loaded by) both the main AppDomain and any other AppDomains that it creates.  Defined in the Common assembly are the interfaces, abstract MarshalByRefObjects, and serializable types that are used to communicate with the secondary AppDomains hosting additional applications and plugins.  The functionality provided by these additional AppDomains is exposed by implementing interfaces (while also inheriting from MarshalByRefObject) or deriving from the abstract MarshalByRefObjects that are defined in Common.  The “abstractness” of the MarshalByRefObjects defined in Common is not strictly necessary, but if the main application provides the concrete functionality in the first place, there’s no benefit (and a number of drawbacks) to hosting it in a secondary AppDomain.

There are basically three methods by which the Common assembly can be shared between the main AppDomain and any secondary AppDomains.

The first method, used by the Framework itself (specifically System.Web and ASP.NET), is to install the Common assembly to the Global Assembly Cache (GAC).  This method allows both the main AppDomain and its secondary AppDomains to load the same types and access the same assembly metadata from a common location known to the Framework and CLR, and eliminates the need to copy the assembly into each secondary AppDomain’s ApplicationBase (the directory from which the secondary application or plugin loads its own assemblies).

The second method, as mentioned, is to copy the Common assembly to the ApplicationBase of each of the secondary AppDomains, so that the CLR’s class loader can locate the assembly when one of the types defined there is loaded into the secondary AppDomain.

The third method is to define the secondary AppDomain’s ApplicationBase to be the same as the ApplicationBase of the main AppDomain, but provide a (relative, child) PrivateBinPath when creating the secondary AppDomain pointing to the directory in which the secondary AppDomain’s assemblies can be found.  Note that the PrivateBinPath must be relative to the main ApplicationBase and must reference a child of the main application’s directory (it cannot be something like “..\Plugins”).  ASP.NET uses the PrivateBinPath in this manner to reference the “bin” directory, while the ApplicationBase references the directory containing the web.config file for the AppDomain’s that it creates, while also referencing the Common information (System.Web) from the GAC.  This kind of AppDomainSetup allows the CLR to load types defined in assemblies from the AppDomain’s ApplicationBase and PrivateBinPath as well as the GAC.

But what about shadow copying?

To get a secondary AppDomain’s assemblies to be shadow copied, and thus prevent the file system from locking those files (at least in their original location) while the application is running, simply set the ShadowCopyFiles property to the string value “true” on the AppDomainSetup object used to create the AppDomain.  This, by itself, is enough to enable shadow copying for the AppDomain created from that AppDomainSetup instance, but there are some additional properties that affect how shadow copying works, which directories are affected, and where shadow-copied assemblies go before they’re loaded.

Assemblies that are shadow copied, by default, go to the CLR’s download cache (some subdirectory of %LOCALAPPDATA%\assembly).  This behavior can be changed and you can specify where you want the assemblies to be copied to by setting the AppDomainSetup.CachePath property.  Setting this property only has an effect if the AppDomainSetup.ApplicationName is also set (the ApplicationName is used to create a subdirectory at CachePath where assemblies will be copied).  The default is to copy all private assemblies available to the application, including those in the ApplicationBase and PrivateBinPath.  Assemblies loaded from the GAC are not copied since they’re available to all applications.  To control which directories are shadow copied (for example, to shadow copy only plugin assemblies when using the third method described above), set the AppDomainSetup.ShadowCopyDirectories to the same value as PrivateBinPath so that only the assemblies in that directory are subject to shadow copying, while the assemblies at ApplicationBase are loaded from their original location (and locked by the file system).

The CLR has no built-in facilities to recycle an AppDomain when the original assemblies have changed on disk.  To achieve an effect similar to ASP.NET’s automatic application recycling, you will need to create the necessary behavior using a FileSystemWatcher watching the original directory or a Timer for periodic application recycling.

Open Question:

Suppose you have a running AppDomain and wish to create a second AppDomain when the original files have changed on disk (before unloading the first AppDomain) so that you can begin servicing new requests in the new AppDomain while any existing requests are still being completed by the old AppDomain before it’s torn down (similar to IIS’s Application Pool recycling behavior).  If you give both AppDomain’s the same ApplicationName and both of them originate from the same location (ApplicationBase and/or PrivateBinPath), will there be a shadow copying conflict when the second AppDomain is created?

Saturday, November 13, 2010

Accessing Subversion over SSL

In a previous post, I was trying to get communication with my Subversion repository secured (somehow...either over SSL or using SVN+SSH).  Well, I finally did that...nearly a year ago...and forgot how.

Now, I have to do it again.  Lesson learned: If you ever have to research how to do something, blog about it so that others can benefit and so you can repeat it.

Instead of using the binaries available on Tigris, this time I'm using the CollabNet package (version 1.4.6).  They distribute a very nice and convenient installer that also includes and configures Apache for you (I de-selected the SVNSERVE option during installation because I don't intend to serve the repository through that channel).

This is a test installation that I'm preparing for a demo (still trying to convince my company to dump VSS, and now TFS, in favor of a system that works for all of the technologies, languages, and IDE's that we use).  I instructed the installer to put Apache on "localhost" at port 8080; after all, I'm a Windows developer who works with Web applications, so I need to keep IIS available on port 80.

When the installer completes, open the Services console and start the Apache daemon.  Browse to http://localhost:8080 as a sanity test just to make sure that Apache is, in fact, running and configured correctly (at this stage, at least - we still need to configure a repository and tell Apache where it is).

Next, create a test repository with the following command (the repository name "test1" isn't important, it will be deleted later once the setup is verified; and the repository path C:\Repositories is the repository root I gave to the CollabNet installer).
svnadmin create C:\Repositories\test1

Now I should be able to access the repository at http://localhost:8080/svn/test1, and "WOO-HOO!!!", I can.  The browser shows me "Revision 0: /", just as it should.

The information found at http://www.neilstuff.com/apache/apache2-ssl-windows.htm (Create Self-Signed Certificate and Enable SSL in Apache 2.0.X) helped me get the SSL working (later I found this as well).  I found that I didn't need to install OpenSSL since the CollabNet distribution came with it (it is in the httpd subdirectory of the installation directory), but I did need an OpenSSL.cnf configuration file (here and here).  I also didn't need to install Apache since that was done by the CollabNet installation.

So now that I am able to access the repository through either http://localhost:8080/svn/test1 or https://localhost/svn/test1, I want to force access through SSL and disallow any access on port 8080.  There's no use serving an encrypted version if the same content will also be served in plaintext.

Edit the httpd/conf/httpd.conf file (relative to the CollabNet installation directory) to contain the following lines:
<Location />
  SSLRequireSSL
</Location>

Now http://localhost:8080 returns Forbidden, but https://localhost gives the Apache installation page.  Great, so everything is encrypted over the wire, but we're still allowing everyone (anonymous) access.  Moving along...

This page gives instructions on setting up the Apache server to use SSPI (Windows) or Basic (password file) authentication, or a combination of both of them working together (e.g., to support internal development staff as well as external contributors for whom you might not want to create Windows accounts).  Note that there is an error in the "Multiple Authentication Sources" section.  The advice given says that both AuthAuthoritative and SSPIAuthoritative should be Off, although the example shown states "SSPIAuthoritative On".  This should, in fact, be "SSPIAuthoritative Off".