Tuesday, December 6, 2016

fork() on Windows?

See also:
   http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
   http://www.evanjones.ca/fork-is-dangerous.html

 Perhaps I need not say anything more.

 People often ask me about fork() on Windows.


 It has always been there.


 It is documented in "Inside Windows NT" that the NT kernel developers
  were faced with providing a kernel that could support OS/2, Posix, Win16, and Win32.
  Rather than implement four things, or various special cases, they implemented
  essentially the union of what they needed. [citation needed].

  (For example, where Posix wanted case sensitivity and the others did not, case sensitivity is a flag when opening objects. Another example, that I lack details on, is unusual OS/2 "mutex" properties, that lead to the NT kernel "mutex" being called a "mutant", since it supports OS/2's requirements.)

 Therefore, fork has been there all along.

 The Posix subsystem used it.

 The later Interix/SFU/SUA product used it. (I ported some code both to Cygwin and Interix, and fast/reasonable fork() made the Interix port much more pleasent to use.)

 No doubt the current WSL Windows Subsystem for Linux uses it.

 So, great, it is there. I can just use it?

 Not so fast.

 There are problems with using it.
 So Windows isn't very good or something?
 People are quick to jump to strong negative conclusions.
 Just what are the problems?

 There are at least two problems.
 The first kind is that it isn't exposed via a public API.

 But that isn't the larger problem.

 The larger problem is that in my opinion, to large amounts of reasonable programmers' and code,
 the semantics of fork() are wierd, and things won't work.

 Again, we wonder, Bad programmers? Bad assumptions? Bad code?
 I don't think so.


 Backing up for a short time.
 Consider that most uses of fork are fork + exec.
 This is also known as spawn or posix_spawn or CreateProcess.
 This works fine -- just jump ahead to spawn/CreateProcess.
 If you call fork, and you are going to call exec, the system
 can't know if you if you are going to call exec or not.


 This leaves us with, what is fork without exec for?
 It is for "forking servers", that could be multi-threaded servers,
 but might want to trade performance for separation/reliablity.

 It is for threads prior to the existance of pthreads. Use pthreads.


 It is, I suspect, for creating unused children to diagnose the parent.
 For example, dumping what files it has open.
 "Reliable walk of handle table of live process is racy. Reliable walk of handle table of non-executing forked child is not."

 So, I claim, fork without exec does have some uses cases, but
 the overwhelming use is fork + exec, which you can do otherwise.


 Ok, going back to what is wrong with fork.


 Let's imagine -- this isn't true, but let's imagine, that getpid
 or GetCurrentProcessId() is slow. This is approximately two instructions
 on Windows -- disassemble kernel32!GetCurrentProcessId. But let's assume
 the opposite. This is a valid though exercise, as it leads us into
 real world patterns.



 So, we want to mitigate the assumed slowness of GetCurrentProcessId().


 We write:

   static int fast_pid;

   int FastGetCurrentProcessId()
   {
      int pid = fast_pid;
      if (pid == 0)
         fast_pid = pid = GetCurrentProcessId(); // incorrectly assumed slow
      return pid;
   }

 Note: This is correct and thread safe and everything.
 It assumes a pid of 0 isn't valid, which is incorrect, but it works anyway, just slower, for pid == 0.
 In a race conditon where FastGetCurrentProcessId is called at about the same
 time on multiple threads, that is ok, they all do the same thing, just all "slowly"..


 Now, enter fork().

 In the face of fork(), this function is incorrect.
 Its cache of the pid will  survive fork() and be incorrect.


 Now, while this function is ridiculous, in optimizing something that isn't slow, it is a representative of a reasonable pattern:
   Compute something once; cache it; the result might be process-dependent.
 (And by the way the result might be thread or other-dependent, and still the cache
 will be used across later calls -- be aware and be careful.)


The more general thing is the assumption that a "process"'s globals
are "initialized" "as expected" at the start of the "process" (or when a dll loads, rather).
For some definition of "process".

This assumption, which is a reasonable and useful idea to bake into your
mental model, permeating much of your code, is violated by fork().

But, it gets worse.

If you read various man pages and standards, you discover more problems that fork()
brings to the unsuspecting coder. [citation needed]

Apparently at some point, a question was raised as to how fork() interacts with pthread_create_thread.
Does the new process contain just one thread, that called fork(), or all the threads?

The decision was apparently so unclear that Solaris has fork1 and forkall, so both options
are available to the explicit caller, and fork()'s behavior has varied between them
depending on operating system version (changed in 5.10 to be fork1()).


The Posix standard mandates "fork1".
[citation needed -- try "Open Group Posix standard fork", and for that matter,
read their documentation of pthread_atfork and see the references to forkall]


But wait, there is more.


These days, we have threads, we use them.
Arbitrary libraries loaded into our process might create worker threads on their behalf.
This is a fine thing for them to do. It is a buried implementation detail.


And then you come along and fork. What happens?


It depends. What were the worker threads in the middle of doing?
Did they hold any locks?


Lock holding may or may not be inherited by the new process, but only one thread
will be inherited. So inheriting held locks leads typically to deadlock.
The thread holding the lock is gone and will never release it.


Apple's documentation covers this problem, in a fashion, by saying
that if you fork without exec, and use almost any of their libraries,
it is unsupported. [citation needed]


Posix covers this also in a fashion.
They provide the function pthread_atfork. [citation needed]
This takes three callback functions -- ParentBeforeFork, ParentAfterFork, ChildAfterFork.


The idea is, in the parent before the fork, acquire all locks (blocking).
In the parent and child after the fork, release all locks.
And in child reinitialize globals such as "fast_pid" to zero?


So apparently the problem is solved, but Apple did not bother to use the solution.
Who else does not?


So, in answer to the question: Yes, Windows has fork. You can't really use it,
and you don't really want to. On systems that leave it for you to use, be careful.