Terminology and Concepts

In this chapter we attempt to establish a common terminology to define a solid ground for communicating about concurrent, distributed systems which Akka.NET targets. Please note that, for many of these terms, there is no single agreed definition. We simply seek to give working definitions that will be used in the scope of the Akka.NET documentation.

Concurrency vs. Parallelism

Concurrency and parallelism are related concepts, but there are small differences. Concurrency means that two or more tasks are making progress even though they might not be executing simultaneously. This can for example be realized with time slicing where parts of tasks are executed sequentially and mixed with parts of other tasks. Parallelism on the other hand arise when the execution can be truly simultaneous.

Concurrency

Parallelism

Asynchronous vs. Synchronous

A method call is considered synchronous if the caller cannot make progress until the method returns a value or throws an exception. On the other hand, an asynchronous call allows the caller to progress after a finite number of steps, and the completion of the method may be signalled via some additional mechanism (it might be a registered callback, a Future, or a message).

A synchronous API may use blocking to implement synchrony, but this is not a necessity. A very CPU intensive task might give a similar behavior as blocking. In general, it is preferred to use asynchronous APIs, as they guarantee that the system is able to progress. Actors are asynchronous by nature: an actor can progress after a message send without waiting for the actual delivery to happen.

Non-blocking vs. Blocking

We talk about blocking if the delay of one thread can indefinitely delay some of the other threads. A good example is a resource which can be used exclusively by one thread using mutual exclusion. If a thread holds on to the resource indefinitely (for example accidentally running an infinite loop) other threads waiting on the resource can not progress. In contrast, non-blocking means that no thread is able to indefinitely delay others.

Non-blocking operations are preferred to blocking ones, as the overall progress of the system is not trivially guaranteed when it contains blocking operations.

Deadlock vs. Starvation vs. Live-lock

Deadlock arises when several participants are waiting on each other to reach a specific state to be able to progress. As none of them can progress without some other participant to reach a certain state (a "Catch-22" problem) all affected subsystems stall. Deadlock is closely related to blocking, as it is necessary that a participant thread be able to delay the progression of other threads indefinitely.

In the case of deadlock, no participants can make progress, while in contrast Starvation happens, when there are participants that can make progress, but there might be one or more that cannot. Typical scenario is the case of a naive scheduling algorithm that always selects high-priority tasks over low-priority ones. If the number of incoming high-priority tasks is constantly high enough, no low-priority ones will be ever finished.

Livelock is similar to deadlock as none of the participants make progress. The difference though is that instead of being frozen in a state of waiting for others to progress, the participants continuously change their state. An example scenario when two participants have two identical resources available. They each try to get the resource, but they also check if the other needs the resource, too. If the resource is requested by the other participant, they try to get the other instance of the resource. In the unfortunate case it might happen that the two participants "bounce" between the two resources, never acquiring it, but always yielding to the other.

Race Condition

We call it a Race condition when an assumption about the ordering of a set of events might be violated by external non-deterministic effects. Race conditions often arise when multiple threads have a shared mutable state, and the operations of thread on the state might be interleaved causing unexpected behavior. While this is a common case, shared state is not necessary to have race conditions. One example could be a client sending unordered packets (e.g UDP datagrams) P1, P2 to a server. As the packets might potentially travel via different network routes, it is possible that the server receives P2 first and P1 afterwards. If the messages contain no information about their sending order it is impossible to determine by the server that they were sent in a different order. Depending on the meaning of the packets this can cause race conditions.

Note

The only guarantee that Akka.NET provides about messages sent between a given pair of actors is that their order is always preserved. see Message Delivery Reliability

Non-blocking Guarantees (Progress Conditions)

As discussed in the previous sections blocking is undesirable for several reasons, including the dangers of deadlocks and reduced throughput in the system. In the following sections we discuss various non-blocking properties with different strength.

Wait-freedom

A method is wait-free if every call is guaranteed to finish in a finite number of steps. If a method is bounded wait-free then the number of steps has a finite upper bound.

From this definition it follows that wait-free methods are never blocking, therefore deadlock can not happen. Additionally, as each participant can progress after a finite number of steps (when the call finishes), wait-free methods are free of starvation.

Lock-freedom

Lock-freedom is a weaker property than wait-freedom. In the case of lock-free calls, infinitely often some method finishes in a finite number of steps. This definition implies that no deadlock is possible for lock-free calls. On the other hand, the guarantee that some call finishes in a finite number of steps is not enough to guarantee that all of them eventually finish. In other words, lock-freedom is not enough to guarantee the lack of starvation.

Obstruction-freedom

Obstruction-freedom is the weakest non-blocking guarantee discussed here. A method is called obstruction-free if there is a point in time after which it executes in isolation (other threads make no steps, e.g.: become suspended), it finishes in a bounded number of steps. All lock-free objects are obstruction-free, but the opposite is generally not true.

Optimistic concurrency control (OCC) methods are usually obstruction-free. The OCC approach is that every participant tries to execute its operation on the shared object, but if a participant detects conflicts from others, it rolls back the modifications, and tries again according to some schedule. If there is a point in time, where one of the participants is the only one trying, the operation will succeed.

Actor Systems

Actors are objects which encapsulate state and behavior, they communicate exclusively by exchanging messageswhich are placed into the recipient’s mailbox. In a sense, actors are the most stringent form of object-oriented programming, but it serves better to view them as persons: while modeling a solution with actors, envision a group of people and assign sub-tasks to them, arrange their functions into an organizational structure and think about how to escalate failure (all with the benefit of not actually dealing with people, which means that we need not concern ourselves with their emotional state or moral issues). The result can then serve as a mental scaffolding for building the software implementation.

Note An ActorSystem is a heavyweight structure that will allocate 1...N Threads, so create one per logical application.

Hierarchical Structure

Like in an economic organization, actors naturally form hierarchies. One actor, which is to oversee a certain function in the program might want to split up its task into smaller, more manageable pieces. For this purpose it starts child actors which it supervises. While the details of supervision are explained here, we shall concentrate on the underlying concepts in this section. The only prerequisite is to know that each actor has exactly one supervisor, which is the actor that created it.

The quintessential feature of actor systems is that tasks are split up and delegated until they become small enough to be handled in one piece. In doing so, not only is the task itself clearly structured, but the resulting actors can be reasoned about in terms of which messages they should process, how they should react normally and how failure should be handled. If one actor does not have the means for dealing with a certain situation, it sends a corresponding failure message to its supervisor, asking for help. The recursive structure then allows to handle failure at the right level.

Compare this to layered software design which easily devolves into defensive programming with the aim of not leaking any failure out: if the problem is communicated to the right person, a better solution can be found than if trying to keep everything “under the carpet”.

Now, the difficulty in designing such a system is how to decide who should supervise what. There is of course no single best solution, but there are a few guidelines which might be helpful:

  • If one actor manages the work another actor is doing, e.g. by passing on sub-tasks, then the manager should supervise the child. The reason is that the manager knows which kind of failures are expected and how to handle them.
  • If one actor carries very important data (i.e. its state shall not be lost if avoidable), this actor should source out any possibly dangerous sub-tasks to children it supervises and handle failures of these children as appropriate. Depending on the nature of the requests, it may be best to create a new child for each request, which simplifies state management for collecting the replies. This is known as the “Error Kernel Pattern” from Erlang.
  • If one actor depends on another actor for carrying out its duty, it should watch that other actor’s liveness and act upon receiving a termination notice. This is different from supervision, as the watching party has no influence on the supervisor strategy, and it should be noted that a functional dependency alone is not a criterion for deciding where to place a certain child actor in the hierarchy. There are of course always exceptions to these rules, but no matter whether you follow the rules or break them, you should always have a reason.

Configuration Container

The actor system as a collaborating ensemble of actors is the natural unit for managing shared facilities like scheduling services, configuration, logging, etc. Several actor systems with different configuration may co-exist within the same runtime without problems, there is no global shared state within Akka.NET itself. Couple this with the transparent communication between actor systems—within one node or across a network connection—to see that actor systems themselves can be used as building blocks in a functional hierarchy.

Actor Best Practices

  1. Actors should be like nice co-workers: do their job efficiently without bothering everyone else needlessly and avoid hogging resources. Translated to programming this means to process events and generate responses (or more requests) in an event-driven manner. Actors should not block (i.e. passively wait while occupying a Thread) on some external entity—which might be a lock, a network socket, etc.—unless it is unavoidable; in the latter case see below.
  2. Do not pass mutable objects between actors. In order to ensure that, prefer immutable messages. If the encapsulation of actors is broken by exposing their mutable state to the outside, you are back in normal .NET concurrency land with all the drawbacks.
  3. Actors are made to be containers for behavior and state, embracing this means to not routinely send behavior within messages. One of the risks is to accidentally share mutable state between actors, and this violation of the actor model unfortunately breaks all the properties which make programming in actors such a nice experience.
  4. Top-level actors are the innermost part of your Error Kernel, so create them sparingly and prefer truly hierarchical systems. This has benefits with respect to fault-handling (both considering the granularity of configuration and the performance) and it also reduces the strain on the guardian actor, which is a single point of contention if over-used.

Blocking Needs Careful Management

In some cases it is unavoidable to do blocking operations, i.e. to put a thread to sleep for an indeterminate time, waiting for an external event to occur. Examples are legacy RDBMS drivers or messaging APIs, and the underlying reason is typically that (network) I/O occurs under the covers. When facing this, you may be tempted to just wrap the blocking call inside a Future and work with that instead, but this strategy is too simple: you are quite likely to find bottlenecks or run out of memory or threads when the application runs under increased load.

The non-exhaustive list of adequate solutions to the “blocking problem” includes the following suggestions:

  • Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either dedicated for this purpose or sufficiently sized.
  • Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded number of tasks of this nature will exhaust your memory or thread limits).
  • Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the hardware on which the application runs.
  • Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they occur as actor messages.

The first possibility is especially well-suited for resources which are single-threaded in nature, like database handles which traditionally can only execute one outstanding query at a time and use internal synchronization to ensure this. A common pattern is to create a router for N actors, each of which wraps a single DB connection and handles queries as sent to the router. The number N must then be tuned for maximum throughput, which will vary depending on which DBMS is deployed on what hardware.

Note Configuring thread pools is a task best delegated to Akka, simply configure in the application.conf and instantiate through an ActorSystem.

What you should not concern yourself with

An actor system manages the resources it is configured to use in order to run the actors which it contains. There may be millions of actors within one such system, after all the mantra is to view them as abundant and they weigh in at an overhead of only roughly 300 bytes per instance. Naturally, the exact order in which messages are processed in large systems is not controllable by the application author, but this is also not intended. Take a step back and relax while Akka.NET does the heavy lifting under the hood.

Actors

The previous section about Actor Systems explained how actors form hierarchies and are the smallest unit when building an application. This section looks at one such actor in isolation, explaining the concepts you encounter while implementing it. For a more in depth reference with all the details please refer to F# API or C# API.

An actor is a container for State, Behavior, a Mailbox, Children and a Supervisor Strategy. All of this is encapsulated behind an Actor Reference(ActorRef).

Actor Reference

As detailed below, an actor object needs to be shielded from the outside in order to benefit from the actor model. Therefore, actors are represented to the outside using actor references, which are objects that can be passed around freely and without restriction. This split into inner and outer object enables transparency for all the desired operations: restarting an actor without needing to update references elsewhere, placing the actual actor object on remote hosts, sending messages to actors in completely different applications. But the most important aspect is that it is not possible to look inside an actor and get hold of its state from the outside, unless the actor unwisely publishes this information itself.

State

Actor objects will typically contain some variables which reflect possible states the actor may be in. This can be an explicit state machine (e.g. using the FSM module), or it could be a counter, set of listeners, pending requests, etc. These data are what make an actor valuable, and they must be protected from corruption by other actors. The good news is that Akka.NET actors conceptually each have their own light-weight thread, which is completely shielded from the rest of the system. This means that instead of having to synchronize access using locks you can just write your actor code without worrying about concurrency at all.

Behind the scenes Akka.NET will run sets of actors on sets of real threads, where typically many actors share one thread, and subsequent invocations of one actor may end up being processed on different threads. Akka.NET ensures that this implementation detail does not affect the single-threadedness of handling the actor’s state.

Because the internal state is vital to an actor’s operations, having inconsistent state is fatal. Thus, when the actor fails and is restarted by its supervisor, the state will be created from scratch, like upon first creating the actor. This is to enable the ability of self-healing of the system.

Optionally, an actor's state can be automatically recovered to the state before a restart by persisting received messages and replaying them after restart (see Persistence).

Behavior

Every time a message is processed, it is matched against the current behavior of the actor. Behavior means a function which defines the actions to be taken in reaction to the message at that point in time, say forward a request if the client is authorized, deny it otherwise. This behavior may change over time, e.g. because different clients obtain authorization over time, or because the actor may go into an “out-of-service” mode and later come back. These changes are achieved by either encoding them in state variables which are read from the behavior logic, or the function itself may be swapped out at runtime, see the become and unbecome operations. However, the initial behavior defined during construction of the actor object is special in the sense that a restart of the actor will reset its behavior to this initial one.

Mailbox

An actor’s purpose is the processing of messages, and these messages were sent to the actor from other actors (or from outside the actor system). The piece which connects sender and receiver is the actor’s mailbox: each actor has exactly one mailbox to which all senders enqueue their messages. Enqueuing happens in the time-order of send operations, which means that messages sent from different actors may not have a defined order at runtime due to the apparent randomness of distributing actors across threads. Sending multiple messages to the same target from the same actor, on the other hand, will enqueue them in the same order.

There are different mailbox implementations to choose from, the default being a FIFO: the order of the messages processed by the actor matches the order in which they were enqueued. This is usually a good default, but applications may need to prioritize some messages over others. In this case, a priority mailbox will enqueue not always at the end but at a position as given by the message priority, which might even be at the front. While using such a queue, the order of messages processed will naturally be defined by the queue’s algorithm and in general not be FIFO.

An important feature in which Akka.NET differs from some other actor model implementations is that the current behavior must always handle the next dequeued message, there is no scanning the mailbox for the next matching one. Failure to handle a message will typically be treated as a failure, unless this behavior is overridden.

Children

Each actor is potentially a supervisor: if it creates children for delegating sub-tasks, it will automatically supervise them. The list of children is maintained within the actor’s context and the actor has access to it. Modifications to the list are done by creating (Context.ActorOf(...)) or stopping (Context.Stop(child)) children and these actions are reflected immediately. The actual creation and termination actions happen behind the scenes in an asynchronous way, so they do not “block” their supervisor.

Supervisor Strategy

The final piece of an actor is its strategy for handling faults of its children. Fault handling is then done transparently by Akka, applying one of the strategies described in Supervision and Monitoring for each incoming failure. As this strategy is fundamental to how an actor system is structured, it cannot be changed once an actor has been created.

Considering that there is only one such strategy for each actor, this means that if different strategies apply to the various children of an actor, the children should be grouped beneath intermediate supervisors with matching strategies, preferring once more the structuring of actor systems according to the splitting of tasks into sub-tasks.

When an Actor Terminates

Once an actor terminates, i.e. fails in a way which is not handled by a restart, stops itself or is stopped by its supervisor, it will free up its resources, draining all remaining messages from its mailbox into the system’s “dead letter mailbox” which will forward them to the EventStream as DeadLetters. The mailbox is then replaced within the actor reference with a system mailbox, redirecting all new messages to the EventStream as DeadLetters. This is done on a best effort basis, though, so do not rely on it in order to construct “guaranteed delivery”.

The reason for not just silently dumping the messages was inspired by our tests: we register the TestEventListener on the event bus to which the dead letters are forwarded, and that will log a warning for every dead letter received—this has been very helpful for deciphering test failures more quickly. It is conceivable that this feature may also be of use for other purposes.

Messages

One of the most fundamental concepts to the Actor model is the notion of "message-driven systems," as defined by the Reactive Manifesto:

A message is an item of data that is sent to a specific destination. An event is a signal emitted by a component upon reaching a given state. In a message-driven system addressable recipients await the arrival of messages and react to them, otherwise lying dormant.

Message-passing is how Akka.NET actors communicate with each other in Akka.NET.

Messages are immutable

One major design constraint that you, the Akka.NET user must enforce throughout your code is guaranteeing that your message classes are immutable objects.

Quoted from Beyond HTTP: "What is an Actor?"

So what's an "immutable" object?

An immutable object is an object who's state (i.e. the contents of its memory) cannot be modified once it's been created.

If you're a .NET developer, you've used the string class. Did you know that in .NET string is an immutable object?

Supervision

This document outlines the concept behind supervision and what that means for your Akka.NET actors at run-time.

What Supervision Means

As described in Actor Systems supervision describes a dependency relationship between actors: the supervisor delegates tasks to subordinates and therefore must respond to their failures. When a subordinate detects a failure (i.e. throws an exception), it suspends itself and all its subordinates and sends a message to its supervisor, signaling failure. Depending on the nature of the work to be supervised and the nature of the failure, the supervisor has a choice of the following four options:

  • Resume the subordinate, keeping its accumulated internal state
  • Restart the subordinate, clearing out its accumulated internal state
  • Stop the subordinate permanently
  • Escalate the failure to the next parent in the hierarchy, thereby failing itself

It is important to always view an actor as part of a supervision hierarchy, which explains the existence of the fourth choice (as a supervisor also is subordinate to another supervisor higher up) and has implications on the first three: resuming an actor resumes all its subordinates, restarting an actor entails restarting all its subordinates (but see below for more details), similarly terminating an actor will also terminate all its subordinates. It should be noted that the default behavior of the PreRestart hook of the Actor class is to terminate all its children before restarting, but this hook can be overridden; the recursive restart applies to all children left after this hook has been executed.

Each supervisor is configured with a function translating all possible failure causes (i.e. exceptions) into one of the four choices given above; notably, this function does not take the failed actor’s identity as an input. It is quite easy to come up with examples of structures where this might not seem flexible enough, e.g. wishing for different strategies to be applied to different subordinates. At this point it is vital to understand that supervision is about forming a recursive fault handling structure. If you try to do too much at one level, it will become hard to reason about, hence the recommended way in this case is to add a level of supervision.

Akka.NET implements a specific form called “parental supervision”. Actors can only be created by other actors—where the top-level actor is provided by the library—and each created actor is supervised by its parent. This restriction makes the formation of actor supervision hierarchies implicit and encourages sound design decisions. It should be noted that this also guarantees that actors cannot be orphaned or attached to supervisors from the outside, which might otherwise catch them unawares. In addition, this yields a natural and clean shutdown procedure for (sub-trees of) actor applications.

Warning Supervision related parent-child communication happens by special system messages that have their own  mailboxes separate from user messages. This implies that supervision related events are not deterministically  ordered relative to ordinary messages. In general, the user cannot influence the order of normal messages and  failure notifications. For details and example see the Discussion: Message Ordering section.

The Top-Level Supervisors

An actor system will during its creation start at least three actors, shown in the image above. For more information about the consequences for actor paths see Top-Level Scopes for Actor Paths.

/user: The Guardian Actor

The actor which is probably most interacted with is the parent of all user-created actors, the guardian named"/user". Actors created using system.ActorOf() are children of this actor. This means that when this guardian terminates, all normal actors in the system will be shutdown, too. It also means that this guardian’s supervisor strategy determines how the top-level normal actors are supervised. Since Akka.NET 1.0 it is possible to configure this using the setting akka.actor.guardian-supervisor-strategy, which takes the fully-qualified class-name of aSupervisorStrategyConfigurator. When the guardian escalates a failure, the root guardian’s response will be to terminate the guardian, which in effect will shut down the whole actor system.

/system: The System Guardian

This special guardian has been introduced in order to achieve an orderly shut-down sequence where logging remains active while all normal actors terminate, even though logging itself is implemented using actors. This is realized by having the system guardian watch the user guardian and initiate its own shut-down upon reception of the Terminated message. The top-level system actors are supervised using a strategy which will restart indefinitely upon all types of Exception except for ActorInitializationException and ActorKilledException, which will terminate the child in question. All other exceptions are escalated, which will shut down the whole actor system.

/: The Root Guardian

The root guardian is the grand-parent of all so-called “top-level” actors and supervises all the special actors mentioned in Top-Level Scopes for Actor Paths using the SupervisorStrategy.StoppingStrategy, whose purpose is to terminate the child upon any type of Exception. All other throwables will be escalated … but to whom? Since every real actor has a supervisor, the supervisor of the root guardian cannot be a real actor. And because this means that it is “outside of the bubble”, it is called the “bubble-walker”. This is a synthetic ActorRef which in effect stops its child upon the first sign of trouble and sets the actor system’s isTerminated status to true as soon as the root guardian is fully terminated (all children recursively stopped).

What Restarting Means

When presented with an actor which failed while processing a certain message, causes for the failure fall into three categories:

  • Systematic (i.e. programming) error for the specific message received
  • (Transient) failure of some external resource used during processing the message
  • Corrupt internal state of the actor

Unless the failure is specifically recognizable, the third cause cannot be ruled out, which leads to the conclusion that the internal state needs to be cleared out. If the supervisor decides that its other children or itself is not affected by the corruption—e.g. because of conscious application of the error kernel pattern—it is therefore best to restart the child. This is carried out by creating a new instance of the underlying Actor class and replacing the failed instance with the fresh one inside the child’s ActorRef; the ability to do this is one of the reasons for encapsulating actors within special references. The new actor then resumes processing its mailbox, meaning that the restart is not visible outside of the actor itself with the notable exception that the message during which the failure occurred is not re-processed.

The precise sequence of events during a restart is the following:

  1. Suspend the actor (which means that it will not process normal messages until resumed), and recursively suspend all children.
  2. Call the old instance’s PreRestart hook (defaults to sending termination requests to all children and calling postStop)
  3. Wait for all children which were requested to terminate (using context.Stop()) during PreRestart to actually terminate; this—like all actor operations—is non-blocking, the termination notice from the last killed child will effect the progression to the next step.
  4. Create new actor instance by invoking the originally provided factory again.
  5. Invoke PostRestart on the new instance (which by default also calls PreStart)
  6. Send restart request to all children which were not killed in step 3; restarted children will follow the same process recursively, from step 2
  7. Resume the actor.

What Lifecycle Monitoring Means

Note Lifecycle Monitoring in Akka.NET is usually referred to as DeathWatch

In contrast to the special relationship between parent and child described above, each actor may monitor any other actor. Since actors emerge from creation fully alive and restarts are not visible outside of the affected supervisors, the only state change available for monitoring is the transition from alive to dead. Monitoring is thus used to tie one actor to another so that it may react to the other actor’s termination, in contrast to supervision which reacts to failure.

Lifecycle monitoring is implemented using a Terminated message to be received by the monitoring actor, where the default behavior is to throw a special DeathPactException if not otherwise handled. In order to start listening for Terminated messages, invoke ActorContext.Watch(targetActorRef). To stop listening, invokeActorContext.Unwatch(targetActorRef). One important property is that the message will be delivered irrespective of the order in which the monitoring request and target’s termination occur, i.e. you still get the message even if at the time of registration the target is already dead.

Monitoring is particularly useful if a supervisor cannot simply restart its children and has to terminate them, e.g. in case of errors during actor initialization. In that case it should monitor those children and re-create them or schedule itself to retry this at a later time.

Another common use case is that an actor needs to fail in the absence of an external resource, which may also be one of its own children. If a third party terminates a child by way of the system.Stop(child) method or sending aPoisonPill, the supervisor might well be affected.

One-For-One Strategy vs. All-For-One Strategy

There are two classes of supervision strategies which come with Akka: OneForOneStrategy and AllForOneStrategy. Both are configured with a mapping from exception type to supervision directive (see above) and limits on how often a child is allowed to fail before terminating it. The difference between them is that the former applies the obtained directive only to the failed child, whereas the latter applies it to all siblings as well. Normally, you should use the OneForOneStrategy, which also is the default if none is specified explicitly.

The AllForOneStrategy is applicable in cases where the ensemble of children has such tight dependencies among them, that a failure of one child affects the function of the others, i.e. they are inextricably linked. Since a restart does not clear out the mailbox, it often is best to terminate the children upon failure and re-create them explicitly from the supervisor (by watching the children’s lifecycle); otherwise you have to make sure that it is no problem for any of the actors to receive a message which was queued before the restart but processed afterwards.

Normally stopping a child (i.e. not in response to a failure) will not automatically terminate the other children in an all-for-one strategy; this can easily be done by watching their lifecycle: if the Terminated message is not handled by the supervisor, it will throw a DeathPactException which (depending on its supervisor) will restart it, and the default PreRestart action will terminate all children. Of course this can be handled explicitly as well.

Please note that creating one-off actors from an all-for-one supervisor entails that failures escalated by the temporary actor will affect all the permanent ones. If this is not desired, install an intermediate supervisor; this can very easily be done by declaring a router of size 1 for the worker, see [[Routing]].

Actor References, Paths and Addresses

This chapter describes how actors are identified and located within a possibly distributed actor system. It ties into the central idea that Actor Systems form intrinsic supervision hierarchies as well as that communication between actors is transparent with respect to their placement across multiple network nodes.

The above image displays the relationship between the most important entities within an actor system, please read on for the details.

What is an Actor Reference?

An actor reference is a subtype of ActorRef, whose foremost purpose is to support sending messages to the actor it represents. Each actor has access to its canonical (local) reference through the self field; this reference is also included as sender reference by default for all messages sent to other actors. Conversely, during message processing the actor has access to a reference representing the sender of the current message through the sender method.

There are several different types of actor references that are supported depending on the configuration of the actor system:

Purely local actor references are used by actor systems which are not configured to support networking functions. These actor references will not function if sent across a network connection to a remote CLR. Local actor references when remoting is enabled are used by actor systems which support networking functions for those references which represent actors within the same CLR. In order to also be reachable when sent to other network nodes, these references include protocol and remote addressing information. There is a subtype of local actor references which is used for routers. Its logical structure is the same as for the aforementioned local references, but sending a message to them dispatches to one of their children directly instead. Remote actor references represent actors which are reachable using remote communication, i.e. sending messages to them will serialize the messages transparently and send them to the remote CLR. There are several special types of actor references which behave like local actor references for all practical purposes: PromiseActorRef is the special representation of a Promise for the purpose of being completed by the response from an actor. ICanTell.Ask creates this actor reference. DeadLetterActorRef is the default implementation of the dead letters service to which Akka routes all messages whose destinations are shut down or non-existent. EmptyLocalActorRef is what Akka returns when looking up a non-existent local actor path: it is equivalent to a DeadLetterActorRef, but it retains its path so that Akka can send it over the network and compare it to other existing actor references for that path, some of which might have been obtained before the actor died. And then there are some one-off internal implementations which you should never really see: There is an actor reference which does not represent an actor but acts only as a pseudo-supervisor for the root guardian, we call it “the one who walks the bubbles of space-time”. The first logging service started before actually firing up actor creation facilities is a fake actor reference which accepts log events and prints them directly to standard output; it is Logging.StandardOutLogger.

What is an Actor Path?

Since actors are created in a strictly hierarchical fashion, there exists a unique sequence of actor names given by recursively following the supervision links between child and parent down towards the root of the actor system. This sequence can be seen as enclosing folders in a file system, hence we adopted the name “path” to refer to it. As in some real file-systems there also are “symbolic links”, i.e. one actor may be reachable using more than one path, where all but one involve some translation which decouples part of the path from the actor’s actual supervision ancestor line; these specialities are described in the sub-sections to follow.

An actor path consists of an anchor, which identifies the actor system, followed by the concatenation of the path elements, from root guardian to the designated actor; the path elements are the names of the traversed actors and are separated by slashes.

What is the Difference Between Actor Reference and Path?

An actor reference designates a single actor and the life-cycle of the reference matches that actor’s life-cycle; an actor path represents a name which may or may not be inhabited by an actor and the path itself does not have a life-cycle, it never becomes invalid. You can create an actor path without creating an actor, but you cannot create an actor reference without creating corresponding actor.

Note That definition does not hold for actorFor, which is one of the reasons why actorFor is deprecated in favor of actorSelection.

You can create an actor, terminate it, and then create a new actor with the same actor path. The newly created actor is a new incarnation of the actor. It is not the same actor. An actor reference to the old incarnation is not valid for the new incarnation. Messages sent to the old actor reference will not be delivered to the new incarnation even though they have the same path.

Actor Path Anchors

Each actor path has an address component, describing the protocol and location by which the corresponding actor is reachable, followed by the names of the actors in the hierarchy from the root up. Examples are:

"akka://my-sys/user/service-a/worker1"                   // purely local

"akka.tcp://my-sys@host.example.com:5678/user/service-b" // remote

Here, akka.tcp is the default remote transport for the 1.0 release; other transports are pluggable. A remote host using UDP would be accessible by using akka.udp. The interpretation of the host and port part (i.e.serv.example.com:5678 in the example) depends on the transport mechanism used, but it must abide by the URI structural rules.

Logical Actor Paths

The unique path obtained by following the parental supervision links towards the root guardian is called the logical actor path. This path matches exactly the creation ancestry of an actor, so it is completely deterministic as soon as the actor system’s remoting configuration (and with it the address component of the path) is set.

Physical Actor Paths

While the logical actor path describes the functional location within one actor system, configuration-based remote deployment means that an actor may be created on a different network host than its parent, i.e. within a different actor system. In this case, following the actor path from the root guardian up entails traversing the network, which is a costly operation. Therefore, each actor also has a physical path, starting at the root guardian of the actor system where the actual actor object resides. Using this path as sender reference when querying other actors will let them reply directly to this actor, minimizing delays incurred by routing.

One important aspect is that a physical actor path never spans multiple actor systems or CLRs. This means that the logical path (supervision hierarchy) and the physical path (actor deployment) of an actor may diverge if one of its ancestors is remotely supervised.

How are Actor References obtained?

There are two general categories to how actor references may be obtained: by creating actors or by looking them up, where the latter functionality comes in the two flavours of creating actor references from concrete actor paths and querying the logical actor hierarchy.

Creating Actors

An actor system is typically started by creating actors beneath the guardian actor using the ActorSystem.ActorOfmethod and then using ActorContext.ActorOf from within the created actors to spawn the actor tree. These methods return a reference to the newly created actor. Each actor has direct access (through its ActorContext) to references for its parent, itself and its children. These references may be sent within messages to other actors, enabling those to reply directly.

Looking up Actors by Concrete Path

In addition, actor references may be looked up using the ActorSystem.ActorSelection method. The selection can be used for communicating with said actor and the actor corresponding to the selection is looked up when delivering each message.

To acquire an ActorRef that is bound to the life-cycle of a specific actor you need to send a message, such as the built-in Identify message, to the actor and use the .Sender reference of a reply from the actor.

Absolute vs. Relative Paths

In addition to ActorSystem.actorSelection there is also ActorContext.ActorSelection, which is available inside any actor as context.actorSelection. This yields an actor selection much like its twin on ActorSystem, but instead of looking up the path starting from the root of the actor tree it starts out on the current actor. Path elements consisting of two dots ("..") may be used to access the parent actor. You can for example send a message to a specific sibling:

Context.ActorSelection("../brother").Tell(msg);

Absolute paths may of course also be looked up on context in the usual way, i.e.

Context.ActorSelection("/user/serviceA").Tell(msg);

will work as expected.

Querying the Logical Actor Hierarchy

Since the actor system forms a file-system like hierarchy, matching on paths is possible in the same way as supported by Unix shells: you may replace (parts of) path element names with wildcards («*» and «?») to formulate a selection which may match zero or more actual actors. Because the result is not a single actor reference, it has a different type ActorSelection and does not support the full set of operations an ActorRef does. Selections may be formulated using the ActorSystem.ActorSelection and IActorContext.ActorSelection methods and do support sending messages:

Context.ActorSelection("../*").Tell(msg);

will send msg to all siblings including the current actor. As for references obtained using actorFor, a traversal of the supervision hierarchy is done in order to perform the message send. As the exact set of actors which match a selection may change even while a message is making its way to the recipients, it is not possible to watch a selection for liveliness changes. In order to do that, resolve the uncertainty by sending a request and gathering all answers, extracting the sender references, and then watch all discovered concrete actors. This scheme of resolving a selection may be improved upon in a future release.

Summary: ActorOf vs. ActorSelection

Note What the above sections described in some detail can be summarized and memorized easily as follows:

  • ActorOf only ever creates a new actor, and it creates it as a direct child of the context on which this method is invoked (which may be any actor or actor system).
  • ActorSelection only ever looks up existing actors when messages are delivered, i.e. does not create actors, or verify existence of actors when the selection is created.

Actor Reference and Path Equality

Equality of ActorRef match the intention that an ActorRef corresponds to the target actor incarnation. Two actor references are compared equal when they have the same path and point to the same actor incarnation. A reference pointing to a terminated actor does not compare equal to a reference pointing to another (re-created) actor with the same path. Note that a restart of an actor caused by a failure still means that it is the same actor incarnation, i.e. a restart is not visible for the consumer of the ActorRef.

If you need to keep track of actor references in a collection and do not care about the exact actor incarnation you can use the ActorPath as key, because the identifier of the target actor is not taken into account when comparing actor paths.

Reusing Actor Paths

When an actor is terminated, its reference will point to the dead letter mailbox, DeathWatch will publish its final transition and in general it is not expected to come back to life again (since the actor life cycle does not allow this). While it is possible to create an actor at a later time with an identical path—simply due to it being impossible to enforce the opposite without keeping the set of all actors ever created available—this is not good practice.

It may be the right thing to do in very specific circumstances, but make sure to confine the handling of this precisely to the actor’s supervisor, because that is the only actor which can reliably detect proper deregistration of the name, before which creation of the new child will fail.

It may also be required during testing, when the test subject depends on being instantiated at a specific path. In that case it is best to mock its supervisor so that it will forward the Terminated message to the appropriate point in the test procedure, enabling the latter to await proper deregistration of the name.

The Interplay with Remote Deployment

When an actor creates a child, the actor system’s deployer will decide whether the new actor resides in the same CLR or on another node. In the second case, creation of the actor will be triggered via a network connection to happen in a different CLR and consequently within a different actor system. The remote system will place the new actor below a special path reserved for this purpose and the supervisor of the new actor will be a remote actor reference (representing that actor which triggered its creation). In this case, context.parent (the supervisor reference) and context.path.parent (the parent node in the actor’s path) do not represent the same actor. However, looking up the child’s name within the supervisor will find it on the remote node, preserving logical structure e.g. when sending to an unresolved actor reference.

What is the Address part used for?

When sending an actor reference across the network, it is represented by its path. Hence, the path must fully encode all information necessary to send messages to the underlying actor. This is achieved by encoding protocol, host and port in the address part of the path string. When an actor system receives an actor path from a remote node, it checks whether that path’s address matches the address of this actor system, in which case it will be resolved to the actor’s local reference. Otherwise, it will be represented by a remote actor reference.

Top-Level Scopes for Actor Paths

At the root of the path hierarchy resides the root guardian above which all other actors are found; its name is "/". The next level consists of the following:

  • "/user" is the guardian actor for all user-created top-level actors; actors created using ActorSystem.ActorOfare found below this one.
  • "/system" is the guardian actor for all system-created top-level actors, e.g. logging listeners or actors automatically deployed by configuration at the start of the actor system.
  • "/deadLetters" is the dead letter actor, which is where all messages sent to stopped or non-existing actors are re-routed (on a best-effort basis: messages may be lost even within the local CLR).
  • "/temp" is the guardian for all short-lived system-created actors, e.g. those which are used in the implementation of ActorRef.ask.
  • "/remote" is an artificial path below which all actors reside whose supervisors are remote actor references The need to structure the name space for actors like this arises from a central and very simple design goal: everything in the hierarchy is an actor, and all actors function in the same way. Hence you can not only look up the actors you created, you can also look up the system guardian and send it a message (which it will dutifully discard in this case). This powerful principle means that there are no quirks to remember, it makes the whole system more uniform and consistent.

If you want to read more about the top-level structure of an actor system, have a look at The Top-Level Supervisors.

Location Transparency

You might have noticed looking at some of the Akka.NET documentation that when you create an instance of an actor you defined, you get an IActorRef instance back instead of an instance of your actor type. FooActor or whatever.

The reason for this is simple - you never send a message to an actor instance directly. You do it through an "actor reference," implemented via the IActorRef interface.

This is because actor references have the ability to add location transparency to your actors - an important concept that enables Akka.NET applications to be easily distributed over a network of computers.

Real-world examples of location transparency

Location transparency is a concept that you already use constantly in your everyday life. Here's some examples:

  • Phone numbers
  • Email addresses
  • URLs

Quoted from Beyond HTTP: "What is an Actor?"

What location transparency means is that whenever you send a message to an actor, you don't need to know where they are within an actor system, which might span hundreds of computers. You just have to know that actors' address.

Think of it like calling someone's cell phone number - you don't need to know that your friend Bob is in Seattle, Washington, USA in order to place a call to them. You just need to dial Bob's cell phone number and your cellular network provider will take care of the rest.

Actor references and actor addresses

Quoted from Beyond HTTP: "What is an Actor?"

Actors work just the same way - in Akka.NET every actor has an address that contains the following parts:

  • Protocol - just like how you can have HTTP and HTTPS on the web, Akka.NET supports multiple transport protocols for inter-process communication. The default protocol for single-process actor systems is justakka://. If you're using remoting or clustering, you'll typically use a socket transport like akka.tcp:// orakka.udp:// to communicate between nodes.
  • ActorSystem - every ActorSystem instance in Akka.NET has to be given a name upon startup, and that name can be shared by multiple processes or machines that are all participating in a distributed ActorSystem.
  • Address - if you're not using remoting, then the address portion of an ActorPath can be omitted. But this is used to convey specific IP address / domain name and port information used for remote communication between actor systems.
  • Path - this is the path to a specific actor at an address. It's structure just like a URL path for a website, with all user-defined actors stemming off of the /user/ root actor.

However, this detail of the actor's address is made transparent to you by the actor reference. An IActorRef belonging to a remote system looks exactly the same to you as an IActorRefcreated inside the current process.

Therefore your Akka.NET application code doesn't have to differentiate between local and remote actors - that's handled transparently for you by the Akka.NET framework. This is what we mean by "location transparency."

Message Delivery Reliability

Akka.NET helps you build reliable applications which make use of multiple processor cores in one machine (“scaling up”) or distributed across a computer network (“scaling out”). The key abstraction to make this work is that all interactions between your code units—actors—happen via message passing, which is why the precise semantics of how messages are passed between actors deserve their own chapter.

In order to give some context to the discussion below, consider an application which spans multiple network hosts. The basic mechanism for communication is the same whether sending to an actor on the local application or to a remote actor, but of course there will be observable differences in the latency of delivery (possibly also depending on the bandwidth of the network link and the message size) and the reliability. In case of a remote message send there are obviously more steps involved which means that more can go wrong. Another aspect is that local sending will just pass a reference to the message inside the same application, without any restrictions on the underlying object which is sent, whereas a remote transport will place a limit on the message size.

Writing your actors such that every interaction could possibly be remote is the safe, pessimistic bet. It means to only rely on those properties which are always guaranteed and which are discussed in detail below. This has of course some overhead in the actor’s implementation. If you are willing to sacrifice full location transparency—for example in case of a group of closely collaborating actors—you can place them always on the same local application and enjoy stricter guarantees on message delivery. The details of this trade-off are discussed further below.

As a supplementary part we give a few pointers at how to build stronger reliability on top of the built-in ones. The chapter closes by discussing the role of the “Dead Letter Office”.

The General Rules

These are the rules for message sends (i.e. the tell or ! method, which also underlies the ask pattern):

  • at-most-once delivery, i.e. no guaranteed delivery
  • message ordering per sender–receiver pair

The first rule is typically found also in other actor implementations while the second is specific to Akka.

Discussion: What does “at-most-once” mean?

When it comes to describing the semantics of a delivery mechanism, there are three basic categories:

  • at-most-once delivery means that for each message handed to the mechanism, that message is delivered zero or one times; in more casual terms it means that messages may be lost.
  • at-least-once delivery means that for each message handed to the mechanism potentially multiple attempts are made at delivering it, such that at least one succeeds; again, in more casual terms this means that messages may be duplicated but not lost.
  • exactly-once delivery means that for each message handed to the mechanism exactly one delivery is made to the recipient; the message can neither be lost nor duplicated.

The first one is the cheapest—highest performance, least implementation overhead—because it can be done in a fire-and-forget fashion without keeping state at the sending end or in the transport mechanism. The second one requires retries to counter transport losses, which means keeping state at the sending end and having an acknowledgement mechanism at the receiving end. The third is most expensive—and has consequently worst performance—because in addition to the second it requires state to be kept at the receiving end in order to filter out duplicate deliveries.

Discussion: Why No Guaranteed Delivery?

At the core of the problem lies the question what exactly this guarantee shall mean:

  1. The message is sent out on the network?
  2. The message is received by the other host?
  3. The message is put into the target actor's mailbox?
  4. The message is starting to be processed by the target actor?
  5. The message is processed successfully by the target actor?

Each one of these have different challenges and costs, and it is obvious that there are conditions under which any message passing library would be unable to comply; think for example about configurable mailbox types and how a bounded mailbox would interact with the third point, or even what it would mean to decide upon the “successfully” part of point five.

Along those same lines goes the reasoning in Nobody Needs Reliable Messaging_. The only meaningful way for a sender to know whether an interaction was successful is by receiving a business-level acknowledgement message, which is not something Akka.NET could make up on its own (neither are we writing a “do what I mean” framework nor would you want us to).

Akka.NET embraces distributed computing and makes the fallibility of communication explicit through message passing, therefore it does not try to lie and emulate a leaky abstraction. This is a model that has been used with great success in Erlang and requires the users to design their applications around it. You can read more about this approach in the Erlang documentation_ (section 10.9 and 10.10), Akka.NET follows it closely.

Another angle on this issue is that by providing only basic guarantees those use cases which do not need stronger reliability do not pay the cost of their implementation; it is always possible to add stronger reliability on top of basic ones, but it is not possible to retro-actively remove reliability in order to gain more performance.

Discussion: Message Ordering

The rule more specifically is that for a given pair of actors, messages sent from the first to the second will not be received out-of-order. This is illustrated in the following:

Actor A1 sends messages M1, M2, M3 to A2

Actor A3 sends messages M4, M5, M6 to A2

This means that: 1) If M1 is delivered it must be delivered before M2 and M3 2) If M2 is delivered it must be delivered before M3 3) If M4 is delivered it must be delivered before M5 and M6 4) If M5 is delivered it must be delivered before M6 5) A2 can see messages from A1 interleaved with messages from A3 6) Since there is no guaranteed delivery, any of the messages may be dropped, i.e. not arrive at A2

Note It is important to note that Akka’s guarantee applies to the order in which messages are enqueued into the recipient’s mailbox. If the mailbox implementation does not respect FIFO order (e.g. a PriorityMailbox), then the order of processing by the actor can deviate from the enqueueing order.

Please note that this rule is not transitive:

Actor A sends message M1 to actor C

Actor A then sends message M2 to actor B

Actor B forwards message M2 to actor C

Actor C may receive M1 and M2 in any order

Causal transitive ordering would imply that M2 is never received before M1 at actor C (though any of them might be lost). This ordering can be violated due to different message delivery latencies when A, B and C reside on different network hosts, see more below.

Note Actor creation is treated as a message sent from the parent to the child, with the same semantics as discussed above. Sending a message to an actor in a way which could be reordered with this initial creation message means that the message might not arrive because the actor does not exist yet. An example where the message might arrive too early would be to create a remote-deployed actor R1, send its reference to another remote actor R2 and have R2 send a message to R1. An example of well-defined ordering is a parent which creates an actor and immediately sends a message to it.

Communication of failure

Please note, that the ordering guarantees discussed above only hold for user messages between actors. Failure of a child of an actor is communicated by special system messages that are not ordered relative to ordinary user messages. In particular:

Child actor C sends message M to its parent P

Child actor fails with failure F

Parent actor P might receive the two events either in order M, F or F, M

The reason for this is that internal system messages has their own mailboxes therefore the ordering of enqueue calls of a user and system message cannot guarantee the ordering of their dequeue times.

The Rules for In-App (Local) Message Sends

Be careful what you do with this section!

Relying on the stronger reliability in this section is not recommended since it will bind your application to local-only deployment: an application may have to be designed differently (as opposed to just employing some message exchange patterns local to some actors) in order to be fit for running on a cluster of machines. Our credo is “design once, deploy any way you wish”, and to achieve this you should only rely on The General Rules.

Reliability of Local Message Sends

The Akka.NET test suite relies on not losing messages in the local context (and for non-error condition tests also for remote deployment), meaning that we actually do apply the best effort to keep our tests stable. A local telloperation can however fail for the same reasons as a normal method call can on the CLR:

  • StackOverflowException
  • OutOfMemoryException
  • other :SystemException

In addition, local sends can fail in Akka-specific ways:

  • if the mailbox does not accept the message (e.g. full BoundedMailbox)
  • if the receiving actor fails while processing the message or is already terminated

While the first is clearly a matter of configuration the second deserves some thought: the sender of a message does not get feedback if there was an exception while processing, that notification goes to the supervisor instead. This is in general not distinguishable from a lost message for an outside observer.

Ordering of Local Message Sends

Assuming strict FIFO mailboxes the abovementioned caveat of non-transitivity of the message ordering guarantee is eliminated under certain conditions. As you will note, these are quite subtle as it stands, and it is even possible that future performance optimizations will invalidate this whole paragraph. The possibly non-exhaustive list of counter-indications is:

  • Before receiving the first reply from a top-level actor, there is a lock which protects an internal interim queue, and this lock is not fair; the implication is that enqueue requests from different senders which arrive during the actor’s construction (figuratively, the details are more involved) may be reordered depending on low-level thread scheduling. Since completely fair locks do not exist on the CLR this is unfixable.
  • The same mechanism is used during the construction of a Router, more precisely the routed ActorRef, hence the same problem exists for actors deployed with Routers.
  • As mentioned above, the problem occurs anywhere a lock is involved during enqueueing, which may also apply to custom mailboxes.

This list has been compiled carefully, but other problematic scenarios may have escaped our analysis.

How does Local Ordering relate to Network Ordering

As explained in the previous paragraph local message sends obey transitive causal ordering under certain conditions. If the remote message transport would respect this ordering as well, that would translate to transitive causal ordering across one network link, i.e. if exactly two network hosts are involved. Involving multiple links, e.g. the three actors on three different nodes mentioned above, then no guarantees can be made.

The current remote transport does not support this (again this is caused by non-FIFO wake-up order of a lock, this time serializing connection establishment).

As a speculative view into the future it might be possible to support this ordering guarantee by re-implementing the remote transport layer based completely on actors; at the same time we are looking into providing other low-level transport protocols like UDP or SCTP which would enable higher throughput or lower latency by removing this guarantee again, which would mean that choosing between different implementations would allow trading guarantees versus performance.

Higher-level abstractions

Based on a small and consistent tool set in Akka's core, Akka.NET also provides powerful, higher-level abstractions on top it.

Messaging Patterns

As discussed above a straight-forward answer to the requirement of reliable delivery is an explicit ACK–RETRY protocol. In its simplest form this requires

  • a way to identify individual messages to correlate message with acknowledgement
  • a retry mechanism which will resend messages if not acknowledged in time
  • a way for the receiver to detect and discard duplicates

The third becomes necessary by virtue of the acknowledgements not being guaranteed to arrive either. An ACK-RETRY protocol with business-level acknowledgements is supported by [[At least once delivery]] of the Akka.NET Persistence module. Duplicates can be detected by tracking the identifiers of messages sent via [[At least once delivery]]. Another way of implementing the third part would be to make processing the messages idempotent on the level of the business logic.

Another example of implementing all three requirements is shown at :ref:reliable-proxy (which is now superseded by [[At least once delivery]]).

Event Sourcing

Event sourcing (and sharding) is what makes large websites scale to billions of users, and the idea is quite simple: when a component (think actor) processes a command it will generate a list of events representing the effect of the command. These events are stored in addition to being applied to the component’s state. The nice thing about this scheme is that events only ever are appended to the storage, nothing is ever mutated; this enables perfect replication and scaling of consumers of this event stream (i.e. other components may consume the event stream as a means to replicate the component’s state on a different continent or to react to changes). If the component’s state is lost—due to a machine failure or by being pushed out of a cache—it can easily be reconstructed by replaying the event stream (usually employing snapshots to speed up the process). :ref:event-sourcing is supported by Akka.NET Persistence.

Mailbox with Explicit Acknowledgement

By implementing a custom mailbox type it is possible retry message processing at the receiving actor’s end in order to handle temporary failures. This pattern is mostly useful in the local communication context where delivery guarantees are otherwise sufficient to fulfill the application’s requirements.

Please note that the caveats for The Rules for In-App (Local) Message Sends_ do apply.

An example implementation of this pattern is shown at :ref:mailbox-acking.

Dead Letters

Messages which cannot be delivered (and for which this can be ascertained) will be delivered to a synthetic actor called /deadLetters. This delivery happens on a best-effort basis; it may fail even within a single application in the local machine (e.g. during actor termination). Messages sent via unreliable network transports will be lost without turning up as dead letters.

What Should I Use Dead Letters For?

The main use of this facility is for debugging, especially if an actor send does not arrive consistently (where usually inspecting the dead letters will tell you that the sender or recipient was set wrong somewhere along the way). In order to be useful for this purpose it is good practice to avoid sending to deadLetters where possible, i.e. run your application with a suitable dead letter logger (see more below) from time to time and clean up the log output. This exercise—like all else—requires judicious application of common sense: it may well be that avoiding to send to a terminated actor complicates the sender’s code more than is gained in debug output clarity.

The dead letter service follows the same rules with respect to delivery guarantees as all other message sends, hence it cannot be used to implement guaranteed delivery.

How do I Receive Dead Letters?

An actor can subscribe to class Akka.Actor.DeadLetter on the event stream, see [[event stream]] for how to do that. The subscribed actor will then receive all dead letters published in the (local) system from that point onwards. Dead letters are not propagated over the network, if you want to collect them in one place you will have to subscribe one actor per network node and forward them manually. Also consider that dead letters are generated at that node which can determine that a send operation is failed, which for a remote send can be the local system (if no network connection can be established) or the remote one (if the actor you are sending to does not exist at that point in time).

Dead Letters Which are (Usually) not Worrisome

Every time an actor does not terminate by its own decision, there is a chance that some messages which it sends to itself are lost. There is one which happens quite easily in complex shutdown scenarios that is usually benign: seeing a Akka.Dispatch.Terminate message dropped means that two termination requests were given, but of course only one can succeed. In the same vein, you might see Akka.Actor.Terminated messages from children while stopping a hierarchy of actors turning up in dead letters if the parent is still watching the child when the parent terminates.

.. _Erlang documentation: http://www.erlang.org/faq/academic.html .. _Nobody Needs Reliable Messaging:http://www.infoq.com/articles/no-reliable-messaging

Akka.NET Configuration

Quoted from Akka.NET Bootcamp: Unit 2, Lesson 1 - "Using HOCON Configuration to Configure Akka.NET"

Akka.NET leverages a configuration format, called HOCON, to allow you to configure your Akka.NET applications with whatever level of granularity you want.

What is HOCON?

HOCON (Human-Optimized Config Object Notation) is a flexible and extensible configuration format. It will allow you to configure everything from Akka.NET's IActorRefProvider implementation, logging, network transports, and more commonly - how individual actors are deployed.

Values returned by HOCON are strongly typed (i.e. you can fetch out an int, a Timespan, etc).

What can I do with HOCON?

HOCON allows you to embed easily-readable configuration inside of the otherwise hard-to-read XML in App.config and Web.config. HOCON also lets you query configs by their section paths, and those sections are exposed strongly typed and parsed values you can use inside your applications.

HOCON also lets you nest and/or chain sections of configuration, creating layers of granularity and providing you a semantically namespaced config.

What is HOCON usually used for?

HOCON is commonly used for tuning logging settings, enabling special modules (such as Akka.Remote), or configuring deployments such as the Dispatcher or Router used for a particular actor.

For example, let's configure an ActorSystem with HOCON:

var config = ConfigurationFactory.ParseString(@"akka.remote.helios.tcp {              transport-class =           ""Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote""              transport-protocol = tcp              port = 8091              hostname = ""127.0.0.1""          }"); var system = ActorSystem.Create("MyActorSystem", config);

As you can see in that example, a HOCON Config object can be parsed from a string using theConfigurationFactory.ParseString method. Once you have a Config object, you can then pass this to yourActorSystem inside the ActorSystem.Create method.

"Deployment"? What's that?

Deployment is a vague concept, but it's closely tied to HOCON. An actor is "deployed" when it is instantiated and put into service within the ActorSystem somewhere.

When an actor is instantiated within the ActorSystem it can be deployed in one of two places: inside the local process or in another process (this is what Akka.Remote does.)

When an actor is deployed by the ActorSystem, it has a range of configuration settings. These settings control a wide range of behavior options for the actor, such as: is this actor going to be a router? What Dispatcher will it use? What type of mailbox will it have? (More on these concepts in later lessons.)

We haven't gone over what all these options mean, but the key thing to know for now is that the settings used by the ActorSystem to deploy an actor into service can be set within HOCON.

This also means that you can change the behavior of actors dramatically (by changing these settings) without having to actually touch the actor code itself.

Flexible config FTW!

HOCON can be used inside App.config and Web.config

Parsing HOCON from a string is handy for small configuration sections, but what if you want to be able to take advantage of Configuration Transforms for App.config and Web.config and all of the other nice tools we have in the System.Configuration namespace?

As it turns out, you can use HOCON inside these configuration files too!

Here's an example of using HOCON inside App.config:

<?xml version="1.0" encoding="utf-8" ?><configuration>  <configSections>    <section name="akka"             type="Akka.Configuration.Hocon.AkkaConfigurationSection, Akka" />  </configSections>   <akka>    <hocon>      <![CDATA[          akka {            # here we are configuring log levels            log-config-on-start = off            stdout-loglevel = INFO            loglevel = ERROR            # this config section will be referenced as akka.actor            actor {              provider = "Akka.Remote.RemoteActorRefProvider, Akka.Remote"              debug {                  receive = on                  autoreceive = on                  lifecycle = on                  event-stream = on                  unhandled = on              }            }            # here we're configuring the Akka.Remote module            remote {              helios.tcp {                  transport-class =            "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"                  #applied-adapters = []                  transport-protocol = tcp                  port = 8091                  hostname = "127.0.0.1"              }            log-remote-lifecycle-events = INFO          }      ]]>    </hocon>  </akka></configuration>

And then we can load this configuration section into our ActorSystem via the following code:

var system = ActorSystem.Create("Mysystem"); //automatically loads App/Web.config

HOCON Configuration Supports Fallbacks

Although this isn't a concept we leverage explicitly in Unit 2, it's a powerful trait of the Config class that comes in handy in lots of production use cases.

HOCON supports the concept of "fallback" configurations - it's easiest to explain this concept visually.

To create something that looks like the diagram above, we have to create a Config object that has three fallbacks chained behind it using syntax like this:

var f0 = ConfigurationFactory.ParseString("a = bar");var f1 = ConfigurationFactory.ParseString("b = biz");var f2 = ConfigurationFactory.ParseString("c = baz");var f3 = ConfigurationFactory.ParseString("a = foo"); var yourConfig = f0.WithFallback(f1)                   .WithFallback(f2)                   .WithFallback(f3);

If we request a value for a HOCON object with key "a", using the following code:

var a = yourConfig.GetString("a");

Then the internal HOCON engine will match the first HOCON file that contains a definition for key a. In this case, that is f0, which returns the value "bar".

Why wasn't "foo" returned as the value for "a"?

The reason is because HOCON only searches through fallback Config objects if a match is NOT found earlier in theConfig chain. If the top-level Config object has a match for a, then the fallbacks won't be searched. In this case, a match for a was found in f0 so the a=foo in f3 was never reached.

What happens when there is a HOCON key miss?

What happens if we run the following code, given that c isn't defined in f0 or f1?

var c = yourConfig.GetString("c");

In this case yourConfig will fallback twice to f2 and return "baz" as the value for key c.

HOCON (Human-Optimized Config Object Notation)

This is an informal spec, but hopefully it's clear.

Goals / Background

The primary goal is: keep the semantics (tree structure; set of types; encoding/escaping) from JSON (JavaScript Object Notation), but make it more convenient as a human-editable config file format.

The following features are desirable, to support human usage:

  • less noisy / less pedantic syntax
  • ability to refer to another part of the configuration (set a value to another value)
  • import/include another configuration file into the current file
  • a mapping to a flat properties list such as Java's system properties
  • ability to get values from environment variables
  • ability to write comments

Implementation-wise, the format should have these properties:

  • a JSON superset, that is, all valid JSON should be valid and should result in the same in-memory data that a JSON parser would have produced.
  • be deterministic; the format is flexible, but it is not heuristic. It should be clear what's invalid and invalid files should generate errors.
  • require minimal look-ahead; should be able to tokenize the file by looking at only the next three characters. (right now, the only reason to look at three is to find "//" comments; otherwise you can parse looking at two.)

HOCON is significantly harder to specify and to parse than JSON. Think of it as moving the work from the person maintaining the config file to the computer program.

Definitions

  • key is a string JSON would have to the left of : and a value is anything JSON would have to the right of :. i.e. the two halves of an object field.
  • value is any "value" as defined in the JSON spec, plus unquoted strings and substitutions as defined in this spec.
  • simple value is any value excluding an object or array value.
  • field is a key, any separator such as ':', and a value.
  • references to a file ("the file being parsed") can be understood to mean any byte stream being parsed, not just literal files in a filesystem.

Syntax

Much of this is defined with reference to JSON; you can find the JSON spec at http://json.org/ of course.

Unchanged from JSON

  • files must be valid UTF-8
  • quoted strings are in the same format as JSON strings
  • values have possible types: string, number, object, array, boolean, null
  • allowed number formats matches JSON; as in JSON, some possible floating-point values are not represented, such as NaN

Comments

Anything between // or # and the next newline is considered a comment and ignored, unless the // or # is inside a quoted string.

Omit root braces

JSON documents must have an array or object at the root. Empty files are invalid documents, as are files containing only a non-array non-object value such as a string.

In HOCON, if the file does not begin with a square bracket or curly brace, it is parsed as if it were enclosed with {}curly braces.

A HOCON file is invalid if it omits the opening { but still has a closing }; the curly braces must be balanced.

Key-value separator

The = character can be used anywhere JSON allows :, i.e. to separate keys from values.

If a key is followed by {, the : or = may be omitted. So "foo" {} means "foo" : {}

Commas

Values in arrays, and fields in objects, need not have a comma between them as long as they have at least one ASCII newline (\n, decimal value 10) between them.

The last element in an array or last field in an object may be followed by a single comma. This extra comma is ignored.

  • [1,2,3,] and [1,2,3] are the same array.
  • [1\n2\n3] and [1,2,3] are the same array.
  • [1,2,3,,] is invalid because it has two trailing commas.
  • [,1,2,3] is invalid because it has an initial comma.
  • [1,,2,3] is invalid because it has two commas in a row.
  • these same comma rules apply to fields in objects.

Whitespace

The JSON spec simply says "whitespace"; in HOCON whitespace is defined as follows:

  • any Unicode space separator (Zs category), line separator (Zl category), or paragraph separator (Zp category), including nonbreaking spaces (such as 0x00A0, 0x2007, and 0x202F). The BOM (0xFEFF) must also be treated as whitespace.
  • tab (\t 0x0009), newline ('\n' 0x000A), vertical tab ('\v' 0x000B), form feed (\f' 0x000C), carriage return ('\r' 0x000D), file separator (0x001C), group separator (0x001D), record separator (0x001E), unit separator (0x001F).

In Java, the isWhitespace() method covers these characters with the exception of nonbreaking spaces and the BOM.

While all Unicode separators should be treated as whitespace, in this spec "newline" refers only and specifically to ASCII newline 0x000A.

Duplicate keys and object merging

The JSON spec does not clarify how duplicate keys in the same object should be handled. In HOCON, duplicate keys that appear later override those that appear earlier, unless both values are objects. If both values are objects, then the objects are merged.

Note: this would make HOCON a non-superset of JSON if you assume that JSON requires duplicate keys to have a behavior. The assumption here is that duplicate keys are invalid JSON.

To merge objects:

  • add fields present in only one of the two objects to the merged object.
  • for non-object-valued fields present in both objects, the field found in the second object must be used.
  • for object-valued fields present in both objects, the object values should be recursively merged according to these same rules.

Object merge can be prevented by setting the key to another value first. This is because merging is always done two values at a time; if you set a key to an object, a non-object, then an object, first the non-object falls back to the object (non-object always wins), and then the object falls back to the non-object (no merging, object is the new value). So the two objects never see each other.

These two are equivalent:

{

   "foo" : { "a" : 42 },

   "foo" : { "b" : 43 }

}

 

{

   "foo" : { "a" : 42, "b" : 43 }

}

And these two are equivalent:

{

   "foo" : { "a" : 42 },

   "foo" : null,

   "foo" : { "b" : 43 }

}

 

{

   "foo" : { "b" : 43 }

}

The intermediate setting of "foo" to null prevents the object merge.

Unquoted strings

A sequence of characters outside of a quoted string is a string value if:

  • it does not contain "forbidden characters": "$", '"', '{', '}', '[', ']', ':', '=', ',', '+', '#', '`', '^', '?', '!', '@', '*', '&', '\' (backslash), or whitespace.
  • it does not contain the two-character string "//" (which starts a comment)
  • its initial characters do not parse as true, false, null, or a number.

Unquoted strings are used literally, they do not support any kind of escaping. Quoted strings may always be used as an alternative when you need to write a character that is not permitted in an unquoted string.

truefoo parses as the boolean token true followed by the unquoted string foo. However, footrue parses as the unquoted string footrue. Similarly, 10.0bar is the number 10.0 then the unquoted string bar but bar10.0 is the unquoted string bar10.0. (In practice, this distinction doesn't matter much because of value concatenation; see later section.)

In general, once an unquoted string begins, it continues until a forbidden character or the two-character string "//" is encountered. Embedded (non-initial) booleans, nulls, and numbers are not recognized as such, they are part of the string.

An unquoted string may not begin with the digits 0-9 or with a hyphen (-, 0x002D) because those are valid characters to begin a JSON number. The initial number character, plus any valid-in-JSON number characters that follow it, must be parsed as a number value. Again, these characters are not special inside an unquoted string; they only trigger number parsing if they appear initially.

Note that quoted JSON strings may not contain control characters (control characters include some whitespace characters, such as newline). This rule is from the JSON spec. However, unquoted strings have no restriction on control characters, other than the ones listed as "forbidden characters" above.

Some of the "forbidden characters" are forbidden because they already have meaning in JSON or HOCON, others are essentially reserved keywords to allow future extensions to this spec.

Multi-line strings

Multi-line strings are similar to Python or Scala, using triple quotes. If the three-character sequence """ appears, then all Unicode characters until a closing """ sequence are used unmodified to create a string value. Newlines and whitespace receive no special treatment. Unlike Scala, and unlike JSON quoted strings, Unicode escapes are not interpreted in triple-quoted strings.

In Python, """foo"""" is a syntax error (a triple-quoted string followed by a dangling unbalanced quote). In Scala, it is a four-character string foo". HOCON works like Scala; any sequence of at least three quotes ends the multi-line string, and any "extra" quotes are part of the string.

Value concatenation

The value of an object field or array element may consist of multiple values which are combined. There are three kinds of value concatenation:

  • if all the values are simple values (neither objects nor arrays), they are concatenated into a string.
  • if all the values are arrays, they are concatenated into one array.
  • if all the values are objects, they are merged (as with duplicate keys) into one object.

String value concatenation is allowed in field keys, in addition to field values and array elements. Objects and arrays do not make sense as field keys.

Note: Akka 2.0 (and thus Play 2.0) contains an embedded implementation of the config lib which does not support array and object value concatenation; it only supports string value concatenation.

String value concatenation

String value concatenation is the trick that makes unquoted strings work; it also supports substitutions (${foo}syntax) in strings.

Only simple values participate in string value concatenation. Recall that a simple value is any value other than arrays and objects.

As long as simple values are separated only by non-newline whitespace, the whitespace between them is preserved and the values, along with the whitespace, are concatenated into a string.

String value concatenations never span a newline, or a character that is not part of a simple value.

A string value concatenation may appear in any place that a string may appear, including object keys, object values, and array elements.

Whenever a value would appear in JSON, a HOCON parser instead collects multiple values (including the whitespace between them) and concatenates those values into a string.

Whitespace before the first and after the last simple value must be discarded. Only whitespace between simple values must be preserved.

So for example foo bar baz parses as three unquoted strings, and the three are value-concatenated into one string. The inner whitespace is kept and the leading and trailing whitespace is trimmed. The equivalent string, written in quoted form, would be "foo bar baz".

Value concatenating foo bar (two unquoted strings with whitespace) and quoted string "foo bar" would result in the same in-memory representation, seven characters.

For purposes of string value concatenation, non-string values are converted to strings as follows (strings shown as quoted strings):

  • true and false become the strings "true" and "false".
  • null becomes the string "null".
  • quoted and unquoted strings are themselves.
  • numbers should be kept as they were originally written in the file. For example, if you parse 1e5 then you might render it alternatively as 1E5 with capital E, or just 100000. For purposes of value concatenation, it should be rendered as it was written in the file.
  • a substitution is replaced with its value which is then converted to a string as above.
  • it is invalid for arrays or objects to appear in a string value concatenation.

A single value is never converted to a string. That is, it would be wrong to value concatenate true by itself; that should be parsed as a boolean-typed value. Only true foo (true with another simple value on the same line) should be parsed as a value concatenation and converted to a string.

Array and object concatenation

Arrays can be concatenated with arrays, and objects with objects, but it is an error if they are mixed.

For purposes of concatenation, "array" also means "substitution that resolves to an array" and "object" also means "substitution that resolves to an object."

Within an field value or array element, if only non-newline whitespace separates the end of a first array or object or substitution from the start of a second array or object or substitution, the two values are concatenated. Newlines may occur within the array or object, but not between them. Newlines between prevent concatenation.

For objects, "concatenation" means "merging", so the second object overrides the first.

Arrays and objects cannot be field keys, whether concatenation is involved or not.

Here are several ways to define a to the same object value:

// one object

a : { b : 1, c : 2 }

// two objects that are merged via concatenation rules

a : { b : 1 } { c : 2 }

// two fields that are merged

a : { b : 1 }

a : { c : 2 }

Here are several ways to define a to the same array value:

// one array

a : [ 1, 2, 3, 4 ]

// two arrays that are concatenated

a : [ 1, 2 ] [ 3, 4 ]

// a later definition referring to an earlier

// (see "self-referential substitutions" below)

a : [ 1, 2 ]

a : ${a} [ 3, 4 ]

A common use of object concatenation is "inheritance":

data-center-generic = { cluster-size = 6 }

data-center-east = ${data-center-generic} { name = "east" }

A common use of array concatenation is to add to paths:

path = [ /bin ]

path = ${path} [ /usr/bin ]

Note: Arrays without commas or newlines

Arrays allow you to use newlines instead of commas, but not whitespace instead of commas. Non-newline whitespace will produce concatenation rather than separate elements.

// this is an array with one element, the string "1 2 3 4"

[ 1 2 3 4 ]

// this is an array of four integers

[ 1

2

3

4 ]

 

// an array of one element, the array [ 1, 2, 3, 4 ]

[ [ 1, 2 ] [ 3, 4 ] ]

// an array of two arrays

[ [ 1, 2 ]

[ 3, 4 ] ]

If this gets confusing, just use commas. The concatenation behavior is useful rather than surprising in cases like:

[ This is an unquoted string my name is ${name}, Hello ${world} ]

[ ${a} ${b}, ${x} ${y} ]

Non-newline whitespace is never an element or field separator.

Path expressions

Path expressions are used to write out a path through the object graph. They appear in two places; in substitutions, like ${foo.bar}, and as the keys in objects like { foo.bar : 42 }.

Path expressions are syntactically identical to a value concatenation, except that they may not contain substitutions. This means that you can't nest substitutions inside other substitutions, and you can't have substitutions in keys.

When concatenating the path expression, any . characters outside quoted strings are understood as path separators, while inside quoted strings . has no special meaning. So foo.bar."hello.world" would be a path with three elements, looking up key foo, key bar, then key hello.world.

The main tricky point is that . characters in numbers do count as a path separator. When dealing with a number as part of a path expression, it's essential to retain the original string representation of the number as it appeared in the file (rather than converting it back to a string with a generic number-to-string library function).

  • 10.0foo is a number then unquoted string foo and should be the two-element path with 10 and 0foo as the elements.
  • foo10.0 is an unquoted string with a . in it, so this would be a two-element path with foo10 and 0 as the elements.
  • foo"10.0" is an unquoted then a quoted string which are concatenated, so this is a single-element path.
  • 1.2.3 is the three-element path with 1,2,3

Unlike value concatenations, path expressions are always converted to a string, even if they are just a single value.

If you have an array or element value consisting of the single value true, it's a value concatenation and retains its character as a boolean value.

If you have a path expression (in a key or substitution) then it must always be converted to a string, so truebecomes the string that would be quoted as "true".

If a path element is an empty string, it must always be quoted. That is, a."".b is a valid path with three elements, and the middle element is an empty string. But a..b is invalid and should generate an error. Following the same rule, a path that starts or ends with a . is invalid and should generate an error.

Paths as keys

If a key is a path expression with multiple elements, it is expanded to create an object for each path element other than the last. The last path element, combined with the value, becomes a field in the most-nested object.

In other words:

foo.bar : 42

is equivalent to:

foo { bar : 42 }

and:

foo.bar.baz : 42

is equivalent to:

foo { bar { baz : 42 } }

and so on. These values are merged in the usual way; which implies that:

a.x : 42, a.y : 43

is equivalent to:

a { x : 42, y : 43 }

Because path expressions work like value concatenations, you can have whitespace in keys:

a b c : 42

is equivalent to:

"a b c" : 42

Because path expressions are always converted to strings, even single values that would normally have another type become strings.

  • true : 42 is "true" : 42
  • 3 : 42 is "3" : 42
  • 3.14 : 42 is "3" : { "14" : 42 }

As a special rule, the unquoted string include may not begin a path expression in a key, because it has a special interpretation (see below).

Substitutions

Substitutions are a way of referring to other parts of the configuration tree.

The syntax is ${pathexpression} or ${?pathexpression} where the pathexpression is a path expression as described above. This path expression has the same syntax that you could use for an object key.

The ? in ${?pathexpression} must not have whitespace before it; the three characters ${? must be exactly like that, grouped together.

For substitutions which are not found in the configuration tree, implementations may try to resolve them by looking at system environment variables or other external sources of configuration. (More detail on environment variables in a later section.)

Substitutions are not parsed inside quoted strings. To get a string containing a substitution, you must use value concatenation with the substitution in the unquoted portion:

key : ${animal.favorite} is my favorite animal

Or you could quote the non-substitution portion:

key : ${animal.favorite}" is my favorite animal"

Substitutions are resolved by looking up the path in the configuration. The path begins with the root configuration object, i.e. it is "absolute" rather than "relative."

Substitution processing is performed as the last parsing step, so a substitution can look forward in the configuration. If a configuration consists of multiple files, it may even end up retrieving a value from another file.

If a key has been specified more than once, the substitution will always evaluate to its latest-assigned value (that is, it will evaluate to the merged object, or the last non-object value that was set, in the entire document being parsed including all included files).

If a configuration sets a value to null then it should not be looked up in the external source. Unfortunately there is no way to "undo" this in a later configuration file; if you have { "HOME" : null } in a root object, then ${HOME}will never look at the environment variable. There is no equivalent to JavaScript's delete operation in other words.

If a substitution does not match any value present in the configuration and is not resolved by an external source, then it is undefined. An undefined substitution with the ${foo} syntax is invalid and should generate an error.

If a substitution with the ${?foo} syntax is undefined:

  • if it is the value of an object field then the field should not be created. If the field would have overridden a previously-set value for the same field, then the previous value remains.
  • if it is an array element then the element should not be added.
  • if it is part of a value concatenation with another string then it should become an empty string; if part of a value concatenation with an object or array it should become an empty object or array.
  • foo : ${?bar} would avoid creating field foo if bar is undefined. foo : ${?bar}${?baz} would also avoid creating the field if both bar and baz are undefined.

Substitutions are only allowed in field values and array elements (value concatenations), they are not allowed in keys or nested inside other substitutions (path expressions).

A substitution is replaced with any value type (number, object, string, array, true, false, null). If the substitution is the only part of a value, then the type is preserved. Otherwise, it is value-concatenated to form a string.

Self-Referential Substitutions

The big picture:

  • substitutions normally "look forward" and use the final value for their path expression
  • when this would create a cycle, when possible the cycle must be broken by looking backward only (thus removing one of the substitutions that's a link in the cycle)

The idea is to allow a new value for a field to be based on the older value:

path : "a:b:c"

path : ${path}":d"

self-referential field is one which:

  • has a substitution, or value concatenation containing a substitution, as its value
  • where this field value refers to the field being defined, either directly or by referring to one or more other substitutions which eventually point back to the field being defined

Examples of self-referential fields:

  • a : ${a}
  • a : ${a}bc
  • path : ${path} [ /usr/bin ]

Note that an object or array with a substitution inside it is not considered self-referential for this purpose. The self-referential rules do not apply to:

  • a : { b : ${a} }
  • a : [${a}]

These cases are unbreakable cycles that generate an error. (If "looking backward" were allowed for these, something like a={ x : 42, y : ${a.x} } would look backward for a nonexistent a while resolving ${a.x}.)

A possible implementation is:

  • substitutions are resolved by looking up paths in a document. Cycles only arise when the lookup document is an ancestor node of the substitution node.
  • while resolving a potentially self-referential field (any substitution or value concatenation that contains a substitution), remove that field and all fields which override it from the lookup document.

The simplest form of this implementation will report a circular reference as missing; in a : ${a} you would remove a : ${a} while resolving ${a}, leaving an empty document to look up ${a} in. You can give a more helpful error message if, rather than simply removing the field, you leave a marker value describing the cycle. Then generate an error if you return to that marker value during resolution.

Cycles should be treated the same as a missing value when resolving an optional substitution (i.e. the ${?foo}syntax). If ${?foo} refers to itself then it's as if it referred to a nonexistent value.

The += field separator

Fields may have += as a separator rather than : or =. A field with += transforms into a self-referential array concatenation, like this:

a += b

becomes:

a = ${?a} [b]

+= appends an element to a previous array. If the previous value was not an array, an error will result just as it would in the long form a = ${?a} [b]. Note that the previous value is optional (${?a} not ${a}), which allows a += b to be the first mention of a in the file (it is not necessary to have a = [] first).

Note: Akka 2.0 (and thus Play 2.0) contains an embedded implementation of the config lib which does not support+=.

Examples of Self-Referential Substitutions

In isolation (with no merges involved), a self-referential field is an error because the substitution cannot be resolved:

foo : ${foo} // an error

When foo : ${foo} is merged with an earlier value for foo, however, the substitution can be resolved to that earlier value. When merging two objects, the self-reference in the overriding field refers to the overridden field. Say you have:

foo : { a : 1 }

and then:

foo : ${foo}

Then ${foo} resolves to { a : 1 }, the value of the overridden field.

It would be an error if these two fields were reversed, so first:

foo : ${foo}

and then second:

foo : { a : 1 }

Here the ${foo} self-reference comes before foo has a value, so it is undefined, exactly as if the substitution referenced a path not found in the document.

Because foo : ${foo} conceptually looks to previous definitions of foo for a value, the error should be treated as "undefined" rather than "intractable cycle"; as a result, the optional substitution syntax ${?foo} does not create a cycle:

foo : ${?foo} // this field just disappears silently

If a substitution is hidden by a value that could not be merged with it (by a non-object value) then it is never evaluated and no error will be reported. So for example:

foo : ${does-not-exist}

foo : 42

In this case, no matter what ${does-not-exist} resolves to, we know foo is 42, so ${does-not-exist} is never evaluated and there is no error. The same is true for cycles like foo : ${foo}, foo : 42, where the initial self-reference must simply be ignored.

A self-reference resolves to the value "below" even if it's part of a path expression. So for example:

foo : { a : { c : 1 } }

foo : ${foo.a}

foo : { a : 2 }

Here, ${foo.a} would refer to { c : 1 } rather than 2 and so the final merge would be { a : 2, c : 1 }.

Recall that for a field to be self-referential, it must have a substitution or value concatenation as its value. If a field has an object or array value, for example, then it is not self-referential even if there is a reference to the field itself inside that object or array.

Implementations must be careful to allow objects to refer to paths within themselves, for example:

bar : { foo : 42,

       baz : ${bar.foo}

     }

Here, if an implementation resolved all substitutions in bar as part of resolving the substitution ${bar.foo}, there would be a cycle. The implementation must only resolve the foo field in bar, rather than recursing the entire barobject.

Because there is no inherent cycle here, the substitution must "look forward" (including looking at the field currently being defined). To make this clearer, bar.baz would be 43 in:

bar : { foo : 42,

       baz : ${bar.foo}

     }

bar : { foo : 43 }

Mutually-referring objects should also work, and are not self-referential (so they look forward):

// bar.a should end up as 4

bar : { a : ${foo.d}, b : 1 }

bar.b = 3

// foo.c should end up as 3

foo : { c : ${bar.b}, d : 2 }

foo.d = 4

Another tricky case is an optional self-reference in a value concatenation, in this example a should be foo notfoofoo because the self reference has to "look back" to an undefined a:

a = ${?a}foo

In general, in resolving a substitution the implementation must:

  • lazy-evaluate the substitution target so there's no "circularity by side effect"
  • "look forward" and use the final value for the path specified in the substitution
  • if a cycle results, the implementation must "look back" in the merge stack to try to resolve the cycle
  • if neither lazy evaluation nor "looking only backward" resolves a cycle, the substitution is missing which is an error unless the ${?foo} optional-substitution syntax was used.

For example, this is not possible to resolve:

bar : ${foo}

foo : ${bar}

A multi-step loop like this should also be detected as invalid:

a : ${b}

b : ${c}

c : ${a}

Some cases have undefined behavior because the behavior depends on the order in which two fields are resolved, and that order is not defined. For example:

a : 1

b : 2

a : ${b}

b : ${a}

Implementations are allowed to handle this by setting both a and b to 1, setting both to 2, or generating an error. Ideally this situation would generate an error, but that may be difficult to implement. Making the behavior defined would require always working with ordered maps rather than unordered maps, which is too constraining. Implementations only have to track order for duplicate instances of the same field (i.e. merges).

MIME Type

Use "application/hocon" for Content-Type.

API Recommendations

Implementations of HOCON ideally follow certain conventions and work in a predictable way.

Automatic type conversions

If an application asks for a value with a particular type, the implementation should attempt to convert types as follows:

  • number to string: convert the number into a string representation that would be a valid number in JSON.
  • boolean to string: should become the string "true" or "false"
  • string to number: parse the number with the JSON rules
  • string to boolean: the strings "true", "yes", "on", "false", "no", "off" should be converted to boolean values. It's tempting to support a long list of other ways to write a boolean, but for interoperability and keeping it simple, it's recommended to stick to these six.
  • string to null: the string "null" should be converted to a null value if the application specifically asks for a null value, though there's probably no reason an app would do this.
  • numerically-indexed object to array: see the section "Conversion of numerically-indexed objects to arrays" above

The following type conversions should NOT be performed:

  • null to anything: If the application asks for a specific type and finds null instead, that should usually result in an error.
  • object to anything
  • array to anything
  • anything to object
  • anything to array, with the exception of numerically-indexed object to array

Converting objects and arrays to and from strings is tempting, but in practical situations raises thorny issues of quoting and double-escaping.

Units format

Implementations may wish to support interpreting a value with some family of units, such as time units or memory size units: 10ms or 512K. HOCON does not have an extensible type system and there is no way to add a "duration" type. However, for example, if an application asks for milliseconds, the implementation can try to interpret a value as a milliseconds value.

If an API supports this, for each family of units it should define a default unit in the family. For example, the family of duration units might default to milliseconds (see below for details on durations). The implementation should then interpret values as follows:

  • if the value is a number, it is taken to be a number in the default unit.
  • if the value is a string, it is taken to be this sequence:
    • optional whitespace
    • a number
    • optional whitespace
    • an optional unit name consisting only of letters (letters are the Unicode L* categories, JavaisLetter())
    • optional whitespace

If a string value has no unit name, then it should be interpreted with the default unit, as if it were a number. If a string value has a unit name, that name of course specifies the value's interpretation.

Duration format

Implementations may wish to support a getMilliseconds() (and similar for other time units).

This can use the general "units format" described above; bare numbers are taken to be in milliseconds already, while strings are parsed as a number plus an optional unit string.

The supported unit strings for duration are case sensitive and must be lowercase. Exactly these strings are supported:

  • ns, nanosecond, nanoseconds
  • us, microsecond, microseconds
  • ms, millisecond, milliseconds
  • s, second, seconds
  • m, minute, minutes
  • h, hour, hours
  • d, day, days

Size in bytes format

Implementations may wish to support a getBytes() returning a size in bytes.

This can use the general "units format" described above; bare numbers are taken to be in bytes already, while strings are parsed as a number plus an optional unit string.

The one-letter unit strings may be uppercase (note: duration units are always lowercase, so this convention is specific to size units).

There is an unfortunate nightmare with size-in-bytes units, that they may be in powers or two or powers of ten. The approach defined by standards bodies appears to differ from common usage, such that following the standard leads to people being confused. Worse, common usage varies based on whether people are talking about RAM or disk sizes, and various existing operating systems and apps do all kinds of different things. Seehttp://en.wikipedia.org/wiki/Binary_prefix#Deviation_between_powers_of_1024_and_powers_of_1000 for examples. It appears impossible to sort this out without causing confusion for someone sometime.

For single bytes, exactly these strings are supported:

  • B, b, byte, bytes

For powers of ten, exactly these strings are supported:

  • kB, kilobyte, kilobytes
  • MB, megabyte, megabytes
  • GB, gigabyte, gigabytes
  • TB, terabyte, terabytes
  • PB, petabyte, petabytes
  • EB, exabyte, exabytes
  • ZB, zettabyte, zettabytes
  • YB, yottabyte, yottabytes

For powers of two, exactly these strings are supported:

  • K, k, Ki, KiB, kibibyte, kibibytes
  • M, m, Mi, MiB, mebibyte, mebibytes
  • G, g, Gi, GiB, gibibyte, gibibytes
  • T, t, Ti, TiB, tebibyte, tebibytes
  • P, p, Pi, PiB, pebibyte, pebibytes
  • E, e, Ei, EiB, exbibyte, exbibytes
  • Z, z, Zi, ZiB, zebibyte, zebibytes
  • Y, y, Yi, YiB, yobibyte, yobibytes

It's very unclear which units the single-character abbreviations ("128K") should go with; some precedents such asjava -Xmx 2G and the GNU tools such as ls map these to powers of two, so this spec copies that. You can certainly find examples of mapping these to powers of ten, though. If you don't like ambiguity, don't use the single-letter abbreviations.

Config object merging and file merging

It may be useful to offer a method to merge two objects. If such a method is provided, it should work as if the two objects were duplicate values for the same key in the same file. (See the section earlier on duplicate key handling.)

As with duplicate keys, an intermediate non-object value "hides" earlier object values. So say you merge three objects in this order:

  • { a : { x : 1 } } (first priority)
  • { a : 42 } (fallback)
  • { a : { y : 2 } } (another fallback)

The result would be { a : { x : 1 } }. The two objects are not merged because they are not "adjacent"; the merging is done in pairs, and when 42 is paired with { y : 2 }, 42 simply wins and loses all information about what it overrode.

But if you re-ordered like this:

  • { a : { x : 1 } } (first priority)
  • { a : { y : 2 } } (fallback)
  • { a : 42 } (another fallback)

Now the result would be { a : { x : 1, y : 2 } } because the two objects are adjacent.

This rule for merging objects loaded from different files is exactly the same behavior as for merging duplicate fields in the same file. All merging works the same way.

Needless to say, normally it's well-defined whether a config setting is supposed to be a number or an object. This kind of weird pathology where the two are mixed should not be happening.

The one place where it matters, though, is that it allows you to "clear" an object and start over by setting it to null and then setting it back to a new object. So this behavior gives people a way to get rid of default fallback values they don't want.

hyphen-sep