Lua Lua Lua Lua Lua

Lua Lanes - multithreading in Lua

Description · Supported systems · Installing · Using

Creation · Communications · Results · Cancelling · Grouping

Misc · Substate · Other

Module makers · Comparisons


Copyright © 2007 Asko Kauppi. All rights reserved.
Lua Lanes is published under the same MIT license as Lua 5.1.

This document was revised on 1-Jun-07, and applies to version 20070601.
Section on self was added 30-Oct-07.


Description

Lua Lanes is a portable, message passing multithreading library providing the possibility to run multiple Lua states in parallel. It is intended to be used for optimizing performance on multicore CPU's s.a. Intel Core Duo™ series.

Lanes is a Lua module, included into your software by the regular require "lanes" method. No C side programming is needed to use Lanes; all APIs are Lua side, and most(1 existing extension modules should work seamlessly together with the multiple lanes (they need to be require'd separately into each lane, though).

Features:

Limitations:


Supported systems

Lua Lanes supports the following operating systems: (3

The underlying threading code can be compiled either towards Win32 API, or PThreads. Unfortunately, thread prioritation under PThreads is a JOKE, requiring OS specific tweaks and guessing undocumented behaviour. Other features should be portable to any platform supporting either Win32 or PThreads.


Installing [TBD]

Lua Lanes is intended to be installed via the Lua Rocks initiative.

  > luarocks search lanes
    ... output listing Lua Lanes is there ...
  > luarocks install lanes
    ... output ...

While Lua Rocks is still "evolving", see README, Makefile and source code for clues on how to install...


Using

Steps to using Lua Lanes:

  1. require "lanes"
    - initializes the Lanes module

  2. preparation
    - gives a function, set of standard libs, globals and options for a lane
    - returns a handle, calling which launches a new thread

  3. launching
    - each launch may have different arguments to that lane's function
    - the code starts executing directly at the launch, on the background

  4. controlling (optional)
    - sub and master can send messages to each other using send() and receive()
    - master can cancel a thread it created at any time

  5. reading results
    - results are provided as the calling handle's :results() method, or alternatively as the [1]..[N] table indices. Results can be read either pending (wait==true), peeking (returns nil if lane is still executing) or with a timeout.

NOTE: the globals given in lane preparation are _not_ common between the lanes. Each lane has its own copy, and cannot see or modify other lanes' globals.

Sample without communications:

  require "lanes"

  local f= lanes.prepare( function(n) return 2*n end )
  local l1= f(1)
  local l2= f(2)

  print( l1[1], l2[1] )

Sample with communications:

  require "lanes"

  local function lane()
    while true do
        local a,b= receive()
        send( b,a )
    end     -- never ends
  end

  local f= lanes.prepare( lane )
  local l1= f()
  local l2= f()

  l1:send( 'a','b' )
  l2:send( 10, 50 )
  print( l1:receive() )     -- b    a
  print( l2:receive() )     -- 50   10

Creation

func= lanes.prepare [libs_str | opt_tbl [...]] ( lane_func )

lane_h= func( ... )

Prepares lane_func to be the main chunk for new lanes. Such function must NOT USE UPVALUES, global functions or anything not contained within the function itself. If needed, one can group multiple functions into an enclosing function, so they become an enclosing chunk.

The 'modifiers' provided between the prepare name and the lane_func are either string, or table. There can be any number of them.

libs_str defines the standard libraries made available to the new Lua state:

nil no standard libraries
"", "base" only base library
"math,os" math.*, os.* and base
"*" all standard libraries

If multiple string modifiers are used, their effect is appended (i.e. prepare "math" "io" being same as prepare "math,io"). You can use this effect to make your own modified prepare-variant, which i.e. has your favourite set of default libraries, and/or options.

The base library (print, assert etc. global names) is always in use if the libs_str parameter is used.

NOTE: Not all root level names are part of the base package: require needs stdlib "package" to be included. On the other hand, coroutine.* is part of base (at least in Lua 5.1).

opt_tbl is a collection of options to control the way lanes are run:

.prio : -2..+2 The priority of the lane(s). -2 is lowest, +2 is highest.

Note that implementation and dependability of priorities may vary by platform. Especially Linux by default is not supporting priorities, due to shortcomings of its kernel 2.6 pthread implementation.

.cancelstep : N / true / false (default) By default, lanes are only cancellable when they explicitly call receive(). With this option, one can set cancellation check to occur every N Lua statements. The value true uses a default value (currently 100).

If you want to cancel your subthreads, either place receive() commands in them, or set a cancelstep. The smaller the value, the faster substates will react, but they'll use more CPU time testing for it. Your needs may vary.

.group : group_h Ties the lanes created into a certain group, allowing them to be waited upon as a batch (see grouping).
.G : globals_tbl Sets the globals table for the launched threads. This can be used for giving them constants, and/or storing a reconcilable state in connection with 'cancel'.

NOTE: The global values of various threads are in no manner connected; modifying one will only affect the particular state.

The function returned by lanes.prepare() is a "generator" for launching any number of lanes. They will share code, options, initial globals, but the particular arguments may vary. Only calling the generator function actually launches a new thread, and provides a handle for controlling it.


Communications

lane_h:send(...)

Sends the substate a message, consisting of any number of streamable parameters. Message sending is asynchronous, and the messages are queued for the receiver. If you want to be sure a certain message has arrived, the substate should send a confirmation message back to the master.

[...]= lane_h:receive( [wait_secs=-1] )

Receives (and optionally waits for) a message from the substate.

wait : timeout in seconds (default: infinite).

See grouping for waiting on multiple lanes.


Results

A lane can be waited upon by reading its results; this will also make sure the lane has stopped execution.

There are two ways to read results, either by the :results() method, or by reading lane handle's [1..N] indices. In addition, also lane groups provide a :results() method to wait for multiple lanes to finish.

[...]= lane_h:results( [wait_secs=-1] )

Returns nil if the lane hasn't finished before timeout. Otherwise pends until the lane finishes, and returns the values.

Results can be read for a cancelled lane, too. This allows getting the last contents of a special state table from the lane, to be put aside for consequtive resurrection of the thread (maybe on some other machine, even!).

[val]= lane_h[int]

This is a secondary way of reading lane return values. Any access to integer indices of the lane handle will act as :results(true) call, making sure the lane has finished and setting return values to lane_h[1..n]. The one requested is returned.


Cancelling

lane_h:cancel( [return_state_bool] )

Cancels the substate execution, asyncronously. Cancellation will take effect the next time substate calls receive() or after executing each N statements (see cancelstep).

If return_state is true, the lane's return value will be nil, tbl. Normally, a cancelled lane gives all nil as results.


Lane grouping

Grouping is the way to implement 1-to-N or N-to-1 communications between lanes. It basically allows treating a group of lanes as a single entity.

group_h= lanes.group()

This call provides an object that can be used in lanes.prepare() to group certain lanes together. There is no limit to the group size, at all.

for lane_obj [, ...] in grp:receive( [wait_secs=-1] ) do end
for lane_obj [, ...] in grp:results( [wait_secs=-1] ) do end

Almost the same :receive() and :results() methods work for the group object, as for individual lanes. Only here they shall be within a for construct, and the first return value identifies the lane.

The wait_secs timeout is applied anew for each loop. In other words, looping exits once no events have happened within wait_secs seconds.

For eternal receive() loops (no timeout), the only exit reason is that all the lanes in the group have finished (and all the messages have been received).

Just to fetch the messages sent so far (or results of already finished lanes), loop with wait == 0.

grp:send( [...] )
grp:sendall( [...] )

Group's :send() method sends the message to one thread of the group. Any one. It prefers idle threads (in "waiting" status) if there is any. Otherwise, it tries to distribute the load evenly.

The :sendall() method sends the exact same message to all unfinished threads of the group.

grp:cancel()

Cancels all threads in the group.


Misc

str= h.state

The current execution state of a substate can be read explicitly, providing one of the following strings:

"pending"substate not started, yet
"running"substate running
"waiting"substate waiting at receive()
"done"substate finished executing (results are ready)
"cancelled"substate received cancellation and finished itself (state is ready)

Substate functions

[...]= receive( [wait_secs=-1] )
= send( ... )

Each substate gets the receive and send functions preset into its globals. These can be used to communicate with the master state.

If the "package" standard library is enabled (see creation), require can be used to link into any Lua extensions modules the master state might already be using. Note that linkage to the modules does not inherit, it needs to be separately set up by each state.

HINT: Since linkage takes time, it is a good idea to use message-passing worker lanes if they would require a lot of modules to run. This way, the initialization only happens once.

lane_h= self( ... )

The self closure exists on every substate created with Lanes, and allows parallel recursive problem solving. Calling self essentially launches a new sublane, with the exact same function chunk and other startup parameters (priority, standard libraries, ...) as the current one, but with different dataset. This programming style is motivated by the Intel Threading Building Blocks C++ library, which uses it extensively.

The self feature is regarded EXPERIMENTAL and remains to be properly tested, and performance monitored.
-AKa 30-Oct-2007


Other functions

tube_h= tube( name_str )
tube_h:send( [...] )
[...]= tube_h:receive( [wait_secs=-1] )
tube_h:detach()

Tubes allow two separate Lua states (even ones not created through Lanes) to communicate 1-on-1 with each other. This can be used i.e. for passing information from a high-priority state run by a system library, into normal Lua.

Initialization of a tube is automatic and asynchronous; it does not matter, which party attaches first. Sent messages will be queued until the second party joins in to read them.

A third party trying to attach to the same tube causes an error, but tubes can be detached to make them available for others.

The tubes system is not an integral part of Lua Lanes, but it can be useful in certain situations (we know of at least one!).

str= serialize( [...] )
[...]= deserialize( str )

The serialization/deserialization functions used in Lua Lanes are exposed to application use, as they may prove to be useful. Their implementation may change in the future, but this can be guaranteed:


Required of module makers

Most Lua extension modules should work unaltered with Lanes.

If the module simply ties C side features to Lua, everything is fine without alterations. The lua_open() entry point will be called separately for each lane, where the module is require'd.

If it, however, also does one-time C side initializations, these should be covered into a one-time-only construct such as below. Note that Lanes serializes require calls, so no two module initializations will happen at the same time (this is why the usage of a static instead of a proper OS side lock is enough).

 int luaopen_module( lua_State *L )
 {
    static char been_here;  /* 0 by ANSI C */
    
    /* Calls to 'require' serialized by Lanes; this is safe.  
    */
    if (!been_here) {
        been_here= 1;
        ... one time initializations ...
    }
    
    ... binding to Lua ...
 }


Comparisons to other threading kits

A short comparison of Lanes with other existing Lua multithreading kits.

(used to be comparison.txt among the sources; once text is precise & accurate, it shall be HTMLified, maybe into a separate page) [TBD]

==================
  Lua coroutines    (by Lua authors)
==================

http://www.lua.org/manual/5.1/manual.html#2.11
http://lua-users.org/wiki/CoroutinesTutorial

Lua coroutines is an integral part of Lua 5 itself. It is listed here, since
it should be the _first_ multitasking mechanism to consider. It can also be
used together with Lanes, or the other mechanisms listed below.

Coroutines are very usable in provider/consumer situations, allowing you to
queue Lua functions on an as-needed dataflow basis with each other.
 
Pros:
    - works with plain Lua (no extensions)
    - works on any platform
    - lightweight (no OS level threading or locking involved)

Cons:
    - co-operative, meaning your code will need to decide, who gets to run
    - does not utilize multiple CPUs/cores

Sample:

    ..TBD: sample code doing the "child" "parent" output here (see below)..


=============
  LuaThread     (by Diego Nehab)
=============

http://www.cs.princeton.edu/~diego/professional/luathread/

LuaThread provides thread creation, mutexes, condition variables, and inter-
thread queues to the Lua scripts. It takes a C-kind of approach, where Lua
globals are shared by the threads running, and need therefore to be guarded
against multithreading conflicts. 

Whether this is exactly what you want, or whether a more loosely implemented
multithreading (s.a. Lanes) would be better, is up to you. One can argue that
a loose implementation is easier for the developer, since no application level
lockings need to be considered.

Pros:
    - no marshalling (data serialization) overhead, since threads share the
      same Lua state

Cons:
    - requires a modified Lua core (not available as a module)
    - application level locking required
    - Lua 5.0 and "before release of 5.1" mentioned (docs not updated for 5.1)

Sample:
<<
  local function flood(output, word)
    while 1 do 
        output:lock()
        io.write(word, ", ")
        output:unlock()
    end
  end

  local output = thread.newmutex()
  thread.newthread(flood, {output, "child"})
  flood(output, "parent")
<<


=============
  Lua Rings     (by Roberto Ierusalimschy & Tomás Guisasola)
=============

http://www.keplerproject.org/rings/

".. library which provides a way to create new Lua states from within Lua. 
It also offers a simple way to communicate between the creator (master) and 
the created (slave) states."

".. master can execute code in any of its slaves but each slave only has
access to its master (or its own slaves)."

Rings offers separate Lua states, but no multithreading. This makes it simple,
but it won't use more than one CPU core. Other differences include:

    - opens all Lua standard libraries for subthreads
      (Lanes opens the needed ones)

    - marshalls numbers, strings, booleans, userdata
      (Lanes marshalls also tables, but not userdata)

    - "remotedostring" allows executing code in the master state
      (Lanes does _not_ allow subthreads to trouble/modify master automatically,
      to allow effective sandboxing. The same can be achieved by sending code 
      over (send/receive) between the threads, but master needs to explicitly
      do this)

    - offers "Stable, a very simple API to manage a shared table at the master
      state"
      (Lanes does not offer any shared data mechanisms, by design. They can be
      implemented using queues (send/receive) and a coroutine at the master
      end)

Pros:
    - "offers Stable, a very simple API to manage a shared table at the master 
    state"

Cons:
    - thread contents defined as strings, not Lua source as such; does not
      give syntax check at file parsing, does not allow syntax highlight
    - Lua 5.0 mentioned (will be upcoming for 5.1 - right?)

Sample:
<<
  require"rings"
  S = rings.new ()

  data = { 12, 13, 14, }
  print (S:dostring ([[
  aux = {}
  for i, v in ipairs (arg) do
	table.insert (aux, 1, v)
  end
  return unpack (aux)]], unpack (data))) -- true, 14, 13, 12

  S:close ()
<<


==========================
  Helper Threads Toolkit    (by Javier Guerra G.)
==========================

http://luaforge.net/projects/helper-threads/

"Provides a consistent framework to write non-blocking C libraries, with a Lua
interface for starting tasks and managing the Futures, Queues and Threads."

This seems like a companion of the "occasional threading" model (see below);
Lua side is kept clear of multithreading, while C side can be "spawn" off to
do things on the background.

Pros:
    - provides an "update" mechanism, allowing the (C) thread and controlling
      Lua to interact during execution of the thread
    - ...

Cons:
    - thread is "only for C code and it can't touch or access the Lua state",
      in other words there is no Lua-side multithreading concept (by design)


========================
  Occasional threading      (by Russ Cox)
========================

http://lua-users.org/lists/lua-l/2006-11/msg00368.html

".. able to have certain C calls run in parallel with the [Lua] VM, but
otherwise keep the VM single-threaded."

That pretty much says it all.

Pros:
    - simple, yet providing for the "occasional" need to run really multithreaded
    - can be made into a loadable module (says the message)

Cons:
    - only for occasional usage, the programming paradigm is still essentially
      singlethreaded (by definition)
    - not a real project, just a message on the Lua list (but a good one!)


=============
  Lua Lanes
=============

Lua Lanes takes a loose approach, keeping each created Lua state completely
separate from the others. Parameters, return values and globals (optionally)
are passed transparently marshalled (serialized) between the threads.

Pros:
    - regular Lua 5.1 module
    - completely separate Lua states, one per thread
    - very easy programming model, no application level locking
    - scales well, up to 1000's of threads (and any number of CPU cores)
    - provides priorities (-2..+2) for launched threads
    - asynchronous queues between M->S and S->M
    - threads are cancellable (with complete cleanup)
    - timeouts on all pending operations
    - thread contents are given as regular (Master side) Lua functions;
      syntax checked early on, syntax highlighting works (thread contents
      can also be given in separate source files)
    - standard libraries opened to the subthreads can be granually decided
    - automatic storage of a cancelled thread's globals, for "resurrectable"
      threads
    - serializes calls to 'require', allowing wide compatibility with existing
      modules (and all, with minor changes)

Cons:
    - marshalling data exchange may be a bottleneck for huge amounts, in
      which case use of filesystem for the bulk data may be better
    - userdata and functions not marshallable in send/receive
    - no shared access to data (can be implemented on top of the provided features)
    - ...

For feedback, questions and suggestions:

1) Extension modules get loaded only once, but they will get initialized multiple times, once per each Lua state. If the module C code uses global and/or static variables during its initialization, it may not work with Lanes. The solution is to use Thread Local Storage, or avoid the globals some other way.

2) You can use a separate message passing module s.a. D-BUS, or tubes for that.

3) More information about supported systems -> README, NOTES and lanes.c