SMF is the Solaris/OpenSolaris/Illumos Service Management Facility. It starts, monitors & restarts your services. It’s basically what init, upstart and systemd do, but very clearly focused ond starting processes, keeping them up and stopping them. It has quite a lot of small tools you need to at least basically understand. Due to the time it took me to build my own startscripts I felt like it’s my duty to explain the basic concepts.
Let’s start with the bad news - your service defintions are XML. Puh, that sucks. But bare with me, its worth learning. In the service definition you specify:
- on what your service depends - other services, files, system-initialization-levels (e.g. multi-user, network)
- the way of starting and stopping your service (as a daemon, in foreground, one-off)
- optionally your services configuration
Services are identified by their uniquire fmri, the fault management resource identifier. In the example below this fmri would be
svc:/application/foo/my_service. SVC:/ is implicit here and identifies this resource as a service, but there are other fmri-described entities, mainly your hardware. Having a compatible server means that your CPU, RAM, Disks, even complete mainboards are hot-swappable and managed by a Solaris service.
<?xml version="1.0"?> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type="manifest" name="my_service"> <service name="application/foo/my_service" type="service" version="1"> <create_default_instance enabled="false"/> <single_instance/> <dependency name="network" grouping="require_all" restart_on="error" type="service"> <service_fmri value="svc:/milestone/network:default"/> </dependency> <dependency name="filesystem" grouping="require_all" restart_on="error" type="service"> <service_fmri value="svc:/system/filesystem/local"/> </dependency> <method_context> </method_context> <exec_method type="method" name="start" exec="/opt/local/bin/gunicorn -w 4 -b 127.0.0.1:4000 app:app -D" timeout_seconds="60"> <method_context working_directory='/opt/my_service/'></method_context> </exec_method> <exec_method type="method" name="stop" exec=":kill" timeout_seconds="60"/> <exec_method type="method" name="refresh" exec=":kill -HUP" timeout_seconds="60" /> <property_group name="startd" type="framework"> <propval name="duration" type="astring" value="contract"/> </property_group> <property_group name="application" type="application"></property_group> <stability value="Evolving"/> <template> <common_name> <loctext xml:lang="C"> My awesome service </loctext> </common_name> </template> </service> </service_bundle>
This is an examplatory configuration for a gunicorn-powered python application. It won’t start without enabling it first via
svcadm enable my_service. After that command it will be started, restarted and shut down automatically. As you can see we defined a variable concerning the start, the working directory.
We defined a few things to be a dependency, mainly the network and the local filesystem, but you could also include other services here, identified by there fmri.
As gunicorn supports restarting via SIG-HUP we map the refresh-method to doing just that. The stability-value is just for the administrators reference, as is the common_name.
We start the application under “startd” with the duration set to “contract”. The option “child” seems much more useful when you read the docs first, but actually thats misleading. “Child” starts the process as a child (ok) - but it treats all startup-errors as nonexistent. It just keeps restarting the process - which our monitoring can’t pick up. We rely on “svcs -xv” output to alarm us about service problems. The “contract” type works for that. The process has to daemonize (which almost all servers are capable of doing) and it’s PID will be automatically tracked by the contract system. You can observe it working by running