-
Bug
-
Resolution: Fixed
-
Low
-
HCS 1.4.3
-
None
-
Severity 2 - Major
Summary
On new and upgraded deployments of v1.4.3, runsv curler is stuck in a limbo state. This causes a few intermittent issues, most notably any message notifications that rely on API v1 (PagerDuty, Jenkins, etc) will not fire.
This may also affect email notifications and push notifications from firing.
Environment
HipChat Server v1.4.x -> v1.4.3 (upgraded instances)
HipChat Server v1.4.3 (new deployments)
Steps to Reproduce
Upgrade to HipChat Server v1.4.3 from a v1.4.x instance.
Spin up new v1.4.3 instance.
Actual Results
Log into the HipChat Server command line and check to see if the curler service is running. The quickest way to do this is to grep for curler.pid:
ps aux | grep curler.pid
There should be at least one result (sometimes two) that look similar to this:
hipchat 21321 0.0 0.2 63776 17516 ? S Aug22 0:00 /hipchat-scm/curler/vendor/virtualenv/bin/python /hipchat/curler/current/vendor/virtualenv/bin/twistd --pidfile=/var/run/hipchat/curler.pid --syslog --facility=168 --prefix=curler --nodaemon curler --base-urls=http://localhost:8080/_jobs --job-queue=*curler* --gearmand-server=localhost:4730 --num-workers=5
If there isn't, then curler isn't fully running.
Notes
- There is also a part of curler called curler-export.
- The actual issue may lie with the runsv curler service as just restarting curler does not work by itself. You will see this error:
runsv curler: fatal: unable to lock supervise/lock: temporary failure runsv curler-export: fatal: unable to lock supervise/lock: temporary failure
If so, please run through the workaround below.
Workaround
Please be aware that once the curler service is restarted that all queued jobs (push notifications, email notifications) will all get queued and fired off, which may result in a flood of notifications. These safely can be ignored.
- Log into the HipChat Server command line.
- Gain root access:
sudo dont-blame-hipchat
- Next, stop the curler service:
/etc/init.d/curler stop
- Check to see if any existing (zombie) curler processes exist:
ps aux | grep curler
- If so, then they will need to be killed:
kill -9 curler_PID
Where "curler_PID" is any remaining curler PID's.
- Next, kill the runsv curler and runsv curler-export services:
kill -9 runsv_curler_PID
Where "runsv_curler_PID" is the PID of the runsv curler process found in step 4
kill -9 runsv_curler-export_PID
Where "runsv_curler-export_PID" is the PID of the runsv curler-export process found in step 4
- Start curler
/etc/init.d/curler start
- Verify curler is up:
ps aux | grep curler
If the service is shown as up, then send a test notification from your integration. If the service is not up, please reach out to HipChat Server support at support.atlassian.com and attach log output using hipchat diagnostics -b to the support ticket.