lundi 14 novembre 2016

Laravel 5.1: supervisor queue fatal exiting but jobs run with queue:listen just fine

I've been using supervisor to manage my laravel queue for ages with no problems, but suddenly on 5th November it stopped working and was constantly exiting. I can't find any error messages anywhere. I've checked the log files for supervisor, laravel and the exceptions that I log to bugsnag but there's no real helpful info. Usually i'd assume it was an issue in my laravel app causing an exit, but i'm finding that all the jobs will process fine if I just use:

php artisan queue:listen

But with supervisor it will run for a bit and then exit, or sometimes exit on spawn.

This is my program conf file

[program:laravel_queue]
command=php artisan queue:work --tries=3
process_name=%(program_name)s_%(process_num)02d
directory=/path/to/my/app
autostart=true
autorestart=true
user=myusername
numprocs=1
stderr_logfile=/path/to/my/logs/supervisor_queue_stderr.log
stdout_logfile=/path/to/my/logs/supervisor_queue_stdout.log

This is my supervisor conf file

; supervisor config file

[unix_http_server]
file=/var/run/supervisor.sock   ; (the path to the socket file)
chmod=0700                       ; sockef file mode (default 0700)

[supervisord]
#logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile=/path/to/logs/supervisord.log
pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
childlogdir=/var/log/supervisor            ; ('AUTO' child log dir, default $TEMP)
loglevel=debug

; the below section must remain in the config file for RPC
; (supervisorctl/web interface) to work, additional interfaces may be
; added by defining them in separate rpcinterface: sections
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL  for a unix socket

; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.

[include]
files = /path/to/my/program/confs/*.conf

Now I'm seeing two main errors.

The first sometimes appears straight after restarting the processes:

laravel_queue:laravel_queue_00: ERROR (spawn error)

It doesn't appear everytime, instead sometimes it says

laravel_queue:laravel_queue_00: started

But then it exits.

If I do sudo supervisorctl status it says:

laravel_queue:laravel_queue_00 FATAL Exited too quickly (process log may have details)

The stdout_logfile logfile for the program just shows this all the time:

Processed: Illuminate\Queue\CallQueuedHandler@call
Processed: Illuminate\Queue\CallQueuedHandler@call
Processed: Illuminate\Queue\CallQueuedHandler@call
Processed: Illuminate\Queue\CallQueuedHandler@call
Processed: Illuminate\Queue\CallQueuedHandler@call

Which isn't helpful.

I changed loglevel to debug and this is what I get in the main supervisord log file:

2016-11-14 15:51:30,255 INFO spawned: 'laravel_queue_00' with pid 13969
2016-11-14 15:51:30,445 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:30,445 DEBG fd 11 closed, stopped monitoring <POutputDispatcher at 140364068662736 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:30,446 DEBG fd 13 closed, stopped monitoring <POutputDispatcher at 140364060359496 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:30,454 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:30,454 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:31,457 INFO spawned: 'laravel_queue_00' with pid 13973
2016-11-14 15:51:31,636 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:31,637 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060359496 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:31,637 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060578304 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:31,645 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:31,645 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:33,649 INFO spawned: 'laravel_queue_00' with pid 13977
2016-11-14 15:51:33,852 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:33,853 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060359496 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:33,853 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364068662736 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:33,861 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:33,861 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:36,864 INFO spawned: 'laravel_queue_00' with pid 13981
2016-11-14 15:51:37,655 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call
2016-11-14 15:51:37,656 DEBG 'laravel_queue_00' stdout output:


2016-11-14 15:51:37,658 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364069069168 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:37,659 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364068662736 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:37,670 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:37,670 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:37,992 INFO gave up: laravel_queue_00 entered FATAL state, too many start retries too quickly
2016-11-14 15:51:47,713 INFO spawned: 'laravel_queue_00' with pid 13993
2016-11-14 15:51:47,902 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:47,903 DEBG fd 11 closed, stopped monitoring <POutputDispatcher at 140364060709376 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:47,903 DEBG fd 13 closed, stopped monitoring <POutputDispatcher at 140364060709952 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:47,911 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:47,911 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:48,914 INFO spawned: 'laravel_queue_00' with pid 13997
2016-11-14 15:51:49,092 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:49,093 DEBG fd 11 closed, stopped monitoring <POutputDispatcher at 140364060709952 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:49,093 DEBG fd 13 closed, stopped monitoring <POutputDispatcher at 140364060710528 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:49,101 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:49,101 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:51,106 INFO spawned: 'laravel_queue_00' with pid 14001
2016-11-14 15:51:51,284 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:51,284 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364059718216 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:51,284 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060709952 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:51,293 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:51,293 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:54,298 INFO spawned: 'laravel_queue_00' with pid 14005
2016-11-14 15:51:54,572 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:54,573 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364059718216 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:54,573 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060710456 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:54,582 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:54,583 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:55,584 INFO gave up: laravel_queue_00 entered FATAL state, too many start retries too quickly
2016-11-14 15:51:58,199 INFO spawned: 'laravel_queue_00' with pid 14013
2016-11-14 15:51:58,394 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:51:58,394 DEBG fd 11 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:51:58,395 DEBG fd 13 closed, stopped monitoring <POutputDispatcher at 140364060034472 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:51:58,403 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:51:58,404 DEBG received SIGCLD indicating a child quit
2016-11-14 15:51:59,407 INFO spawned: 'laravel_queue_00' with pid 14017
2016-11-14 15:52:00,407 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:00,407 INFO success: laravel_queue_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-11-14 15:52:00,409 DEBG fd 11 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state RUNNING> (stdout)>
2016-11-14 15:52:00,410 DEBG fd 13 closed, stopped monitoring <POutputDispatcher at 140364060034472 for <Subprocess at 140364059961176 with name laravel_queue_00 in state RUNNING> (stderr)>
2016-11-14 15:52:00,422 INFO exited: laravel_queue_00 (exit status 0; expected)
2016-11-14 15:52:00,422 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:01,426 INFO spawned: 'laravel_queue_00' with pid 14025
2016-11-14 15:52:01,620 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:01,620 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:52:01,620 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060034472 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:52:01,630 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:52:01,630 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:02,633 INFO spawned: 'laravel_queue_00' with pid 14059
2016-11-14 15:52:03,650 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:03,650 INFO success: laravel_queue_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-11-14 15:52:03,652 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state RUNNING> (stdout)>
2016-11-14 15:52:03,652 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060034472 for <Subprocess at 140364059961176 with name laravel_queue_00 in state RUNNING> (stderr)>
2016-11-14 15:52:03,663 INFO exited: laravel_queue_00 (exit status 0; expected)
2016-11-14 15:52:03,663 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:04,666 INFO spawned: 'laravel_queue_00' with pid 14068
2016-11-14 15:52:05,411 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:05,413 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:52:05,414 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060034472 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:52:05,424 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:52:05,424 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:06,427 INFO spawned: 'laravel_queue_00' with pid 14073
2016-11-14 15:52:06,612 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:06,613 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:52:06,613 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060036704 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:52:06,621 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:52:06,622 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:08,626 INFO spawned: 'laravel_queue_00' with pid 14079
2016-11-14 15:52:09,458 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:09,460 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:52:09,460 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060035120 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:52:09,472 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:52:09,473 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:12,478 INFO spawned: 'laravel_queue_00' with pid 14084
2016-11-14 15:52:13,325 DEBG 'laravel_queue_00' stdout output:
Processed: Illuminate\Queue\CallQueuedHandler@call

2016-11-14 15:52:13,327 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 140364060138888 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stdout)>
2016-11-14 15:52:13,328 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 140364060033968 for <Subprocess at 140364059961176 with name laravel_queue_00 in state STARTING> (stderr)>
2016-11-14 15:52:13,339 INFO exited: laravel_queue_00 (exit status 0; not expected)
2016-11-14 15:52:13,339 DEBG received SIGCLD indicating a child quit
2016-11-14 15:52:14,340 INFO gave up: laravel_queue_00 entered FATAL state, too many start retries too quickly

There's a few strange things in there, such as sometimes it says:

INFO exited: laravel_queue_00 (exit status 0; not expected)

And at other times

INFO exited: laravel_queue_00 (exit status 0; expected)

With no indication as to why it's sometimes expected and sometimes not.

It may have something to do with the "startretries" config option. I can see some pattern in the logs. The first two times there are 4 instances of "exit code 0; not expected" and 4 "processed" outputs before it says it entered FATAL state. But then the third time it says it entered fatal state it took 8 exit codes. However within that time there were two instances of "exit status 0; expected" and the final FATAL error happens after a further 4 instances of "exit status 0; not expected" so perhaps it is something to do with having 4 failures. But then the default startretries is 3 and I haven't changed that config option, so surely it should be failing after 3? Or maybe it starts 3 times and doesn't start the 4th time, so 4 failures is actually correct. If so it could possibly be something to do with startretries.

But that's not too helpful because I still have no idea why it would be failing anyway. It looks like the jobs are being processed because of the output saying it has and I can see activity in the database in line with the jobs being processed. So what's the error that's causing supervisor to stop running?

I have no idea and am feeling pretty lost after a whole day of Googling. Any help or pointers would be greatly appreciated.



via Chebli Mohamed

Aucun commentaire:

Enregistrer un commentaire