Discussion:
[Erp5-dev] activities are not being fired at all
Jacek Medrzycki
2007-10-31 11:36:10 UTC
Permalink
Hi.

We have noticed that on our site (at revision 27264) activities are not
being invoked. All activities are in state -1. When invoked by hand,
hovewer, they execute with no errors.

Timer sevice IS enabled in zope.conf and there is "ZServer Timer server
started " message in event.log.

We have several (older) sites on this machine and those sites have no
problems with activities.

Any suggestions would be welcomed.

Jacek
Pelletier Vincent
2007-10-31 13:21:49 UTC
Permalink
Post by Jacek Medrzycki
We have noticed that on our site (at revision 27264) activities are not
being invoked. All activities are in state -1. When invoked by hand,
hovewer, they execute with no errors.
Those activities might be waiting for a tag to disapear from activity list.
For example, an activity waiing for a reindex to be finished on a given
object.

Activity dependencies (ie, waiting for a tag or a method_id) are computed at
distribution time, which would explain the presence of activities with -1 as
processing_node. Invoking them bypasses the dependency check. Previously, it
was done at execution time, which caused dependency check to happen multiple
time on messages on activty processing nodes instead of actualy executing
activities. Now, the extra work is done on the distributing node which has
"nothing else" to do.

This implementation has been measured as much more efficient in cases like
reindexing a complete site, but Your Mileage May Vary depending on how
activities are created.

Sometimes, when activities fail and are being waited for the result looks like
complete activity freeze. This _is_ a feature.

What you might encounter though is a distribution inneficiency when there are
many activities waiting on others. In pathologic cases, distributing node can
takes tens of seconds to find an electible activity, distribute it, and
searches for the next to distrbute... Sadly, I see no easy way out of this.
One possibility is to control activity generation to avoid creating too many
activities at once, by creating an activity bundle when the previous one is
finished processing (or the number of remaining activities drops below a
certain amount to avoid activity engine starvation... hint: countMessage).
--
Vincent Pelletier
Jacek Medrzycki
2007-10-31 13:42:11 UTC
Permalink
Thank You for reply.
However, I think it's not this case. Our site is devel site and doesn't
have many activities. But, as far as I see, all activities are in -1
state and there is no activity they might be waiting for (such activity
would have to be in state 1 or 2, wouldn't it?).

It fails in very simple cases - activity list is empty, then an
organisation is added, which generates some activities, and they all are
in -1 state. There is no way to make them execute (neither clearing nor
zope restart or whatever) but to invoke them manually. All those
activities are system-created (reindexing after obj. creation), so i
think it's not the depedency issue (BTW, how can I check if an activity
waits for another to complete?).

It looks to me like timer service not working properly - but I don't
know why. No suspicious log entries or something (log level is ALL).
What else can I check?

I've read about gethostbyname issue on wiki, but there are more sites on
this machine and all of them works.

I wonder, however, if there is a change in timer service (or related
sites) which prevents newer revisions work with old timer server?

How can I track this problem?


Regards, J.
Pelletier Vincent
2007-10-31 14:06:00 UTC
Permalink
Post by Jacek Medrzycki
But, as far as I see, all activities are in -1
state and there is no activity they might be waiting for (such activity
would have to be in state 1 or 2, wouldn't it?).
processing_node meaning:
1..x : distributed to that node
0 : invalid value (nothing ever sets any activity to this value)
-1 : not yest distributed
-2 : activity executed & failed multiple times for whatever reason (exeption
was thrown by the activity itself)
-3 : activity failed for more basic reason (activity engine was not able to
call the desired method on desired object... for example if object was
removed but activity remained)
x..-4 : invalid values (nothing ever sets any activity to this value)

IIRC, dependencies are checked ignoring the processing_node value.

Also, I don't think there is any protection against circular activity
dependencies curently.
Post by Jacek Medrzycki
It fails in very simple cases - activity list is empty, then an
organisation is added, which generates some activities, and they all are
in -1 state. There is no way to make them execute (neither clearing nor
zope restart or whatever) but to invoke them manually. All those
activities are system-created (reindexing after obj. creation), so i
think it's not the depedency issue
Then it would mean that there is no distributing node defined. You can check
that on portal_activities/manageLoadBalancing .

You can also check that portal_activities is registered to timer server on the
same page.
Post by Jacek Medrzycki
(BTW, how can I check if an activity waits for another to complete?).
There is no quick way. The first things you can see directly form MySQL tables
are the values in "tag" and "method_id" columns. But to know which are the
tags being waited on, you must decode the pickled message stored in activity
tables (since a single message can depend on multiple values, it's not
available as a separate column).
Post by Jacek Medrzycki
It looks to me like timer service not working properly - but I don't
know why. No suspicious log entries or something (log level is ALL).
What else can I check?
I once saw a weird timerserver behaviour when non-existing objects were still
registered to it though they had been removed. You can check that in zope
configuration panel, timerserver section.
Post by Jacek Medrzycki
I've read about gethostbyname issue on wiki, but there are more sites on
this machine and all of them works.
Those problem should only appear on a multinode setup (zeo).
Post by Jacek Medrzycki
I wonder, however, if there is a change in timer service (or related
sites) which prevents newer revisions work with old timer server?
You can try to add a log in CMFActivity/ActivityTool.py:process_timer to see
if it's invoked (it's the method invoked by timerserver).
--
Vincent Pelletier
Jacek Medrzycki
2007-10-31 16:00:50 UTC
Permalink
Thank You for responce.

Before You answered, I decided to delete the site (as it was only a
devel site without a valuable data) and create it from scratch.
Activities works now. So I think it was an upgrade issue (the site was
r17102 before and was upgraded to r17264). As I deleted the site, I
cannot check if one of Your suggestions would have fixed the problem. :(


Regards, Jacek

Łukasz Nowak
2007-10-31 13:44:32 UTC
Permalink
Hello,

On 2007-10-31, 12:36:10
Post by Jacek Medrzycki
Hi.
We have noticed that on our site (at revision 27264) activities are
not being invoked. All activities are in state -1. When invoked by
hand, hovewer, they execute with no errors.
You mean revision 17264? I'm working right now on such revision, on
ubuntu/feisty. I've got no problem with activities at all. Isn't it a
some kind of not nice configuration/copy&paste/etc problem?

Regards,
Luke
--
?ukasz Nowak R&D Ventis http://www.ventis.com.pl/
tel: +48 32 768 16 85 fax: +48 32 392 10 61
``Use the Source, Luke...'' I am only craftsman.
Loading...