Discussion:
[Erp5-dev] performance of reindex object
Bartek Gorny
2010-02-12 08:45:13 UTC
Permalink
Hello,

I'm running a production instance of ERP5, and I have a performance
problem - reindexing some documents consumes a lot of CPU power.
Sometimes reindexing four of five docs takes more then a minute, with
mysql consuming up to 200% CPU and python processes eating up another
50% (this is a virtual machine running on three CPU cores, using ZEO,
with three processing nodes). Something is definitely wrong - my
question is, where should I begin to look for a problem. I read
"performance crimes", and I don't seem to have committed any of those
(at least not outright). Any advice, how to trace and where the
problem may arise, would be most welcome.

The dbase is not very big - count of objects in tables are:

catalog: 380K
category:950K
delivery:4K
movement: 130K
predicate:160K
predicate_category:160K
roles_and_users:2K
stock:40K

So, is there a problem, have I done something wrong, or is it just too much?

Bartek
--
"Software is largely a service industry operating under the persistent
but unfounded delusion that it is a manufacturing industry."
Eric S.Raymond, "The Magic Cauldron"
Jean-Paul Smets
2010-02-12 09:40:04 UTC
Permalink
Hi,

Some hints:
- we have sites with > 10,000 K lines in various tables and this
does not happen
- reindexing speed is tested by unit test, fluctuating, but under
control

Questions
- what are those documents for which "sometimes reindexing four of
five docs takes more then a minute"
- are there any extensions to catalog ? (ex. many columns ? scripts
in catalog which parse objects resursively)
- are you using MySQL ?

Reindexing speed should be between 10 and 30 simple documents / second /
core. If your document is complex, made for example of 100 subdocuments,
it will take 3 to 10 seconds for reindexing the root document, which is
normal, since you are actually reindexing 100 documents. If your root
document is made of 1000 subdocuments, changing the way to recursively
reindex subdocuments could be considered. If your root document is made
of 10,000 subdocuments, changing the way to recursively reindex
subdocuments is required.

Another possibility for slow reindexing is abuse of indices of MySQL (or
any other DB). The more indices you add, the slower INSERT. In large
sites, we usually remove some indices and add others, but this really
depends on the application and the nature of data, so there are no
universal rules here besides "optimize your indices in MySQL based on
your data".

Another possiblity is locking problems. One process of indexing is
waiting for another to finish. You must study what happens in MySQL to
track that (there are many tools for that purpose).

Anyway, optimizing "pure" reindexing speed is not so easy because this
is very often an issue of optimizing python method calls and the way
data is accessed. We are for example currently improving the speed of
catalog by caching some values related to the filters. This will provide
a few % improvement.

Regards,

JPS.
Post by Bartek Gorny
Hello,
I'm running a production instance of ERP5, and I have a performance
problem - reindexing some documents consumes a lot of CPU power.
Sometimes reindexing four of five docs takes more then a minute, with
mysql consuming up to 200% CPU and python processes eating up another
50% (this is a virtual machine running on three CPU cores, using ZEO,
with three processing nodes). Something is definitely wrong - my
question is, where should I begin to look for a problem. I read
"performance crimes", and I don't seem to have committed any of those
(at least not outright). Any advice, how to trace and where the
problem may arise, would be most welcome.
catalog: 380K
category:950K
delivery:4K
movement: 130K
predicate:160K
predicate_category:160K
roles_and_users:2K
stock:40K
So, is there a problem, have I done something wrong, or is it just too much?
Bartek
--
Jean-Paul Smets-Solanes, Nexedi CEO - Tel. +33(0)6 29 02 44 25
ERP5 Enterprise: Open Source ERP/CRM for Mission Critical Applications
http://www.erp5.com
TioLive SaaS: run your business online, with more freedom
http://www.tiolive.com
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
Bartek Gorny
2010-02-15 11:17:56 UTC
Permalink
Yes, the performance results you mention is more or less what I would
expect, so there must be some reason it so slow. Portal type is based
on Order, there are no extensions to catalog, no additional indices,
I'm using MySQL.

I'm using a non-standard role for security (Reviewer) - can this be
the reason? This the only unusual thing about those documents I can
think of...

Bartek
Hi,
? ?- we have sites with > 10,000 K lines in various tables and this
does not happen
? ?- reindexing speed is tested by unit test, fluctuating, but under
control
Questions
? ?- what are those documents for which "sometimes reindexing four of
five docs takes more then a minute"
? ?- are there any extensions to catalog ? (ex. many columns ? scripts
in catalog which parse objects resursively)
? ?- are you using MySQL ?
Reindexing speed should be between 10 and 30 simple documents / second /
core. If your document is complex, made for example of 100 subdocuments,
it will take 3 to 10 seconds for reindexing the root document, which is
normal, since you are actually reindexing 100 documents. If your root
document is made of 1000 subdocuments, changing the way to recursively
reindex subdocuments could be considered. If your root document is made
of 10,000 subdocuments, changing the way to recursively reindex
subdocuments is required.
Another possibility for slow reindexing is abuse of indices of MySQL (or
any other DB). The more indices you add, the slower INSERT. In large
sites, we usually remove some indices and add others, but this really
depends on the application and the nature of data, so there are no
universal rules here besides "optimize your indices in MySQL based on
your data".
Another possiblity is locking problems. One process of indexing is
waiting for another to finish. You must study what happens in MySQL to
track that (there are many tools for that purpose).
Anyway, optimizing "pure" reindexing speed is not so easy because this
is very often an issue of optimizing python method calls and the way
data is accessed. We are for example currently improving the speed of
catalog by caching some values related to the filters. This will provide
a few % improvement.
Regards,
JPS.
Post by Bartek Gorny
Hello,
I'm running a production instance of ERP5, and I have a performance
problem - reindexing some documents consumes a lot of CPU power.
Sometimes reindexing four of five docs takes more then a minute, with
mysql consuming up to 200% CPU and python processes eating up another
50% (this is a virtual machine running on three CPU cores, using ZEO,
with three processing nodes). Something is definitely wrong - my
question is, where should I begin to look for a problem. I read
"performance crimes", and I don't seem to have committed any of those
(at least not outright). Any advice, how to trace and where the
problem may arise, would be most welcome.
catalog: 380K
category:950K
delivery:4K
movement: 130K
predicate:160K
predicate_category:160K
roles_and_users:2K
stock:40K
So, is there a problem, have I done something wrong, or is it just too much?
Bartek
--
Jean-Paul Smets-Solanes, Nexedi CEO - Tel. +33(0)6 29 02 44 25
ERP5 Enterprise: Open Source ERP/CRM for Mission Critical Applications
http://www.erp5.com
TioLive SaaS: run your business online, with more freedom
http://www.tiolive.com
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
_______________________________________________
Erp5-dev mailing list
Erp5-dev at erp5.org
http://mail.nexedi.com/mailman/listinfo/erp5-dev
--
"Software is largely a service industry operating under the persistent
but unfounded delusion that it is a manufacturing industry."
Eric S.Raymond, "The Magic Cauldron"
Bartek Gorny
2010-02-16 14:53:05 UTC
Permalink
Ok, problem solved. The reason was .getPrice - this method is called
every time a movement or order is reindexed. Some of my documents do
not define a price, so the default implementation was used, and for
some reason it took it a few seconds of doing mysql-heavy operations
to finally return None.

I solved it (or worked around it?) by placing a few
[PortalType]_getPriceCalculationOperandDict scripts which either
return None or use simple custom logic to retrieve the default price.
After that, the reindexing speed went up to about 5 docs per second,
which is fine for me.

Bartek
Post by Bartek Gorny
Yes, the performance results you mention is more or less what I would
expect, so there must be some reason it so slow. Portal type is based
on Order, there are no extensions to catalog, no additional indices,
I'm using MySQL.
I'm using a non-standard role for security (Reviewer) - can this be
the reason? This the only unusual thing about those documents I can
think of...
Bartek
Hi,
? ?- we have sites with > 10,000 K lines in various tables and this
does not happen
? ?- reindexing speed is tested by unit test, fluctuating, but under
control
Questions
? ?- what are those documents for which "sometimes reindexing four of
five docs takes more then a minute"
? ?- are there any extensions to catalog ? (ex. many columns ? scripts
in catalog which parse objects resursively)
? ?- are you using MySQL ?
Reindexing speed should be between 10 and 30 simple documents / second /
core. If your document is complex, made for example of 100 subdocuments,
it will take 3 to 10 seconds for reindexing the root document, which is
normal, since you are actually reindexing 100 documents. If your root
document is made of 1000 subdocuments, changing the way to recursively
reindex subdocuments could be considered. If your root document is made
of 10,000 subdocuments, changing the way to recursively reindex
subdocuments is required.
Another possibility for slow reindexing is abuse of indices of MySQL (or
any other DB). The more indices you add, the slower INSERT. In large
sites, we usually remove some indices and add others, but this really
depends on the application and the nature of data, so there are no
universal rules here besides "optimize your indices in MySQL based on
your data".
Another possiblity is locking problems. One process of indexing is
waiting for another to finish. You must study what happens in MySQL to
track that (there are many tools for that purpose).
Anyway, optimizing "pure" reindexing speed is not so easy because this
is very often an issue of optimizing python method calls and the way
data is accessed. We are for example currently improving the speed of
catalog by caching some values related to the filters. This will provide
a few % improvement.
Regards,
JPS.
Post by Bartek Gorny
Hello,
I'm running a production instance of ERP5, and I have a performance
problem - reindexing some documents consumes a lot of CPU power.
Sometimes reindexing four of five docs takes more then a minute, with
mysql consuming up to 200% CPU and python processes eating up another
50% (this is a virtual machine running on three CPU cores, using ZEO,
with three processing nodes). Something is definitely wrong - my
question is, where should I begin to look for a problem. I read
"performance crimes", and I don't seem to have committed any of those
(at least not outright). Any advice, how to trace and where the
problem may arise, would be most welcome.
catalog: 380K
category:950K
delivery:4K
movement: 130K
predicate:160K
predicate_category:160K
roles_and_users:2K
stock:40K
So, is there a problem, have I done something wrong, or is it just too much?
Bartek
--
Jean-Paul Smets-Solanes, Nexedi CEO - Tel. +33(0)6 29 02 44 25
ERP5 Enterprise: Open Source ERP/CRM for Mission Critical Applications
http://www.erp5.com
TioLive SaaS: run your business online, with more freedom
http://www.tiolive.com
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
_______________________________________________
Erp5-dev mailing list
Erp5-dev at erp5.org
http://mail.nexedi.com/mailman/listinfo/erp5-dev
--
"Software is largely a service industry operating under the persistent
but unfounded delusion that it is a manufacturing industry."
Eric S.Raymond, "The Magic Cauldron"
--
"Software is largely a service industry operating under the persistent
but unfounded delusion that it is a manufacturing industry."
Eric S.Raymond, "The Magic Cauldron"
Loading...