Discussion:
[Erp5-dev] how indexing works
Yoshinori Okuji
2005-09-21 20:17:45 UTC
Permalink
According to a request from S?bastien, I describe how indexing works in the
current implementation briefly.

In the past, ERP5 catalogged objects in one-by-one basis. For each object,
portal_catalog called Z SQL Methods to insert rows into tables. This was
slow, because MySQL invoked its SQL query interpreter each time and needed to
rebuild indices each time. This was slow, also because the cache efficiency
in ZODB was bad.

Now ERP5 groups multiple objects for indexing, using the new functionality in
CMFActivity. The activity SQLDict implements support for group methods and
expand methods. First, I explain group methods.

When we make an active object, this looks like this:

obj.activate().immediateReindexObject()

CMFActivity can be extended arbitrarily by passing optional parameters to
activate:

obj.activate(group_method_id='portal_catalog/catalogObjectList').immediateReindexObject()

This parameter "group_method_id" is simply ignored when an activity does not
recognize it. But SQLDict recognizes it, and applies a special handling for
this active object. In the case of this example, SQLDict tries to gather
active objects which has the same group method id. In the current setting,
SQLDict collects up to 100 objects at a time, and validates each active
object (e.g. checking an after method id). Then, SQLDict obtains objects from
ZODB and calls the group method with the list of those object. So, in
SQLDict, immediateReindexObject is not used at all any longer, while keeping
compatibility.

The method "catalogObjectList" in portal_catalog calls Z SQL Methods with the
list of objects (after filtering). This reduces the number of SQL queries to
MySQL significantly, and so performs better. Also, if objects are related (in
most cases, yes), the ZODB cache hits the same objects with a higher
probability, so this also reduces the load of Zope.

Now, about expand methods. There are some ways to implement
recursiveReindexObject. In the past implementation, recursiveReindexObject
called immediateReindexObject with recursively traversed objects. So, one way
was to call catalogObjectList with a list of traversed objects. However, this
does not allow grouping a recursiveReindexObject call with another or
reindexObject. So I decided to add a new parameter into SQLDict:
expand_method_id.

As you can see in ERP5Type/Document/Folder.py, recursiveReindexObject is like
this:

obj.activate(group_method_id='portal_catalog/catalogObjectList',
expand_method_id='getIndexableChildValueList').recursiveImmediateReindexObject()

As you understand above, when an activity does not recognize group_method_id
or expand_method_id, this just calls recursiveImmediateReindexObject as
before. But SQLDict deals with this in a different way. Because this uses the
same group method as reindexObject, this is grouped with reindexObject. Then,
SQLDict finds an expand method "getIndexableChildValueList" and calls this
method with the object. The result is a list of all child objects, including
the object itself, which are indexable. This result is taken into account for
the group method, and the rest is the same as reindexObject.

Due to this change, portal_catalog does not use Z SQL Methods for one object,
such as z_catalog_category, any longer. Instead, it uses methods for multiple
objects, such as z_catalog_object_list. These methods make use of the
extended inserts specific to MySQL, which can insert multiple rows by a
single query. Although this is specific to MySQL, we can do similar
optimization for PostgreSQL as well (e.g. dropping indices, inserting rows,
and rebuilding indices).

Is this enough?

YO
--
Yoshinori Okuji, Nexedi Research Director
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
ERP5: Free / Open Source ERP Software for small and medium companies
http://www.erp5.org
Storever: OpenBrick, WiFi infrastructure, notebooks and servers
http://www.storever.com
Sebastien Robin
2005-09-22 07:09:37 UTC
Permalink
Thank you very much Yoshinori.

Also, this new way of reindexing objects require sometimes to change the
configuration of mysql. Indeed, by default the value of
"max_allowed_packet" in the my.cnf file can be too small, and reindexing
many objects generates big packets. For example the value of 1Mo is too
small. Personnaly, I have :

[mysqld]
max_allowed_packet = 64M

The value is already fine in the live CD.

Seb.
Post by Yoshinori Okuji
According to a request from S?bastien, I describe how indexing works in
the current implementation briefly.
In the past, ERP5 catalogged objects in one-by-one basis. For each
object, portal_catalog called Z SQL Methods to insert rows into tables.
This was slow, because MySQL invoked its SQL query interpreter each
time and needed to rebuild indices each time. This was slow, also
because the cache efficiency in ZODB was bad.
Now ERP5 groups multiple objects for indexing, using the new
functionality in CMFActivity. The activity SQLDict implements support
for group methods and expand methods. First, I explain group methods.
obj.activate().immediateReindexObject()
CMFActivity can be extended arbitrarily by passing optional parameters
obj.activate(group_method_id='portal_catalog/catalogObjectList').immedi
ateReindexObject()
This parameter "group_method_id" is simply ignored when an activity
does not recognize it. But SQLDict recognizes it, and applies a special
handling for this active object. In the case of this example, SQLDict
tries to gather active objects which has the same group method id. In
the current setting, SQLDict collects up to 100 objects at a time, and
validates each active object (e.g. checking an after method id). Then,
SQLDict obtains objects from ZODB and calls the group method with the
list of those object. So, in SQLDict, immediateReindexObject is not
used at all any longer, while keeping compatibility.
The method "catalogObjectList" in portal_catalog calls Z SQL Methods
with the list of objects (after filtering). This reduces the number of
SQL queries to MySQL significantly, and so performs better. Also, if
objects are related (in most cases, yes), the ZODB cache hits the same
objects with a higher probability, so this also reduces the load of
Zope.
Now, about expand methods. There are some ways to implement
recursiveReindexObject. In the past implementation,
recursiveReindexObject called immediateReindexObject with recursively
traversed objects. So, one way was to call catalogObjectList with a
list of traversed objects. However, this does not allow grouping a
recursiveReindexObject call with another or reindexObject. So I decided
to add a new parameter into SQLDict: expand_method_id.
As you can see in ERP5Type/Document/Folder.py, recursiveReindexObject
obj.activate(group_method_id='portal_catalog/catalogObjectList',
expand_method_id='getIndexableChildValueList').recursiveImmediateReinde
xObject()
As you understand above, when an activity does not recognize
group_method_id or expand_method_id, this just calls
recursiveImmediateReindexObject as before. But SQLDict deals with this
in a different way. Because this uses the same group method as
reindexObject, this is grouped with reindexObject. Then, SQLDict finds
an expand method "getIndexableChildValueList" and calls this method
with the object. The result is a list of all child objects, including
the object itself, which are indexable. This result is taken into
account for the group method, and the rest is the same as
reindexObject.
Due to this change, portal_catalog does not use Z SQL Methods for one
object, such as z_catalog_category, any longer. Instead, it uses
methods for multiple objects, such as z_catalog_object_list. These
methods make use of the extended inserts specific to MySQL, which can
insert multiple rows by a single query. Although this is specific to
MySQL, we can do similar optimization for PostgreSQL as well (e.g.
dropping indices, inserting rows, and rebuilding indices).
Is this enough?
YO
--
Sebastien Robin, Nexedi Technical Director
Nexedi: Consulting and Development of Free / Open Source Software
http://www.nexedi.com
ERP5: Free / Open Source ERP Software for small and medium companies
http://www.erp5.org
Storever: OpenBrick, WiFi infrastructure, notebooks and servers
http://www.storever.com
Loading...