|
Indexing in Livelink Search |
The
Extractor:
“Indexupdate” which
is the Extractor process, makes call to index.update (function) request handler
in llserver process. Index.update sends request to llserver process. It is the
llserver performs that performs actual extraction.
Indexupdate.exe is
run by Admin process every 1 minute.
Extractor receives
metadata from database and content from file storage, packs it into IPool and
sends it to Document Conversion process.
After enabling thread
logging, index.update function shows the objects being extracted.
There are two modes
of extraction:
Full extraction mode-
Building index for the first time. Extractor traverses DTree table from highest
dataid to lowest dataid creating IPool messages. After DTree, it uses
DTreeNotify and WIndexNotify to get most recent data indexed first.
Incremental
Extraction mode- Updates new items being added, runs every minute
Extraction can be
captured by enabling thread log and connect log on Admin server hosting
Extractor process. In this mode, DTreeNotify and WIndexNotify tables are used for new updates.
Fun fact:
If the following
query returns “-1” then extractor is in Incremental Extraction mode:
SELECT IniValue FROM
KIni WHERE IniSection=’OTIndex’;
Logging
Extractor:
Extraction is
captured in logs by enabling Livelink thread logs and Connect logs on Livelink
Admin server hosting the Extractor process.
Enter the following
in opentext.ini file:
Debug=2 /* for thread logs*/
wantLogs=TRUE /*
for connect logs*/
wantTimings=TRUE
wantVerbose=TRUE
wantDebugSearch=TRUE
Restart the services.
(Note: Connect logs increase in size very rapidly. Use them only when necessary and disable them after use.)
Document
Conversion Services:
DCS is a set of
processes and services responsible for preparing data prior to indexing. DCS
performs tasks such as managing the data flows and IPools during ingestion,
extracting text and metadata from content. It is responsible for retrieving
indexable content out of objects in an IPool message and passing them onto
Update Distributor. It’s a scheduled process that starts every minute.
Otdoccnv process is
run by Admin server every 1 minute. There are two instances of otdoccnv process
running for every data source- master and slave.
Master- reads
documents from IPool#1 and sends them to slave for conversion, reads the result
from slave and writes to IPool#2.
Slave- converts the
content to HTML format
Logging
DCS:
Add the following
lines [FilterEngine] section of opentext.ini file:
logfile=\livelinkInstall\logs\otdoccnv.log
This will generate
two log files:
·
otdoccnv.log.master
·
otdoccnv.log.slave
Update Distributor:
The process involved
is “otupdatedistributor”. As the name suggests, it distributes any new index
update task to appropriate index engine. Update Distributor monitors IPool
directory to check for new indexing requests. Then determines the index engine
to send update request and sends it to that engine.
For selecting index
engine, Update Distributor sends a message to all index engines, using key, to
check for duplicate entry with them. The one that confirms having duplicate
object is given that IPool for writing/updating or deleting. If data is not in
any index engine then it gives IPool to one of them using Round-Robin method.
During allocation, it also checks for the mode of partition such as Read/Write,
Read-only and Update-only (we'll deal with partition modes in detail in next
pages). It also considers parameters such as memory available with partition.
Update Distributor
works in the following sequence:
1. Read from LLHome\config\search.ini file
2. Contact RMI Registry Server
3. Contact the index engine
4. Start to process IPool messages
Update Distributor is
responsible for rolling back transaction if indexing of any IPool fails.
Logging Update
Distributor:
System Object Volume -> Enterprise Data Source ->
Enterprise Data Flow Manager -> Functions menu for Enterprise Update
Distributor -> Properties -> Advanced Settings
Index
Engine:
Index Engine (otindexengine) performs
the task of adding, updating and deleting objects in the search index. It
accepts the request from Update Distributor. Each partition has at least one
index engine.
Ipupdate process is run by Admin server. This
process separates content from metadata. While metadata is saved in *._d_ temp
file, content is stored in *.urn temp file.
Eg 10._d_ is for metadata and 10.urn is created for content.
Logging
Index Engine:
Index engines logs
can be enabled from: Administration Page -> System Object Volume ->
Enterprise Data Source -> Enterprise Data Flow Manager -> Functions menu
for Enterprise Update Distributor -> Properties -> Index Engine
Ipupdate logs can be
viewed from this location: Enterprise\index\ipupdate.log
IPool Structure:
IPool (Interchange
Pool) is used to encapsulate data for indexing. IPool in simplest form consists
of two directories. Enterprise Data Flow (EDF) process is responsible for
writing to and reading from IPool directory.
It can be viewed from
Enterprise Data Flow Manager page.