Wednesday, April 2, 2014

Search Indexing Process




OpenText Livelink Content Server Search components and architecture
Indexing in Livelink Search



The Extractor:


“Indexupdate” which is the Extractor process, makes call to index.update (function) request handler in llserver process. Index.update sends request to llserver process. It is the llserver performs that performs actual extraction.
Indexupdate.exe is run by Admin process every 1 minute.
Extractor receives metadata from database and content from file storage, packs it into IPool and sends it to Document Conversion process.
After enabling thread logging, index.update function shows the objects being extracted.

There are two modes of extraction:

Full extraction mode- Building index for the first time. Extractor traverses DTree table from highest dataid to lowest dataid creating IPool messages. After DTree, it uses DTreeNotify and WIndexNotify to get most recent data indexed first.
Incremental Extraction mode- Updates new items being added, runs every minute
Extraction can be captured by enabling thread log and connect log on Admin server hosting Extractor process. In this mode, DTreeNotify and WIndexNotify tables are used for new updates.


Fun fact:
If the following query returns “-1” then extractor is in Incremental Extraction mode:
SELECT IniValue FROM KIni WHERE IniSection=’OTIndex’;

Logging Extractor:
Extraction is captured in logs by enabling Livelink thread logs and Connect logs on Livelink Admin server hosting the Extractor process.
Enter the following in opentext.ini file:

Debug=2                                  /* for thread logs*/
wantLogs=TRUE                       /* for connect logs*/
wantTimings=TRUE
wantVerbose=TRUE
wantDebugSearch=TRUE

Restart the services.
(Note: Connect logs increase in size very rapidly. Use them only when necessary and disable them after use.)


Document Conversion Services:


DCS is a set of processes and services responsible for preparing data prior to indexing. DCS performs tasks such as managing the data flows and IPools during ingestion, extracting text and metadata from content. It is responsible for retrieving indexable content out of objects in an IPool message and passing them onto Update Distributor. It’s a scheduled process that starts every minute.
Otdoccnv process is run by Admin server every 1 minute. There are two instances of otdoccnv process running for every data source- master and slave.
Master- reads documents from IPool#1 and sends them to slave for conversion, reads the result from slave and writes to IPool#2.
Slave- converts the content to HTML format
Logging DCS:
Add the following lines [FilterEngine] section of opentext.ini file:
logfile=\livelinkInstall\logs\otdoccnv.log
This will generate two log files:
·         otdoccnv.log.master
·         otdoccnv.log.slave


Update Distributor:


The process involved is “otupdatedistributor”. As the name suggests, it distributes any new index update task to appropriate index engine. Update Distributor monitors IPool directory to check for new indexing requests. Then determines the index engine to send update request and sends it to that engine.
For selecting index engine, Update Distributor sends a message to all index engines, using key, to check for duplicate entry with them. The one that confirms having duplicate object is given that IPool for writing/updating or deleting. If data is not in any index engine then it gives IPool to one of them using Round-Robin method. During allocation, it also checks for the mode of partition such as Read/Write, Read-only and Update-only (we'll deal with partition modes in detail in next pages). It also considers parameters such as memory available with partition.

Update Distributor works in the following sequence:
1. Read from LLHome\config\search.ini file
2. Contact RMI Registry Server
3. Contact the index engine
4. Start to process IPool messages

Update Distributor is responsible for rolling back transaction if indexing of any IPool fails.

Logging Update Distributor:
System Object Volume -> Enterprise Data Source -> Enterprise Data Flow Manager -> Functions menu for Enterprise Update Distributor -> Properties -> Advanced Settings

Index Engine:

Index Engine (otindexengine) performs the task of adding, updating and deleting objects in the search index. It accepts the request from Update Distributor. Each partition has at least one index engine.
Ipupdate process is run by Admin server. This process separates content from metadata. While metadata is saved in *._d_ temp file, content is stored in *.urn temp file.  Eg 10._d_ is for metadata and 10.urn is created for content.

Logging Index Engine:
Index engines logs can be enabled from: Administration Page -> System Object Volume -> Enterprise Data Source -> Enterprise Data Flow Manager -> Functions menu for Enterprise Update Distributor -> Properties -> Index Engine
Ipupdate logs can be viewed from this location: Enterprise\index\ipupdate.log

IPool Structure:


IPool (Interchange Pool) is used to encapsulate data for indexing. IPool in simplest form consists of two directories. Enterprise Data Flow (EDF) process is responsible for writing to and reading from IPool directory.
It can be viewed from Enterprise Data Flow Manager page.



1 comment:

  1. it wold be great if you post How the document is stored and indexed in Content server. Thank you

    ReplyDelete

I would be glad to address your questions and opinion about my blog. You can comment while remaining anonymous. Please enter your comments below: