wesley tanaka

How the Drupal 6 search indexer works

Bugs Filed

Bug 312385: {search_total} can get out of sync

Search Tables

The search module stores its data in four tables:

  • search_dataset is a queue of items to be indexed
  • search_index is the main search index which gives a score to each distinct word in every node.
  • search_total is a materialized view on {search_index}
    • Equivalent to SELECT word, LOG10(1 + 1/MAX(1,SUM(score))) AS count FROM {search_index} GROUP BY word
    • Written in search_update_totals(), whose sole function is to maintain the view
    • Read in do_search(), where it is INNER JOINed onto the overall search query
  • search_node_links: Tracks links to nodes contained in search items (like nodes), used to improve search scores for nodes that are frequently linked to.

Search Indexer Steps

Base case where node.module is the only module which implements update_index
  1. node_update_index()
    1. The number of items (default 100) to index is fetched from a variable (search_cron_limit)
    2. The maximum number of comments per thread is fetched from the DB and stored in a variable node_cron_comments_scale
    3. The value of the variable node_cron_views_scale is recalculated
    4. The 100 node IDs for the nodes to reindex are fetched
    5. _node_index_node() is called for each of those node IDs
      1. Loads the node from the database
      2. sets the variable node_cron_last
      3. The node is rendered into HTML
      4. nodeapi 'update index' is called to get any extra invisible keywords
        1. comment.module adds all comments
        2. search.module adds any anchor text for incoming links using the search_node_links table
        3. taxonomy.module adds any tags for the node
      5. search_index() is called
        1. ... stuff happens
        2. search_wipe($sid, $type, TRUE) is called
          1. DELETEs corresponding row(s) from {search_dataset}, {search_index}, and {search_node_links}
        3. Row is inserted into {search_dataset} with new data column and reindex set to 0
        4. Every word in $results[0] gets its {search_index} entry updated is added into search_dirty()
        5. ... stuff happens
  2. search_update_totals() is called, which keeps the materialized view {search_total} up to date
    1. Processes adds and updates.  For each word that's been added into search_dirty()
      1. Get the sum of scores from {search_index} for that word, normalize it, and store it in {search_total}
    2. Delete rows in {search_total} which do not correspond to a row in {search_index}

Can you help me fix my

Can you help me fix my drupal search indexer. My links are not being indexed.

drupal.org

Hi John, you probably want to be asking on the drupal.org forums, since there are a lot of people there, and many of them know much more than I do.

how do drupal know which node it is from the database tables?

hello, thanks for your clear explanation.

i have been learning about search, but i dont find how drupal knows from the sid which nid has to present.

thanks :)

ups! i am toooo tired... it

ups! i am toooo tired... it is not tid, it is sid. and it is the same as nid.

but why they change the name? why didn't call it the same?

i am sure there is a reason, but now i am toooo tired to find it :))

thanks for your clear explanation.

Syndicate content
by Wesley Tanaka