Understanding the node processor plugin for the Feeds module

In Drupal 6, the Feeds module has three types of plugins. These are the fetcher, parser and the processor plugins. The processor plugins that are supplied with the Feeds module are for data, nodes, taxonomy and users. This article looks at the node processor.

If a feed is imported multiple times to allow the information in the feed to update the Drupal site, thre are three options under the node processor settings. These are

Do not update existing nodes
Replace existing nodes
Update existing nodes (slower than replacing them)

The unique ID that is specified in the mapping is used to determine if the node being imported is new or corresponds to an existing node. But what does the standard node processor actually do for each of the three update options? For replace existing node you might think that it would replace the existing node. You would be wrong. For update existing node you mght think that it would check to see if there are differences in the feed and the node and update the node to reflect the new information in the feed. Again you would be wrong. Reality is a bit more complicated although, in practice, these options really do what they say. Let us look at the code in FeedsNodeProcessor.inc.

while ($item = $batch->shiftItem()) {
  // Create/update if item does not exist or update existing is enabled.
  if (!($nid = $this->existingItemId($batch, $source)) || ($this->config['update_existing'] != FEEDS_SKIP_EXISTING)) {
    // Only proceed if item has actually changed.
    $hash = $this->hash($item);
    if (!empty($nid) && $hash == $this->getHash($nid)) {
      continue;
    }

$item is the information from the feed corresponding to a node. The first thing we should notice is that when the setting for Update existing nodes is set to Do not update existing nodes the node processor skips processing that item. When processing existing nodes is enabled (the other two options), a hash is computed of the feed item. This is compared to the hash for the previous time this item was processed if it is an existing node. If the hashes match then the item is not processed. So the Replace existing nodes option does not replace items that have no changes. It does seem sensible not to bother replacing something if it has not changed. The Update existing nodes option also does not process items where there are no changes. Again, this seems sensible.

But what about the case where a node is edited on a website? The node processor never really looks at the existing node but relies on the stored hash. So there can be cases where a feed is imported with information that is different than is stored in the node but the node will not be replaced or updated. Users may get very irate that the feed is not getting imported properly. Debugging such a problem can be very confusing. While the standard node processor will usually do what you want, sometimes the results are unexpected. There are also cases where the standard node processor tht comes with the Feeds module is not going to do what is needed for a particular website. For instance, you may really want to check whether a particular field has changed in addition to checking the hash to avoid angry calls from users. In this case, it is pretty easy to copy the node processor plugin to a custom module, make the appropriate changes and add a function to the custom module to flush the plugins so the new plugin is recognized. But it all starts with understanding how the standard node processor plugin that comes with the Feeds module really works.