Avatar-eric-london
Created by Eric.London on 2012-04-10
Tags:
 
In this article I'll show how to setup a Rails project with faceted solr searching integration. This code uses the following: sunspot gem for Solr integration, and acts-as-taggable-on for tagging and search facets.

RVM/Rails Setup

$ mkdir solrfacets

# create rvm gemset
$ echo "rvm use --create ruby-1.9.2@solrfacets" > solrfacets/.rvmrc

$ cd solrfacets

# install rails
$ gem install rails

# create new rails project
$ rails new .

# version control
$ git init
$ git add .
$ git commit -am "new rails project"


Add gems

# file: Gemfile, added:
gem 'acts-as-taggable-on'
gem 'sunspot_rails'
gem 'sunspot_solr', :groups => [:development, :test]

# installing gems
$ bundle


Create default scaffolding for a Post model

$ rails generate scaffold Post title:string content:text


Add tags property to Post model

# file: app/models/post.rb

 class Post < ActiveRecord::Base
   attr_accessible :content, :title
+  acts_as_taggable_on :tags
 end


Run acts-as-taggable-on migration

$ rails generate acts_as_taggable_on:migration


Setup/create database

$ rake db:migrate


Part 2, Random Data

The model is now setup to create Posts with a title, content, and array of tags. For demonstration purposes, I decided to create a rake task to populate the content attribute with lorem ipsum text, and the tags with random words from /usr/share/dict/words.

Modified the Post model to enable :tag_list as mass assignable

# file: app/models/post.rb

 class Post < ActiveRecord::Base
-  attr_accessible :content, :title
+  attr_accessible :content, :title, :tag_list
   acts_as_taggable_on :tags
 end


Added lorem gem

# file: Gemfile
gem 'lorem', :groups => [:development]

# installing
$ bundle


Created a ruby rake script to create 20 Posts with 20 random tag words

# file: lib/tasks/create_random_posts_and_tags.rake

namespace :db do
  desc "Create random posts and tags."
  task :create_random_posts_and_tags => :environment do
    
    # count the number of lines in the dictionary
    dict_word_count = `wc -l /usr/share/dict/words | awk '{print $1}'`.to_i
    
    # get 100 random words for the facets
    facet_words = 100.times.map{ `sed $(echo #{Random.rand(dict_word_count)})"q;d" /usr/share/dict/words`.strip! }
    
    # create 20 random posts
    (1..20).each do |i|

      post = Post.create!({
        :title => "Post #{i}",
        :content => Lorem::Base.new('paragraphs', 1).output,
        :tag_list => 20.times.map{ facet_words[rand(facet_words.size)] },
      })
      
    end
    
  end
end


Executed rake task to create posts

$ rake db:create_random_posts_and_tags


Part 3, Solr Sunspot

Generate default configuration

$ rails generate sunspot_rails:install


Add code to index Post data. In this code, I added ":stored => true" to each property to: 1. avoid querying Active Record on the search results page; and 2. to enable matches highlighting.

# file: app/models/post.rb

 class Post < ActiveRecord::Base
   attr_accessible :content, :title, :tag_list
   acts_as_taggable_on :tags
+
+  searchable :auto_index => true, :auto_remove => true do
+    string :title, :stored => true
+    text :content, :stored => true
+    string :tag_list, :multiple => true, :stored => true
+  end
+
 end


Setup Solr development server via Jetty

# start solr
$ rake sunspot:solr:start 

# index data
$ rake sunspot:solr:reindex


At this point, you should be able to browse and query the solr search results and verify the structure of the indexed data. Example URL: http://localhost:8982/solr/select/?q=*:*
Querying solr directly

Add a new Search controller

$ rails generate controller Search search


Revised search controller to be named route

# file: config/routes.rb

-  get "search/search" 
+  get 'search' => 'search#search', :as => 'search'


Define the search controller method. I set the controller to pass 2 instance variables to the view: @search and @hits. @hits contains the stored values, allowing us to query solr directly, instead of Active Record.

# file: app/controllers/search_controller.rb

class SearchController < ApplicationController
  def search

    # only search if keyword has been entered
    if params[:keywords].nil? || params[:keywords].empty?
      @hits = []
    else
      @search = Post.search do
        fulltext params[:keywords] do
          highlight :content
        end
        facet :tag_list
        paginate :per_page => 10
        
        # tags, AND'd        
        if params[:tag].present?
          all_of do
            params[:tag].each do |tag|
              with(:tag_list, tag)
            end
          end
        end
        
      end
      @hits = @search.hits
      
    end    
  end
end


Define the search view. This code contains the following sections: search form, search results (@hits with matches highlighting), and facets generation. I set the facets as an array, to allow the user to select multiple.

# file: app/views/search/search.html.erb

<h1>Search#search</h1>

<!-- FORM: -->
<%= form_tag search_path, :method => :get do %>
  <%= text_field_tag :keywords, params[:keywords] %>
  <%= submit_tag "Search", :name => nil %>
<% end %>

<!-- SEARCH RESULTS: -->
<% if @hits.any? %>
  <h2>Search Results</h2>
  <ul>
    <% @hits.each do |hit| %>
      <li>
        <%= link_to hit.stored(:title), post_path(hit.primary_key) %><br/>
        <% hit.highlights(:content).each do |highlight| %>          
          <%= highlight.format { |word| "*#{word}*" } %>
        <% end %>
      </li>
    <% end %>  
  </ul>
<% end %>

<!-- FACETS HTML: -->
<%
facets_html = ''
if not @search.nil?
  
  # check for existing tags in query string
  existing_tag_facets = []
  if params[:tag].present?
    existing_tag_facets = params[:tag]
  end

  facet_links_off = ''
  facet_links_on = ''

  @search.facet(:tag_list).rows.each_with_index do |facet, index|
    break if index == 10;
    
    # check if facet is selected
    if (params[:tag].kind_of?(Array) and params[:tag].include? facet.value)
      tag_facets = existing_tag_facets - [facet.value]      
      facet_links_on << "<li>#{link_to facet.value, :keywords => params[:keywords], :tag => tag_facets} (-)</li>"
    elsif @hits.size > 1
      tag_facets = existing_tag_facets + [facet.value]
      facet_links_off << "<li>#{link_to facet.value, :keywords => params[:keywords], :tag => tag_facets} (#{facet.count})</li>"
    end

  end

  facets_html << "<strong>Filter by tags</strong>"
  if facet_links_on.size > 0
    facets_html << "<ul class='search_facets_on'>#{facet_links_on}</ul>"
  end
  if facet_links_off.size > 0
    facets_html << "<ul class='search_facets_off'>#{facet_links_off}</ul>"
  end

end
%>
<%= raw facets_html %>


Browsing to http://localhost:3000/search now shows the search form. I entered "lorem" to get the following result. Note the asterisks around keyword "lorem" in the results. The tag facets are shown below with their associated result count.
solr search results with facets

By clicking on two tags, the facet counts and associated results decrease. The facet links can also be unselected. Great.
solr search results with facets selected
In this article, I'll walk through a basic Rails (3.2.x) setup for creating a nested resource for two models. Nested resources work well when you want to build out URL structure between two related models, and still maintain a RESTful convention. This code assumes you are running RVM to manage Ruby/Gem versions, and Git for version control.

Creating a new Rails project

$ mkdir family

# create rvm gemset
$ echo "rvm use --create ruby-1.9.2@family" > family/.rvmrc

$ cd family

# install rails
$ gem install rails

# create new rails project
$ rails new .

# version control
$ git init
$ git add .
$ git commit -am "new rails project"


Create two models (Parent & Child)

# Parent model
$ rails generate scaffold Parent name:string
$ git add .
$ git commit -am "rails generate scaffold Parent name:string"

# Child model
$ rails generate scaffold Child name:string parent_id:integer
$ git add .
$ git commit -am "rails generate scaffold Child name:string parent_id:integer"

# Create db (defaults to SQLite3)
$ rake db:migrate

# version control
$ git add db/schema.rb
$ git commit db/schema.rb -m "created database schema"


Review un-nested routes

$ rake routes
   children GET    /children(.:format)          children#index
            POST   /children(.:format)          children#create
  new_child GET    /children/new(.:format)      children#new
 edit_child GET    /children/:id/edit(.:format) children#edit
      child GET    /children/:id(.:format)      children#show
            PUT    /children/:id(.:format)      children#update
            DELETE /children/:id(.:format)      children#destroy
    parents GET    /parents(.:format)           parents#index
            POST   /parents(.:format)           parents#create
 new_parent GET    /parents/new(.:format)       parents#new
edit_parent GET    /parents/:id/edit(.:format)  parents#edit
     parent GET    /parents/:id(.:format)       parents#show
            PUT    /parents/:id(.:format)       parents#update
            DELETE /parents/:id(.:format)       parents#destroy


Adding model relationships

# file: app/models/parent.rb
class Parent < ActiveRecord::Base
  attr_accessible :name
  has_many :children
end

# file: app/models/child.rb
class Child < ActiveRecord::Base
  attr_accessible :name, :parent_id
  belongs_to :parent
end

# version control
$ git commit app/models -m "added relationships to models"


Nesting the routes

# file: config/routes.rb
-  resources :children
-  resources :parents
+  resources :parents do
+    resources :children
+  end

# version control
$ git commit -m config/routes.rb "nested resources in routes file"


Reviewing changes to routes

$ rake routes
  parent_children GET    /parents/:parent_id/children(.:format)          children#index
                  POST   /parents/:parent_id/children(.:format)          children#create
 new_parent_child GET    /parents/:parent_id/children/new(.:format)      children#new
edit_parent_child GET    /parents/:parent_id/children/:id/edit(.:format) children#edit
     parent_child GET    /parents/:parent_id/children/:id(.:format)      children#show
                  PUT    /parents/:parent_id/children/:id(.:format)      children#update
                  DELETE /parents/:parent_id/children/:id(.:format)      children#destroy
          parents GET    /parents(.:format)                              parents#index
                  POST   /parents(.:format)                              parents#create
       new_parent GET    /parents/new(.:format)                          parents#new
      edit_parent GET    /parents/:id/edit(.:format)                     parents#edit
           parent GET    /parents/:id(.:format)                          parents#show
                  PUT    /parents/:id(.:format)                          parents#update
                  DELETE /parents/:id(.:format)                          parents#destroy


Adding test data via Rails console

$ rails c

> dad = Parent.new(:name => 'Paul')
 => #<Parent id: nil, name: "Paul", created_at: nil, updated_at: nil> 

> dad.save
   (0.1ms)  begin transaction
  SQL (20.0ms)  INSERT INTO "parents" ("created_at", "name", "updated_at") VALUES (?, ?, ?)  [["created_at", Fri, 06 Apr 2012 16:13:17 UTC +00:00], ["name", "Paul"], ["updated_at", Fri, 06 Apr 2012 16:13:17 UTC +00:00]]
   (2.4ms)  commit transaction
 => true 

> son = dad.children.new(:name => 'Eric')
 => #<Child id: nil, name: "Eric", parent_id: 1, created_at: nil, updated_at: nil> 

> daughter = dad.children.new(:name => 'Mara')
 => #<Child id: nil, name: "Mara", parent_id: 1, created_at: nil, updated_at: nil> 

> exit


Adding a private controller method to load the Parent object for each method

# file: app/controllers/children_controller.rb
@@ -1,4 +1,7 @@
 class ChildrenController < ApplicationController
+
+  before_filter :load_parent
+
   # GET /children
   # GET /children.json
   def index
@@ -80,4 +83,11 @@ class ChildrenController < ApplicationController
       format.json { head :no_content }
     end
   end
+
+  private
+
+    def load_parent
+      @parent = Parent.find(params[:parent_id])
+    end
+
 end


At this point, each controller and view for the Child class model needs to be adjusted (links, redirection, form, etc)

Method: children#index

# file: app/controllers/children_controller.rb

   def index
-    @children = Child.all
+    @children = @parent.children.all



# file: app/views/children/index.html.erb

-    <td><%= link_to 'Show', child %></td>
-    <td><%= link_to 'Edit', edit_child_path(child) %></td>
-    <td><%= link_to 'Destroy', child, confirm: 'Are you sure?', method: :delete %></td>
+    <td><%= link_to 'Show', parent_child_path(@parent, child) %></td>
+    <td><%= link_to 'Edit', edit_parent_child_path(@parent, child) %></td>
+    <td><%= link_to 'Destroy', [@parent, child], confirm: 'Are you sure?', method: :delete %></td>
 
-<%= link_to 'New Child', new_child_path %>
+<%= link_to 'New Child', new_parent_child_path(@parent) %>


Method: children#new

# file: app/controllers/children_controller.rb

   def new
-    @child = Child.new
+    @child = @parent.children.new



# file: app/views/children/_form.html.erb

-<%= form_for(@child) do |f| %>
+<%= form_for([@parent, @child]) do |f| %>




# file: app/views/children/new.html.erb

-<%= link_to 'Back', children_path %>
+<%= link_to 'Back', parent_children_path(@parent) %>


Method: children#create

# file: app/controllers/children_controller.rb

   def create
-    @child = Child.new(params[:child])
+    @child = @parent.children.new(params[:child])
 
     respond_to do |format|
       if @child.save
-        format.html { redirect_to @child, notice: 'Child was successfully created.' }
+        format.html { redirect_to [@parent, @child], notice: 'Child was successfully created.' }


Method: children#show

# file: app/controllers/children_controller.rb

   def show
-    @child = Child.find(params[:id])
+    @child = @parent.children.find(params[:id])



# file: app/views/children/show.html.erb

-<%= link_to 'Edit', edit_child_path(@child) %> |
-<%= link_to 'Back', children_path %>
+<%= link_to 'Edit', edit_parent_child_path(@parent, @child) %> |
+<%= link_to 'Back', parent_children_path(@parent) %>


Method: children#edit

# file: app/controllers/children_controller.rb

   def edit
-    @child = Child.find(params[:id])
+    @child = @parent.children.find(params[:id])



# file: app/views/children/edit.html.erb

-<%= link_to 'Show', @child %> |
-<%= link_to 'Back', children_path %>
+<%= link_to 'Show', parent_child_path(@parent, @child) %> |
+<%= link_to 'Back', parent_children_path(@parent) %>


Method: children#update

# file: app/controllers/children_controller.rb

   def update
-    @child = Child.find(params[:id])
+    @child = @parent.children.find(params[:id])
 
     respond_to do |format|
       if @child.update_attributes(params[:child])
-        format.html { redirect_to @child, notice: 'Child was successfully updated.' }
+        format.html { redirect_to [@parent, @child], notice: 'Child was successfully updated.' }


Method: children#destroy

# file: app/controllers/children_controller.rb

   def destroy
-    @child = Child.find(params[:id])
+    @child = @parent.children.find(params[:id])
     @child.destroy
 
     respond_to do |format|
-      format.html { redirect_to children_url }
+      format.html { redirect_to parent_children_path(@parent) }


At this point, the default scaffolding's links and redirection have been updated to work with the nested routes.
In this article I'll explain how I recently setup a web server to host both 1. Ruby on Rails via Phusion Passenger (mod_rails), and 2. PHP via Apache (mod_php). Nginx will sit in front and proxy requests (by hostname) to Apache, or serve them directly via Phusion. Here's a rough diagram:

nginx apache diagram

I started with a fresh (minimal) installation of Ubuntu 10.04 LTS. Here we go..


# update installed packages
$ sudo apt-get update
$ sudo apt-get upgrade

# install SSH server
$ sudo apt-get install openssh-server -y


Part 1, Apache/PHP


# install PHP & Apache
sudo apt-get install php5 php5-cli php5-common php5-curl php5-gd php5-mysql php-pear -y


Set Apache to listen on port 8000.
Note: nginx will listen on 80 and proxy requests to Apache.


# edit file: /etc/apache2/ports.conf

# replace:
NameVirtualHost *:80
Listen 80

# with:
NameVirtualHost *:8000
Listen 8000


For sake of this tutorial, I created a simple PHP script.

$ mkdir /var/www/php.eric.vm
$ echo '<?php echo "hello php world";' >> /var/www/php.eric.vm/index.php


And created an Apache vhost for the above script:

# created new/example file: /etc/apache2/sites-available/php.eric.vm

<VirtualHost *:8000>

  ServerName php.eric.vm
  ServerAdmin webmaster@localhost
  DocumentRoot /var/www/php.eric.vm

  <Directory /var/www/php.eric.vm >
    AllowOverride All
  </Directory>

  ErrorLog /var/log/apache2/php.eric.vm-error.log

  LogLevel warn

  CustomLog /var/log/apache2/php.eric.vm-access.log combined

</VirtualHost>



# Enabling the new conf file by adding a symlink:
$ cd /etc/apache2/sites-enabled
$ sudo ln -s ../sites-available/php.eric.vm

# removed the existing default vhost:
$ sudo rm 000-default

# restarted Apache
$ sudo service apache2 restart


At this point, I was able to reach my php script by browsing to http://php.eric.vm:8000

Part 2, RVM/Ruby/Passenger


# Install CURL, to start the RVM installation
sudo apt-get install curl -y

# Install RVM (multi-user installation)
$ sudo bash -s stable < <(curl -s https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer )

# add user to RVM group
$ sudo usermod -a -G rvm eric

# Install Ruby/RVM dependencies
# NOTE: you can run "rvm requirements" to get this list:
$ sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion -y

# install nodejs, per javascript runtime
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:chris-lea/node.js
$ sudo apt-get update
$ sudo apt-get install nodejs

# install ruby 1.9.2
$ rvmsudo rvm install 1.9.2

# set default version of ruby
$ rvm use 1.9.2 --default

# install rails
$ rvmsudo gem install rails --version 3.2.1

# install mysql server
# note: rails defaults to sqlite3, choose whatever you want
$ sudo apt-get install mysql-server -y

# install passenger
$ rvmsudo gem install passenger

# install nginx/passenger requirements
$ sudo apt-get install libcurl4-openssl-dev -y

# install passenger nginx module
$ rvmsudo passenger-install-nginx-module
# Install options I chose:
# 1. Yes: download, compile and install Nginx for me. (recommended)
# Please specify a prefix directory [/opt/nginx]: 


Part 3, Test Rails App

For this tutorial, I created a (very) simple Rails app.


# create new rails app
$ cd /var/www
$ rails new railsdemo
$ cd railsdemo

# integrate with git version control
$ git init
$ git add .
$ git commit -am "initial rails repo"

# remove default placeholder index page
$ rm public/index.html

# create sample home controller
$ rails generate controller home index

# add route in file: config/routes.rb
root :to => "home#index"

# version control
$ git add .
$ git commit -am "added home controller and route"


(as usual) if I had made changes to my models, I would have run:

$ rake db:migrate


To test my rails development environment:

$ rails s


At this point, I was able to browse to my rails app at: http://rails.eric.vm:3000
The generic controller message was shown:
Home#index
Find me in app/views/home/index.html.erb

To run the rails app in production mode, I edited the file: config/environments/production.rb, and made this change:

config.assets.compile = true


And as necessary, migrate production database:

$ rake db:migrate RAILS_ENV=production


At this point, the rails app should be able to run in production mode using:

$ rails s -e production

(if not, check log/production.log for errors)

Part 4, Nginx

Although nginx is now installed, you'll need a init script. I simply copied the one listed here: http://techoctave.com/c7/posts/16-how-to-host-a-rails-app-with-phusion-passenger-for-nginx, and pasted it here: /etc/init.d/nginx

# set file permissions
$ sudo chmod +x /etc/init.d/nginx

# add init script run levels
$ sudo /usr/sbin/update-rc.d -f nginx defaults


The last part of this tutorial involves making changes to the nginx conf file: /opt/nginx/conf/nginx.conf

For my server, I set nginx to run as the same user/group as Apache, and increased the number of worker processes (per # of CPU):

# replaced: 
user  nobody;
worker_processes  1;

# with: 
user   www-data www-data;
worker_processes  4;


Within the http directive, I added a server directive for my rails app:

http {
  # ...snip...
  server {
    listen 80;
    server_name rails.eric.vm
    root /var/www/railsdemo/public
    passenger_enabled on;
  }
  # ...snip...
}


And, an upstream and server directive for Apache:

http {
  # ...snip...
  upstream apache {
    server 127.0.0.1:8000 weight=5;
  }

  server {
    listen 80;
    server_name php.eric.vm;

    location / {
      proxy_pass http://apache;
    }
  }
  # ...snip...
}


The above configuration changes allow nginx to listen on port 80, and based on hostname: 1. server the rails app (nginx > passenger > rails); or 2. proxy pass the request to Apache (nginx > apache > php).

* Special thanks to Dan Vine (my Rails partner in crime)!
Avatar-eric-london
Created by Eric.London on 2012-03-10
Tags:
 
In this article, I'll share a drush script I wrote to export data from a Drupal site to JSON format. Scripts like this will require customization, but hopefully it will be helpful as a kick start for some. I used it to export users, nodes, comments, taxonomy, and files from a blog site.

<?php

// define data format
DEFINE('EXPORT_DATE_FORMAT', 'Y-m-d H:i:s');

// fetch desired node data
$sql = "select nid from {node} order by nid asc";
$resource = db_query($sql);
$nodes = array();
while($row = db_fetch_object($resource)) {
  $node = node_load($row->nid);
  if (is_object($node)) {
    $nodes[] = $node;
  }
}

// create a container to store all data
$data = new StdClass();

// loop through node objects, collect desired data
$data->nodes = new StdClass();
foreach ($nodes as $nid => $node) {
  
  $n = new StdClass();
  
  // basic properties
  $n->nid = $node->nid;
  $n->type = $node->type;
  $n->uid = $node->uid;
  $n->user_name = $node->name;
  $n->status = $node->status;
  $n->created = date(EXPORT_DATE_FORMAT, $node->created);
  $n->changed = date(EXPORT_DATE_FORMAT, $node->changed);
  $n->title = $node->title;
  $n->body = $node->body;
  $n->path = $node->path;

  // cck field [simple example, single value]
  if (!empty($node->field_example_single[0]['value'])) {
    $n->field_example_single = $node->field_example_single[0]['value'];
  }

  // cck field [simple example, multi-value]
  if (!empty($node->field_example_multi)) {
    $n->field_example_multi = array();
    foreach ($node->field_example_multi as $field_data {
	  $n->field_example_multi[] = $field_data['value'];
	}
  }

  // taxonomy
  if (!empty($node->taxonomy)) {
    $n->taxonomy = array();
    foreach ($node->taxonomy as $tid => $object) {
      $n->taxonomy[] = $object->name;
    }
  }

  // files
  if (!empty($node->files)) {
    $n->files = array();
    foreach ($node->files as $fid => $object) {
      $f = new StdClass();
      $f->fid = $fid;
      $f->filename = $object->filename;
      $f->filepath = $object->filepath;
      $f->filemime = $object->filemime;
      $f->filesize = $object->filesize;
      $f->timestamp = date(EXPORT_DATE_FORMAT, $object->timestamp);
      $n->files[] = $f;
    }

  }

  // comments (recursive)
  if ($node->comment_count) {
    $n->comments = get_node_comments_recursive($n->nid);
  }
  
  // process node type
  if (!isset($data->nodes->{$n->type})) {
    $data->nodes->{$n->type} = array();
  }
  $data->nodes->{$n->type}[$n->nid] = $n;
  
}

// fetch user object list
$sql = "select uid from {users} order by uid asc";
$resource = db_query($sql);
$users = array();
while($row = db_fetch_object($resource)) {
  $user = user_load($row->uid);
  if (is_object($user)) {
    $users[] = $user;
  }
}

// loop through user objects, collect desired data
$data->users = array();
foreach($users as $user) {

  $u = new StdClass();
  
  $u->uid = $user->uid;
  $u->name = $user->name;
  $u->pass = $user->pass;
  $u->email = $user->mail;
  $u->created = date(EXPORT_DATE_FORMAT, $user->created);
  $u->status = $user->status;
  $u->picture = $user->picture;
  $u->roles = array_values($user->roles);
  
  $data->users[$u->uid] = $u;

}

$json = json_encode($data);

file_put_contents('/non/docroot/path/drupal_export.json', $json);

// FUNCTIONS

// recursively fetch comments data
function get_node_comments_recursive($nid, $pid = 0) {
 
  $sql = "
    select *
    from {comments}
    where nid = %d and pid = %d
    order by thread asc
  ";
  $resource = db_query($sql, $nid, $pid);

  $comments = array();
  while ($row = db_fetch_object($resource)) {
    
    $c = new StdClass();
    $c->cid = $row->cid;
    $c->pid = $row->pid;
    $c->nid = $row->nid;
    $c->uid = $row->uid;
    $c->subject = $row->subject;
    $c->comment = $row->comment;
    $c->hostname = $row->hostname;
    $c->timestamp = date(EXPORT_DATE_FORMAT, $row->timestamp);
    $c->status = $row->status;
    $c->thread = $row->thread;
    $c->user_name = $row->name;
    
    $comments[$row->cid] = $c;
  }
  if (empty($comments)) {
    return array();
  }
  
  foreach ($comments as $key => $value) {
    $children = get_node_comments_recursive($nid, $value->cid);
    if (!empty($children)) {
      $comments[$key]->children = $children;
    }
  }
  
  return $comments;
}
?>


I put this script outside my Drupal docroot in a scripts directory. I called it via drush like this:

$ cd drupal_docroot
$ drush scr ../scripts/drupal_export.php


In this tutorial, I'll share my notes and code I've used to setup geospatial Apache Solr searching in Drupal 7 using the Search API module. For this tutorial I created a minimal Ubuntu server virtual machine. All the commands should be executed as a user with permission to modify files, or prefixed with "sudo".

The first thing I do with a fresh virtual machine is check for package upgrades.

$ apt-get update
$ apt-get upgrade


I find it cumbersome to type in a virtual machine window, so I'll install open-ssh and ssh from my Mac. If you plan to do so, you'll need to find your virtual machine's IP address using ifconfig. For this tutorial I added local DNS (/etc/hosts) to point "drupal7.vm" to my VM's IP.

$ apt-get install openssh-server


Install the LAMP stack. The following packages will install Apache httpd as a dependency.

$ apt-get install php5 php5-cli php5-common php5-curl php5-gd php5-mysql php-pear mysql-server


At this point, browsing to your VM/server's IP address will give you the standard Apache welcome message:
It works!
This is the default web page for this server.
The web server software is running but no content has been added, yet.

Install version control.

$ apt-get install git-core


Create a mysql database for Drupal 7.

$ mysql -u youruser -p
mysql> create database drupal7;
mysql> grant all privileges on drupal7.* to 'drupal7'@'localhost' identified by 'somepassword';
mysql> exit


Install drush via Pear.

$ pear upgrade-all
$ pear channel-discover pear.drush.org
$ pear install drush/drush


Verifying drush is installed.

$ which drush
/usr/bin/drush
$ drush --version
drush version 4.5


Create an Apache vhost directory

$ mkdir -p /var/www/vhosts


Download drupal via drush

$ cd /var/www/vhosts
$ drush dl drupal
# rename folder (as necessary)
$ mv drupal-7.10 drupal7


Integrate drupal file system with git

$ cd drupal7
$ git init
$ git add .
$ git commit -am "initial commit of drupal7"


Install drupal via drush

$ drush site-install standard --db-url=mysql://dbuser:pass@localhost/dbname


Add Apache2 vhost

$ cd /etc/apache2/sites-available
# create new file, called "drupal7" with contents:
<VirtualHost *:80>
  ServerName drupal7.vm
  DocumentRoot /var/www/vhosts/drupal7
  ErrorLog /var/log/apache2/drupal7-error_log
  CustomLog /var/log/apache2/drupal7-access_log combined
  <Directory /var/www/vhosts/drupal7>
    AllowOverride All
  </Directory>
</VirtualHost>

# create symlink
$ cd ../sites-enabled
$ ln -s ../sites-available/drupal7 001-drupal7.conf

# enable apache2 mod_rewrite module
$ a2enmod rewrite

# restart apache2
$ /etc/init.d/apache2 restart


At this point, browsing to your VM/server's hostname should show a Drupal installation.

Part 2, Tomcat/Solr

Installing java jdk and tomcat6

$ apt-get install openjdk-6-jdk tomcat6 tomcat6-admin tomcat6-common tomcat6-user


Browsing to your VM/server's hostname on port 8080 (ex: http://drupal7.vm:8080) will show the generic Tomcat welcome message:
It works !
If you're seeing this page via a web browser, it means you've setup Tomcat successfully. Congratulations!

Installing Solr in Tomcat

$ mkdir ~/downloads
$ cd ~/downloads
# Download the latest stable version of Apache Solr from:
url: http://www.apache.org/dyn/closer.cgi/lucene/solr/
# example:
$ wget http://www.motorlogy.com/apache//lucene/solr/3.5.0/apache-solr-3.5.0.tgz
$ tar -xzf apache-solr-3.5.0.tgz


Copy/rename java war file into Tomcat webapps directory

$ cp ~/downloads/apache-solr-3.5.0/dist/apache-solr-3.5.0.war /var/lib/tomcat6/webapps/solr.war


Note: copying the java war file into the Tomcat webapps folder will create this directory automatically:

/var/lib/tomcat6/webapps/solr


Copy solr files

$ cp -r ~/downloads/apache-solr-3.5.0/example/solr/ /var/lib/tomcat6/solr/


Create Catalina config file to link war file to solr directory

$ cd /etc/tomcat6/Catalina/localhost
# create new file: "solr.xml", with the contents:
<?xml version="1.0" encoding="UTF-8"?>
<Context docBase="/var/lib/tomcat6/webapps/solr.war" debug="0" privileged="true" allowLinking="true" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/var/lib/tomcat6/solr" override="true" />
</Context>


Setup Tomcat admin user(s)

# edit file: /etc/tomcat6/tomcat-users.xml, ensure similar contents exist:
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
<role rolename="admin"/>
<role rolename="manager"/>
<user username="eric" password="supersecretpassword" roles="admin,manager"/>
</tomcat-users>


Update webapps WEB-INF/web.xml file

# edit file: /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml, update "solr/home" section to reflect solr path:
<env-entry>
  <env-entry-name>solr/home</env-entry-name>
  <env-entry-value>/var/lib/tomcat6/solr</env-entry-value>
  <env-entry-type>java.lang.String</env-entry-type>
</env-entry>


Download search api drupal modules that contain solr xml configuration files, and copy into solr conf directory

$ mkdir -p /var/www/vhosts/drupal7/sites/all/modules/contrib
$ cd /var/www/vhosts/drupal7/sites/all/modules/contrib
$ drush dl search_api search_api_solr
$ cp /var/www/vhosts/drupal7/sites/all/modules/contrib/search_api_solr/solrconfig.xml /var/lib/tomcat6/solr/conf/
$ cp /var/www/vhosts/drupal7/sites/all/modules/contrib/search_api_solr/schema.xml /var/lib/tomcat6/solr/conf/


Reset tomcat permissions, and restart tomcat

$ cd /var/lib
$ chown -R tomcat6.tomcat6 tomcat6
$ /etc/init.d/tomcat6 restart


You should now be able to browse to the solr admin java page.
Example: http://drupal7.vm:8080/solr/admin/
Solr Admin Page

If things aren't working well at this point, check the Tomcat logs and look for SEVERE log entries

/var/log/tomcat6/catalina.out


In addition, the solr java module should be listed in the Tomcat Web Application Manager
Ex URL: http://drupal7.vm:8080/manager/html

Part 3, Drupal code

Getting the solr-php-client library from code.google.com

$ mkdir -p /var/www/vhosts/drupal7/sites/all/libraries
$ cd /var/www/vhosts/drupal7/sites/all/libraries

# URL: http://code.google.com/p/solr-php-client/downloads/list
# File: SolrPhpClient.r60.2011-05-04.tgz
$ wget http://solr-php-client.googlecode.com/files/SolrPhpClient.r60.2011-05-04.tgz
$ tar -xzf SolrPhpClient.r60.2011-05-04.tgz


Downloading and installing contrib drupal modules

$ cd /var/www/vhosts/drupal7
$ drush dl entity views ctools facetapi
$ drush en search_api search_api_views search_api_solr search_api_facetapi entity views views_ui ctools facetapi


(Optionally) I install devel, admin_menu, and disable overlay/toolbar

$ drush dl devel admin_menu
$ drush en devel admin_menu
$ drush dis overlay toolbar


Add the tomcat/solr server to Search API configuration.
- URL: /admin/config/search/search_api
- click on "+ Add Server"
- server name: Solr 3.5.0
- Service class: Solr service
- Solr host: localhost
- Solr port: 8080
- Solr path: /solr
- click Create Server

You should receive some confirmation messages:
The server was successfully created.
The Solr server could be reached (latency: # ms).
If not, ensure tomcat/solr is reachable at the url you specified and the tomcat service is running.

At this point Solr is ready to send/receive data and index content, but there is nothing to index. For this tutorial, I decided to build off of user profiles and store latitude and longitude using the geolocation field module.

$ drush dl geolocation
$ drush en geolocation


Adding some user profile fields:
- URL: /admin/config/people/accounts/fields
- First Name | field_name_first | Text
- Last Name | field_name_last | Text
- Geolocation | field_geolocation | Geolocation | Latitude/Longitude

I then added a bunch of users with latitude/longitude coordinates.
- URL: /admin/people/create
- note: I used Google Geocoding API to fetch the coordinates: http://code.google.com/apis/maps/documentation/geocoding/

Adding the search api index.
- URL: /admin/config/search/search_api
- click "+ Add index"
- Index name: People
- Item type: User
- Server: Solr 3.5.0
- click: Create Index

On the next admin page, you can select which fields to index. For this tutorial, I chose: User ID, Name, Email, URL, First Name, and Last Name. Unfortunately, at the time of writing this, the geolocation lat/lng fields are not exposed to the Entity API. I assume this is a temporary problem, and there are numerous patches in the geolocation issue queue.
@see (for example):
Property Info callback for Entity API - http://drupal.org/node/1366642
Fix for Search API not picking up the entity to index it's fields - http://drupal.org/node/1320564

I copied code directly from the issues queue, made some modifications, and created a custom module to expose the geolocation field data to the entity api module. In addition, I added a new property "lat_lon" that concatenates lat and lng together with a comma. @see: http://wiki.apache.org/solr/SpatialSearch

<?php
/**
 * Implements hook_field_info_alter()
 */
function MYMODULE_field_info_alter(&$info) {
  if (isset($info['geolocation_latlng'])) {
    $info['geolocation_latlng']['property_type'] = 'geolocation';
    $info['geolocation_latlng']['property_callbacks'] = array('geolocation_property_info_callback');
  }
}

function geolocation_property_info_callback(&$info, $entity_type, $field, $instance, $field_type) {
  $name = $field['field_name'];
  $property = &$info[$entity_type]['bundles'][$instance['bundle']]['properties'][$name];

  $property['type'] = ($field['cardinality'] != 1) ? 'list<geolocation>' : 'geolocation';
  $property['getter callback'] = 'entity_metadata_field_verbatim_get';
  $property['setter callback'] = 'entity_metadata_field_verbatim_set';
  $property['auto creation'] = 'geolocation_default_values';
  $property['property info'] = geolocation_data_property_info();

  unset($property['query callback']);
}

function geolocation_default_values() {

  return array(
    'lat' => '',
    'lng' => '',
    'lat_sin' => '',
    'last_name' => '',
    'lat_cos' => '',
    'lat_rad' => '',
    'lat_lon' => '',
  );

}

function geolocation_data_property_info($name = NULL) {

  // Build an array of basic property information for the geolocation field.
  $properties = array(
    'lat' => array(
      'label' => t('Latitude'),
    ),
    'lng' => array(
      'label' => t('Longitude'),
    ),
    'lat_sin' => array(
      'label' => t('Sine of Latitude'),
    ),
    'lat_cos' => array(
      'label' => t('Cosine of Latitude'),
    ),
    'lat_rad' => array(
      'label' => t('Radian Latitude'),
    ),
    'lat_lon' => array(
      'label' => t('Latitude,Longitude'),
    ),
  );

  // Add the default values for each of the address field properties.
  foreach ($properties as $key => &$value) {
    
    switch ($key) {
    
      case 'lat_lon':
        $value += array(
          'description' => !empty($name) ? t('!label of field %name', array('!label' => $value['label'], '%name' => $name)) : '',
          'type' => 'text',
          'getter callback' => '_MYMODULE_geolocation_entity_property_verbatim_get',
          'setter callback' => '_MYMODULE_geolocation_entity_property_verbatim_set',
        );
        break;
    
      default:
        $value += array(
          'description' => !empty($name) ? t('!label of field %name', array('!label' => $value['label'], '%name' => $name)) : '',
          'type' => 'text',
          'getter callback' => 'entity_property_verbatim_get',
          'setter callback' => 'entity_property_verbatim_set',
        );
        break;
    
    }

 }

 return $properties;
}


function _MYMODULE_geolocation_entity_property_verbatim_get($data, array $options, $name, $type, $info) {
  if (is_array($data) && isset($data['lat']) && isset($data['lng'])) {
    return $data['lat'] . ',' . $data['lng'];
  }
  return '';
}

function _MYMODULE_geolocation_entity_property_verbatim_set(&$data, $name, $value, $langcode, $type, $info) {
  // TODO
  return;
}
?>


I added this code to a custom module, renamed function calls (as necessary), and enabled. Update the solr index to add the new fields to the index.
- URL: /admin/config/search/search_api/index/people/fields
- Expand "Add Related Fields"
- Choose Geolocation, click Add fields
The above will expose the following fields now available to the index:
- Geolocation » Latitude
- Geolocation » Longitude
- Geolocation » Sine of Latitude
- Geolocation » Cosine of Latitude
- Geolocation » Radian Latitude
- Geolocation » Latitude,Longitude
Enable "Geolocation » Latitude,Longitude" and save changes.

Index the content.
- URL: /admin/config/search/search_api/index/people/status
- Click: Index now
- note: if you had already indexed the content, you'll probably need to clear it first
In my environment, I got the following confirmation message:
Successfully indexed 7 items.

I find it to be very helpful to verify the xml response from Solr directly after making changes to the index/schema.
The following URL structure will query solr for all results and return all fields:

Ex URL: http://drupal7.vm:8080/solr/select/?q=&fl=*


A sample XML document response.

<doc>
  <str name="f_ss_search_api_language"/>
  <str name="f_ss_url">http://drupal7.vm/user/3</str>
  <str name="id">people-3</str>
  <str name="index_id">people</str>
  <long name="is_uid">3</long>
  <str name="item_id">3</str>
  <arr name="spell">
    <str>nashua</str>
    <str>nashua@example.com</str>
    <str>nashua</str>
    <str>nashua</str>
    <str>42.933692,-72.278141</str>
  </arr>
  <str name="ss_search_api_id">3</str>
  <str name="ss_search_api_language"/>
  <str name="ss_url">http://drupal7.vm/user/3</str>
  <arr name="t_field_geolocation:lat_lon">
    <str>42.933692,-72.278141</str>
  </arr>
  <arr name="t_field_name_first">
    <str>nashua</str>
  </arr>
  <arr name="t_field_name_last">
    <str>nashua</str>
  </arr>
  <arr name="t_mail">
    <str>nashua@example.com</str>
  </arr>
  <arr name="t_name">
    <str>nashua</str>
  </arr>
</doc>


Take note the field name in the following XML, it is used in the next file edit.

<arr name="t_field_geolocation:lat_lon">
  <str>42.933692,-72.278141</str>
</arr>


Update the solr schema.xml configuration and add the geospatial fieldType and field data.

# Edit file: /var/lib/tomcat6/solr/conf/schema.xml
# Just prior to the closing "</types>" tag, I inserted: (around line 287)
    <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
    <fieldtype name="geohash" class="solr.GeoHashField"/>

# And, just after the opening "<fields>" tag, I inserted: 
    <field name="t_field_geolocation:lat_lon" type="location" indexed="true" stored="true"/>
    <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>


Restart Tomcat

$ /etc/init.d/tomcat6 restart


Since the schema and solr data types have been updated, the content will have to be re-indexed.
- URL: /admin/config/search/search_api/index/people/status
- click: Clear index
- click: Index now

Returning to the solr query above will now show updated xml: (note: no longer an array)

<str name="t_field_geolocation:lat_lon">42.933692,-72.278141</str>


Verify the native solr geospatial searching is working using the following query syntax:
URL: http://drupal7.vm:8080/solr/select/?q=&fl=*&fq={!geofilt sfield=t_field_geolocation:lat_lon pt=42.933692,-72.278141 d=100}
By putting a distance parameter of 100 (kilometers) and Nashua NH coordinates, I get 2 results: Nashua and Portsmouth, awesome.

Create a solr integrated view.
- URL: /admin/structure/views/add
- View name: People
- Show: People
- Create a Page [checked]
- Path: people
- Continue & edit
Note: at this point, you have full reign over view configuration. For this tutorial, I set the format to Grid, and added some fields:
- Geolocation: Latitude,Longitude (indexed)
- Indexed User: Email
- Indexed User: First Name
- Indexed User: Last Name
- Indexed User: Name
Save the view when edits are complete.

Browsing to the view will show something like this:
Ex URL: http://drupal7.vm.people
People View

The next chunk of custom code modifies the solr query executed and adds geospatial filtering.
@see: hook_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query)
<?php
function MYMODULE_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {

  $lat = 42.933692;
  $lng = -72.278141;
  $distance = 100;

  $call_args['params']['fq'][] = "{!geofilt sfield=t_field_geolocation:lat_lon pt={$lat},{$lng} d={$distance}}";

}
?>


The above code will limit the view's results using the hardcoded coordinates.
People View 2

Clearly, it works but there are loose ends to tie..
- automatically fetch a user's coordinates to store in the geolocation field
- add a search form to the people view page to allow the user to search for a location (instead of hard coded coordinates, blah)
- translate the user's location search input to coordinates using an API

Hopefully, I can find more time to elaborate on this tutorial in the near future! Cheers.