Wednesday, August 23, 2006

Drupal According to a Newbie

[UPDATE: This piece is almost 4 years old. I plan on doing a large scale update soon relevant for Drupal 6]

I have been tasked with figuring out Drupal, this great but painfully cryptic CMS. What do I think? What have I found? Read on...

Forgive this breathless overview of Drupal. I have been wracking my brains to figure out how Drupal does its thing. A first glance it looks like a regular CMS like PostNuke. Then you scratch beneath the surface and Drupal almost looks like it comes out of one of those scenes out of a sci-fi movie: the hero picks up a clear crystal and inserts into a control panel to magically make the spaceship go into hyperspace. What I'm saying is that the deft behind-the-scenes work that Drupal does really amazed me. I've been working up notes for my fellow developers in a Drupal project and I wanted to share now what I have uncovered. There is much more depth to come and much more information to be found elsewhere on this site. If you're like me (a PHP coder who has seen his share of other CMS and wants to develop new modules, themes and engines) here is what I have found in my wading into Drupal.


Real Basics

index.php loads bootstrap.inc bootstrap.inc loads a bunch of definitions. The function, " conf_path()", searches for config files (see line 82 of bootstrap.inc for more information.


<p>/**<br>
* Find the appropriate configuration directory.<br>
*<br>
* Try finding a matching configuration directory by stripping the website's<br>
* hostname from left to right and pathname from right to left. The first<br>
* configuration file found will be used, the remaining will ignored. If no<br>
* configuration file is found, return a default value '$confdir/default'.<br>
*<br>
* Example for a fictitious site installed at<br>
* http://www.drupal.org:8080/mysite/test/ the 'settings.php' is searched in<br>
* the following directories:<br>
*<br>
* 1. $confdir/8080.www.drupal.org.mysite.test<br>
* 2. $confdir/www.drupal.org.mysite.test<br>
* 3. $confdir/drupal.org.mysite.test<br>
* 4. $confdir/org.mysite.test<br>
*<br>
* 5. $confdir/8080.www.drupal.org.mysite<br>
* 6. $confdir/www.drupal.org.mysite<br>
* 7. $confdir/drupal.org.mysite<br>
* 8. $confdir/org.mysite<br>
*<br>
* 9. $confdir/8080.www.drupal.org<br>
* 10. $confdir/www.drupal.org<br>
* 11. $confdir/drupal.org<br>
* 12. $confdir/org<br>
*<br>
* 13. $confdir/default<br>
*/</p>



A way to make the site sensitive and open the door for branding is to add code ca. line 125. Split the url as normal but also read the top level directory (e.g. www.site.com/Africa/Aids -- would yield "Africa"). This could be used to factor in the theme used for a section. But you don't need to meddle with the core to achieve this. Because of how modules behave, they can listen for arguments in the URL and querystring and alter the presented HTML accordingly.


The [directory]/settings.php gets loaded by the conf_init() function.


Processing Basics
-----------------
load index.php
it loads the supporting bootstrap.inc
that fires "drupal_bootstrap()" which fires:
_drupal_bootstrap("DRUPAL_BOOTSTRAP_DATABASE")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_SESSION")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_PAGE_CACHE")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_PATH")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_FULL")


"drupal_bootstrap()" does a lot of recursive calling between functions called by drupal_bootstrap() and functions that call for drupal_bootstrap(). When the phas passed into drupal_bootstrap is equal to the current drupal_bootstrap() it exits and returns.

drupal_get_filename() loads the file references from "system" database table. If a file reference isn't there, it isn't considered.


among these is the execution of "DRUPAL_BOOTSTRAP_PATH." It uses what is passed in as the $_GET['q'] by doing a load of './includes/path.inc'. Then does a "drupal_init_path()" to get you off to the races-- to declare what you do next. It treats the $_GET object like a global array and loads it with the content of the path.
drupal_init_path() is in path.inc.


_drupal_bootstrap_full() is in './includes/common.inc'

the function loads and loads these:

<p> require_once './includes/theme.inc';
require_once './includes/pager.inc';
require_once './includes/menu.inc';
require_once './includes/tablesort.inc';
require_once './includes/file.inc';
require_once './includes/unicode.inc';
require_once './includes/image.inc';
require_once './includes/form.inc';



something added after "form.inc" could take precedence over a pre-existing function. There is no programmatic way to wipe out an already user-defined function, so the functions would have to be replaced before they were called here.


After all of the preparation comes the "DRUPAL_BOOTSTRAP_FULL" call, again. A big part of this is the "module_init" function from module.inc ca. line 14


function module_init() {
// Load all the modules that have been enabled in the system table.
foreach (module_list(TRUE, FALSE) as $module) {
drupal_load('module', $module);
}
module_invoke_all('init');
}


The foreach call to the module_list function loads a list of modules to execute.
Then drupal_load fires to execute the module as named by the $module argument (line 477 bootstrap.inc).


  • It looks for what type and its name
  • If it's already been set, it doesn't try again
  • Then it gets the file name to load and confirms its existance
  • Then it includes the module. Anything in the module that is "auto fire" will be executed
  • Anything that works as a function becomes available for useage.
  • The reference is stored in the $files[$type][$name].

<load($type, $name) {
static $files = array();
if (isset($files[$type][$name])) {
return TRUE;
}
$filename = drupal_get_filename($type, $name);
if ($filename) {
include_once "./$filename";
$files[$type][$name] = TRUE;

return TRUE;
}

return FALSE;



module_invoke() and module_invoke_all() are important. If these were in a movie script, they would read some like "Insert Battle of Gettysburg here":


  • it uses php function, "func_get_args()," to get the arguments provided through the call. Arrays, objects, variables-- you name it. They all pipe through this call and become available.
  • the first value in $args is the module name. It is shifted out of the array and transferred into $module variable.
  • the second value is the hook-- the function in a module that is called.
    for example:
    $values = serialize($_GET);
    module_invoke('sassy','add',$values)

goes into module_invoke and becomes:
a call to 'sassy_add($values)'

module_add can then unpack $values through unserialize have everything that came through $_GET;
call_user_func_array is a PHP function that turns the function name and the arguments into a single function call.


function module_invoke() {
$args = func_get_args();
$module = array_shift($args);
$hook = array_shift($args);
$function = $module .'_'. $hook;
if (module_hook($module, $hook)) {
return call_user_func_array($function, $args);
}
}



module_invoke_all() is similar but more complex. It will execute a stream of functions in succession before module_invoke_all is satisfied and exited.


  • it loads the values into $args.
  • it unshifts the hook of the arguments. Each hook is an array of modules, hooks and arguments

  • it spins through a foreach of the $hook array list. It executes module_implements($hook) to get the $module value
  • module_implements needs $hook but has a optional variable of $sort
  • the static array of isset($implementations[$hook]) asks whether or not a particular hook has been executes so that it isn't repeatedly fired
  • module_list loads $list. This is recursive. It starts on its first impleentation as empty, they becomes populated.
  • If the $list is empty, it will execute either

if you need to bootstrap-- the first call-- then this is used


$result = db_query("SELECT name, filename, throttle, bootstrap FROM {system} WHERE type = 'module' AND status = 1 AND bootstrap = 1 ORDER BY weight ASC, filename ASC")
otherwise, you get this list


$result = db_query("SELECT name, filename, throttle, bootstrap FROM {system} WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC");


  • the module names get populated into $list to use outside of this function.
  • if the sort value is true, it will return the sorted list; otherwise, it will return the regular list
  • the list returns to module_implements, that is used to call the reliant functions.
  • the module_hook and $args are passed are called. The result is either:
  • if $result and is_array-- merge the $result into $return;
  • else if $result exists (but isn't an array), it's added as another cell of the $return array.

  • when all of the module calls have occurred, $return will be returned.

function module_invoke_all() {
$args = func_get_args();
$hook = array_shift($args);
$return = array();
foreach (module_implements($hook) as $module) {
$function = $module .'_'. $hook;
$result = call_user_func_array($function, $args);
if (isset($result) && is_array($result)) {
$return = array_merge($return, $result);
}
else if (isset($result)) {
$return[] = $result;
}
}



Drupal's bootstrapping is largely completed through the execution of the "module_invoke_all('init');" call-- the "init" is the hook-- the first argument pulled by module_invoke_all(), there is no second argument available, so all registered modules will fire off their [name]_init on load. If function "[name]_init()" is not present in the code in [name].module, nothing will happen upon page load. If a [name]_init does not exist, it will generate a PHP exception/warning but that warning may be supressed depending on the server settings. Because use of the call_user_func_array function generates a warning while a function call generates a critical error/script failure it means that optional elements can be missing and the Drupal code can continue to function. This is something I described a while back as "functionality slipping beneath waves." Most of the modules lack an "init" hook. The exceptions being the organizational modules like "category" and "views."



All of this recursion creates a large sum of information that is held on the index.php page in the array called $return. That is passed to theme() for processing, organization and presentation.


In a similar way "module_invoke_all('exit');" called by the function "drupal_page_footer()" will close off all modules that are "open" and have a [name]_exit function available.



Form Processing and Validation



The "_form_validate()" function in form.inc is loaded with all of the form elements and a "form_id" is optional (I feel this can be used to add in subforms but that has to be researched more) but for the most part it's used to identify what module the data is destined for. When added with the field name of "op" to command how the form submission behaves.


The form validator will go through the elements that have been passed as arguments. Each form type has the opporunity for validation rules. It is possible to programmatically set additional validation rules or even unset them with unsetting the parent form.


These are some of the qualities contained by a form element-- some come in from the type, some are set for the form invocation, some are the values set by the users. There may be an exploit in all this if a user can add these elements to a form and submit the form.


#validated
#needs_validation
#required
#title
#options -- for arrays
#DANGEROUS_SKIP_CHECK
#type
#default_value
#maxlength
#description
#value - user supplied
#error - the error message populated if there was an error (because of validation)


Forms hold their data inside of edit[name] form names.
When submitted, these elements go into a single array called "$edit"-- this a holdover from the way form input used to be introduced rather than come in from $_GET or $_POST, the material would come straight into the script.


Here's an example of what you call to populate a form element:


$form['title'] = array('#type' =&gt; 'textfield',
'#title' => t('Title'),
'#default_value' => $edit['title'],
'#maxlength' => 64,
'#description' =&gt; t('The name of the feed; typically the name of the web site you syndicate content from.'),
'#required' =&gt; TRUE,
);



When you list the $form elements in a form build statement (e.g. aggregator's function aggregator_form_feed($edit = array()) ), the order the are input is the order they appear on the HTML form by default.


When make a request for a url, whether it be the $_GET['q'] or the real path, it will get hooked by the [module]_menu.


In the case of the aggregator, it give s value to a function callback, "aggregator_form_feed" so that when you come to site.com?q=admin/aggregator/add/feed it reads the "admin/aggregator/add/feed" as the path and choose the appropriate case statement.



How Themes Work


At the top of the stack: theme('page',$return) loads the page.



theme() uses theme_get_function to get the function to call to load the page.


if a theme hasn't been set, init_theme() is called to find and load the theme references


there are three potential hooks for theme based functions. You pass in the function name it looks in this


order:



  1. [theme]_[function]

  2. [theme engine}_[function]

  3. theme_[function]



If you wanted something that overrides "theme_date", you could put it in the theme to be theme specific, or the engine to be engine specific. The actual file doesn't need to be moved into the form.inc or theme.inc or similar, it can reside in your own files as long as it doesn't not share a function name with a pre-established function. For example, "theme_date" is taken but if you named it AI_date is would take precedence and be used almost every time a date function relating to theme was used while the "AI" theme was in play. Every time a theme function is called, it called via theme_get_function() to get instill this order of precedence. When you want to call a function, we *could* call :

call_user_func_array(theme_get_function([function hook name]),$args);

but it's good form to call:

function theme($function name,$other arguments)

If I were to invent our own phpBB based theming engine, I would have to look at the hooks in an established templating engine and match those in our own engine.

Update: I was happy that I posted this here on my blog. When I posted this on the Drupal.org site, this piece got tossed without explanation. Go figure. It's myopic of the Drupal community to not know that this is a topic that daunts ALOT of people. If it stymies someone, they will move on to another CMS application.

Wednesday, August 16, 2006

I Cannot Put "Louis Vuitton" into the Text of my Blog

I got this today from "ebay.ca". It strikes me as odd. Warning me off from some keywords? When I quizzed ebay.ca, they said, "I believe that these two companies believe its an infringement of their trademark." Idiots. Don't trademark McDonalds. What are you? McDonalds? Wendy's? One of the Safeway family?

It doesn't really matter much. I thought CJ-- Commission Junction-- was kind of a dud. Only one person I know of has made cash off them by a carefully executed/semi-legal scam.

What do you think?

Dear Mike DeWolfe,

eBay would like to ask, if you are involved in any pay-per-click advertising, that you refrain from purchasing and/or using any of the following keywords or derivations thereof immediately:

* Louis Vuitton (Vuitton) or Christian Dior (Dior)

This also means that you may not purchase any derivatives or misspellings of these keywords. (For example, you may not purchase Louis Vuitton, LouisVuitton, Vuitton, Louuis Vuitton, Louis Viuitton, LOUS VUITTON, Louis Vuiton, LOUIS VUTTON, Louis Vitton, LOUIIS VUITTON, Loui Vuitton, LOUIS VUTTON, : LouuisVuitton, LouisViuitton, LOUSVUITTON, LouisVuiton, LOUISVUTTON, LouisVitton, LOUIISVUITTON, LouiVuitton, LOUISVUTTON, Christian Dior, CHRISTIAN DIOR, Chrstian Dior, CHRESTIAN DIOR, Chrstian Dior, chistian dior, Chistian Dior, CHRISTION DIOR, CHISTIAN DIOR, Chistian Dior, Christian Diior, Chritian Dior, CHRITIAN DIOR, ChristianDior, CHRISTIANDIOR, ChrstianDior, CHRESTIANDIOR, ChrstianDior, chistiandior, ChistianDior, CHRISTIONDIOR, CHISTIANDIOR, ChistianDior, ChristianDiior, ChritianDior, CHRITIANDIOR, Dior, DIOR, DOIR, Doir, Diior, DIIOR, among other spellings or combinations). Please note that this list is not exhaustive and is provided to assist you in deteriming the types
of keywords that cannot be purchased.

Please add each of the above types of terms to the Negative Match Suppression List within your Google AdWords account immediately. They must also be added to the negative match suppression list for every other paid search advertising campaign you may be running in connection with the eBay Affiliate Program.

There are also changes you must make with respect to your ad copy. It is important that no ad copy include any of the keywords above in the:

a) Title of the advertisement, b) display URL, or c) body of the advertisement.

Publishers who disregard this request will be in violation of the eBay Supplemental Terms & Conditions as well as the Publisher Service Agreement. Violations of these agreements may result in immediate termination from the eBay program and/or permanent deactivation from the

Thank you for your prompt attention to this matter.

Sincerely,

The eBay.ca Affiliate Team


---------------------------------------
This message was sent by an advertiser in the Commission Junction network
based on the mail settings selected in your account. Commission Junction
does not send messages to individuals outside of its network and guards
the privacy of all information received. To unsubscribe from receiving......
Commission Junction Network.

Thank you for your prompt attention to this matter.

Sincerely,

The eBay.ca Affiliate Team

Sunday, August 13, 2006

Does Adsense Make No Sense?

Adsense makes a big deal about disallowing the posting of details: how many clicks equal plus many clickthroughs and how much they make. They don't allow you to keep out some advertisers (e.g. talk about birth control and a scroll of viagra ads could run down your right nav bar). They pick on little Adsense users who generate less than 1000 hits per day-- those who create a lot of work per click-- while big time scam artists who create massive click circles get the red carpet treatment.

Then a piece from ProductWiki came to my attention:

We are in an interesting position at ProductWiki as we generate our revenue from two different sources of pay-per-click (PPC) advertising: Google AdSense and Shopping.com Merchant Listings. These ads show up on all of our product pages (never at the same time) and each type of ad gets approximately a 50% share of our page views across a broad spectrum of products.

I've compiled data contrasting the performance of Shopping.com and Google AdSense on ProductWiki taken from a one week period at the end of last month.


Shopping.comAdSense
Clickthrough rate (%)29%6.5%
Revenue per click ($/click)$0.21$0.19
eCPM ($/1000 impressions)$59$13

Taking a look at the most significant of these figures (eCPM), Shopping.com outperforms AdSense by a factor of 4.6!

Here’s a brief explanation of how the ads work on ProductWiki.

Google AdSense

AdSense is very simple to implement. When a product page is requested by a user, simultaneously a request is made to Google Ad Services to deliver a banner ad containing two links from their ad database. The database is populated by advertisers. Google has already analyzed the product page and attempts to deliver the most relevant ads. We have neither control on how the ads are formatted (with the exception of some minor font and colour tweaking), nor which ads are chosen.






You would expect to see very relevant results since the theme of our site is consumer products, but it’s often not the case. In the above example, we see an ad for “creative products” and for a Zen Vision case, not exactly what we were going for.

Shopping.com Merchant Listings

Wherever possible we display links to online merchants who sell the product in question. This is handled by our software interacting with a Shopping.com XML Web Service that pulls a specific set of ads for each of our products. Unlike AdSense, we are able to control both what ads are displayed and how they look.





Toshiba Gigabeat MEG-F40 with Shopping.com


Why do Shopping.com Ads do so much better?

1. Relevancy, relevancy, relevancy
By far the most significant reason for such a huge gap in performance is relevancy. In our previous AdSense example, the user is presented with two vaguely relevant ads that have only little to do with the Creative Zen Vision:M. Whereas, in our Shopping.com example with the Toshiba Gigabeat, you see links to three reputable merchants who sell the exact mp3 player you’re looking at.

2. Style
Our integration with Shopping.com allows us to style the ads in almost any way we choose, and thus we are able to maintain a consistent theme throughout the page. As far as AdSense is concerned, since we have very little say in how they are displayed, we struggle in creating a consistent theme.

3. The “ignore” factor
AdSense is omnipresent on the Web and, like a boring TV ad, people are getting better at tuning them out. (Take a look at this post by Seth Godin). That portion of our page might as well be invisible.

---

Google does have something that Shopping.com does not – breadth. Their database of ads is much more extensive than anything Shopping.com has to offer beyond products; they have ads for blogs, publishers, services, etc.

So how do you get the best of both worlds: the relevancy and style of Shopping.com ads coupled with the breadth of Google AdSense? Simple. Google needs to allow publishers to control what ads are displayed and how they are styled. Another possible improvement is to allow advertisers to classify their ads across broad categories (product, service, blog, etc.) and then publishers could exclusively select ads from those categories.

In the meantime, don’t merely be satisfied with Google Adsense without exploring other alternatives. While our situation is specific to consumer products, I believe there is likely something out there that will be more prosperous for the theme of your website.

Update: Eric Giguere has written an excellent piece on his blog providing detailed suggestions on what we can do to improve our AdSense performance. We plan to implement these changes and examine how much of an effect they have on revenue.
Google is the 400 lb. gorilla. Every one wants to play in the biggest ballpark and knock one out of the part. Take a lesson from Google and hop into the time machine. Whiz back to the year of 1997. Yahoo, Altavista, Webcrawler and Northern Lights were the the big players in the big game on the web: search engines. Then a plucky little player started showing up in my web logs: "googlebot". I checked out the related website. Google. I thought it was a test site for someone's project: there were no ads, no clutter.
Google went into Yahoo's ballpark and won the World Series. Google Adsense is great idea. But I think it's the equivalent of Yahoo. If a company can address these concerns, it will do to Google what Google did to Yahoo:
  • Allow the blockage of advertisers. Open the list and allow the blockage of segments of the advertising community. This way abortion ads don't show up next to a pro-life article.
  • Reveal ad rates: show warts and all. Adsense's mandate of secrecy is hiding what everyone knows: 1% of the people are making 99% of the money.
  • Put ads on videos. This is being done now in a limited way. Youtube could throw ads onto their Flash delivered in a heartbeat. Software can detect scene breaks: at the scene break of a large piece, throw in a contextual ad.
  • Send ads as videos. There are plenty of Flash ads on the web. Whoever gets into the game shouldn't be afraid of using this medium. No? Checkout the ads on TVGuide.com. They knocked me on my ass.
  • Allow site sponsors: Adbrite does this and its a good idea. Many hits or few, there should be a way for someone to sponsor your site. Maybe they don't even need to run ads-- just throw you some cash.
  • Toss bad actors but leave those with bad attention to detail: I have to wonder how much more likely it is that Google Adsense clips an account when it's at $99 vs. $1.
Starting an Adsense competitor isn't hard. Here's what you need:
  • A weberver and a domain. Any memorable name would do-- even if you started with the husk like a dud domain. The webserver would have to handle 1GB/mo. traffic in its first six months; 10GB/mo. in it's second six months; 15GB/mo. + 20% of the prior month's traffic thereafter.
  • A way to create distributeable ads via a script. The application has to be small, smart and robust. In other words: no 500 Errors.
  • Fraud detection. Do a search for "Liposuction" on Google and check out how the top three hits look like. They're awash with Google Adsense ads. Tell me that isn't fraud. Tell me Google isn't complicit in all of this.
  • A way to read pages for their true context. With AJAX you can add your message; or remove contextually beneficial material via the client. Google Adsense cannot sense that so there is a huge way to hoodwink Adsense. The Adsense replacement has to be smarter than that.
  • A way to promote the new creation. A good idea with bad promotion doesn't get as far as a bad idea with good promotion (Vonage, anyone?)
  • A good accounting system. You need to know when to pay people, how to pay them and when fraud is afoot. I have three Adbrite cheques sitting on my bulletin board: $0.07, $0.45, $0.33. Adsense holds back until you hit $100. How about for amounts less than $10, you have to pay $5 handling to get a cheque, but above that, you get more cheques, even if they're smaller.
The person who comes up with this will maybe spend $10,000 on the application development; $10,000 in the first year to run the server; and an unguessed amount on promotion. Done right, this could rob a 400 lb. gorilla of its bananas.