I have been tasked with figuring out Drupal, this great but painfully cryptic CMS. What do I think? What have I found? Read on...
Forgive this breathless overview of Drupal. I have been wracking my brains to figure out how Drupal does its thing. A first glance it looks like a regular CMS like PostNuke. Then you scratch beneath the surface and Drupal almost looks like it comes out of one of those scenes out of a sci-fi movie: the hero picks up a clear crystal and inserts into a control panel to magically make the spaceship go into hyperspace. What I'm saying is that the deft behind-the-scenes work that Drupal does really amazed me. I've been working up notes for my fellow developers in a Drupal project and I wanted to share now what I have uncovered. There is much more depth to come and much more information to be found elsewhere on this site. If you're like me (a PHP coder who has seen his share of other CMS and wants to develop new modules, themes and engines) here is what I have found in my wading into Drupal.
Real Basics
index.php loads bootstrap.inc bootstrap.inc loads a bunch of definitions. The function, " conf_path()", searches for config files (see line 82 of bootstrap.inc for more information.
<p>/**<br>
* Find the appropriate configuration directory.<br>
*<br>
* Try finding a matching configuration directory by stripping the website's<br>
* hostname from left to right and pathname from right to left. The first<br>
* configuration file found will be used, the remaining will ignored. If no<br>
* configuration file is found, return a default value '$confdir/default'.<br>
*<br>
* Example for a fictitious site installed at<br>
* http://www.drupal.org:8080/mysite/test/ the 'settings.php' is searched in<br>
* the following directories:<br>
*<br>
* 1. $confdir/8080.www.drupal.org.mysite.test<br>
* 2. $confdir/www.drupal.org.mysite.test<br>
* 3. $confdir/drupal.org.mysite.test<br>
* 4. $confdir/org.mysite.test<br>
*<br>
* 5. $confdir/8080.www.drupal.org.mysite<br>
* 6. $confdir/www.drupal.org.mysite<br>
* 7. $confdir/drupal.org.mysite<br>
* 8. $confdir/org.mysite<br>
*<br>
* 9. $confdir/8080.www.drupal.org<br>
* 10. $confdir/www.drupal.org<br>
* 11. $confdir/drupal.org<br>
* 12. $confdir/org<br>
*<br>
* 13. $confdir/default<br>
*/</p>
A way to make the site sensitive and open the door for branding is to add code ca. line 125. Split the url as normal but also read the top level directory (e.g. www.site.com/Africa/Aids -- would yield "Africa"). This could be used to factor in the theme used for a section. But you don't need to meddle with the core to achieve this. Because of how modules behave, they can listen for arguments in the URL and querystring and alter the presented HTML accordingly.
The [directory]/settings.php gets loaded by the conf_init() function.
Processing Basics
-----------------
load index.php
it loads the supporting bootstrap.inc
that fires "drupal_bootstrap()" which fires:
_drupal_bootstrap("DRUPAL_BOOTSTRAP_DATABASE")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_SESSION")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_PAGE_CACHE")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_PATH")
_drupal_bootstrap("DRUPAL_BOOTSTRAP_FULL")
"drupal_bootstrap()" does a lot of recursive calling between functions called by drupal_bootstrap() and functions that call for drupal_bootstrap(). When the phas passed into drupal_bootstrap is equal to the current drupal_bootstrap() it exits and returns.
drupal_get_filename() loads the file references from "system" database table. If a file reference isn't there, it isn't considered.
among these is the execution of "DRUPAL_BOOTSTRAP_PATH." It uses what is passed in as the $_GET['q'] by doing a load of './includes/path.inc'. Then does a "drupal_init_path()" to get you off to the races-- to declare what you do next. It treats the $_GET object like a global array and loads it with the content of the path.
drupal_init_path() is in path.inc.
_drupal_bootstrap_full() is in './includes/common.inc'
the function loads and loads these:
<p> require_once './includes/theme.inc';
require_once './includes/pager.inc';
require_once './includes/menu.inc';
require_once './includes/tablesort.inc';
require_once './includes/file.inc';
require_once './includes/unicode.inc';
require_once './includes/image.inc';
require_once './includes/form.inc';
something added after "form.inc" could take precedence over a pre-existing function. There is no programmatic way to wipe out an already user-defined function, so the functions would have to be replaced before they were called here.
After all of the preparation comes the "DRUPAL_BOOTSTRAP_FULL" call, again. A big part of this is the "module_init" function from module.inc ca. line 14
function module_init() {
// Load all the modules that have been enabled in the system table.
foreach (module_list(TRUE, FALSE) as $module) {
drupal_load('module', $module);
}
module_invoke_all('init');
}
The foreach call to the module_list function loads a list of modules to execute.
Then drupal_load fires to execute the module as named by the $module argument (line 477 bootstrap.inc).
- It looks for what type and its name
- If it's already been set, it doesn't try again
- Then it gets the file name to load and confirms its existance
- Then it includes the module. Anything in the module that is "auto fire" will be executed
- Anything that works as a function becomes available for useage.
- The reference is stored in the $files[$type][$name].
<load($type, $name) {
static $files = array();
if (isset($files[$type][$name])) {
return TRUE;
}
$filename = drupal_get_filename($type, $name);
if ($filename) {
include_once "./$filename";
$files[$type][$name] = TRUE;
return TRUE;
}
return FALSE;
module_invoke() and module_invoke_all() are important. If these were in a movie script, they would read some like "Insert Battle of Gettysburg here":
- it uses php function, "func_get_args()," to get the arguments provided through the call. Arrays, objects, variables-- you name it. They all pipe through this call and become available.
- the first value in $args is the module name. It is shifted out of the array and transferred into $module variable.
- the second value is the hook-- the function in a module that is called.
for example:
$values = serialize($_GET);
module_invoke('sassy','add',$values)
goes into module_invoke and becomes:
a call to 'sassy_add($values)'
module_add can then unpack $values through unserialize have everything that came through $_GET;
call_user_func_array is a PHP function that turns the function name and the arguments into a single function call.
function module_invoke() {
$args = func_get_args();
$module = array_shift($args);
$hook = array_shift($args);
$function = $module .'_'. $hook;
if (module_hook($module, $hook)) {
return call_user_func_array($function, $args);
}
}
module_invoke_all() is similar but more complex. It will execute a stream of functions in succession before module_invoke_all is satisfied and exited.
- it loads the values into $args.
- it unshifts the hook of the arguments. Each hook is an array of modules, hooks and arguments
- it spins through a foreach of the $hook array list. It executes module_implements($hook) to get the $module value
- module_implements needs $hook but has a optional variable of $sort
- the static array of isset($implementations[$hook]) asks whether or not a particular hook has been executes so that it isn't repeatedly fired
- module_list loads $list. This is recursive. It starts on its first impleentation as empty, they becomes populated.
- If the $list is empty, it will execute either
if you need to bootstrap-- the first call-- then this is used
$result = db_query("SELECT name, filename, throttle, bootstrap FROM {system} WHERE type = 'module' AND status = 1 AND bootstrap = 1 ORDER BY weight ASC, filename ASC")
otherwise, you get this list
$result = db_query("SELECT name, filename, throttle, bootstrap FROM {system} WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC");
- the module names get populated into $list to use outside of this function.
- if the sort value is true, it will return the sorted list; otherwise, it will return the regular list
- the list returns to module_implements, that is used to call the reliant functions.
- the module_hook and $args are passed are called. The result is either:
- if $result and is_array-- merge the $result into $return;
- else if $result exists (but isn't an array), it's added as another cell of the $return array.
- when all of the module calls have occurred, $return will be returned.
function module_invoke_all() {
$args = func_get_args();
$hook = array_shift($args);
$return = array();
foreach (module_implements($hook) as $module) {
$function = $module .'_'. $hook;
$result = call_user_func_array($function, $args);
if (isset($result) && is_array($result)) {
$return = array_merge($return, $result);
}
else if (isset($result)) {
$return[] = $result;
}
}
Drupal's bootstrapping is largely completed through the execution of the "module_invoke_all('init');" call-- the "init" is the hook-- the first argument pulled by module_invoke_all(), there is no second argument available, so all registered modules will fire off their [name]_init on load. If function "[name]_init()" is not present in the code in [name].module, nothing will happen upon page load. If a [name]_init does not exist, it will generate a PHP exception/warning but that warning may be supressed depending on the server settings. Because use of the call_user_func_array function generates a warning while a function call generates a critical error/script failure it means that optional elements can be missing and the Drupal code can continue to function. This is something I described a while back as "functionality slipping beneath waves." Most of the modules lack an "init" hook. The exceptions being the organizational modules like "category" and "views."
All of this recursion creates a large sum of information that is held on the index.php page in the array called $return. That is passed to theme() for processing, organization and presentation.
In a similar way "module_invoke_all('exit');" called by the function "drupal_page_footer()" will close off all modules that are "open" and have a [name]_exit function available.
Form Processing and Validation
The "_form_validate()" function in form.inc is loaded with all of the form elements and a "form_id" is optional (I feel this can be used to add in subforms but that has to be researched more) but for the most part it's used to identify what module the data is destined for. When added with the field name of "op" to command how the form submission behaves.
The form validator will go through the elements that have been passed as arguments. Each form type has the opporunity for validation rules. It is possible to programmatically set additional validation rules or even unset them with unsetting the parent form.
These are some of the qualities contained by a form element-- some come in from the type, some are set for the form invocation, some are the values set by the users. There may be an exploit in all this if a user can add these elements to a form and submit the form.
#validated
#needs_validation
#required
#title
#options -- for arrays
#DANGEROUS_SKIP_CHECK
#type
#default_value
#maxlength
#description
#value - user supplied
#error - the error message populated if there was an error (because of validation)
Forms hold their data inside of edit[name] form names.
When submitted, these elements go into a single array called "$edit"-- this a holdover from the way form input used to be introduced rather than come in from $_GET or $_POST, the material would come straight into the script.
Here's an example of what you call to populate a form element:
$form['title'] = array('#type' => 'textfield',
'#title' => t('Title'),
'#default_value' => $edit['title'],
'#maxlength' => 64,
'#description' => t('The name of the feed; typically the name of the web site you syndicate content from.'),
'#required' => TRUE,
);
When you list the $form elements in a form build statement (e.g. aggregator's function aggregator_form_feed($edit = array()) ), the order the are input is the order they appear on the HTML form by default.
When make a request for a url, whether it be the $_GET['q'] or the real path, it will get hooked by the [module]_menu.
In the case of the aggregator, it give s value to a function callback, "aggregator_form_feed" so that when you come to site.com?q=admin/aggregator/add/feed it reads the "admin/aggregator/add/feed" as the path and choose the appropriate case statement.
How Themes Work
At the top of the stack: theme('page',$return) loads the page.
theme() uses theme_get_function to get the function to call to load the page.
if a theme hasn't been set, init_theme() is called to find and load the theme references
there are three potential hooks for theme based functions. You pass in the function name it looks in this
order:
- [theme]_[function]
- [theme engine}_[function]
- theme_[function]
If you wanted something that overrides "theme_date", you could put it in the theme to be theme specific, or the engine to be engine specific. The actual file doesn't need to be moved into the form.inc or theme.inc or similar, it can reside in your own files as long as it doesn't not share a function name with a pre-established function. For example, "theme_date" is taken but if you named it AI_date is would take precedence and be used almost every time a date function relating to theme was used while the "AI" theme was in play. Every time a theme function is called, it called via theme_get_function() to get instill this order of precedence. When you want to call a function, we *could* call :
call_user_func_array(theme_get_function([function hook name]),$args);
but it's good form to call:
function theme($function name,$other arguments)
If I were to invent our own phpBB based theming engine, I would have to look at the hooks in an established templating engine and match those in our own engine.
Update: I was happy that I posted this here on my blog. When I posted this on the Drupal.org site, this piece got tossed without explanation. Go figure. It's myopic of the Drupal community to not know that this is a topic that daunts ALOT of people. If it stymies someone, they will move on to another CMS application.
Comments