Saturday, April 24, 2010

Sitting In Drupalcon's Cheap Seats

A colleague does a lot of Drupal module development. He's a cool and knowledgeable guy and in the Drupal Association. He spoke of the legend of the Drupalcon. In the last six years, Drupalcon attendance number have doubled with each session. What started with a bunch of coders in a pub basement has now grown to 3000+ people. Massive rooms capable of holding 800 people were too small to contain the crowds. Birds of a Feather-- ad hoc sessions of people who were like minded-- were packed with 30-60 people apiece. The scale of this event was massive. It speaks to the growth of the Drupal content management system.
Has it grown to be too big?

The State Of Drupal

According to Dries Buytaert, the originator of Drupal, Drupal powers 1% of the websites out there. Given the market fragmentation and how many people roll their own designs, having one CMS power 1% of them is massive. That said, Wordpress is three times more popular than Drupal. They are different products: Wordpress is for blogging; Drupal is for anything.

Weakness Is Not Strength

Drupal has some weaknesses. It's comparatively solid for security; it's good for internationalization but it's weak for scalability. There was a great talk(2) from Khalid Baheyeldin about how to extend the CMS for a massive amount of processing. The basics: use fewer modules, build your server specific for Drupal and shun some crawlers. If I could do this at my day job, I'd be able to take up golf and not spend day after day in terror as 158 modules pop like popcorn vs. 100,000 page views from Google each day. Many of the talks were about getting Drupal to behave better in a high traffic environment. I think that says it all: Drupal has performance problems otherwise you wouldn't be trying to fix it. You can win big from its flexibility, but that flexibility comes at a cost.
I think some concepts need to considered:
- One ideal module made to accomplish your site's work. Start with core and expand by almost nothing at all. Go back to coding and sink the time into the development work that happens with most sites.
- Not everything needs a module. Google Analytics? Google Adsense? Google and other sites make their tools so portable that they are wholly driven by the client side. Why are there so many modules in Drupal that do what should happen at the bottom of your theme? I will populate a block with a Google AdSense block. Why do it some other way?
- Hard wiring is not a sin (nor is hacking core, but I'll go to Drupal Hell if I suggest that). You get full reign of your themeing. Why would a fixture that is on every page in your theme go into something programmatically derived? On one site, I've done, the blocks start AFTER a number of divs where the elements are hardwired into place. This cuts down on the amount of function calls and database work.
- More code is not better. Drupal 7 is promised to come with more code. One Twitterer fired up Drupal 7 to see it face plant. Bug reports say that Drupal 7 busts a seam when it tries to do an update. None of this is surprising: Drupal 7 is still under development.
- Put your busy-work into the client side as much as possible. Lean on Ajax, good design and smart usage of CSS. For example, Google will de-list you if you present Google friendly code that is different from what you produce for non-Google users. But you can present the same code to everyone and rely on the idea that, for now, Google will not be able to entirely parse the code in the same way a browser does-- then give those capable of using the code something better than those that cannot (dynamic loading, dynamically sourced forms, etc.).
- Be agnostic. Drupal has some set aside directories (like modules, includes and themes). That leaves many options for sub directory names. You can install Wordpress inside of the sub-directory of a Drupal site. Don't be afraid to use multiple technologies. You wouldn't move your apartment using your Honda Civic, when the Dodge pick-up is available. Don't mistake that some applications can satisfy niche roles better than Drupal can.

The Forest for the Trees Problem

I went to the PHP Conference in Vancouver in 2007; and the OpenWeb Conference in 2009. Their conference sizes were large but okay. I was able to network with a number of people and have some good conversations. Drupalcon's 3000 attendees meant that I met many people once and only once. There were so many Birds of a Feather sessions that I couldn't zero in a breadth of topics, but strafe what I could. The saving grace: many of the Drupalcon events are available online. I could have watched these from home. With the use of the IRC channels, I could have communicated with the community. With this number of attendees, I needed some specialization. I sat with module contributors; librarians and entrepreneurs. I was hard pressed to find someone in my boat.
There are many categories of people at Drupal:
Industry: Public Sector, Private Sector, Start-ups and Hobbyists
Specialty: Core coding, Coding, Module Development, Theming, Architecture, Entrepeneurialism.
Ideally, Drupal needs to split along something like these 24 segments (I'm not going to presume I have the recipe with these categories). I could get juxtaposition from a module developer bent on satisfying the needs of libraries; but I really needed to talk to someone who is getting a lot out of their install. If dating technology could have been applied to the attendee dynamic, then people could have met people who were one step up the skills ladder.
I wish there were more days-- there were-- the unconference before; and the core developer summit afterwards, but they felt as though they were not open to everyone. Worse still: I had the option of training on the Sunday-- or the Unconference.
I could have learned what I did via a tutorial tree-- here's what you know, there's what you don't-- so that you can spend the time of discoveries and skill building.

Sharing The Piss But Not The Recipe

There was another more annoying occurrence at this conference. I went to a talk by people from Four Kitchens. In short, Four Kitchens ROCKS. Their sites looks good. Their Pressflow distribution is a great way to ruggedize Drupal. When they offered a talk from taking a Photoshop mock-up and making it into a theme, I leapt at the chance to attend. They described what a Drupal theme was (thanks-- I spent 8 hrs. in a room on Sunday re-learning that). Then they opened up the floor to questions. They glossed over how to start with a PSD and end up with a Drupal theme. That was the whole point of the session. Lots of people are doing it-- I know they are. I've cut up Fireworks designs. What I didn't know was how to take a Photoshop mock-up and convert it. I still don't know, but I know that Four Kitchens does it and they do it well.
I cannot singly fault them. This is a big topic. I hit this problem multiple times. Hearing "it's easy" but getting no details is like those affiliate marketers who say "I make $20,000 a week!" but don't back it up with real details-- SHARE.
I really wish the talks could be vetted before they are approved. I went through a number of sessions that were VERY basic, so much so that I didn't know how someone could run to a conference specialized for Drupal but not know some of these basics.
The conference needed to be split into basic, intermediate and advanced and really stick to it. 20 APIs Every Drupal Developer Should Know is an example of a session that was ranked for "Basic, Intermediate"-- really it was "Basic Developer" (too low for me); and "Intermediate Architect" (about right for my wife who is a site admin, but she doesn't code though she should know what's out there). When it said "Intermediate" I thought it was in my league. When it said "APIs" I thought it was about APIs-- Daylife, Flickr, Google, etc.. No: it was about 20 types of Drupal modules. My bad.

Were I to run Drupalcon, I would have done it differently:
- Split the conference into lots of micro conferences. Hold them nearly in parallel with some shifting so that the entrepreneurs, coders and themers don't need need to jockey for the same rooms.
- Hold training, but not at the conference. The problem there: some places run diploma-mill training that results in their students washing windows (really). Lullabot, Zivtech and Lynda are great and should be used more. What I would like: a question-answer of what there is to know vs. what you don't know. At my theming training I learned about (theme)_preprocess_(specialty). I wish I could have jumped to that then off to something else new. Group training sessions don't work like that.
- Match people to spark conversations. Let people code themselves and look for people up and down the skills tree. There were 3000 people at Drupalcon. I liked speaking with everyone I spoke with; but I would have liked to speak with the 30 people who could fill in my blanks-- only I didn't know who they were.
- Open the cookbooks. People should share their Drupal recipes as much as their bosses will allow. I was able to share my biography and session schedule, but I could not share how I built propertypast.com or thosedewolfes.com.

The Good Stuff

Beyond learning and emersing myself in Drupal-land, I came up with some good ideas and a better understanding of Drupal. I also learned how much I have managed to wring out of Drupal and Apache. Others are pulling off the same result by lopping out 120 modules; or doing the same as what we're doing but with four servers and not one. I left one person speechless when I told them that the site had 36 themes. Yep: I know we have about 30 themes too many.
I have an idea for a module-- well, a framework concept as well as a migration synchronization module.
I have an idea for a site to support Drupal people. Ideally, it should go on Drupal.org. Realistically, I'll put it on prefabsite.net.
I have an idea for a Photoshop plug-in. First, I have to become a Photoshop plug-in developer expert. It's like Colombus wanting corn: first he has to travel to the New World.

Monday, April 19, 2010

Fat Men and Luggage

I'm really fat-- there's no way to sugar coat it (if you did, maybe you'd expect me to eat it). I'm at the SF Drupalcon, which means I am carting my digital essentials: a laptop, a power strip and a SLR digital camera, all wrapped in a bike bag and associated with some hard won swag (oh, hosting? tell me more! [reaches for t-shirt]...).
As a fat man the last thing I need is luggage-- it is literally more to lug around. If anything, I need less-- like no laptop bag and some subspace generator that weighs me in at a feathery 160 lbs.-- so that I can move around best.
I came to this epiphany today and then an hour later, Dries Buytaert gave his inspiring key note address. In looking ahead to Drupal 7, he said that it will have more code (read: poorer performance) but be more scalable. That's great... er, sort of. I remember Steve Ballmer talking about how his staff would boast about kloc-- 1000 line blocks of code. The more klocs you wrote, the better you were at programming. This is why Windows 3.1 went from humming a 486 with 64MB of RAM all the way to needing 1GB+ on a screaming machine to run Vista. More lines may be the only solution to some problems, but sometimes the bulk happens for its own sake.
If the code is too bulky it brings in too much overhead before you get to the real source of overhead: content and data relationships.
Drupal needs to make a left turn in its bulking trend. Drizzle is an example of how to go. Drizzle is a fork of MySQL made to be more modular and strip out extraneous code. Most coders pride themselves on how much work they as reflected in their volume of code. Drizzle is priding themselves on how much code they have taken out. They have removed thousands of lines of code. There are data types that no one uses, so Drizzle doesn't have them. Pressflow is a distribution of Drupal 6 that gets rid of the fringe code that is meant to satisfy PHP4 pecadilloes. It has less code which makes it faster and more scalable. It does backport advantages of Drupal 7 and share its own advantages into Drupal 7.
How do you do complex stuff in a potentially free form way without requiring alot of code? That's the devil in the details. Some ideas:
Context: Factor in context to simplify the data set. I hit the url path functions and was amazed that it searched for all matches and never brought arg(0) into the mix as a way to pre-filter the results.
Make Your Function Respond Like a Good Witness: Have you watched those courtroom dramas where the witness volunteers information and suddenly gets implicated in a murder? Less is more. When you look at the node object in Drupal 5 and 6, it brings in the data in many ways repeated off of several trees. That's a lot of bulk to move around and traverse. If the output is just what you expect then you don't have excess to trim, ignore or suffer with. Functions needs to be able to give what is required and have some extensibility capacity.
Lean of the Language and the Environment: Waaay back in Drupal 4.7 we had a glitch in CCK. I leaned on a PHP function to mend the problem-- easy problem to solve with an easy and solid function. . Like a good Drupaller, I suggested the change as a patch. I was scolded by the maintainer that I should not use a PHP function to resolve this function. WTF? I can't use PHP in Drupal? I understand the logic and the strength of abstracting the mysql_fetch_row() but some functions do not need abtraction-- they work fine. In some cases, there is road that does not need to be re-paved. Adding more functions to repeat the work of what is natively available in PHP or MySQL bulks up the Drupal code and will impact performance. It will give the fat man too much luggage.
Looking at Drupal 7's projected big red bar to indicate its code size, I am concerned how much luggage it going to get carried around in the next version.

Thursday, April 08, 2010

JQuery block loading

It's great to have lots of content. Sometimes you need even content from offsite. If you link to an offsite source that can drop your page rank, feeding page rank to the source (the "Articlebase effect" wherein a site full of mediocre articles gets all of the traffic). You cannot serve two versions of your site (one for Google to gain favour; one for real people). Google will de-index your site in response. Google is an 800-lb. gorilla: on site I work with, they are responsible for about 50% of the page views. The real users make up a small minority of the page views. You need to cushion your processing to accomodate this huge amount of non-human usage. If you could serve less content that could mean less database access-- less work to gather the page contents. But, you need to give users as much content as possible to hook them. You need to serve the same page to all. But, you can capitalize on the technical limitations of spiders. For a while (eg. ignore this article in you're reading it in the Spring of 2011), search engine spiders are not Javascript/JQuery sensitve. So, you can deliver the same page to all, but use JQuery and Ajax loading to bring in supplemental content. If the content points to offsite links, it may send users away from your site, but it will not contribute to lowering your page rank.

<script type="text/javascript">

window.onload = function() {
$("#ajax_window").load("http://mike.dewolfe.bc.ca/cooking").fadeIn("slow").slideDown('slow')};
</script>


<div id="ajax_window"></div>

The above code is an example of what I may use on my mike.dewolfe.bc.ca site. It leans on the existance of JQuery. The page it calls load into the "ajax_window" DIV. I do have to call a local resource for the sake of simplicity because of XSS safeguards in my browser. Rather than use "document.ready()" a classic way of gauging a page load, I use window.onload(). This is because I subscribe to the Steve Souder tips on High Performance Web Sites: Essential Knowledge for Front-End Engineers and put my script calls at the bottom of the page. The problem with that is that the JQuery library loads late on a page and the function calls in the page that reference it will choke and fail. window.onload() is a basic Javascript function, so it will always be recognized. When the page is loaded, the JQuery functions are present and window.onload() will allow those functions to be called.

Monday, April 05, 2010

What am I trying to do with Property Past

I had a tense dim-sum with my sister a few weeks ago. She announced that she was going to move because their house was infested with mold. How many grow-ops, crack shacks and murder houses are out there that get a lick of paint and then get put out on the market. Our own 97 year old house has been through so many renovations and expansions that I don't know what it began as (I think I'm sleeping in the original kitchen). Properties have pasts. If they are not sterling, a realtor will not mention them. I used to live at 1170 Tattersall. On its bright side, it was built by the McGills of local McGill and Orme fame. It was a nice house. On its dark side, it was used by a local thief: he had so much booty stuffed into the basement suite, he could barely shut the door (this according to police).
Enter www.propertypast.com. It's a site built with Drupal 6 and using the nice Color Paper theme (I like that theme). What I did that was a little different: I built it with four vocabularies (street numbers, streets, cities, regions). These vocabularies are what organize and describe the content. There are many Elm Streets, but only one Elm Street per city-- so the Elm Street gets recycled and addresses need three or four vocabularies of terms to make a match. I altered the node to show the vocabularies, control their output and append associated terms. For example, with this link (http://propertypast.com/node/10), we have 728 Pembroke Victoria, British Columbia The 728 includes all of the terms. The Pembroke include Pembroke, Victoria and British Columbia. Victoria is Victoria and British Columbia. As you get out from the exact match, the links can take you to a broader match.

The trick from here: getting the data. I would LOVE it if people would contribute the data. I would also like to find a way to mass import pre-existing data: MLS sale prices, BC Assessment prices, crime reports, etc.. That may be the Achilles Heel of this process. Here's my call to you: if you know of a home with a colorful past, add it here at Property Past. Perhaps I will list this on my growing list of web projects...