Archive for the 'technobabble' Category

A thousand games for the iPhone

I used to love playing those old (now retro) style games. You know, those point-and-click adventure games like Day of the Tentacle, Larry, Monkey Island, Police Quest, Kings Quest, Loom, Freddie Pharkas, Full Throttle and lots lots more. The graphic quality was good (i really love the cartoonisch style), nice music and excellent gameplay. Next to the adventure games there were really good level based games like The Lost Vikings (this game was really terrific), .. (ok, i was more into the adventures).
Yesterday i was checking the Apple App Store and saw the kind of games that are being sold for the iPhone today. They’re just like (or actually not as good as) the games i mentioned above. Although game engines (like Lucas’s SCUMM) are available for the iPhone (but unfortunatly not in in the App Store) almost all of the above games are still licence restricted so technically you’re not allowed to play them unless you’ve bought an orginal copy back in the early 90’s. This is a request to all the SIERRA and Lucas Art like companies out there, either sell your games online for a dollar so we can play them on our iphone (which is a brilliant platform for these kind of games) OR drop the licence and contribute to the world!
Same goes ofcourse for all the brilliant games (more important, gameplay) created for the C64 and MSX systems in the 80’s.

AWS seems down!

It seems the most important of the Amazon’s webservices are down. EC2 (the Elastic Computing), S3 (storage) and SQS (the Queing service) are all reporting red on CloudStatus. Hope it comes up soon and i’m very curious about the cause.

AWS down

Website migration SEO and Performance Checklist

Everybody has seen it happen before, just after putting live a new (version of your) site, undoing the whole thing becomes necessary or quickfixes need to be applied. The cause is mostly because of a drop in visitors (somehow your ranking dropped?) or the new version won’t perform at all. There is a way to prevent al this! This  checklist, made by Joost de Valk (Onetomarket) and Eelco van Beek (IC&S) is meant help companies and individuals that are planning to lauch a new site.

Redirect old URL’s
In most cases a new website comes with new URL’s. This should not be a problem, however since a lot of people, weblogs and searchengines have linked to the old URL’s, a lot of the content won’t be found anymore. All those old URL’s should therefore be 301 redirected to the new URL’s. If you don’t do this all value built up in the old URL’s will be lost. There are a lot of examples of websites that lost 70-80% of their traffic due to misdirecting after a migration.

Rankings on important keywords
A new site usually means a new style and new texts. In some cases even new productnames. By changing texts and renaming pages you could easily lose your rankings in the searchengines on terms that caused very relevant traffic. Always check your analytics before you change your productnames and make sure that you’re not losing any rankings.

Hosting in the right country
It becomes more and more important for searchengines that a certain domain is hosted in the country which the domain is meant for. This also delivers the best experience for the users. So don’t start moving to another country for hosting based on price when migrating to a new site. What seems in-expensive might turn very expensive.

Don’t change your whois
It is not a good idea when your site and hosting is completely changed to update the whois (ownership information on a domain) information on your domain. Searchengines have the awkward attitude to mark a site on which everything changes at once as being sold. When that happens searchengines feel the need to rebuilt your site’s rankings completely, deleting all of the former ranking information. This is something you really want to prevent.

Measuring is making sure
It’s essential to measure site capacity before going-live. There are a lot of tools (like Jmeter) that can simulate users on your site. By running a lot of those simulations at once and meanwhile monitor the platform and measure all kinds of loads the maximum number of users on the site can be obtained.
This information can be used to setup a scaling plan to be able to scale up when 70 or 80% of the max capacity is reached. Watch Out! often, mostly on big sites, the traffic generated by searchengine spider bots is neglected in this capacity measurement. When deploying a new site version this traffic can reach up to the spidering of thousands of pages per hour. To be in control of this process it generally a good idea to A implement this in the capacity testing and B load balance the indexing searchengines by setting a crawl delay.

Are users visible in the site performance stats? Scale up!
It’s always a good idea to actively measure the user experience (in terms of effective speed) of your site. This way critical delays (DNS resolving, bandwidth shortage, network latency, connection count or HTML) can be identified. For example: a certain site shows a different response time between working days and weekends. This site is know to have more users on working days. The number of users are of direct influence of the site response times so it is time to scale up the site. The number of users (while not reaching the maximum number of users on a site) may never be of influence on the response times for a given site. An example of such a performance measurement can be found here (we do these for our customers at IC&S).

There is always something cacheable
Caching is storing (pieces of) websites or pages in a intermediate infrastructure. When a piece of content that has traversed the cache is requested for a second time, and the content is know to not have changed in the meantime, the cache will return it, instead of the actual application layer. This way the application layer can be offloaded which results in more capacity. Cacheable items can be found in all websites and -applications. This also goes for dynamic content which is specifically generated for a user.Take for example a forum. A forum is static until messages or comments are added, changed or deleted. When that happens, the cache can be instructed to delete that specific part of content so that it will be reloaded next time it is requested. Caches can be instructed to cache all kinds of data for example based on the content-type (images, video, static texts) or certain headers. For most of the platforms we support we cache about 70 - 90% of all content!

Put your servers as close as possible to your users
In the Netherlands there’s a popular data exchange point called the AMSIX. This exchanges connects internet content providers and access providers (the users) to put them (in network terms) as close to each other as possible. The golden rule is that the shorter (measured by the actual distance and the number of nodes in between) the path between the supplier of data and the receiver of data the faster the actual transfer. Sites in The Netherlands which have their users also in The Netherlands therefore should use the AMSIX. Global sites should think of using CDN’s, Content Delivery Networks. A Content Delivery Network is located in many countries and is able to delivery your content at a much higher speed. They do this by logically chopping up your site into cacheable parts or locally processable parts and putting that on their edges (which are servers placed in different countries). They also do not use the regular internet infrastructure to connect to these edges but instead use a direct (faster) network (their own backbone) connected to these edges.

There is always something cachable, also on the backend
Many websites use centralized data storage, for example a database. Nowadays almost all databases support querycaching; when the same query is executed a couple of times and the dataset hasn’t changed the query won’t be really executed. Instead the resultset will be retrieved from a cache. This speeds up a lot, the database engine gets less queries and information is retrieved blazingly fast. When using a database another thing to take care of is the connection pooling feature between the database and the frontside (php, asp, java etc). Creating a connection is a very expensive (in terms of load and time) process. Connections should be re-used in a connection pool.

HTML or HMTL
The order of your HTML source code is important. Is javascript used in the site? Be aware that javascript will block loading of other components in the HTML until after the loading of components in the javascript. Tools like Pagetest but also Yslow on Firebug provide (among a lot of other interesing information) detailed information about pageloading and blocking. The blocking issue in javascript can be handled by using certain arguments. Problem is though that these arguments are browser specific. So you might need to implement browser specific javascript.

Video? Not too fast
A video should played at a certain bits per second speed. This is called the bitrate. If the video is not played at this speed it will brake up and won’t show correctly. This speed is dependend on the format, codec and quality of the video. Make sure that when a video is requested the returns the video at a speed which is a little above the video bitrate speed. Else the video could be downloaded at the speed of the requester (which with DSL is already 20 mbits/sec). An example: we’ve got a server which provides video’s. The server is connected at a 100 mbit/sec connection. This means that 10 users with a 10 mbit/sec dsl connection fill up the connection completely. The video however has a bitrate of 256 kbit/sec. By sending the video with a speed of 350 kbit/sec we we can service about 290 users instead of the 10 just mentioned. Be aware, these users will use up the connection for a longer period.

Is the site cloud-ready?
The newest hype in internet infrastructure land is cloud computing. A cloud is a large number of connected computers on which virtual machines can be created on the fly. You pay as you use (bandwidth and cputime). Big advantage of this approach is that you won’t need to invest a lot of money in a complete infrastructure which in capacity terms will be designed to handle the peak load of the platform. With proper configuration and application tuning a site can run on one virtual machine, but, when traffic increased, is able to spawn new instances of the same virtual machine to cope with the load (autoscaling). This is a very interesting development for internet sites which are not sure how populair they will be; investments are low and, with a good revenue model, you’ll earn more as your costs increase (because costs will only increase with the increase of users). Animoto is based upon such a platform and runs on the Amazon EC2 cloud.

Transferring my digital live to Google

I should actually be ashamed.. a long long time ago I designed and created my own e-mail system called DBmail. The general idea was that e-mail and e-mail provisioning is structured data and structured data should be stored in a database. This way maintenance, searching, scalability, backup and data consistency should be better covered than using a filesystem for storage (all depending on the database of course, but DBmail supports many). This, combined with the fact that I have easy access to my own online servers should have actually resulted in me using DBmail for all my e-mail needs. Well, since a couple of days, this is no longer the case.

A few months ago I talked my dad into transferring his e-mail (and domain) to Google by subscribing to Google Apps. He seems very happy since then (i’m measuring by the number of help requests me and my brothers are receiving). Since i was doing my own e-mail i was also doing my own spamfighting which started to take a lot of my time. FYI, my domain eelco.com is receiving about 20.000 unwanted messages per day. My wife’s domain marloes.info does about the same so i need to block about 40.000 messages per day just for my wife and me. Then i’m also hosting e-mail for a couple of domains for friends, so add another 20.000 to that. If i’m not blocking those messages they (the friends and family) start telling me i’m letting to much spam through. An effective remedy is called greylisting. Greylisting uses a feature of the SMTP protocol to request a resend when a message is being delivered the first time from an unique sender. If the sender is using a regular fully SMTP compliant mailserver the mail will be resend. If the sender is a spammer he probably uses a bulk sender which does not fully comply with SMTP and therefore will not send the message again. So the spam message is blocked. Problem with this approach is that a lot of messages that are sent for the first time will have a arbitrary delay.. which kind of sucks when you’re waiting for a certain subscription message to come through.

The spamfighting is actually the biggest reason i switched to Google Apps for my e-mail. Google uses it’s uge Gmail userbase to identify spam; every time you click the report spam button the Gmail system learns about your spam message and prevents it from being delivered in the future, also for other Gmail users. This works great, i’m getting even less spam then on my own server.

Another really nice feature of the Google mail platform is the search feature. It’s extremely quick and uses Google search technology to index your e-mail messages. So with this feature in mind i wanted use import my complete digital life into the Google mail datbase. Google has a nice option which uses imap to import from other mail accounts. The problem is my e-mail dates back to 1991 (still got all of it in backup) and i really wanted to be able to import that as well. First i tried a dirty hack by ‘resending’ my old e-mail to an intermediate account which i could then import using Google imap import. That didn’t work and quite a few people received really strange bounces, sorry about that folks :-)

Then i discovered the Google E-mail API method for importing e-mail. This is an XML based system which accept standard rfc822 based message in an XML envelope accompanied by a one or more labels for inclusion into the Google e-mail database. Since a lot of my old e-mail is in the Cyrus DB or Maildir format i needed to create a little script that recurses through those storage formats, identify e-mail and send them through the API into the Google db. I wanted to fresh up (actually re-learn) my Python skills so i did the whole thing in a Python script.

You can download the script (GoogleMailPy) here. It’s quite dirty but it works and it supports the Google response messages (Google doesn’t like it when a lot of e-mail is being pushed within a certain timeframe. The script takes this in to account and uses the Google Double Back strategy to Play Nice™). The script defaulty adds two labels: GoogleMailPy (to identify all mail processed by the script) and the directory path in which the original e-mail message was found (with cyrus and Maildir this is also the folder path in your e-mail so it makes kind of sense to use this as temporary label until you find a better one).

GoogleMailPy uses no external Python libs so it should work out-of-the box. I’ve put in some comments in the source if you’d like to put in some feature of your own. I’ve just imported about 32000 messages and everything seems to be working ok. Using the script is of course totally at your own risk and it’s probably stuffed with bugs :-)

Usage is easy:

1. first set the right user credentials in the script. Check for the SETTING comments (There are three, one for turning certain labeling features on or off, one for the user credentials and one for the right userpath).
2. Just call the script with a Maildir of Cyrus DB directory path.

Please let me know about your experiences in the comments of this post.
I’m now checking out the other Google APPS (especially the wordprocessor and spreadsheet). We might connect it to our new client portal at the office to generate and share reports. More on that later.

Nerds in an airplane

So, what do you do if you’re a nerd, stuck in an airplane but not close enought to you nerd-friend to have a nice chat. Well, easy. You just put up an ad-hoc network and start ichatting.

The end of Velocity, beginning of the San Francisco trip

Update: i’ve redone the clip in vimeo, which is 30 times better than Youtube.

Ofcourse i’ll write some more about Velocity later on (we had some more nice sessions to write about). However, i just created a little clip of Bart and me renting a Ford Focus (see below). I’m not going to get some sleep. Tomorrow i’ll change hotels to one close to Fisherman’s Wharf.

Velocity 2008, day two, a short recap so far

In short: today is already much better then yesterday! First we had a great talk from Steve Sounders about HTML / JS optimization. The thing i didn’t know is that the order and methods used in creating HTML and JS based sites is very determend for the actual speed of your site in a browser. For example: if you have a image load after a javacscript section the browser will (by default) not preload the image (the js will cause a blocking situation). So Steve mentioned a lot of methods to get around this. The thing is, for me it still feels like working around a bigger problem. The fact that browsers still have different interperations for the same type of source is irritating to say the least. I also not understand that the above situation occurs. Why not start a massive pre-fetch operation after parsing the HTML? I can understand that such an operation is useless when followed by a script that decides wether or not the script actually needs to be loaded. However, this also can be scanned by a parser.

Steve was followed up by Adam Bechtel from Yahoo. He compared major infrastrutures with plumbing. An extremely good presentation. Not specificallly because it was new information, but more because the way of thinking while desiging a infrastructure. Since at IC&S we do this often for our customers, this was very refreshing.

David Ulevitch (everydns.net, opendns.com) has a good story about this method called anycast which is basically, using some loopholes in BGP to create real network load balacing and automatic failover. By assigning a /24 subnet to a single server or a cluster of servers those /24 can be assigned it’s own BGP route. This route will be announced all the way to your connection provider. If this is done at multiple locations traffic will be routed to the nearest costpath location (which is the location which has the lowest networking path cost). So, automatic balancing occurs. When one of those locations has a fall out, the route is deleted and all traffic will ‘failover’ to one of the alternate locations. Why is this kind of dirty? Well, flapping can occur, in which case the BGP routing entry will be created and deleted in routers all over the world multiple times (this can lead to damping in which case the uplink providers will delete the route). Also,  because the smallest subnet that can be routed with BGP on the internet is a /24. So your server will use up this complete subnet of ipnumbers. This problem can ofcourse be fixed by using multiple hosts on this cluster and duplicate it completly somewhere else.

After this dark scheme of BGP hacking Adam Jacob came on stage to tell us about building automated infrastructures. This was a very cool talk. He basically explained what tools to use to create a completly automated infrastructure (hence the title of his talk). What was so nice about it is how easy this actually is. Ofcourse, at we do stuff like that already at IC&S but we learned a lot of new stuff. The part about how to deploy a platform on EC2 and extent it when needed with just a few click was very interesting. Parts of it we’ll most certainly use for our Jitscale service.

Currently i’m listing to Peter Zaitsev about mysql scaling and performance. Since his english is quite hard to understand and he’s - as he’s telling us himself - not providing a silver bullit for performance solutions on mysql the talk is not really interesting. Let’s see what’s next!

Gooood morning!

Another day at Velocity has started! We’re now looking at how Faceball should be done correctly. This is the howto video. Everybody get their thanks (as if the conference is over already!) More later!

Velocity: Ignite session, the jetlag is kickin in bigtime

So, we’re still alive and kicking here at Velocity.. altough the jetlag is getting to us right now and people around us have to start talking slowly for u to be able to understand :-)

As a roundup; it was a nice day, a few good speaker, some not so good. I think i’m a bit spoiled by TED at which every speaker was marvelous there were no dull moments to be found. Anyways, i’m looking forward to tomorrow which, looking at the schedule, has some nice sessions. At this time we’re waiting for the Ignite sessions to start. These sessions feature startups that can present their idea’s to an expert audience (us). Tomorrow morning one of the ignitees wil be the winner. No idea what they’ll be winning though.

Below is a little clip i shot on our way from Velocity to Beni Hana’s. Bart and Justin conclude the first day of Velocity in this clip. Have fun!

<a href="http://youtube.com/watch?v=W2--SiPSIy8" >http://youtube.com/watch?v=W2--SiPSIy8</a>

A new approach to incident management : Incident Command System

This was a great talk! Brent Chapman. He suggest using the Incident Command System approach, which is a method created by the US government in the ‘70 designed to be used in emergency situations, is to be used in IT based incidents. The whole way a public incident is handled has a lot of overlap with how an IT based incident should be handled. When Brent was talking i made a little clip about him explaining an example. This after he explained the whole concept about ICS, so it might not be completly understandable (for questions leave comments). (Parts of) this method will be very applicable to the handling of incidents at IC&S.

I’ll talk about this later on.

Next Page »