Everybody has seen it happen before, just after putting live a new (version of your) site, undoing the whole thing becomes necessary or quickfixes need to be applied. The cause is mostly because of a drop in visitors (somehow your ranking dropped?) or the new version won’t perform at all. There is a way to prevent al this! ThisĀ checklist, made by Joost de Valk (Onetomarket) and Eelco van Beek (IC&S) is meant help companies and individuals that are planning to lauch a new site.
Redirect old URL’s
In most cases a new website comes with new URL’s. This should not be a problem, however since a lot of people, weblogs and searchengines have linked to the old URL’s, a lot of the content won’t be found anymore. All those old URL’s should therefore be 301 redirected to the new URL’s. If you don’t do this all value built up in the old URL’s will be lost. There are a lot of examples of websites that lost 70-80% of their traffic due to misdirecting after a migration.
Rankings on important keywords
A new site usually means a new style and new texts. In some cases even new productnames. By changing texts and renaming pages you could easily lose your rankings in the searchengines on terms that caused very relevant traffic. Always check your analytics before you change your productnames and make sure that you’re not losing any rankings.
Hosting in the right country
It becomes more and more important for searchengines that a certain domain is hosted in the country which the domain is meant for. This also delivers the best experience for the users. So don’t start moving to another country for hosting based on price when migrating to a new site. What seems in-expensive might turn very expensive.
Don’t change your whois
It is not a good idea when your site and hosting is completely changed to update the whois (ownership information on a domain) information on your domain. Searchengines have the awkward attitude to mark a site on which everything changes at once as being sold. When that happens searchengines feel the need to rebuilt your site’s rankings completely, deleting all of the former ranking information. This is something you really want to prevent.
Measuring is making sure
It’s essential to measure site capacity before going-live. There are a lot of tools (like Jmeter) that can simulate users on your site. By running a lot of those simulations at once and meanwhile monitor the platform and measure all kinds of loads the maximum number of users on the site can be obtained.
This information can be used to setup a scaling plan to be able to scale up when 70 or 80% of the max capacity is reached. Watch Out! often, mostly on big sites, the traffic generated by searchengine spider bots is neglected in this capacity measurement. When deploying a new site version this traffic can reach up to the spidering of thousands of pages per hour. To be in control of this process it generally a good idea to A implement this in the capacity testing and B load balance the indexing searchengines by setting a crawl delay.
Are users visible in the site performance stats? Scale up!
It’s always a good idea to actively measure the user experience (in terms of effective speed) of your site. This way critical delays (DNS resolving, bandwidth shortage, network latency, connection count or HTML) can be identified. For example: a certain site shows a different response time between working days and weekends. This site is know to have more users on working days. The number of users are of direct influence of the site response times so it is time to scale up the site. The number of users (while not reaching the maximum number of users on a site) may never be of influence on the response times for a given site. An example of such a performance measurement can be found here (we do these for our customers at IC&S).
There is always something cacheable
Caching is storing (pieces of) websites or pages in a intermediate infrastructure. When a piece of content that has traversed the cache is requested for a second time, and the content is know to not have changed in the meantime, the cache will return it, instead of the actual application layer. This way the application layer can be offloaded which results in more capacity. Cacheable items can be found in all websites and -applications. This also goes for dynamic content which is specifically generated for a user.Take for example a forum. A forum is static until messages or comments are added, changed or deleted. When that happens, the cache can be instructed to delete that specific part of content so that it will be reloaded next time it is requested. Caches can be instructed to cache all kinds of data for example based on the content-type (images, video, static texts) or certain headers. For most of the platforms we support we cache about 70 - 90% of all content!
Put your servers as close as possible to your users
In the Netherlands there’s a popular data exchange point called the AMSIX. This exchanges connects internet content providers and access providers (the users) to put them (in network terms) as close to each other as possible. The golden rule is that the shorter (measured by the actual distance and the number of nodes in between) the path between the supplier of data and the receiver of data the faster the actual transfer. Sites in The Netherlands which have their users also in The Netherlands therefore should use the AMSIX. Global sites should think of using CDN’s, Content Delivery Networks. A Content Delivery Network is located in many countries and is able to delivery your content at a much higher speed. They do this by logically chopping up your site into cacheable parts or locally processable parts and putting that on their edges (which are servers placed in different countries). They also do not use the regular internet infrastructure to connect to these edges but instead use a direct (faster) network (their own backbone) connected to these edges.
There is always something cachable, also on the backend
Many websites use centralized data storage, for example a database. Nowadays almost all databases support querycaching; when the same query is executed a couple of times and the dataset hasn’t changed the query won’t be really executed. Instead the resultset will be retrieved from a cache. This speeds up a lot, the database engine gets less queries and information is retrieved blazingly fast. When using a database another thing to take care of is the connection pooling feature between the database and the frontside (php, asp, java etc). Creating a connection is a very expensive (in terms of load and time) process. Connections should be re-used in a connection pool.
HTML or HMTL
The order of your HTML source code is important. Is javascript used in the site? Be aware that javascript will block loading of other components in the HTML until after the loading of components in the javascript. Tools like Pagetest but also Yslow on Firebug provide (among a lot of other interesing information) detailed information about pageloading and blocking. The blocking issue in javascript can be handled by using certain arguments. Problem is though that these arguments are browser specific. So you might need to implement browser specific javascript.
Video? Not too fast
A video should played at a certain bits per second speed. This is called the bitrate. If the video is not played at this speed it will brake up and won’t show correctly. This speed is dependend on the format, codec and quality of the video. Make sure that when a video is requested the returns the video at a speed which is a little above the video bitrate speed. Else the video could be downloaded at the speed of the requester (which with DSL is already 20 mbits/sec). An example: we’ve got a server which provides video’s. The server is connected at a 100 mbit/sec connection. This means that 10 users with a 10 mbit/sec dsl connection fill up the connection completely. The video however has a bitrate of 256 kbit/sec. By sending the video with a speed of 350 kbit/sec we we can service about 290 users instead of the 10 just mentioned. Be aware, these users will use up the connection for a longer period.
Is the site cloud-ready?
The newest hype in internet infrastructure land is cloud computing. A cloud is a large number of connected computers on which virtual machines can be created on the fly. You pay as you use (bandwidth and cputime). Big advantage of this approach is that you won’t need to invest a lot of money in a complete infrastructure which in capacity terms will be designed to handle the peak load of the platform. With proper configuration and application tuning a site can run on one virtual machine, but, when traffic increased, is able to spawn new instances of the same virtual machine to cope with the load (autoscaling). This is a very interesting development for internet sites which are not sure how populair they will be; investments are low and, with a good revenue model, you’ll earn more as your costs increase (because costs will only increase with the increase of users). Animoto is based upon such a platform and runs on the Amazon EC2 cloud.