Sourcefabric Manuals

 English |  Español |  Français |  Italiano |  Português |  Русский |  Shqip

Live Blog 2.0 for Journalists

Making your live blog search engine friendly

Search engines like Google and Bing create their vast indexes of the Web by regularly ‘crawling’ every page and site they can find. Traditionally, these ‘crawlers’ were designed to index static content such as HTML, plain text, and office document formats. As long as content remained static, and was rendered by the web browser exactly as it was received from the server, that was fine.

Web 2.0 brought major improvements to the user experience, thanks in large part to the development of the ‘AJAX’ group of technologies, which allow data to be continuously exchanged between browser and server in the background, without the need to reload the page. Live Blog is an example of this type of dynamically generated web application, in which a substantial amount of code runs in the user’s web browser rather than on the web server.

In response, some search engine crawlers, notably Googlebot (Google’s crawler), have been developing their ability to index dynamically generated content by emulating what happens in the web browser when a user visits and interacts with a page of dynamic content. Progress has been good, but it’s still incomplete, and other search engine crawlers may not be as advanced as Googlebot. So developers still have to do extra search engine optimization (SEO) to ensure that content generated by their applications will get indexed.

Live Blog's new search engine optimized publishing method 

With its 2.0 release, Live Blog now provides an alternative method of publishing your blog in a way which ensures that its content is indexable by search engines. This works by periodically generating a static HTML version of each blog in the background on the server. As a content publisher, you can set your content management system (CMS) to request the latest static HTML version via a REST API and insert it into the article page at the position where you want the blog to appear.

When readers or search engine crawlers visit the page, they will immediately see the latest posts from the embedded live blog, supplied in static HTML by the server. As you and your colleagues publish new posts to the timeline, these posts will appear in readers' timelines automatically in the normal way, via AJAX requests, without the need to reload the page.

This new SEO publishing feature is enabled and controlled in the blog configuration. For details, see the chapter Configuring your live blog.

The remainder of this chapter provides a technical outline of the SEO publishing method which will be chiefly be of interest to developers and system administrators.

How the SEO publishing feature has been implemented

Version 2.0 implemented the following changes to Live Blog in order to achieve the server side HTML generation and to make blog content indexable by search engines:

  • The embedding of Live Blog has been refactored with Backbone.js to make it compatible with Node.js for server-side HTML generation.

  • The static HTML of the blog content is generated using Node.js on Sourcefabric’s server.

  • Backend API services have been built to make this generated HTML accessible to the user's CMS.

  • A new section has been added to the blog configuration to allow users to configure the number of posts initially contained in the HTML generated on the server, and the refresh rate for HTML on the server.

CMS integration

The publisher's CMS can request the following data from the Live Blog REST API:

  • HTML for any given blog

  • The time when that blog was last updated

  • The time when the server-side HTML was last generated

There are two ways of integrating with the Live Blog REST API:

  1. The CMS can make regular requests to the Live Blog API to retrieve the latest version of the HTML content.

  2. In the Live Blog admin interface, the system administrator can specify an optional callback URL. If this URL is specified, it will be called every time there's an update to the HTML of the blog.

Once the HTML has been requested from the Live Blog API, it has to be included in an article page in the CMS at the desired position.

The HTML for each blog can be retrieved from a URL that is structured as follows:

Generic: http://<server>:<port>/content/seo/<blogId>.html

Example: http://public.sd-demo.sourcefabric.org/content/seo/2.html

Alternatively, you can generate the required URL in the blog configuration, as explained in the chapter Configuring your live blog.

For more information, please see our API documentation:

There has been error in communication with Booktype server. Not sure right now where is the problem.

You should refresh this page.