Wednesday 9 May 2012

Playing with the WWW::Mechanize Perl module

WWW::Mechanize, if you're not aware of it already, is a Perl module that acts as a basic programmable web browser.  It builds on and adds features to another Perl module called LWP::UserAgent.

The WWW::Mechanize module can be instructed to GET from and POST to web servers.  The retrieved content can be searched for links that may be followed, allowing click through to other pages; searched for forms that may be filled in and submitted, and content searched, saved or used as data for further processing.

Usage is pretty straightforward.  Below is some code that logs into a website; follows a link with the text 'Noticeboard'; prints the content of the page as text to standard out; follows a link on that page with the text 'next'; prints the content of that page to standard out, and then follows a link with the text 'Sign out' to sign out from the site.

#!/usr/bin/perl -w

use WWW::Mechanize;
use strict;

my $url = 'http://www.mysite.com';
my $username = 'myusername';
my $password = 'mypassword';

my $mech = WWW::Mechanize->new(noproxy => 1);

$mech->get($url);

$mech->submit_form(
  form_name => 'Form1',
  fields    => {
    username => $username,
    password => $password,
    Proceed1 => 'Sign in'
  }
);

$mech->follow_link(text => 'Noticeboard');
print $mech->content(format => 'text');

$mech->follow_link(text => 'next');
print $mech->content(format => 'text');

$mech->follow_link(text => 'Sign out');

One issue I found with the above code is that if the WWW::Mechanize browser is redirected, it doesn't update the referrer header.  This can be a problem, if for example, you are redirected to a page while attempting to log in to a site which checks that the referrer is a known site.  This issue has been raised here.  The developers are aware of this and I am hopeful that a fix will be implemented soon.

If you run into problems with WWW::Mechanize, the following two lines can prove useful for debugging:

$mech->add_handler("request_send", sub { shift->dump; return });
$mech->add_handler("response_done", sub { shift->dump; return });

These will cause the module to output the HTTP headers that are sent in requests to the web server and the HTTP headers that are received in responses from the web server.

You can find more information on WWW::Mechanize on CPAN.  The Mechanize.pm page lists all the methods by type for WWW::Mechanize.

Wednesday 4 April 2012

YouTube video buffering issue on IPv6 enabled networks

First, some background.  I have IPv6 tunnels with Hurricane Electric on my home network and VPS.  The VPS is configured with the default 1480 byte MTU on the tunnel, as it is directly connected to my provider's network with a 1500 byte MTU on the interface.  The home network was also configured with the default 1480 byte tunnel MTU initially, which seemed to work fine in the beginning (I'm not sure why) and then stopped working on large packets later.

I set the MTU on my ADSL router to 1488 bytes, as this is exactly the payload of 31 ATM cells (48 bytes each).  This is the most efficient MTU for ADSL connections that utilise ATM, it would seem.  PPPoA with VC/MUX is one of the more common configurations in the UK.  This adds a 2 byte header and an 8 byte trailer to the data, leaving 1478 bytes left for the packet headers and payload.

The IPv6 tunnel transports IPv6 packets encapsulated in IPv4.  IPv4 IP headers are 20 bytes.  The maximum MTU for IPv6 packets is therefore 1478 - 20 bytes, i.e. 1458 bytes.

Hurricane Electric allow some configuration of tunnel MTU on their website, but not an MTU that is exactly 1458 bytes, unfortunately.  The nearest is 1452 bytes.  After configuring the MTU at both ends and checking with ping6 that the path MTU discovery was correctly reporting the 1452 byte MTU, I was happy that all was working as it should.

I started to notice that YouTube videos would no longer buffer correctly shortly after this.  The video would play for a few seconds and then wait while buffering for about 30 seconds to play a few more seconds.  At first I put it down to a glitch.  As time went on I realised that YouTube over IPv4 was fine, but not over IPv6.  YouTube's website is IPv4 only, but the video streams from their CDN come from servers that are dual stack enabled.

I checked and rechecked my IPv6 tunnel using ping6 and http://test-ipv6.com/ to make sure it was working correctly.  As far as I can tell, it is.

It seems (to me) that YouTube's CDN servers aren't doing path MTU discovery correctly, or at least, path MTU discovery is failing to work, possibly because ICMPv6 is blocked somewhere in the network path.

A Google search reveals a number of people experiencing this buffering issue when connected to an IPv6 enabled network.  Very few posts give any clear information on how it might be fixed.  The closest they seem to get to a fix is mention of changing MTU value.

I have managed to fix the buffering issue now, I hope!  The answer is to use TCP MSS clamping.

All traffic on my home network passes through my Linux router, which does NAT for IPv4 traffic and pass-through for IPv6 traffic.  Adding the following to the firewall rules seems to fix the YouTube buffering issue:

ip6tables -A FORWARD -p tcp -t mangle --tcp-flags SYN,RST SYN \
  -j TCPMSS --set-mss 1392

This causes the MSS (Maximum Segment Size) for TCP connections to be fixed at 1392 bytes.  This is the TCP payload value and is calculated by subtracting the header length of 60 bytes from the MTU, i.e. 1452 - 60 bytes.

It's unfortunate that this has to be done, but on the plus side, it will fix path MTU discovery issues for other sites as well as YouTube.