The GeoIP database used forfinding the location of listeners in the stats has been updated. This should reduce the number of ‘unknown’ listener locations in the stats are of the control panel.
We are still experiencing issues on one of our UK networks. This will be affecting uk1-pn (shoutcast) and possibly uk1-adj (autodj). We do apologise for the problems over the last day or two, but the network providers are working very hard to get this resolved, they have been working on it all through the night since about 11pm last night.
I’ll post any updates here.
UPDATE 13:08: I can confirm that the issue is NOT affecting uk1-adj – only uk1-pn and one dedicated server customer who will be contacted shortly.
UPDATE 14:26: Annoyingly, I’ve just found out from a third party source that there will be another 90 minutes of downtime on the whole network, so this will affect uk1-adj as well. It’s not good that they are doing it at this time in the day, but it must have been the only option.
We have just received this message from one of our suppliers. This only affects customers on uk1-pn, uk1-adj and a small number of dedicated server customers.
We need to perform some emergency network as soon as possible. This has been scheduled for tonight, which we appreciate is very short notice. We need to reboot a router to install new software, and this reboot will take up to 45 minutes. We will do everything we can to speed up the process as much as we can and reduce the maintenance time.
Date: 24/09/2009
Window: 23:00 for 2 hours
Duration: < 45 minutes.The maintenance is to perform an emergency upgrade of Cisco software. We are using a Cisco VSS-1440 as part of our network core, and we have been experiencing some reduced performance with it today. There is no cause within our network configuration and set up of this, and it started to have a detrimental effect on some clients today. We escalated this to the Cisco TAC team, who have diagnosed a fault with the software on the router in the form of a memory leak. Cisco has supplied us with a new version of the software for the router which will fix the memory leak and slow performance.
The nature of this problem is that it will escalate as time goes on, which is why we want to apply the fix as soon as core business hours finish today. Please accept our apologies for the short notice, we hope our clients appreciate this problem was out of our control caused by Cisco software, and we are working as best we can to resolve it quickly.
We apologise for any inconvenience this may cause, please do not hesitate to contact us if you have any queries or questions regarding this maintenance window.
UPDATE 01:56: It looks like the network is coming back up now, it took longer than expected so I’ll post any updates in the morning.
We are looking in to the issue affecting uk1-pn, which has just gone down. Updates will be posted here.
UPDATE 12:51: The datacentre staff are aware of an issue, so it sounds like a network problem. They are working to get this fixed asap,
UPDATE 12:54: Server is back online, downtime lasted about 10 minutes.
I’m not 100% certain but there may be an issue with the USA premium network, a customer has reported issues with AutoDJ and I’ve heard a little buffering on another stream. Thre’s no packet loss and there isn’t any lag but something is amiss.
I’ll contact the network people and keep you updated here.
UPDATE: I have a ticket open with theĀ datacentre, who will look in to this shortly. Please don’t open a support ticket, as it may not be answered right away if it is regarding this issue.
According to Radiotoolbox stream test, the streams are perfect http://www.radiotoolbox.com/images/sbin/stream_test_graph.png?id=13119 — this does suggest that only certain routes are affected.
UPDATE3 22:09: Appears to be fixed
The issue seems to have cleared up at the moment, I’ll post here with any info as I get it.
UPDATE4 22:19:
The issue was with one of the carriers, XO – it seems to have calmed down now and they are looking in to the problem. The problem was outside the control of us or the datacentre as it was on an international link outside the network.
We will be performing maintenance which will affect your service. This will last about 30 seconds, and during this time, your server will unreachable via the network.
Date: 23/09/2009
Window: 23:00 for 2 hours
Duration: < 2 minutes.
In order to minimise the time needed to move the servers to our new facility, the VLAN this subnet is configured on has been spanned across the two locations. The maintenance is to move your subnet onto the router on our new facility.
The subnet affected is: 87.117.208.0/24
Servers affected: uk1-pn.mixstream.net
Due to a problem with the server resetting a few times over the last week, we are scheduling an engineer to power down the server and check out the hardware. The scheduled downtime will be:
2:00AM, Tuesday 22nd September 2009.
10:00PM, Monday 21st September 2009.
We expect the downtime to be around 30 minutes, and once it is powered up, all AutoDJ streams on the server will restart automatically.
There was a network connectivity outage on 09/09/2009, which started at approximately 23:15 PM, following the network maintenance window. The outage lasted for approximately 6 minutes. According to the network admins, the issue appears to have resurfaced, although it doesn’t seem to be affecting any of our streams.
Date: 09/09/09
Time: 23:15
Duration: <10 minutes
Date: 10/09/09
Time: 12:45
Duration: Ongoing
We are currently experiencing problems on this network, this is being looked into.
UPDATE: All servers are back online now, downtime latest about 20 minutes. If your stream is still offline then please try restarting it and open a support ticket if that doesn’t work.
UPDATE2: Here’s a message from the provider that was affected.
“There was about 15 minutes of network downtime for servers in our Teb2 location. We currently have two datacenters located in the same building (Teb1 and Teb2). In our network today a provider dropped and caused a BGP flag. Our Teb1 datacenter had no issues re-building routes from this. However the routers in Teb2 did not handle this as quickly as Teb1. We will be adding in another router in Teb2 to ensure something like this can not happen again.”