Saturday, April 16, 2011

Looking into 3G - and why my skype failed

I spent a frustrated morning try to upload/restore a friends backup to his server and skype my cousin at the same time. I am currently using the safaricom 3G service as my primary connection but had to fall back to an alternate providers broadband for this upload (I required a consistent uninterrupted service for this upload). 3G has always served me well - since i moved my lab - more like sold some of mine to use the work lab...

In the meantime, I started messing around with some tools trying to figure out this 3G 'issue' and the effect of large buffers more out of curiosity - It (3G) really serves me well when its working, that and I was bored...

A few things to note:
- Today is a saturday so I expect more contention since the sites around here serve residential/home users. Which means that with my large files, TCP is  wrecking havoc as usual.

Buffers on all the network elements are shared and distributed among all clients, the radio controllers are shared and obviously we share the internet backhaul networks. That initial connection to the Radio is what I was curious about.

We have gone through cycles of high capacity at the edge, then at the core then to the edge again. In the past it used to be that Dialup users in Kenya rarely cumulatively filled an ISP's capacity, Newer technologies like DSL, frame relay, ppp multilink saved the consumer but moved the bottleneck to the core.

The internet has a single method of mitigating/signalling congestion. By dropping packets.This is the only way you notice that 'hey, that packet never arrived, and do something about it'. Windowing (tcp) is built around this mechanism. The other mechanism is known as Explicit congestion Notification (ECN). It's like telling your friend on your way to work driving in the opposite direction ', Hey, the road is flooded back there', use another route or dont go at all.

The best solution is always more capacity, however you can only get so much with 3G/edge/gprs. What most computers and home routers have nowadays is huge buffers. Buffers increase delay - because you pack the packet longer. Which means some packets get to their destination pretty much useless. Its like being in traffic jam past a doctors appointment time. getting there late is useless. So the very solutions you build in (longer jam controlled by a traffic cop) tends to break the network more.

Remember the internet and our networks rely on packets dropping to deal with congestion. excessive buffering breaks that.

So back to 3G; please note most of what powers 3G and Edge (actually lets focus on 3G) was designed at a time telecommunication networks didnt care much about data. So obviously transmitting 1500bytes as a single packet is pretty much impossible (ie the MTU on most of those systems is much much lower). This obviously calls for alot of what tcp is known for - fragment, transmit, reorder and ----buffering.

Unfortunately I decided on this article at a time when the 3G network seems to be okay. at least the RTT are not as bad as earlier in the day.
C:\Documents and Settings\jgitau>ping 196.201.208.2

Pinging 196.201.208.2 with 32 bytes of data:

Reply from 196.201.208.2: bytes=32 time=84ms TTL=56
Reply from 196.201.208.2: bytes=32 time=104ms TTL=56
Reply from 196.201.208.2: bytes=32 time=83ms TTL=56
Reply from 196.201.208.2: bytes=32 time=111ms TTL=56

Ping statistics for 196.201.208.2:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 83ms, Maximum = 111ms, Average = 95ms


So what could possibly have been happening - when things were not working out for me: When you are served by a busy RNC, you have to wait for some time to retransmit the damaged packets, or the RNC to retransmit it to back to you (tcp 101). Most of these are buffered waiting completion (remember each packet is fragmented then put together for onward transmission). Also remember TCP is end to end, however on a 3G network, the said 'end points' are actually multiple endpoints. You probably use up about 8 - 10 IP addresses for each connection - RNC to the Core, SGSN to GGSN, GGSN to the Internet etc- each of those elements have to bring up a session for you to transmit....

By not signalling congestion, the buffers fill up because the endpoints never backed off. The buffers stay full until the load lessens.  Suddenly all of you 'clients' are suffering and complaining but the RNC can't really do much for you now can it?...So is buffering bad?

This whole thing becomes worse when you try tuning stuff and realize that the bandwidth for 3G is variable. I say pick an amount lets say a conservative 128K and tune your system with if you are so inclined.

I have no point here today other than to say that 3G networks are not easy to predict. The RNC is the first bit that actually deals with your packet and more often than not is going to be the first culprit when congestion occurs. Everything else from there on is able to handle larger packet sizes. ehh no wait there's an SGSN just after that:-)....End to end qos could help but I know of no one implementing it...I however look forward to LTE and maybe a technology like HSUPA - what that does is eliminate the number of buffers you have to deal with.

Sooo tools I use frequently or would like to use more of: - I put some of them here just so I remember where to find them....:-)
tstat
Mlab has  set of tools
xplot
tcptrace
netalyzr and a sample output from my 3g connection

sample output:
Network buffer measurements (?): Uplink 3500 ms, Downlink 430 ms
We estimate your uplink as having 3500 msec of buffering. This is quite high, and you may experience substantial disruption to your network performance when performing interactive tasks such as web-surfing while simultaneously conducting large uploads. With such a buffer, real-time applications such as games or audio chat can work quite poorly when conducting large uploads at the same time.
We estimate your downlink as having 430 msec of buffering. This level may serve well for maximizing speed while minimizing the impact of large transfers on other traffic. 
 

Note the Uplink buffer above. So obviously my skype suffered if i uploaded the 'huge' files on one computer while skyping on another.

Wednesday, April 13, 2011

why designing networks is cool!

When companies engage a network designer be it in house or a consultant, one of the most beautiful things is that post implementation feeling; there's always a change. Mostly for the better. The value is visible, the ROI immediate - well almost.

Quite a number of design recommendations are left out, compromises are made, its very engaging. Also, as was with the last major design work I undertook, some companies do actually get into the design process with a clear understanding of the role they must play, what is required, the support they must accord and a willingness to let their networks be transformed by it (the process). I'll also add having the right engineering team to push some good decisions that look unnecessary to the management is sometimes necessary.

There has to be a management solidly behind innovation, new technologies and technique of doing things. For instance the choice between eigrp/OSPF or ISIS should really not start a debate with management ditto anycast vs load balancers for some services? let the guys decide and justify their design.

Good design also happens to be a single element in the overall system. It has to be supported by the business. It has to influence the business, It has to be fed by the business, it has to fit into it's culture, support its products.

In the end its fun watching a good design get implemented, its even better watching others work on it, change things, enhance it, grow it. It's very satisfying.

Tuesday, April 12, 2011

More exams!!

several things are slowly taking shape. one is annoying:

I still can't find a credible cisco learning partner to work with towards getting a ccsi.


It could however prove to be an interesting opportunity too...

Monday, April 11, 2011

8 June, 2011 - World IPv6 Day

I hate that this blog hasn't focused a lot more on ipv6. I take solace in the fact that mobile networks are not going to ipv6 soon (mainly out of ignorance if you ask me), Infact I suspect they will have to be forced to use it since no one will be thinking about it if the decision is left to the guys I see making current decisions in the telco space (imagine if apple released an IPv6 only iphone).

Mobile operators stand to benefit the most from IPv6 mainly from M2M applications/communications. Incidentally People so afraid of change are unfortunately in charge of moving us forward (from the regulator to the operators). Focus on mobile number portability has wasted lots of time. a few people saw it as the dead end it seems to be.

Its a clear case of the blind leading the sighted:-) I see it in the whole industry, there's alot of talk in mailing lists about 'issues' but no action *Please read disclaimer below if you're about to rant*. Politics doesn't get work done.

It will be a consultants field day:-) when IPv6 gets forced on the networks. Closer to home, we have some internet peering but dont have a single service on IPv6 (2c0f:fe38::/32): from the cable and wireless looking glass you'll find us represented:-) I would really like to have some IPv6 pdp contexts activated, an IPv6 dmz, to test end to end mobile IPv6.

inet6.0: 5546 destinations, 31745 routes (5535 active, 0 holddown, 14 hidden)
+ = Active Route, - = Last Active, * = Both

2c0f:fe38::/32     *[BGP/170] 2w3d 09:10:07, MED 0, localpref 80
                      AS path: 6453 33771 I
                    > to 2001:5002:100:4::2 via ae0.1404

* so yes our network is IPv6 ready, we can definately provide IPv6 connectivity but we again haven't really tested any service - yet, and you wont have many places to 'go' to that areipv6 enabled. I however wish you'd begin testing. Believe me you'll save money in the near future.

we haven't progressed the IPv6 initiative as much as we should have in Kenya either, the network guys seem ready. The local exchange point has a bunch of us IPv6 peering, but we as yet have no applications running on it - apart from DNS and hmm I wonder if the google global cache reachable through KIXP is IPv6 enabled.


tracing to the ipv6.google.com uses our international link so I guess not, or I used the wrong fqdn.

Primary#traceroute ipv6 ipv6.google.com
Type escape sequence to abort.
Tracing the route to 2A00:1450:8002::93

  1 2001:5A0:C00:100::35 [AS 6453] 292 msec
    2001:5A0:C00:100::15 224 msec
    2001:5A0:C00:100::35 248 msec
  2 2001:5A0:2A00:100::1 [AS 6453] 180 msec 180 msec 180 msec
  3 2001:5A0:2000:400::2 [AS 6453] 188 msec 188 msec 184 msec
  4 2A01:3E0:FFF0:400::D [AS 6453] 188 msec 188 msec 188 msec
  5 2A01:3E0:FF80:100::9 [AS 6453] 200 msec 196 msec 196 msec
  6 2A01:3E0:FF20::3A [AS 6453] 196 msec 220 msec 196 msec
  7 2001:7F8::3B41:0:1 [AS 6453] 200 msec 228 msec 200 msec
  8 2001:4860::1:0:10 [AS 6453] 228 msec 200 msec 200 msec
  9 2001:4860::1:0:8 [AS 6453] 208 msec 208 msec 204 msec
 10 2001:4860::8:0:2AC3 [AS 6453] 212 msec 212 msec 212 msec
 11 2001:4860::2:0:87D [AS 6453] 212 msec 208 msec 220 msec
 12 2001:4860:0:1::25 [AS 6453] 216 msec
    2001:4860:0:1::23 212 msec
    2001:4860:0:1::25 220 msec
 13 2A00:1450:8002::93 [AS 6453] 208 msec 212 msec 208 msec


I hope and wish to have a full IPv6 DMZ (dns,smtp,ntp,pop,www,wap,looking glass etc) by the IPV6 day.

So...scoot over to the isc . its important to note here that whether we like it or not, among others, Facebook, Google, Yahoo, Cisco, Akamai Technologies, Limelight Networks, W3C, Bing (Microsoft), Tom's Hardware, Rackspace, Verizon, and Juniper have committed to participating in the experiment (wikipedia).We will all participate if our users visit sites affiliated with the networks above. so we might as well do something about our infrastructure.

what are you doing about it?

I am not directly responsible for this infrastructure at work anymore but I'll definately make a concerted effort to ensure our customers don't get caught off guard. and now Im sleepy:-)

Sunday, April 10, 2011

Software-Defined Networking (SDN) and other things Im catching up on

Sundays tend to find me at home just hanging out with friends. Today was extra great I did just that with a bonus. I've met someone new (to me) that might very well join my 'the circle of trust'.

We (happened to be all CCIE's) - note Kenya has 7 8 CCIE's so getting more than 3 together is always quite interesting - we basically threw ideas discussed the current networking trends, opportunities, where we are, what we are, who we are, how things are done here vs how they happen elsewhere whether there's opportunity to do better than others etc etc....well obviously this paragraph has nothing to do with SDN...

SDN (software defined networking) is an NGO promoting change in the way networks are run and managed.

It's based on openflow, a relatively new protocol and its supported by some of the biggest users and buyers of networking equipment. Looking at the list of  members this evening tells me that this will be a definite game changer in the future.

Soon I hope to get to test the protocol. Indigo have a list of supported hardware. The opengear sounds like something I might just have. If I get at least two, we'll give it a test drive. Either way the idea of commoditiz'ing networking gear is very appealing.

anyhow here's a list of places to check on openflow:
  1. : this podcast here is a good start
  2. : openflow networking website
  3. : Ivan's analysis of the same
  4. : on networkworld
  5. : A company actually making and hoping and I believe will sell the switches
  6. : and another one
also there's a Linux Software Reference System which lets you run openflow on a linux pc with multiple NIC's. Expect something on openflow here at some point in the future. When working with SME's, i expect cheap networking gear like this to feature prominently. Mikrotik is so far my favorite, we'll see how openflow and SDN fare.

*Other areas I'm trying to catch up on:-
  • IOS-XR - on CRS-1's
  • NX-OS - this one will be tricky. Rumor has it that our new data center (an area I'm weak in) will be running a couple of Nexus. I might have to make new alliances to get a hold of some switches running NX-OS. I am totally clueless on this and can't wait to just power one up.
  • LTE - I just ordered three books on LTE (Safari doesn't have much on this). So in a months' time I'll be focusing on it. I might very well move to the section dealing with LTE at work if only to get a grasp of what the vendors are doing. the base level knowledge will have to be read though.

Thursday, April 7, 2011

cisco CSG/GGSN + more DNS applications

So once upon a short time back I worked on the mobile core (I still insist thats the best team to work with - ever). The interface between the sgsn and the GGSN is called the gn interface. (I promise to do more posts on the mobile core).

Now our GGSN's were actually a couple of cisco MWAM blades hosting several ggsn's each.

We used DNS for load balancing traffic from the sgsn to the ggsn. Here's a brief of how it works:

during session setup also known as pdp context activation, the sgsn is supposed to set up a tunnel (gtp tunnel) with a ggsn for each  session. Now the user uses an 'apn' like 'safaricom' which defines the service a customer is allowed to access. each ggsn is configured to allow a specific apn access, the sgsn looks at the apn, checks a dns server, resolves the apn to a ggsn's ip address and creates the tunnel.

DNS is therefore used to decide which ggsn to channel the request to. Multiple DNS servers means you get to load balance the traffic among the ggsn's - round robin. It was not perfect, but it was cheaper than trying to get loadbalancers in.

If you'd like to know more about how the mobile core works, how its all put together drop a comment ....the design considerations are definately way more than trying to slap together a document for a pure IP core....

The cisco tool bar,bad strategies?? - good riddance

It was a really happy happy happy surprise and will be a happy happy happy happy haaapppy day come 15th April if indeed the damn cisco toolbar jingmathingie goes away. I hate it, hate the way it takes up my notebook screens space, hate the way its rendered (I use a linux desktop) hate the way it moves around as I try to focus on a sentence or a word my brain can't process...and just in case you think Im alone just have a look at the number of bloggers waiting with bated breath:
http://blogs.cisco.com/webexperience/cisco-com-toolbar-update/
http://etherealmind.com/cisco-website-sucks-part-2/

while at it they removed something else that really bothered me:
http://blogs.cisco.com/webexperience/death-ofby-toolbar/

now if only they can sort out the fact that a simulator for educational purposes (especially XR) works for them not against them... Chambers already accepted here that their strategy was wrong before....not letting guys have access to IOS for educational reasons is a bit medieval, so is hiding simulators in san jose....

Wednesday, April 6, 2011

Cisco 7600-ES+20G3C

Im not sure how to word this post.

The cisco  7600-ES+20G3C  modules running on c7600rsp72043-advipservicesk9-mz.122-33.SRD4.bin have been misbehaving on me. Here's a short list of issues I've had:

One started spewing out the following:
Feb 10 13:37:11.536: %C7600_ES-DFC9-5-BRIDGE_ASIC_INTR: The Bridge-ASIC-AR[0] interrupt asserted. Addr[0x0200]=0x00000004
Feb 10 13:37:11.544: %C7600_ES-DFC9-5-BRIDGE_ASIC_INTR: The Bridge-ASIC-AR[0] interrupt asserted. Addr[0x0200]=0x00000004
Feb 10 13:37:13.524: %C7600_ES-DFC9-5-BRIDGE_ASIC_INTR: The Bridge-ASIC-AR[0] interrupt asserted. Addr[0x0200]=0x00000004
Feb 10 13:37:13.532: %C7600_ES-DFC9-5-BRIDGE_ASIC_INTR: The Bridge-ASIC-AR[0] interrupt asserted. Addr[0x0200]=0x00000004
Feb 10 13:37:13.544: %C7600_ES-DFC9-5-BRIDGE_ASIC_INTR: The Bridge-ASIC-AR[0] interrupt asserted. Addr[0x0200]=0x00000004
Feb 10 13:37:13.552: %C7600_ES-DFC9-5-BRIDGE_ASIC_INTR: The Bridge-ASIC-AR[0] interrupt asserted. Addr[0x0200]=0x00000004
Im talking a gagomoth of informational lines, the alarms were not service affecting, Our syslog server was obviously not amused. Note this only happened on one 7609-s (out of more than 20 ). We ended up swapping the module that cooled things off, figuring out along the way that it was due to a bug -CSCtc16746 (that oly affected that node:-)) - weird.

- one 20 port module somehow lost functionality on half the ports. I assume there's a chip that controls that half that just conked out.
- Another one just 'died' dead dead dead..no light, nothing...it had been working fine, unfortunately that was just before we had installed an external syslog server. So clueless on what happened.

(all modules were replaced by cisco in time and we keep spares).

Im just trying to figure our if Im the only one going through some wacky do's with the ES+20 modules.

Other than that some of the QOS features we have implemented would probably never be possible on other modules....so I still love them):...Moral of the story if any: Pay for support specifically support that replaces modules for you within the shortest time possible.

*PS if you have SAMI blades running STP or GGSN or CSG, let me know how thats working out for you too...

Live Packet Capture in Wireshark With GNS3

Live Packet Capture in Wireshark With GNS3

This is cool for the guys using gns3 and need to have a look at traces ....

More on Anycast and DNS

I'll take you back some to 2002. I'd just got my very first job, a semblance of freedom (Im not too sure it wasn't slavery of sorts now:-)), a chance to spread my 'wings', someone had trusted me with their customers as a technical analyst/systems admin/power dude/network admin/billing admin and some other things in between, I was ready to conquer the world .....I was young, full of energy and fresh from ditching campus for part  money part frustration part defiance and some bzzzzz word abbreviated as ADD and some excitement.....oh those were happy days.


Elsewhere on the internet:
October 21, 2002 something a bit bigger targeted at a larger audience happened:
An attack was launched at all 13 root servers aiming at disabling the internet itself. the closest we had got to a catastrophe was earlier in April 1997 when 7 root servers went offline for technical reasons.

The role of anycast addressing in all this cannot be underestimated. Anycast ensured that a total outage never occured. It continutes to do so for DNS, it can do the same for your organization's services.

Anycast is simply the use of routing/addressing policy to choose and use several geographically dispersed targets that "listen" to a service within a receiver group from a single source.

So the same IP addressing space is used to address each of the listeners.  Layer 3 routing dynamically handles the calculation and transmission of packets from our source ( in our case a DNS Client) to its most appropriate (DNS Server) target. So if I try to resolve cnn.com, the root server at KIXP as opposed to the one in NewYork will respond, which essentially means an outage to the one in New York will not affect us.

One of the other significant uses for anycast in the IPv6 arena is the Anycast Prefix for 6to4 Relay Routers.

It has a simple operational model:
6to4 Assigns a block of IPv6 address space to any host or network that has a global IPv4 address.
6to4 Encapsulates IPv6 packets inside IPv4 packets for transmission over an IPv4 network using 6in4.
then
6to4 Routes traffic between 6to4 and "native" IPv6 networks.

It's supposed to be a transitional mechanism, I haven't tested it but there is a list of relay routers thats
constantly updated.Today everyone using 6to4 should now set their default router to 2002:c058:6301::
which is a special magic anycast address for the nearest (in BGP terms) Relay Router.

So now working on the same premise, that anycast can help you distribute a service, a network designer can use anycast by either using IANA reserved addresses or apportioning a part of his/her address space for anycast addressing and distributing applications. Include it in your 'toolbox' for your next design.

Practical uses for anycast in our environment:

1: Depending on how a bank has done their network, geographically distributed ATM's could use it. Well that means banks with newer networks and open to new ideas, old banks have really rigid old 'unchangeable' systems. (because of policy of course). Working on a banks network is both fun and annoying.
- Healthcare networks are more open and something like patient records can be distributed using anycast, the same goes for student record in schools.

2: We have alot of customers using 3g/edge/umts. They can be configured to send whatever data they collect to the closest server.
- For instance each nakumatt or shoe store or a fleet manager tracking his trucks with a distributed network can post truck data, sales or collections or whatever data to a local server that at the back end synchronizes with the main database.
- What this ensures is customers especially in large retail stores are always served without suffering WAN delays. (that time you wait as the guy scans your goods, waits for the price to pop up, tallying it up grr).
 - This obviously depends on the operators network and in extremely good/lucky cases, if direct tunnel is employed then you can get the services like dns,wapgw's,www servers as close to the ggsn's as possible.

3: Static databases for corporate use can be installed on all pop's saving you alot of WAN capacity and improving user experience. same as above but directed at where I work:-)

4: print servers,mail (smtp) servers,smpp,wapgw's,www servers etc.You can basically have a distributed DMZ give the same IP or dns name to each customer and they would never face some of the issues I've seen around. A service like skiza will be greatly enhanced by this (remember one of the advantages is loadbalancing, and this negates the need for load balancers).

5: In a telco, you can have the Ga/Gy/Gi interfaces on the ggsn/sgsn as close to the users as possible. Actually so would the Gr interface. You can shorten the hop count for signalling and save some milliseconds which count for alot in a mobile environment. services like GRX services pretty much use an almost similar model.

Drawbacks:
- complexity
- Expensive
- Difficult to manage and troubleshoot
- Monitoring it is a pain

Benefits:
- It works, DNS is a good example, Akamai, Google and a bunch of other large networks use it.
- Reliable
- Load balances your traffic/internet traffic. google's installation of a caching server at the kenyan exchange point will save us all alot of expensive bandwidth.
- localizing DDos and any other issues ie only a small userbase gets affected.
- Clients only configure one IP regardless of where you are. Technically you can use a single wapGW,smtp,dns address and anycast takes care of the rest for you.
- obviously you get increased availability.

http://www.youtube.com/watch?v=14zDAcOY2VM
http://www.cshttp://www.faqs.org/rfcs/rfc3068.html.berkeley.edu/~karthik/research/papers/oasis/
http://www.lacnic.net/documentos/lacnicxi/presentaciones/Google-LACNIC-final-short.pdf
google for anycast + bgp+dns or any other keyword....

Tuesday, April 5, 2011

Network management - Technologies that make it bearable

IP Anycast: Widely known for its role in DNS. Arguably there would have come up a technique to scale DNS to the levels it has. Anycast however has proved to be so resilient It never made sense to change it.

DNS: Imagine having a network of several hundred nodes. You need DNS to give meaning to the IP addresses. A traceroute for instance showing the interfaces, nodes,buildings traversed is more meaningful in the middle of the night just before you start pulling your hair than a bunch of IP's showing you a path to a blackhole.

SMTP: for sending email from systems.

SMPP: for sending SMS in the same manner SMTP does.
Perl : for making sense of log files.
SNMP: we all use it so I won't spend much time on it. I however in the end show while trying to understand myself how the MIB's and OID's come together. different versions etc.

the coming posts (next two weeks) will focus a bit on anycast, its other applications, a few analogies etc...then we'll get on to DNS a service so key in network management that seems to work for some but not for others and why people really don't use it when they should be using it, sample ways to automate zonefile generation etc....How it fits in with IPv6....

Friday, April 1, 2011

ccsi...Im coming....

The kind of knowledge/education required to get to the higher levels of expertise I aspire to takes years to acquire. Sometimes even experience doesn't cut it. I finally figured/decided that adding training/teaching to regular work will probably work best - for me...!

How I think training works well:
A) I'll get to read so deeply and critically into the literature of involved topics and their sub-topics till I become an 'expert', and I get to know I am one directly from the reaction from trainees.(yes more than during a ccie because now you get questions from more people for more than a cumulative 8 hour period:-))

B) Being able to approach any and all vendor provided literature, RFC's and other technical material with the ability to simultaneously maintain the thought that "this is such a load of crap" and "this is the best paper/rfc ever written!" with my mind all the while coming up with ideas to justify each position.

now if only there's a clp this side of the sahara to sort out some pesky learning partner requirements for a ccsi!!!! Seriously I expected it to be easier than I am finding it.... so if you know a cisco learning partner in Kenya (hmm does it actually matter?)..leave the name and number on the comments....(yes I obviously haven't looked around much)