kodgehopper's OSPF ECMP mini-HOWTO

rev. 1.2 edited by Pilot

Hi
Here's a mini-HOWTO on setting up load balancing on Zebra and OSPF. A little thank you to everyone who helped me get things working, even when I asked and did amazingly stupid things (and for not laughing too). So here goes:


This message documents the steps i've taken to get load balancing working with OSPF on a very simple network. The steps below are meant for people who, like myself are relatively new to routing. Much of the information might seem unncessary to more experienced users, so to any of you who complain, my official response is "bleh :P" :)) I'll try to make things as thorough as possible. Also, to anyone who knows what they're doing: please correct any ommissions/errors :))

1. Getting Information

The majority of the information i used to set up my load balancing network were obtained from the following sources:

2. What You'll Need

This would probably be more aptly titled "What I've Used". Load balancing can be done with slightly different setups, but my setup is known to work with the following:

The kernel needs to be compiled with multipath support. That can be found under "Networking Options" -> "TCP/IP Networking" -> "IP: advanced router" -> "IP: equal cost multipath".
Zebra needs to be compiled with the --enable-multipath=[number] option. If [number] == 0, then zebra will support as many paths as you have. If [number] is greater than or equal to 1, then zebra will support only that number of paths, so --enable-multipath=2 means zebra supports at most 2 paths.
iproute2 is a more advanced set of networking tools than the old ifconfig and route tools. It's needed here primarily because it lets you see when you have 2 of the same routes set over different interfaces. The basic "route" tool doesnt do this, so even when things are working, you might think they're not. The advanced routing howto covers the iproute basics nicely.
iptables is not necessary to get things working. I use it to check that my setup is working. Adding a few simple rules and monitoring the throughput on each interface lets me do this. The same can be done with ipchains. If you have some other way of testing your setup, you don't need this option.

3. Network Setup

My test network looks like the diagram below:

+----------+     +------+       +------+      +----------+
|          |     |      |-------|      |      |          |
| subnet 1 |-----| rt 1 |       | rt 2 |------| subnet 2 |
|          |     |      |-------|      |      |          |
+----------+     +------+       +------+      +----------+
All ip addresses are in the 146.141.0.0 domain.

3.1 Interfaces And IP Addresses

4. Before We Get Into the Dirty Stuff

Linux does per-flow routing, not per packet routing. So you'll need to start more than one flow (a flow is any traffic between a source IP and destination IP) to make sure things work properly. Load balancing is done on outgoing traffic, that is, traffic that originates from your local subnet. Traffic that enters your subnet through the router is not load balanced. The setup works best when load balancing is used on both routers.

5. Configuration Files

You'll need just 2 configuration files, one for zebra and another for ospfd. Both my config files for the network setup above are listed in the following sections. These config files use no authentication and are meant to only get the basic network up and running. You will most likely want to add fancier options to your setup. NOTE: don't forget to create /var/log/zebra directory if you don't have yet.

5.1 zebra.conf

5.1.1 zebra.conf on router 1

! -*- zebra -*-
hostname rt1
password zebra
enable password zebra

debug zebra events
debug zebra kernel

interface lo
 description loopback

interface eth0
	description local network .27.0/24
	ip address 146.141.27.1/24

interface eth1
	description interface 1 on shared net
	ip address 146.141.15.1/24

interface eth2
	description interface 2 on shared net
	ip address 146.141.16.1/24

log file /var/log/zebra/zebra.log

5.1.2 zebra.conf on router 2

! -*- zebra -*-
hostname rt2
password zebra
enable password zebra

! extra debugging info is always fun!

debug zebra events
debug zebra kernel

interface lo
	description loopback

interface eth0
	description local network .28.0/24
	ip address 146.141.28.1/24

interface eth1
	description interface 1 on shared net 
	ip address 146.141.15.2/24

interface eth2
	description interface 2 on shared net 
	ip address 146.141.16.2/24

log file /var/log/zebra/zebra.log

5.2 ospfd.conf

5.2.1 ospfd.conf on router 1

! -*- ospf -*-
hostname rt1-ospf
password zebra
enable password zebra

! no authentication used below. change this in your final setup!!!

interface eth0
	no ip ospf authentication-key

! set the costs to be the same for equal balancing (these are just sample nos.)
! interface eth1  ip ospf cost 20  no ip ospf authentication-key

interface eth2
	ip ospf cost 20
	no ip ospf authentication-key

router ospf
	ospf router-id 0.0.0.1
	network 146.141.27.0/24 area 0
! set network info for each of the shared nets
	network 146.141.15.0/24 area 0
	network 146.141.16.0/24 area 0

log file /var/log/zebra/ospfd.log

5.2.2 ospfd.conf on router 2

! -*- ospf -*-
hostname rt2-ospf
password zebra
enable password zebra 

! we're not using authentication below. change this on your routers!!!

interface eth0
	no ip ospf authentication-key

! we set equal costs for eth1 and eth2. (these are just sample numbers)

interface eth1
	ip ospf cost 20
	no ip ospf authentication-key
interface eth2
	ip ospf cost 20
	no ip ospf authentication-key

router ospf
	ospf router-id 0.0.0.2 
! first the configuration for our local lan
	network 146.141.28.0/24 area 0
! info for the first shared network/subnet
	network 146.141.15.0/24 area 0
! info for the second shared network/subnet
	network 146.141.16.0/24 area 0

! duh
log file /root/zebraOSPFLog 

5.3 Quick Explanation

The above configuration files shouldn't be too difficult to understand. The important bits are that firstly, there's no authentication used in the above setup. In you final setup, you'll definately want to change that.

6. The Fun Stuff

Ok, now zebra and ospfd need to be started. First, we'll start zebra ONLY and make sure that things are as they should be. If they aren't, there's probably little point in continuing, in which case you need to go figure out why. There's lots of reasons, and i don't know most of them. On the other hand, if everything works as it should, we'll start ospfd and keep out fingers crossed.

6.1 Making A Fresh Start

To avoid any unnecessary problems, I like flushing my routing tables and ip addresses from my network interfaces. That way I make a clean start. This is not a necessary step, but I like doing it when trying something new. Obviously, this needs to only be done on the routers. Zebra initializes all the interfaces when we start it up so there's no problem with getting rid of this information.
First flush your routing table:

# ip route flush scope global type unicast
If you run "ip route", there won't be anything to report now except directly connected routes. Next, clean up the addresses on your network interfaces. the following commands do that:
# ip addr flush eth0 
# ip addr flush eth1 
# ip addr flush eth2 
Your interface table should now look similar to the following:
# ip address

1: lo:  mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/32 scope global lo
2: eth0:  mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:04:75:9d:45:45 brd ff:ff:ff:ff:ff:ff
3: eth1:  mtu 1500 qdisc teql0 qlen 100
    link/ether 00:60:08:2d:94:a6 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc teql0 qlen 100
    link/ether 00:02:b3:5b:60:a2 brd ff:ff:ff:ff:ff:ff
Notice there's no ip addresses associated with interfaces eth0, eth1, eth2. We're now ready to start.

6.2 Starting zebra

Start zebra with:

       zebra -dl
Your interface table will now look similar to the list below. I've taken this information from router 1 in my setup above. Router 2's interfaces will obviously contain IP information relevant to it and the .28.0/24 subnet.
# ip addr

1: lo:  mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/32 scope global lo
2: eth0:  mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:04:75:9d:45:45 brd ff:ff:ff:ff:ff:ff
    inet 146.141.27.1/24 brd 146.141.27.255 scope global eth0
3: eth1:  mtu 1500 qdisc teql0 qlen 100
    link/ether 00:60:08:2d:94:a6 brd ff:ff:ff:ff:ff:ff
    inet 146.141.15.1/24 brd 146.141.15.255 scope global eth1
4: eth2:  mtu 1500 qdisc teql0 qlen 100
    link/ether 00:02:b3:5b:60:a2 brd ff:ff:ff:ff:ff:ff
    inet 146.141.16.1/24 brd 146.141.16.255 scope global eth2
Notice that zebra has configured IPs for all the interfaces. Now let's look at the routing table. Again, this is the output from router 1. Router 2 will have similar, but slightly different information.
# ip route ls
146.141.16.0/24 dev eth2  proto kernel  scope link  src 146.141.16.1 
146.141.15.0/24 dev eth1  proto kernel  scope link  src 146.141.15.1 
146.141.27.0/24 dev eth0  proto kernel  scope link  src 146.141.27.1 

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
146.141.16.0    0.0.0.0         255.255.255.0   U     0      0        0 eth2
146.141.15.0    0.0.0.0         255.255.255.0   U     0      0        0 eth1
146.141.27.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
Now that we've gotten this far, we can start ospf.

6.3 Starting ospfd

This part is less exciting. Once you're certain that everything's working well on both routers after starting just zebra, start ospfd with the following command:

# ospfd -d
Again, monitor your routing table. It'll remain the same for the first few seconds, but will then change to resemble the output below. Again, this output is from router 1, which now reflects router 2's routes. Router 2 will similarly contain router 1's routes.
# ip route
146.141.16.0/24 dev eth2  proto kernel  scope link  src 146.141.16.1 
146.141.15.0/24 dev eth1  proto kernel  scope link  src 146.141.15.1 
146.141.28.0/24  proto zebra  metric 30 
	nexthop via 146.141.15.2  dev eth1 weight 1
	nexthop via 146.141.16.2  dev eth2 weight 1
146.141.27.0/24 dev eth0  proto kernel  scope link  src 146.141.27.1 
Take note of 2 important things here.
  1. There's 2 routes, 1 for each interface on the shared network/subnet.
  2. iproute2 explicitly recognizes the zebra protocol.
Contrast this with the output of "route -n" shown below.
# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
146.141.16.0    0.0.0.0         255.255.255.0   U     0      0        0 eth2
146.141.15.0    0.0.0.0         255.255.255.0   U     0      0        0 eth1
146.141.27.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
146.141.28.0    146.141.15.2    255.255.255.0   U     0      0        0 eth1
Notice that the old routing table only shows 1 route to 146.141.28.0/24, even though there's really 2. This is why iproute2 is needed.

If you've gotten to this point, congratulations. If "ip route" is still showing only one route to 146.141.28.0/24, don't bother going any further since things won't work anyways. Possible fixes: double check that you have multipath support in the kernel and in zebra. Also, a problem i had was that for some strange reason, even after recompiling the kernel and zebra with multipath support, one router worked fine but the other showed only one route instead of 2. This happened on the redhat 7.3 box. I don't know why it happened, and I don't have a proper fix for it. But i copied the zebra binary from the router running redhat 7.2 and everything worked fine after that.

Again, notice that router 1 now has access to the .28.0/24 subnet which router 2 is connected to. In other words, routers are sharing routes, which means OSPF is working. Also very importantly, notice that the .28.0/24 is accessible through BOTH the shared networks/subnets. The following lines (taken from the table above) show this:

146.141.28.0/24  proto zebra  metric 30 
	nexthop via 146.141.15.2  dev eth1 weight 1
	nexthop via 146.141.16.2  dev eth2 weight 1
If you've got this, then again, congratulations. You should now have a load balancing OSPF setup. Another reminder that this type of information should appear on both your routers.

7. Testing Your Setup

Ping packets just don't make for the best tests. Sure they're useful, i'm not denying that, but sometimes you just want to see something a little more meaty :)) This is where iptables fits in. On both routers, we're going to put in 2 rules that allow us to monitor the traffic on the eth1 and eth2 interfaces. So do the following:
First clean out all rules in all your tables. I just do it for 2 tables below.

# iptables -F
# iptables -t nat -F
Next, set up dummy rules that effectively act as nothing more than counters:
# iptables -A FORWARD -i eth1 -j ACCEPT
# iptables -A FORWARD -i eth2 -j ACCEPT
Now, on routers 1 and 2, you can constantly monitor the traffic passing through both the eth1 and eth2 interfaces with this command:
# watch -n 1 iptables -L -v
Look for the 2 rules just added, and on the same line you'll see the amount of bandwidth passed through the eth1 and eth2 interfaces.
Next, sit at some host on subnet 2 and transfer a file (preferably a large file that will take a while to download) from some machine on subnet 1. You should see traffic on one of the interfaces, but not the other. That's because you only have one active flow (a connection between 1 source IP and 1 destination IP). While the first file transfer is taking place, start a second transfer a different machine on subnet 1 to the same machine on subnet 2. you'll see traffic on the second interface!!! that's your per-flow load balancing in action.

Also, if you kill one of the interfaces on the router with a command like:

# ip link set eth1 down
traffic will automatically resume on the other link after a little while!!! If you bring the link up again however, it isn't used until another flow is started. I'm not sure if there's a workaround for this. The iptables counters can be reset with the "iptables -Z" command. And that's about it.

Hope this is useful to someone out there. If it is, let me know :))
later

kodgehopper@irc.freenode.net#zebra