epairs and duplicate address (DAD) warnings.

After the initial step of determining the set of commands needed to bring up a VIMAGE/jail network (described here), I started reorganizing it to more closely mimic the order in which these commands would be called by a Mininet script. The typical order of operations and their corresponding commands are roughly:

  1. Instantiate a topology (Topo) object: (no corresponding step)
  2. Add Switches and Hosts to Topo object: Start up some jails with bridges, if they are switches, and shells, if they’re hosts
  3. Interconnect the Switches and Hosts by adding Links to the Topo object: create epairs, and move the interfaces to the jails)
  4. Initialize the network with the Topo object as its topology: Bring the interfaces up, and if the jails represent switches, add the interface to the bridge

But, while trying to recreate the same –linear,2 network, I noticed the following messages in dmesg:

epair2b: DAD detected duplicate IPv6 address fe80:2::ff:70ff:fe00:40b: NS in/out/loopback=4/1/0, NA in=0
epair2b: DAD complete for fe80:2::ff:70ff:fe00:40b - duplicate found
epair2b: manual intervention required
epair2b: possible hardware address duplication detected, disable IPv6
epair3b: DAD detected duplicate IPv6 address fe80:2::ff:70ff:fe00:40b: NS in/out/loopback=4/1/0, NA in=0
epair3b: DAD complete for fe80:2::ff:70ff:fe00:40b - duplicate found
epair3b: manual intervention required
epair3b: possible hardware address duplication detected, disable IPv6

And a bit above, the following:

epair2a: Ethernet address: 02:ff:20:00:03:0a
epair2b: Ethernet address: 02:ff:70:00:04:0b
epair2a: link state changed to UP
epair2b: link state changed to UP
epair3a: Ethernet address: 02:ff:20:00:03:0a
epair3b: Ethernet address: 02:ff:70:00:04:0b
epair3a: link state changed to UP
epair3b: link state changed to UP

As the DAD (Duplicate Address Detection) warnings suggested, the same MAC (hardware) addresses were indeed being reused for the epair* interfaces being created.

Searching for the warnings eventually brought me to a thread describing the exact mechanics behind the issue – I had interleaved the steps for creating and moving the interfaces. The MAC addresses for epair* interfaces are generated from a globally tracked if_index counter. This value increases by one for each interface created (so +2 for each ifconfig epair create), and decreases by one for each destroyed or moved to a VIMAGE jail. The problem arises when epair creation is interleaved with moving them to jails; The if_index:

  1. increases by two for the first epairs created
  2. drops back down by two when they are moved, and
  3. take on the same values as for the first epairs when the next epairs are created

In fact, since the value of if_index is used directly in the 4th and 5th byte of the MAC address (first and last are hard-coded and the rest, set by other means), we can see in the dmesg output that the index values 3 and 4 are being reused repeatedly.

It also explains why I didn’t see this issue initially, since I was creating all of the epairs at once and then moving them later on, making the changes in if_index monotonic. While I could reorganize the commands so that it both follows Mininet’s conventions and the if_index doesn’t fluctuate, I manually assigned unique addresses to each epair for the time being:

# ifconfig epair1a ether 02:ff:00:00:01:14    #from s1 (jid 1) to h1 (jid 4)
# ifconfig epair1b ether 02:ff:00:00:01:41    #from s1 (jid 4) to h1 (jid 1)
...

Of course, I’ll use a less manual approach for generating a unique MAC address in Mininet.

[update]: if_index is not guaranteed to monotonically increase, but the number in the interface name (the ‘n’ in epair[n]) does, so I decided to use that as a base for my unique MACs.


References:

Creating networks with VIMAGE jails and epairs.

This is part of a series of notes on the experimental process of getting Mininet to run on FreeBSD.

The first step is to identify the components and commands that are required to implement the basic features. For an emulator like Mininet, this would be 1) the ability to build custom network topologies, and 2) the ability to interact with the topology by sending traffic across it, and monitoring the traffic flowing through the network.

Custom topologies

Mininet allows users to build custom network topologies by interconnecting node and link Mininet objects. Here, jails with VIMAGE replace the mount and network namespaces used to implement the nodes, and epairs replace the veth virtual Ethernet pairs implementing the links.

This link provides clear instructions for getting VIMAGE up and running for a simple topology, making it a good place to start. At the time that this post was written, the stable release (10.2) VIMAGE isn’t enabled by default, and required a custom kernel.

Since the initial (and primary) focus at this time is in building custom topologies, the jails aren’t given their own directory trees, and their paths are set to /.

Handling traffic

In addition to being able to build out topologies, Mininet also allows users to interact with their networks with tools such as ping, traceroute, and tcpdump, which require creating raw sockets from within the jails. This can be enabled by setting security.jail.allow_raw_sockets to 1, or by passing allow.raw_sockets as a command to the jail utility when creating the jails.

Finally, the jails that represent network nodes (e.g. switches and routers, as opposed to end hosts) need some mechanism to move traffic. In Mininet, this would typically be an OpenFlow-programmable software switch such as Open vSwitch or the CPqD software switch. Although the former is available in the ports collection, to reduce the number of moving parts, the if_bridge network device will be used for the time being to narrow down to the core set of commands needed to bring up a topology capable of carrying traffic.

Manual topology construction

The following steps identify the steps and commands required to manually construct what Mininet calls a linear,2 topology:

s1---s2
|    |
h1   h2

where h1 and h2 represent hosts on the network, and s1 and s2, the network nodes (switches).

  1. Prepare the host. After enabling VIMAGE in the kernel:
    # kldload if_bridge
    # sysctl security.jail.allow_raw_sockets=1
  2. Create jails. Since allow_raw_sockets was set in the host, there is no need to pass allow.raw_sockets to jail.
    # jail -c vnet name=s1 jid=1 path=/ persist
    # jail -c vnet name=s2 jid=2 path=/ persist
    # jail -c vnet name=h1 jid=3 path=/ persist
    # jail -c vnet name=h2 jid=4 path=/ persist
    

    jls should now show your jails (jls -v will show you more, including the assigned names):

    # jls
       JID  IP Address      Hostname                      Path
         1  -                                             /
         2  -                                             /
         3  -                                             /
         4  -                                             /
  3. Create bridges in the ‘network node’ jails (JIDs 1,2, and 3)
    # jexec s1 ifconfig bridge1 create up
    # jexec s2 ifconfig bridge2 create up
  4. Create virtual Ethernet links (epairs) and interconnect the jails
    # ifconfig epair1 create      # s1  h1
    # ifconfig epair2 create      # s2  h2
    # ifconfig epair3 create      # s1  s2
    # ifconfig epair1a vnet s1
    # ifconfig epair1b vnet h1
    # ifconfig epair2a vnet s2
    # ifconfig epair2b vnet h2
    # ifconfig epair3a vnet s1
    # ifconfig epair3b vnet s2
  5. Add epair interfaces to each bridge and bring them up
    jexec s1 ifconfig bridge1 addm epair1a addm epair3a
    jexec s1 ifconfig epair1a up
    jexec s1 ifconfig epair3a up
    jexec s2 ifconfig bridge2 addm epair2a addm epair3b
    jexec s2 ifconfig epair2a up
    jexec s2 ifconfig epair3b up
  6. Configure IP addresses for ‘host’ jail interfaces
    # jexec h1 ifconfig epair1b 10.0.0.1 up
    # jexec h2 ifconfig epair2b 10.0.0.2 up

Sanity-checking the topology

It should now be possible to ping from one host to another:

# jexec h1 ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2): 56 data bytes
64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=0.046 ms
...
^C
--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.046/0.052/0.055/0.004 ms

It should also be possible to monitor the traffic passing through a network node (e.g. by running tcpdump) while the hosts are pinging one another.

Teardown

Once a topology is no longer needed, it should be torn down and the virtual links and jails destroyed.

  1. Remove epairs from jails and destroy them (removing one end of an epair destroys both endpoints)
    # ifconfig epair1a -vnet s1
    # ifconfig epair2a -vnet s2
    # ifconfig epair1a destroy
    # ifconfig epair2a destroy
  2. Destroy the bridges
    jexec s1 ifconfig bridge1 destroy
    jexec s2 ifconfig bridge2 destroy
  3. Destroy jails
    jail -r s1
    jail -r s2
    jail -r h1
    jail -r h2

The idea is that the commands (and procedures) that have been identified here can be retrofitted into Mininet.

[to be continued]

A generic intro to switchwork.

This post introduces switchwork in a very general way; specifically, it’s aimed at people attempting to configure/administer their first switches. The takeaway from this post, hopefully, is a modus operandi for switch configuration, and a handful of keywords that might come in handy when searching for more information.

Background

I was asked the following question: “Given very little to begin with, how do I go about configuring/administering a switch? What is actually involved in switch work, and what are some keywords to look for?”

The scenario is this: You have just a basic knowledge of networking (e.g what OSI layers are), and the Linux command line. The switch staring back at you is the first of the (many makes and models of) “serious” network device you’re about to mess with. It is a layer 2/3 switch with a CLI, serial management interface, and at least a dozen ports. The switch has a wide range of features from STP to port mirroring and even a DHCP sever – Given that you know the features exist, and you know how to enable them. Which brings us to…

The Problem

You’ve found a manual, but it’s written under the assumption that the reader already knows a thing or two about administering networks, knows what needs to be done, and is just looking for the “how”. Manuals can be pretty useless until you have a knowledge base of what the options are and why they are there. Even when you have a high-level idea of what needs to be done, searching a manual can still be a pain when you aren’t aware of certain keywords or general procedures.

Bridging (some) gaps

After thinking a bit, I came up with this list of steps/notes on how to begin configuring a typical switch. Note, this answer was put together with the idea of exposing the various aspects of switchwork without focusing on just one make or model.

  1. Many switches don’t come with networked services enabled. You’ll likely have to connect to the switch using a serial (RS-232) cable and a client program supporting serial communication, such as HyperTerminal, Minicom, or C-Kermit. There is usually a labeled console port (RJ-45 or DB-9) on the front faceplate of a switch that you can hook a cable to.

  2. To actually connect via serial, expect to configure some settings on your client. Switches seem to favor:
    • baud rate: either 9600 or 115200 bps (e.g if one garbles output, try the other). 9600 seems to be more popular.
    • carrier detection off
    • no parity bits
    • 8 data 1 stop bit (8N1)

    You’ll likely have to specify a serial port, usually something like /dev/ttyS0 on Linux  or /dev/cuau0 on FreeBSD. Using C-kermit, the commands at your terminal prompt might amount to (for a Quanta LB9A):

        kermit -l /dev/ttyS0
        set carrier-watch off
        set baud 115200
        connect

  3. Some switches (e.g. some Netgear models) may not use a serial interface, but rather, auto-configure itself with a static IP. In this case, you can hook an Ethernet cable to your switch, and even likely, point a browser to its GUI.

  4. Once you’re connected, you’ll either have to enter a default username/password to get to, or be dropped straight into, the CLI. Sadly little can be said about the CLI that is generic:
    • There are usually various modes for various administration tasks. The default is a non-privelaged read-only mode that lets you see a limited set of system status and configurations. An ‘enabled’ mode with more privilege (similar to becoming root) may exist to allow you to configure a switch. From this enabled mode you can enter a configuration mode that actually lets you change things, such as enabling non-serial access methods, such as SSH or Telnet.
    • Considering the dozens, if not hundreds, of ports that these things can have, many CLIs (but not all) have a range syntax for configuring multiple ports at once. An example is the range keyword used by Cisco IOS and the firmware for NEC’s IP8800 series L2/L3 switches.
    • Often times, configuration changes have to be explicitly saved with a command before exiting “enabled” mode. Some CLIs will give you a visual hint (like an exclamation point) indicating there are changes that need saving.

    For example, on the IP8800/S3640, logging in, turning multiple ports into trunk ports, saving the settings, and logging off involves the following steps:

        login: operator

        > enable
        # configure
        (config)# interface range gi 0/45-48
        (config-if-range)# switchport mode trunk
        !(config-if-range)# save
        (config-if-range)# exit
        (config)# exit
        # disable
        > exit

  5. Management-wise, many switches and routers can be thought of as a *nix box with a bunch of network interfaces. For example, Quanta’s LB9A, NEC’s IP8800-series, and Juniper’s MX-80 run firmwares that are based on, or incorporate, Linux, NetBSD, and FreeBSD, respectively. Some may even give you the option to drop into a shell from the CLI or at boot up, or execute UNIX-flavored commands (The IP8800 even came with ed, a minimal text editor!).

  6. A part of configuration/maintenance will include updating or installing new firmware on (“flashing”) the switch. The procedures vary wildly with make and model. Some switches may be updated from the CLI (e.g. the IP8800, with its ‘ppupdate’ command), but some will require more invasive measures such as updating and configuring the boot loader. For example, enabling OpenFlow on the LB9A involves copying the new firmware to its CF Card (via FTP, from its Linux shell) and pointing its boot loader (in this case, U Boot) to the image location manually upon reboot.

  7. An improperly flashed/updated switch may potentially be rendered nonfunctional (“bricked”), so it is important to keep interruption of this process to an absolute minimum.