Intel Packet of Death

Testing:
As described in my blog post here I experienced an issue with certain Intel ethernet controllers.  Here's how to see if your controllers are affected.

For this simplified test you'll need two machines (one to replay the packet and one to receive it) and you'll need to be on the same ethernet segment.  No routers or VLAN aware switches should be in the mix (but dumb switches/hubs should be fine).
  1. On the replay machine install tcpreplay.
  2. Connect the receiving machine to the network and bring the interface up (IP address doesn't matter).
  3. Replay one (or all) of the packets attached to this post from the replay machine:

sudo tcpreplay -v -i [transmitting interface] [pcap name]

Example:

sudo tcpreplay -v -i eth1 pod-icmp-ping.pcap

If your controllers are affected the ethernet interface will lose link.  In many circumstances the only way to get the controller to work again is to physically power off the machine and power it back on.

NOTE: These packets will be sent to the ethernet broadcast address (to simplify testing).  If you are affected by this issue it will take down all of the ethernet interfaces on the connected network.  If that is of concern you should use tcpreplay-edit to set a specific destination ethernet address:

sudo tcpreplay-edit --enet-dmac=00:11:22:33:44:55 -v -i eth1 pod-icmp-ping.pcap

Where "00:11:22:33:44:55" is the MAC address of the machine you'd like to test.

Finding other examples (findpod):

I've had various people report similar (if not identical) behavior with various other ethernet controller and traffic types. If you're experiencing sporadic failures of your ethernet controller and you think it may be related to network traffic you're receiving I've created a tool called "findpod" that can help you narrow your search.  It's called "findpod.sh" and there is a download link below. If you're using a Debian based system you can install it like so:

sudo bash ./findpod.sh install

  It will install three software dependencies: ifplugd, screen, and tcpdump.  Run it like this:

sudo findpod <interface> start

  Example:

sudo findpod eth1 start

  This will start the ifplugd daemon.  Once link is detected on the provided interface it will start an automatically rotating packet capture up to 100MB in size (can be changed in the script).  When the interface loses link it will stop the packet capture and move it to a meaningful file name.  You can then review this packet capture and find the last packets sent or received on the suspect interface.  Suggestions and comments are welcome!

Fixing:

As news of this issue spreads further some controllers are affected and some aren't. That's more or less what I expected. Here's what I know about fixing this.

It has been my understanding that Intel provides at least two EEPROM versions for this chip: one with BMC enabled and one without. My controllers do not have BMC enabled, therefore my fix only applies to non-BMC enabled controllers. This is unfortunate because the BMC enabled controllers seem to be much more widely used. Even with that other than the very basics (MAC address and checksum) I don't know the meaning of these values. Another reason not to reprogram the EEPROM on your NIC based on what some guy on the internet told you.

With that being said here is a diff between an affected EEPROM and a good EEPROM:

Offset Values

-0x0010: ff ff ff ff 6b 02 00 00 86 80 d3 10 ff ff 5a c0
+0x0010: 01 01 ff ff 6b 02 d3 10 d9 15 d3 10 ff ff 58 85
               
-0x0030: c9 6c 50 31 3e 07 0b 46 84 2d 40 01 00 f0 06 07
+0x0030: c9 6c 50 21 3e 07 0b 46 84 2d 40 01 00 f0 06 07

-0x0060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+0x0060: 20 01 00 40 16 13 ff ff ff ff ff ff ff ff ff ff

Where the "-" lines were the bad EEPROM and the "+" lines were the good EEPROM.

Under Linux you can view these values with ethtool:

# ethtool -e [interface]

Č
ċ
findpod.sh
(2k)
Kristian Kielhofner,
Feb 20, 2013, 9:30 AM
ċ
pod-http-post.pcap
(1k)
Kristian Kielhofner,
Feb 6, 2013, 8:46 AM
ċ
pod-icmp-ping.pcap
(1k)
Kristian Kielhofner,
Feb 6, 2013, 8:46 AM
Comments