OS X 10.9 & 10.10 Bugs & Network ARP Issues
Several weeks ago, we have obtained a working patch which corrects this condition for 10.9.x and 10.10.x. It is our understanding that this patch will eventually be incorporated into a future version of OS X 10.10.x Mavericks.
Update 2016: We wrote a guide explaining how to patch OS X for users colocating Mac mini servers running OS X 10.9 or 10.10.
Update 11/13/14: We have deployed an ARP Patch for OS X 10.9 and 10.10.
Several weeks ago, we have obtained a working patch that corrects this condition for 10.9.x and 10.10.x. It is our understanding that this patch will eventually be incorporated into a future version of OS X 10.10.x Mavericks. We have now deployed this patch hundreds of times with 100% success.
As a MacStadium subscriber, you can automatically apply this patch to your servers here at MacStadium by logging into your MacStadium Dashboard, and clicking on the Configuration Tab for your server, and then ARP Patch. The utility will validate your current version of OS X, and then connect to your server remotely, and apply the patch for you.
If you are currently running 10.7 or 10.8, you should apply this patch prior to your attempt at upgrading to 10.9 or 10.10. After the patch is installed you should reboot your server, and then start the upgrade process to a newer version of OS X via the app-store.
Please ensure that after you complete the upgrade, that you re-enable auto-power on settings in Preferences > Energy within OS X. All new dedicated servers at MacStadium will now be provisioned with OS X 10.10 Mavericks by default unless otherwise requested. The patch will be preloaded as part of our default provisioning process.
As always, we do not recommend jumping on the 'latest OS X version' band-wagon because of various issues that may come up via the upgrade process, and other bugs in the new OS X version that you may become a victim of. If you decide to upgrade, we highly recommend performing a Time Machine or similar backup of your current server onto a USB Hard Drive (which you can subscribe from us) prior. When it comes to OS X upgrades in a server environment, we like to suggest that the "if it ain't broke, then don't fix it" mentality is the safest route.
Update 7/1/14: OS X Mavericks 10.9.4 is released, however ARP issue still exists.
Some users are reporting that it has gotten worse after the upgrade to 10.9.4 from 10.9.3.
Some good news - we have found another workaround that so far provides to be 100% effective. By switching 10.9 users to an Apple USB 100Mb Ethernet adapter, and using that instead of the onboard Ethernet, the packet loss issue goes away.
It should be noted that the Apple Thunderbolt Gigabit adapters do not correct the problem as they appear to be the same chipset as the onboard Gigabit Ethernet port.
For Hosting Elite (Gigabit subscribers), the 100Mb USB Ethernet adapters create a significant throughput bottleneck in comparison with the on-board Gigabit adapters for Hosting Elite customers. However, we are still seeing full 100Mb speeds for Hosting Pro customers.
To date, we have deployed dozens of these adapters for 10.9 users, with 100% successful results.
Update 6/15/14: There have been some updates out there related to the root cause of the OS X 10.9.x Mavericks Network ARP issues.
As being tracked in further detail on an Apple Discussion board, Cisco has officially created a case surrounding this issue in recent weeks (many thanks to others out there fighting this issue with us). The official Cisco bug id is: CSCeg05955. Here are some snippets...
- Symptom: A client device may send unicast ARP requests to GLBP members, but doesn't get a response. The result may be an intermittent - perhaps periodic - inability of that client to transmit IP traffic.
- Conditions: GLBP is being used in load balancing mode for clients' default gateway.
- More Info: Unicast ARP is increasingly utilized by various clients, especially wireless ones, to determine whether they have reconnected to the same subnet. See for example RFC 4436 for a use case.
- Result: This bug will not be fixed by Cisco, as it is viewed as a client problem. Client devices should not invalidate their ARP entry before using the unicast ARP.
So bottom line as eloquently stated by others "it's apparently Cisco vs. the world on whether a host's ARP entry should be invalidated before a Unicast ARP to determine if it's still connected to the last network it was connected to."
We can hope for better news in 10.9.4... or 10.10.x ..............
Update 5/15/14: 10.9.3 Was released today, and unfortunately, no notable fixes to the network ARP issue.
Sorry - no good news here... Things are mostly quiet on the 10.9 front as long as users can deal with an occasional few packets being dropped for no good reason. It's pretty safe to say that most users who request an upgrade from 10.8 to 10.9 regret it shortly thereafter, and subsequently request us to do a downgrade for them. Doing this downgrade is super simple if you have a USB HDD as an option on your Dedicated Server, and are running Time Machine for hourly snapshots of your server - something we recommend for all users. As always, the upgrade and the downgrade are free at MacStadium because our remote hands services are free and all-inclusive 24x7.
Update 2/25/14: 10.9.2 Was released today, and it looks like the network ARP may be corrected?
The 10.9.2 release initially looked very promising as the consistency of the problem was greatly reduced from something that looks like 50-80% packet loss to something that looks like 1-3% packet loss (occasionally). On the bright side, this makes a hosted Mac running 10.9.x mostly usable as long as you are not running mission-critical applications - but still far from an acceptable solution for anybody desiring 100% uptime and 0% packet loss as users have come to expect and - rightfully so...
We will be moving forward with officially offering 10.9.x in our website checkout for users who require the Mavericks OS level for XCode development or some other revision-specific reason but recommending everybody else stay away from. The bottom line is that 10.9 offers very little in terms of new features, and creates a ton of headaches, and lost hours of productivity related to migration.
MacStadium is now also stocking HDMI GPU Enablers from Fit Headless for anybody who is using an OS X 10.9 Mavericks Mac Dedicated Server here, you can request one of these vs. the Display Port adapter which we have been using for years in 10.6, 10.7, 10.8 OS X revisions. This HDMI adapter provides 1080 and 720p digital resolutions vs. the plethora of analog resolutions provided by the display port adapter.
Update 12/20/13: 10.9.1 Did not fix the network ARP issues noted below in 10.9.0 release.
The rumor mill states that the 10.9.2 release (still in development) will fix the issue.
Update: 11/15/13: MacStadium is offering a temporary fix to 10.9.0 ARP issues.
We have allocated a dedicated IP Network which bypasses the network issues in 10.9.0 as noted below. Subscribers can request to have their existing IPs migrated to one of these new IPs via a ticket into our support group. There is no charge.
#1: The Network ARP Issue:
The Symptoms: The issue presents itself as significant packet loss that comes, and then goes every 30-60 seconds. This packet loss was not present on 10.7 or 10.8 running on the exact same Mac host. This can be seen by running a PING to your IP Address similar to this:
>ping 208.52.190.X ..... Request timed out. Request timed out. Request timed out. Request timed out.... Reply from 208.52.190.X: bytes=32 time=6ms TTL=63 Reply from 208.52.190.X: bytes=32 time=1ms TTL=63 Reply from 208.52.190.X: bytes=32 time=1ms TTL=63 Reply from 208.52.190.X: bytes=32 time=1ms TTL=63.... Request timed out. Request timed out. Request timed out. Request timed out.... Reply from 208.52.190.X: bytes=32 time=1ms TTL=63 Reply from 208.52.190.X: bytes=32 time=1ms TTL=63 Reply from 208.52.190.X: bytes=32 time=1ms TTL=63 Reply from 208.52.190.X: bytes=32 time=1ms TTL=63....
In the latest version of OS X Mavericks, it appears that Apple has implemented section 18.104.22.168 of RFC1122, "ARP Cache Validation." In this scenario, it appears that Mavericks installations are performing unicast ARP requests and timing out ARP for the gateway on the host if it does not receive a corresponding response - thus assuming its ARP entry is no longer valid. Based on preliminary evidence after 5 unsuccessful attempts at unicast ARP OS X reverts to standard broadcast ARP and is successful. This is causing a symptom that resembles a short period of unresponsiveness or packet loss.
In layman's terms, what is happening is that Mavericks is confused by the redundant network environment that your Mac host is communicating thru in order to access the Internet. This confusion is happening because the IP gateway does not reside directly on Core Router #1 or #2, instead, it is virtualized across the two physical core routers.
This type of network infrastructure setup is implemented in order to guard hosts against an entire gateway router from going down due to failure or for maintenance at which time, all hosts connected to the router cluster will remain online. On Cisco equipment, this is performed using a protocol known as HSRP (Hot Standby Routing Protocol).
Please note that it seems that we are not seeing 100% success with this fix/solution. We are still getting a few reports of mavericks machines 'going away' intermittently on the network even after the patches are placed. We would however say that this is a 99% fix:
Permanently change unicast ARP setting to 0 by running this command in terminal in OS X:
sudo su touch /etc/sysctl.conf echo net.link.ether.inet.arp_unicast_lim=0 >> /etc/sysctl.conf chown root:wheel /etc/sysctl.conf chmod 0644 /etc/sysctl.conf
REBOOT YOUR MINI and then TEST to confirm after a reboot you can run this command in terminal in OS X:
sudo sysctl -a | grep net.link.ether.inet.arp_unicast_lim
OS X should then respond with the following, and a ZERO at the end. If it is a 5, then you did it wrong.
#2: The Energy Management Issue:
A lot of subscribers have called in where their Mac mini host is not reachable, and we discover that it is powered down. In almost all cases, the reason for this is that Mavericks reset the Energy management settings during the upgrade from 10.8 to 10.9. This setting is critical in the data center environment because your ability to remotely power on/off your mini via your MacStadium dashboard requires this setting to be enabled as follows:
#3: The Display and GPU Enabler Issue:
Almost all of the minis here at MacStadium have an external GPU Enabler on them to drastically boost video performance on your remote desktop. Most of us have gotten spoiled by this 'hack' which works flawlessly in 10.8.x. Unfortunately, at this time, the GPU Enablers have no effect in 10.9, and in fact, in many cases, we have to remove them to get Mavericks to operate correctly. Some users are seeing a grey screen, until we remove the GPU Enabler dongle, and reboot the mini. This is evident by the lack of screen resolutions in the Scaled setting of Display Preferences. The net effect is that your mini will feel like it is running much slower in 10.9. Users have asked us to revert back to 10.8 for this reason alone.
#4: The Mission Control Displays issue :
We have had several reports where Mavericks remote desktop screen is not fitting into the remote access window, or a black screen, or a grey screen in remote desktop. These are symptoms of issues in the new Mavericks multi-display features, and turning off this setting seems to help.