Jump to content
Mieze

New Driver for Realtek RTL8111

1,363 posts in this topic

Recommended Posts

The main reason I'm choosing for SMB instead of all other protocols (Netatalk/NFS, etc.) is that it works on all devices that are in my (home) network.

Also it is the protocol I already have knowledge of how it works. (That doesn't mean I don't want to learn about others..)

 

I'm still thinking of adding other sharing-protocols to my NAS, but I don't want the network to be 'loaded' with unneeded traffic/broadcasts.

I also don't know what impact it has on my NAS, because.. well it is not the most powerful machine.

 

There is no need to worry. Most SoC based NAS boxes are even less powerful and they run SAMBA and Netatalk simultaneously. Two years ago, when I started experimenting with Netatalk I was using a WD MyBook Live (800MHz PPC single Core, 256MB RAM). As the box runs debian I began to customize it. Finally I managed to get Netatalk, SAMBA and openldap working without any performance issues (~90MB/sec throughput read/write) and with more RAM I could have added even more features.

 

Mieze

Share this post


Link to post
Share on other sites
Advertisement

Hello everyone,

 

I know there has been talk about poor performance with SMB but has anyone noticed poor refresh rates using this driver with ARD Apple Remote Desktop as a server?

 

I haven't noticed any errors in the log messages, or have I tried the debug version. I still need to do more testing, I will post my results after I have completed more testing.

 

I just wanted to get someone else's take or experience with Apple Remote Desktop and this driver.

 

Thanks,

 

Robert (mrengles)

Share this post


Link to post
Share on other sites

 

I just wanted to get someone else's take or experience with Apple Remote Desktop and this driver.

 

 

FWIW, I use ARD almost daily from my Z77X-UP5 (intel nic) to my Z68MX (realtek with this driver) and it works very well. no refresh issues and performance not any noticeably different than when i was using the lnx2mac as far as ARD. In this case the realtek is sharing its desktop to my intel LAN if that matters. I rarely do the opposite direction. The only time i had refresh issues when on my LAN was when i discovered ARD decided to connect to my mac mini server over IPv6 which i think was actually going over the internet and back.

 

Hope that helps,

g\

Share this post


Link to post
Share on other sites

Hello everyone,

 

I know there has been talk about poor performance with SMB but has anyone noticed poor refresh rates using this driver with ARD Apple Remote Desktop as a server?

 

I haven't noticed any errors in the log messages, or have I tried the debug version. I still need to do more testing, I will post my results after I have completed more testing.

 

Hello mrengels!

 

I use the same configuration for my homeserver (10.8.3 Server) which usually runs without display connected. On the login screen the screen refresh is sometimes slow and incomplete. When I'm logged in the refresh is fast but I get artifacts (black frames or rectangles) from time to time, but when a display is connected to the machine (HD4000/DVI), ARD works flawlessly. There are no artifacts or slow refreshes.

 

All machines are connected via Gigabit Ethernet.

 

The problem is not related to any particular NIC or driver. I saw this with the lnx2mac driver I used when I set up the machine last year. Later I added an Intel 82574L card using Apple's driver and disabled onboard LAN. The problem persisted. Two month ago I switched over to the Realtek NIC using my driver but nothing has changed.

 

Mieze

Share this post


Link to post
Share on other sites

I just wanted to let you know about the results of my latest tests with regard to the SMB performance issue.

  • SMB throughput when communicating with another Mac via SMB has been significantly improved so that it is on a par with Apple's Broadcom driver in both directions.
  • When communicating with Win7 machines performance is also good.
  • With Windows Server 2008 R2 (64bit) performance is even better than with Win7 in both directions.
  • Communication with WinXP hosts hasn't improved at all and is still lousy.

The strange thing is that Apple's Broadcom driver shows the same weakness when exchanging data with WinXP machines. The performance is as bad as with my driver. It looks like certain Windows versions trigger the issue?

 

Mieze

Edited by Mieze

Share this post


Link to post
Share on other sites

I just wanted to let you know about the results of my latest tests with regard to the SMB performance issue.

  • SMB throughput when communicating with another Mac via SMB has been significantly improved so that it is on a par with Apple's Broadcom driver in both directions.
  • When communicating with Win7 machines performance is also good.
  • With Windows Server 2008 R2 (64bit) performance is even better than with Win7 in both directions.
  • Communication with WinXP hosts hasn't improved at all and is still lousy.

The strange thing is that Apple's Broadcom driver shows the same weakness when exchanging data with WinXP machines. The performance is as bad as with my driver. It looks like certain Windows versions trigger the issue?

 

Mieze

 

No changes to achieve this?? FYI: My WHS2011 is basically a Windows 2008 R2...

Share this post


Link to post
Share on other sites

No changes to achieve this?? FYI: My WHS2011 is basically a Windows 2008 R2...

 

I've changed interrupt mitigate to 0xaf54 because listing large directory trees using ls -alR with AFP was a little bit slower with 0xaf83. I haven't checked the previous value 0xaf83 with Server 2008 R2 because it's clear that 0xaf54 is closer to the optimum.

 

By the way I added a config option to Info.plist to set the interrupt mitigate value without rebuild. It's quite straightforward, you'll only have to convert the hex number to a decimal and put it in.

 

Lets summarize the changes since the last official release, version 1.0.4:

  • Support for TCP/IPv6 and UDP/IPv6 checksum offload added (can be disabled in Info.plist).
  • Maximum size of the scatter-gather-list has been increased from 24 to 40 segments to resolve performance issues with TSO4 when offloading large packets which are highly fragmented.
  • TSO4 can be disabled in Info.plist without rebuild.
  • Statistics gathering has been improved to deliver more detailed information (resource shortages, transmitter resets, transmitter interrupt count).
  • The interrupt mitigate settings has been changed to improve performance with SMB and to reduce CPU load.
  • Configuration option added to allow for user defined interrupt mitigate settings without rebuild (see above).

You are encouraged to test this release candidate thoroughly, in particular with IPv6. As I don't have an IPv6 enabled internet connection my tests are limited to LAN but so far I have no evidence for any problems with TCP/IPv6 and UDP/IPv6 checksum offload. This version is running perfectly on my home server for 2 days now and if there are no unexpected problems I'm planning to make this version the next stable release, version 1.1.0.

 

Known issues:

  • There are still performance problems with regard to SMB in certain configurations. My tests indicate that Apple's Broadcom driver shows the same behavior with those configurations. Obviously it's a more general problem that is not limited to my driver.
  • RTL8111C: WoL does not work .

Mieze

 

 

@nozyczek: This version uses 40 for kMaxSegs. Please test it in order to see if this is sufficient.

RealtekRTL8111-V1.1.0-RC1.zip

Edited by Mieze

Share this post


Link to post
Share on other sites

Hello Mieze

 

I'm french, my english is bad.

 

My Mobo is Gigabyte GA-H55M-S2 with LGA1156 and Core I3, ethernet Realtek 8111E

 

After try a lot of version from http://lnx2mac.blogs...osx-driver.html with 10.8.3, no WOL and no WOD.

 

So

- clean up my DSDT = nothing

- clear cache = nothing.

- try different version = nothing

 

I try your driver and ALL WORKS with my 10.8.3.

 

Very Very Very Very Very good work

 

Thank you, Thank you, Thank you, Thank you

 

:thumbsup_anim::thanks_speechbubble:

Share this post


Link to post
Share on other sites

Are you sure that the permissions are set to root:wheel 755? I managed to load the driver even from desktop provided the permissions are correct.

 

Mieze

Yes, I did that. Same issue. System profiler shows no Ethernet adapter while booting from USB flash drive.

I created the USB drive using this guide, then moved your kext into /Extra/Extension. All other kexts (FakeSMC, NullCPUPowerManagement, VoodooPS2Controller) work.

Share this post


Link to post
Share on other sites

Yes, I did that. Same issue. System profiler shows no Ethernet adapter while booting from USB flash drive.

I created the USB drive using this guide, then moved your kext into /Extra/Extension. All other kexts (FakeSMC, NullCPUPowerManagement, VoodooPS2Controller) work.

 

You probably have another driver in /Extra/Extensions that is conflicting. A lot of USB tools install a rollback IONetworkFamily.kext that includes a lot of network drivers in the Contents/PlugIns directory, for example.

Share this post


Link to post
Share on other sites

I've changed interrupt mitigate to 0xaf54 because listing large directory trees using ls -alR with AFP was a little bit slower with 0xaf83. I haven't checked the previous value 0xaf83 with Server 2008 R2 because it's clear that 0xaf54 is closer to the optimum.

 

By the way I added a config option to Info.plist to set the interrupt mitigate value without rebuild. It's quite straightforward, you'll only have to convert the hex number to a decimal and put it in.

 

Lets summarize the changes since the last official release, version 1.0.4:

  • Support for TCP/IPv6 and UDP/IPv6 checksum offload added (can be disabled in Info.plist).
  • Maximum size of the scatter-gather-list has been increased from 24 to 40 segments to resolve performance issues with TSO4 when offloading large packets which are highly fragmented.
  • TSO4 can be disabled in Info.plist without rebuild.
  • Statistics gathering has been improved to deliver more detailed information (resource shortages, transmitter resets, transmitter interrupt count).
  • The interrupt mitigate settings has been changed to improve performance with SMB and to reduce CPU load.
  • Configuration option added to allow for user defined interrupt mitigate settings without rebuild (see above).

You are encouraged to test this release candidate thoroughly, in particular with IPv6. As I don't have an IPv6 enabled internet connection my tests are limited to LAN but so far I have no evidence for any problems with TCP/IPv6 and UDP/IPv6 checksum offload. This version is running perfectly on my home server for 2 days now and if there are no unexpected problems I'm planning to make this version the next stable release, version 1.1.0.

 

Known issues:

  • There are still performance problems with regard to SMB in certain configurations. My tests indicate that Apple's Broadcom driver shows the same behavior with those configurations. Obviously it's a more general problem that is not limited to my driver.
  • RTL8111C: WoL does not work .

Mieze

 

 

@nozyczek: This version uses 40 for kMaxSegs. Please test it in order to see if this is sufficient.

 

OK... results with this version: My reads from server have now improved to 4-5MB/sec with net.inet.tcp.delayed_ack=3 (the default). If I change that to net.inet.tcp.delayed_ack=0 I can get average 20MB/sec with peaks into 30MB/sec (which is better than I've ever seen with this driver). Writes, however, are now slowed down to 2-3MB/sec (either setting for delayed_ack), but as I stated before the 'better' write performance seemed random, so maybe I just haven't been lucky lately.

 

Hopefully, some day I'll have more time to chase this down more fully or another machine to test with, but for now that's all I have...

Share this post


Link to post
Share on other sites

OK... results with this version: My reads from server have now improved to 4-5MB/sec with net.inet.tcp.delayed_ack=3 (the default). If I change that to net.inet.tcp.delayed_ack=0 I can get average 20MB/sec with peaks into 30MB/sec (which is better than I've ever seen with this driver). Writes, however, are now slowed down to 2-3MB/sec (either setting for delayed_ack), but as I stated before the 'better' write performance seemed random, so maybe I just haven't been lucky lately.

 

Hopefully, some day I'll have more time to chase this down more fully or another machine to test with, but for now that's all I have...

 

Last night I had the idea to play with the LAN connection setting on the XP machine (Macbook Pro late 2006, Marvell Yukon NIC, as client) in order to improve SMB performance. Although I didn't had the time for extensive experiments, only 20 minutes, the results are promising. Disabling QoS Packet Scheduler for the connection boosted read throughput so that I got decent reads for the first time. I was able to copy a 2GB file from the server to the XP client in less than a minute. Unfortunately write speed seems to be unaffected by this change.

 

I also tried to vary the NIC interrupt mitigate settings but didn't got any conclusive results except the fact that there is an influence on SMB performance.

 

I know that you can't apply these results directly to WHS 2011 but I think that it might be worth to give it a try.

 

Mieze

Share this post


Link to post
Share on other sites

@nozyczek: This version uses 40 for kMaxSegs. Please test it in order to see if this is sufficient.

 

Mieze,

RealtekRTL8111-V1.1.0-RC1looks great! iperf shows stable ~941 both ways. Impressive!

Awesome job!

Share this post


Link to post
Share on other sites

Last night I had the idea to play with the LAN connection setting on the XP machine (Macbook Pro late 2006, Marvell Yukon NIC, as client) in order to improve SMB performance. Although I didn't had the time for extensive experiments, only 20 minutes, the results are promising. Disabling QoS Packet Scheduler for the connection boosted read throughput so that I got decent reads for the first time. I was able to copy a 2GB file from the server to the XP client in less than a minute. Unfortunately write speed seems to be unaffected by this change.

 

I also tried to vary the NIC interrupt mitigate settings but didn't got any conclusive results except the fact that there is an influence on SMB performance.

 

I know that you can't apply these results directly to WHS 2011 but I think that it might be worth to give it a try.

 

Mieze

 

I'm making some progress. I have fixed the slow receive/read problem. By looking at slice's code and the original Linux driver code and a *lot* of experimentation, study, code review, etc, I have boiled the receive problem down to differences in your version of interruptOccurred.

 

Here is the new version:


void RTL8111::interruptOccurred(OSObject *client, IOInterruptEventSource *src, int count)
{
WriteReg16(IntrMask, 0x0000);

for (int count = 20; count > 0; count--) {

/* Read interrupt status to determine work */
UInt16 status = ReadReg16(IntrStatus);
status &= (intrMask | TxDescUnavail);
/* Clear interrupt status with work done this iteration */
WriteReg16(IntrStatus, status);

/* hotplug/major error/no more work/shared irq */
if ((status == 0xFFFF) || !(status & intrMask))
break;

if (status & SYSErr) {
pciErrorInterrupt();
break;
}

/* Seems redundant, but it's in the 8168 code... */
if ((status & TxOK) && (status & TxDescUnavail)) {
WriteReg8(TxPoll, NPQ); /* set polling bit */
}

/* Rx interrupt */
if (status & (RxOK | RxDescUnavail | RxFIFOOver))
rxInterrupt();

/* Tx interrupt */
if (status & (TxOK | TxErr /*| TxDescUnavail*/))
txInterrupt();

if (status & LinkChg)
checkLinkStatus();

/* Check if a statistics dump has been completed. */
if (needsUpdate && !(ReadReg32(CounterAddrLow) & CounterDump))
updateStatitics();
}

if (0 == count) {
IOLog("Ethernet [RealtekRTL81111]: max count reached in interrupt service.\n");
}

/* Write clean mask */
WriteReg16(IntrMask, intrMask);
}

 

Now I get about ~51MB/sec with a Finder copy from my SMB server to the SSD on the laptop. I think the critical change is that the IntrStatus is written closer (in time) to the read. Probably the loop is helpful too...

 

File copy to the server is about 7 to 10MB/sec which is better too. And now I can reproduce the effect of doing copies simultaneously increasing the write speed. If during the copy of a large file to the server (the write case here), I simultaneously start copying a large file from the server to the laptop (the read case above) throughput on the copy to the server jumps to ~49MB/sec (read speed remains stable at ~51MB/sec). If I stop the copy from the server to the laptop, the copy to the server slows back down.

 

This could be affected by the interrupt mitigate value, so I'll play with that too. With a lot of received packets happening, it is more likely more interrupts are generated, and since all types of interrupts are processed each interrupt...

 

I'm going to experiment with the code some more on the transmit side. Since slice's version doesn't have this issue, I think I'll experiment copying the output packets to memory as one descriptor. This is the only major difference I can see between your driver and slice's version... You send output packets as potentially fragmented/chained dma descriptors, whereas slice's driver always sends a packet as a single dma descriptor.

 

BTW, I have fixed all the bugs in slice's version as far as not negotiating speed 1000, and other weird happenings when the cable is unplugged (there was a garbage local being used). So at this point, I'm looking to fix your driver instead of trying to add checksum offload to slice's driver.

 

I'll update status here as I figure out more...

 

P.S. Sorry about the poor indenting in the code above. It should look ok when you paste it with xcode. This site is really brain dead when it comes to stripping leading spaces from code blocks.

Share this post


Link to post
Share on other sites

Now I get about ~51MB/sec with a Finder copy from my SMB server to the SSD on the laptop. I think the critical change is that the IntrStatus is written closer (in time) to the read. Probably the loop is helpful too...

 

If clearing the interrupt status register earlier in the loop helps, it would be no problem for me to change this, provided it doesn't cause any unwanted side effects. If the loop has any effect could be easily determined by adding a statistics variable to get the maximum number count has reached but as the txInterrupt() and rxInterrupt() functions handle as many finished descriptors as available, i.e. all received / transmitted packets, I doubt that count won't go higher than 1 or 2, but of course this depends on your exact system configuration. In case the driver's thread gets preempted in between there might be more runs.

 

File copy to the server is about 7 to 10MB/sec which is better too. And now I can reproduce the effect of doing copies simultaneously increasing the write speed. If during the copy of a large file to the server (the write case here), I simultaneously start copying a large file from the server to the laptop (the read case above) throughput on the copy to the server jumps to ~49MB/sec (read speed remains stable at ~51MB/sec). If I stop the copy from the server to the laptop, the copy to the server slows back down.

 

Have you played with the settings on the Windows machine? I'm pretty sure that the slow writes are triggered by it because my test results with different setups show that this is the only consistent factor.

 

This could be affected by the interrupt mitigate value, so I'll play with that too. With a lot of received packets happening, it is more likely more interrupts are generated, and since all types of interrupts are processed each interrupt...

 

Keep an eye on system load in general and on smbd in particular. top is a good helper to find out what the machine is doing. A high load might be the result of too many interrupts but in case it's idling most of the time during a write operation it might be waiting for an answer from the other endpoint that isn't coming. The latest release counts transmitter interrupts and puts the value into the ethernet statistics so that you can check it in IORegistryExplorer and after uncommenting the last line in rxInterrupt() you'll also get the number of receiver interrupts.

 

I'm going to experiment with the code some more on the transmit side. Since slice's version doesn't have this issue, I think I'll experiment copying the output packets to memory as one descriptor. This is the only major difference I can see between your driver and slice's version... You send output packets as potentially fragmented/chained dma descriptors, whereas slice's driver always sends a packet as a single dma descriptor.

No, the most important difference is concurrency which widely affects timing. I let the NIC calculate checksums and segment large TCP packets so that the network stack is less involved in defining the exact timing because there is a lot of work still going on after outputPacket() returned. TSO acts on packets of up to 64KB which means that one call of outputPacket() could result in the transmission of more than 40 ethernet packets.

 

On the receiver side packets come in with checksum verification already done which means that they will be handled much faster. Also keep in mind the side effects of checksum calculation by the CPU. It's not limited to consumption of cycles but it also churns up the cache affecting other tasks too. Microsoft has done excellent research on that topic (see NDIS docs).

 

Mieze

 

Mieze,

RealtekRTL8111-V1.1.0-RC1looks great! iperf shows stable ~941 both ways. Impressive!

Awesome job!

 

Thank you very much for the tests! I will push the latest version to github next week and update the binaries.

 

Mieze

Share this post


Link to post
Share on other sites

If clearing the interrupt status register earlier in the loop helps, it would be no problem for me to change this, provided it doesn't cause any unwanted side effects. If the loop has any effect could be easily determined by adding a statistics variable to get the maximum number count has reached but as the txInterrupt() and rxInterrupt() functions handle as many finished descriptors as available, i.e. all received / transmitted packets, I doubt that count won't go higher than 1 or 2, but of course this depends on your exact system configuration. In case the driver's thread gets preempted in between there might be more runs.

 

I'll do a test w/ only one loop just to see if the loop is helping. Both slice's driver and the Linux driver have this loop and that's why I added it.

 

Have you played with the settings on the Windows machine? I'm pretty sure that the slow writes are triggered by it because my test results with different setups show that this is the only consistent factor.

 

I did but they didn't make a difference. On top of that, this performance problem is only present with your driver, which makes me think it is something in the driver (like it is [was] with the receive side...)

 

Keep an eye on system load in general and on smbd in particular. top is a good helper to find out what the machine is doing. A high load might be the result of too many interrupts but in case it's idling most of the time during a write operation it might be waiting for an answer from the other endpoint that isn't coming. The latest release counts transmitter interrupts and puts the value into the ethernet statistics so that you can check it in IORegistryExplorer and after uncommenting the last line in rxInterrupt() you'll also get the number of receiver interrupts.

 

Thanks... I'll check it out. Last time I looked for 'smbd' I could not find it.

 

No, the most important difference is concurrency which widely affects timing. I let the NIC calculate checksums and segment large TCP packets so that the network stack is less involved in defining the exact timing because there is a lot of work still going on after outputPacket() returned. TSO acts on packets of up to 64KB which means that one call of outputPacket() could result in the transmission of more than 40 ethernet packets.

 

At this point there are a lot of differences, and I don't know yet what the difference causing my issue is, but I hope to figure it out. I think I'll also add some code to track how many segments might be in the mbuf passed to outputPacket and how large they are. Slice's code assumes there is 1608 or less bytes in each mbuf_t passed to outputPacket (that's how much memory is allocated for each tx dma descriptor buffer, and the code doesn't check for overflow). Is your driver's outputPacket treated differently for some reason where you must deal with larger mbuf_t packets?

 

On the receiver side packets come in with checksum verification already done which means that they will be handled much faster. Also keep in mind the side effects of checksum calculation by the CPU. It's not limited to consumption of cycles but it also churns up the cache affecting other tasks too. Microsoft has done excellent research on that topic (see NDIS docs).

 

I agree that checksum offload is a great feature to have and I can see where providing dma pointers directly into the network stack buffers (mbuf_t) should be of great advantage. But not when I get only 10MB/sec on a gig connection.

 

I appreciate the feedback... I'm learning as I go...

Share this post


Link to post
Share on other sites

I'll do a test w/ only one loop just to see if the loop is helping. Both slice's driver and the Linux driver have this loop and that's why I added it.

With linux it makes sense because the interrupt handler runs at interrupt level but as I already stated earlier in this thread OS X is different with regard to that point.

 

I did but they didn't make a difference. On top of that, this performance problem is only present with your driver, which makes me think it is something in the driver (like it is [was] with the receive side...)

 

No, the problem also exists with Apple's Broadcom driver. Check out the reports about bad SMB performance of Apple users.

 

At this point there are a lot of differences, and I don't know yet what the difference causing my issue is, but I hope to figure it out. I think I'll also add some code to track how many segments might be in the mbuf passed to outputPacket and how large they are. Slice's code assumes there is 1608 or less bytes in each mbuf_t passed to outputPacket (that's how much memory is allocated for each tx dma descriptor buffer, and the code doesn't check for overflow). Is your driver's outputPacket treated differently for some reason where you must deal with larger mbuf_t packets?

First, Slice's driver doesn't have to handle physical segments because it copies every packet to/from a physical contiguous DMA buffer. Second, as long as you don't use TSO or jumbo frames there won't be any packets (mbufs) larger than 1518 Bytes. Third, you won't receive any multisegment mbufs unless the driver tells the network stack in getFeature() that it can handle them.

 

As of now I haven't seen much debug data from you so that I barely know what is going on.

 

Edit: Are you aware of the fact that this could trigger a feedback loop? Letting the NIC poll the transmitter descriptor ring will cause the TxDescUnavail bit in the interrupt status register to be set again when all descriptors have been finished and I don't see TxDescUnavail to be cleared at any point.

 

/* Seems redundant, but it's in the 8168 code... */
if ((status & TxOK) && (status & TxDescUnavail)) {
WriteReg8(TxPoll, NPQ); /* set polling bit */
}

 

Mieze

Edited by Mieze

Share this post


Link to post
Share on other sites

With linux it makes sense because the interrupt handler runs at interrupt level but as I already stated earlier in this thread OS X is different with regard to that point.

 

Yes, and I thought that too about setting the IntrMask to zero and re-enabling it at the end -- shouldn't be necessary with a workloop based interrupt, right?. Since this OS X interrupt handler is not a "real" interrupt handler (it executes in a kernel thread, part of workloop, after the actual interrupt has been handled; real interrupt handler just triggers the thread/workloop). But I tried removing it and it caused all kinds of problems... and I still don't understand why.

 

No, the problem also exists with Apple's Broadcom driver. Check out the reports about bad SMB performance of Apple users.

 

Doesn't happen w/ lnx2mac or slice's, so it is something I want to keep looking at before I blame Apple completely.

 

First, Slice's driver doesn't have to handle physical segments because it copies every packet to/from a physical contiguous DMA buffer. Second, as long as you don't use TSO or jumbo frames there won't be any packets (mbufs) larger than 1518 Bytes. Third, you won't receive any multisegment mbufs unless the driver tells the network stack in getFeature() that it can handle them.

 

Sounds like something more to play with. I notice slice's driver doesn't implement getFeatures so it must be getting base class implementation.

 

As of now I haven't seen much debug data from you so that I barely know what is going on.

 

There really isn't anything to see. I've run the debug version w/ DebugLog modified so the output from the driver can be easily identified, and there is almost nothing of interest.

 

Edit: Are you aware of the fact that this could trigger a feedback loop? Letting the NIC poll the transmitter descriptor ring will cause the TxDescUnavail bit in the interrupt status register to be set again when all descriptors have been finished and I don't see TxDescUnavail to be cleared at any point.

 

/* Seems redundant, but it's in the 8168 code... */
if ((status & TxOK) && (status & TxDescUnavail)) {
WriteReg8(TxPoll, NPQ); /* set polling bit */
}

 

Thanks for the heads up, but I don't think it will because TxDescUnavail is not set in IntrMask. And, actually, I think it is cleared by this code:

/* Read interrupt status to determine work */
UInt16 status = ReadReg16(IntrStatus);
status &= (intrMask | TxDescUnavail);
/* Clear interrupt status with work done this iteration */
WriteReg16(IntrStatus, status);

 

But this code is not really necessary for the 'receive fix'. But I'm working on fixing the transmit side too... Accidently left it in there before I posted the code for you. It was something in slice's code, so I thought it was worth a try (being that is xmit related).

 

I think just moving the write to IntrStatus closer to the read helps me on the receive side. Maybe you could test it on your side to see if it causes any issues with your devices. I'll keep working on the xmit problem.

Share this post


Link to post
Share on other sites

Yes, and I thought that too about setting the IntrMask to zero and re-enabling it at the end -- shouldn't be necessary with a workloop based interrupt, right?. Since this OS X interrupt handler is not a "real" interrupt handler (it executes in a kernel thread, part of workloop, after the actual interrupt has been handled; real interrupt handler just triggers the thread/workloop). But I tried removing it and it caused all kinds of problems... and I still don't understand why.

 

Interrupt mask has to be cleared in order to clear bits in interrupt status properly. I already figured that out during my tests a long time ago.

 

There really isn't anything to see. I've run the debug version w/ DebugLog modified so the output from the driver can be easily identified, and there is almost nothing of interest.

 

Maybe there isn't anything, but there could be the missing hint. In between it would be interesting to find out what the machine is doing during send operations. What does top say? By the way the watchdog timer routine is very useful to get statistics data every second. How many interrupts are there? Have you created a packet dump with Wireshark?

 

Thanks for the heads up, but I don't think it will because TxDescUnavail is not set in IntrMask. And, actually, I think it is cleared by this code:

/* Read interrupt status to determine work */
UInt16 status = ReadReg16(IntrStatus);
status &= (intrMask | TxDescUnavail);
/* Clear interrupt status with work done this iteration */
WriteReg16(IntrStatus, status);

 

But this code is not really necessary for the 'receive fix'. But I'm working on fixing the transmit side too... Accidently left it in there before I posted the code for you. It was something in slice's code, so I thought it was worth a try (being that is xmit related).

 

I think just moving the write to IntrStatus closer to the read helps me on the receive side. Maybe you could test it on your side to see if it causes any issues with your devices. I'll keep working on the xmit problem.

 

Bits in the interrupt status register might get set even if the corresponding bit in the interrupt mask register is cleared. Although they don't cause an interrupt when they are masked, they prevent your loop from exiting when work is done which means that your loop boils down to busy waiting which could be achieved easier.

 

Mieze

Share this post


Link to post
Share on other sites

Interrupt mask has to be cleared in order to clear bits in interrupt status properly. I already figured that out during my tests a long time ago.

 

It is a little strange, as you would think the mask and status would be independent. But I'm sure Realtek expected these tasks to be performed at actual interrupt time instead of later... Probably this is documented by Realtek, but probably only under NDA...

 

Bits in the interrupt status register might get set even if the corresponding bit in the interrupt mask register is cleared. Although they don't cause an interrupt when they are masked, they prevent your loop from exiting when work is done which means that your loop boils down to busy waiting which could be achieved easier.

 

Yes, they may/will get set, but they won't generate interrupts and the code will not stay in that loop for bits that aren't in the mask. See:

/* hotplug/major error/no more work/shared irq */
if ((status == 0xFFFF) || !(status & intrMask))
break;

 

But all this is neither here nor there. This is just experimental code to try and determine the root cause of the problem.

 

Here's an interesting experiment I did:

void RTL8111::interruptOccurred(OSObject *client, IOInterruptEventSource *src, int count)
{
WriteReg16(IntrMask, 0x0000);

for (int count = 1/*kMaxInterruptWork*/; count > 0; count--) {

/* Read interrupt status to determine work */
UInt16 status = ReadReg16(IntrStatus);
status &= (intrMask | TxDescUnavail);

/* hotplug/major error/no more work/shared irq */
if ((status == 0xFFFF) || !(status & intrMask))
break;

if (status & SYSErr) {
pciErrorInterrupt();
break;
}

/* Seems redundant, but it's in the 8168 code... */
////if ((status & TxOK) && (status & TxDescUnavail)) {
//// WriteReg8(TxPoll, NPQ); /* set polling bit */
////}

/* Tx interrupt */
if (status & (TxOK | TxErr | TxDescUnavail)) {
txInterrupt();
/* !!!!! EXPERIMENTAL !!!!! */
if (kNumTxDesc != txNumFreeDesc)
status &= ~(TxOK | TxErr | TxDescUnavail);
}

/* Clear interrupt status with work done this iteration */
WriteReg16(IntrStatus, status);

/* Rx interrupt */
if (status & (RxOK | RxDescUnavail | RxFIFOOver))
rxInterrupt();

if (status & LinkChg)
checkLinkStatus();

/* Check if a statistics dump has been completed. */
if (needsUpdate && !(ReadReg32(CounterAddrLow) & CounterDump))
updateStatitics();
}

if (0 == count) {
IOLog("Ethernet [RealtekRTL81111]: max count reached in interrupt service.\n");
}

/* Write clean mask */
WriteReg16(IntrMask, intrMask);
}

 

Basically, just as an experiment, I left the xmit related status bits uncleared if there was still work pending in the xmit descriptors... With this, I get ~40MB/sec writes from laptop to sever. Excessive CPU usage along with it, of course, but it also 'fixes' the write performance problem. Perhaps that gives us some clues. It certainly gives me some further ideas to try.

 

It kind of says to me "too much interrupt mitigation on the receive side." SMB stack may be waiting on acks that are late to arrive?

 

Also, I'm kind of wondering if the chip doesn't appreciate having the interrupt status query/clear so late (well after asserting IRQ). I may experiment with installing a real interrupt handler to clear status earlier in the process (saving the status [cumulative bitwise-or] for later inspection by the workloop based interrupt handler, of course).

 

I won't have time to work on this for the next couple of days, but will resume when I can...

Share this post


Link to post
Share on other sites

It is a little strange, as you would think the mask and status would be independent. But I'm sure Realtek expected these tasks to be performed at actual interrupt time instead of later... Probably this is documented by Realtek, but probably only under NDA...

 

There are 25 versions of the RTL8111, each with its own bugs and quirks. As Realtek's products are targeted to the mass market it might also be a matter of cost not to eliminate design errors.

 

Basically, just as an experiment, I left the xmit related status bits uncleared if there was still work pending in the xmit descriptors... With this, I get ~40MB/sec writes from laptop to sever. Excessive CPU usage along with it, of course, but it also 'fixes' the write performance problem. Perhaps that gives us some clues. It certainly gives me some further ideas to try.

 

We know now conclusively that busy waiting resolves the issue.

 

It kind of says to me "too much interrupt mitigation on the receive side." SMB stack may be waiting on acks that are late to arrive?

 

... so that network statistics and Wireshark should show a huge number of retransmitted packets during SMB writes while the CPU would be idling most of the time. Running ping for a minute should give you a rough estimation about packet roundtrip time.

 

Also, I'm kind of wondering if the chip doesn't appreciate having the interrupt status query/clear so late (well after asserting IRQ). I may experiment with installing a real interrupt handler to clear status earlier in the process (saving the status [cumulative bitwise-or] for later inspection by the workloop based interrupt handler, of course).

 

This would result in an impact on any protocol but so far we only have a SMB performance issue.

 

Mieze

Share this post


Link to post
Share on other sites

I've pushed version 1.1.0 to github and updated the prebuild binaries. There have been no changes since RC1. I decided to put the binaries into the download section of this site. Please see:http://www.insanelymac.com/forum/files/category/5-lan-and-wireless/

 

Mieze

Share this post


Link to post
Share on other sites

Mieze,

I just ran 1.1.0 under 10.9 dp1. Everything seems to be working OK. I will do performance testing, WOL etc when I find a moment.

nozyczek

Share this post


Link to post
Share on other sites

Mieze,

I just ran 1.1.0 under 10.9 dp1. Everything seems to be working OK. I will do performance testing, WOL etc when I find a moment.

nozyczek

Thanks for the test! By the way Realtek has just updated the Linux sources the driver is based on. I'll merge in that new code, version 8.036.00, as soon as possible. My plans for the future also include:

 

  • Try to find a solution for the WoL issue with the RTL8111C.
  • Add support for TCP/IPv6 segmentation offload (TSO6). After reverse engineering the Win7 driver I found out how it has to be done but still haven't found some time to test my theory.

Mieze

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Mieze
      Being asked to add support for Realtek's Fast Ethernet PCIe NICs to my RTL8111 driver I got tired of answering the same old question again and again so that I finally decided to write a separate driver for these chips and to make a few of you guys and gals happy.
       
      As of now the driver supports the following members the RTL810X Fast Ethernet family:
      RTL8101E RTL8102E RTL8103E RTL8401E RTL8105E RTL8402 RTL8106E RTL8106EUS RTL8107E   Here is a list of the driver's basic features:
      Supports Sierra (maybe El Capitan). 64 bit architecture only. Support for multisegment packets relieving the network stack of unnecessary copy operations when assembling packets for transmission. No-copy receive and transmit. Only small packets are copied on reception because creating a copy is more efficient than allocating a new buffer. TCP, UDP and IPv4 checksum offload (receive and transmit). TCP segmentation offload under IPv4. Support for TCP/IPv6 and UDP/IPv6 checksum offload. Supports Wake on LAN. Support for Energy Efficient Ethernet (EEE) which can be disabled by setting enableEEE to NO in the drivers Info.plist without rebuild. The default is YES. The driver is published under GPLv2. Built using Xcode 4.6.3.  
      Changelog Version 2.0.1 (2018-05-10): Fixes a problem with retrieval of the permanent MAC address on some chips. Version 2.0.0 (2017-04-04): Uses Apple's private driver interface introduced with 10.8. Adds support for the RTL8107E. Supports packet scheduling with QFQ. Adds support for flow control and EEE. Version 1.0.0 (2014-05-24): First offical release.     Installation   Before you install the driver you have to remove any installed driver for RTL810X. Goto /S/L/E and delete the old driver. Recreate the kernel cache. Open System Preferences and delete the corresponding network interface, e. g. en0. If you forget this step you might experience strange problems with certain Apple domains, iTunes and iCloud later. Install the new driver and recreate the kernel cache. Reboot Open System Preferences again, select Network and check if the new network interface has been created automatically or create it manually now. Configure the interface.   Help - I'm getting kernel panics!
      Well, before you start complaining about bugs after you upgraded macOS and ask me to publish a driver update, you should first try to resolve the issue on your own by cleaning the system caches.
      As the driver uses macOS's private network driver interface, which is supposed to be used by Apple provided drivers only, you might run into problems after an OS update because the linker may fail to recognize that IONetworking.kext has been updated and that the driver needs to be linked against the new version (Apple provided drivers avoid this problem because they are always updated together with IONetworking.kext). As a result, the linking process produces garbage and the driver may call arbitrary code when trying to call functions from IONetworking.kext. This usually results in unpredicted behavior or a kernel panic. In order to recover from such a situation, you should clean the System Caches forcing the linker to recreate it's caches:
      Delete all the files in /System/Library/Caches and it's subdirectories but leave the directories and the symbolic links intact. This is very important! Reboot. Recreate the kernel cache. Reboot again.  
      Troubleshooting Make sure you have followed the installation instructions especially when you have issues with certain domains while the others are working fine. Use the debug version to collect log data when trying to track down problems. The kernel log messages can be retrieved with "grep kernel /var/log/system.log" in Terminal. Starting from Sierra use "log show --predicate "processID == 0" --debug" in order to retrieve kernel logs. Include the log data when asking for support or giving feedback. I'm an engineer, not a clairvoyant. Don't copy and paste large amounts of log data to your post. Create an archive with the log data and attach it to your post. In case you don't want to make your log data publicly accessible, contact me via PM and I will provide you a mail address to send it directly to me.  Check your BIOS settings. You might want to disable Network Boot and the UEFI Network Stack as these can interfere with the driver. Double check that you have removed any other Realtek kext from your system because they could prevent the driver from working properly. Delete the following files: /Library/Preferences/SystemConfiguration/NetworkInterfaces.plist /Library/Preferences/SystemConfiguration/preferences.plist Verify your bootloader configuration, in particular the kernel flags. Avoid using npci=0x2000 or npci=0x3000.  In Terminal run netstat -s in order to display network statistics. Carefully examine the data for any unusual activity like a high number of packets with bad IP header checksums, etc. In case auto-configuration of the link layer connection doesn't work it might be necessary to select the medium manually in System Preferences under Network for the interface. Use Wireshark to create a packet dump in order to collect diagnostic information. Keep in mind that there are many manufacturers of network equipment. Although Ethernet is an IEEE standard, different implementations may show different behavior causing incompatibilities. In case you are having trouble try a different switch or a different cable.  
      Getting the driver
      There is a prebuilt binary in the Download section of this site: http://www.insanelymac.com/forum/files/file/259-realtekrtl8100-binary/ The source code can be found on Github: https://github.com/Mieze/RealtekRTL8100   Mieze
    • By Mieze
      This project is dedicated to the memory of Mausi, the cat I loved more than anybody else.
       

       
      A few days before Christmas I started my latest project, a new driver for recent Intel onboard LAN controllers. My intention was not to replace hnak's AppleIntelE1000e.kext completely but to deliver best performance and stability on recent hardware. That's why I dropped support for a number of older NICs. Currently the driver supports:
       
      5 Series
      82578LM 82578LC 82578DM 82578DC 6 and 7 Series
      82579LM 82579V 8 and 9 Series
      I217LM I217V I218LM I218V I218LM2 I218V2 I218LM3 100 Series (since V2.1.0d0)
      I219LM I219V 200 Series (since V2.3.0d0)
      I219LM I219V 300 Series (since V2.4.0d0)
      I219LM I219V  
      Key Features of the Driver
      Support for multisegment packets relieving the network stack of unnecessary copy operations when assembling packets for transmission. No-copy receive and transmit. Only small packets are copied on reception because creating a copy is more efficient than allocating a new buffer. TCP, UDP and IPv4 checksum offload (receive and transmit). Support for TCP/IPv6 and UDP/IPv6 checksum offload. Makes use of the chip's TCP Segmentation Offload (TSO) feature with IPv4 and IPv6 in order to reduce CPU load while sending large amounts of data (disabled due to hardware bugs). Fully optimized for Sierra (64bit architecture) but should work with older 64bit versions of macOS too, provided you build from source with the appropriate SDK for the target OS. Support for Energy Efficient Ethernet (EEE). VLAN support is implemented but untested as I have no need for it. The driver is published under GPLv2.  
      Current Status
      The driver has been tested successfully with I217V, I218V and 82579V under 10.9.5 and above. The attached archive includes source code as well as a prebuilt binary (debug version) for Mavericks and newer versions of macOS.   Known Issues
      There seem to be problems while using VMware with version 1.x.x of the driver. In case you are affected use version 2.0.0 or newer.  
      FAQ
      Could you add support of for...? Well, you are probably asking me to add support for one of the older NICs like the 82571/2/3/4L or 82583 and the answer will be no as I dropped support for these chips intentionally. They are broken and I lost more than 2 weeks trying to make it work on the 82574L without success. I was asked to add support for I210, I211 and I350 but as these chips have a completely different architecture, which isn't supported by the underlying Linux driver, this is impossible, sorry. Does it work with Snow Leopard or 32 bit kernels? No and I have no plans to make a version for 32 bit kernels or anything older than Lion. WoL from S5 doesn't work with this driver but under Windows it's working. Is this a driver bug? No it isn't, the driver is working as it should because OS X doesn't support WoL from S5.  
      Installation
      Goto /S/L/E and delete AppleIntelE1000e.kext. Recreate the kernel cache. Open System Preferences and delete the corresponding network interface, e. g. en0. Reboot. Install the new driver and recreate the kernel cache. I recommend to use Kext Wizard or a similar utility for the installation. Reboot Open System Preferences again, select Network and check if the new network interface has been created automatically or create it manually now. Configure the interface.  
      Help - I'm getting kernel panics!
      Well, before you start complaining about bugs after you upgraded macOS and ask me to publish a driver update, you should first try to resolve the issue on your own by cleaning the system caches.
      As the driver uses macOS's private network driver interface, which is supposed to be used by Apple provided drivers only, you might run into problems after an OS update because the linker may fail to recognize that IONetworking.kext has been updated and that the driver needs to be linked against the new version (Apple provided drivers avoid this problem because they are always updated together with IONetworking.kext). As a result, the linking process produces garbage and the driver may call arbitrary code when trying to call functions from IONetworking.kext. This usually results in unpredicted behavior or a kernel panic. In order to recover from such a situation, you should clean the System Caches forcing the linker to recreate it's caches:
      Delete all the files in /System/Library/Caches and it's subdirectories but leave the directories and the symbolic links intact. This is very important! Reboot. Recreate the kernel cache. Reboot again.  
      Troubleshooting
      Make sure you have followed the installation instructions especially when you have issues with certain domains while the others are working fine. Use the debug version to collect log data when trying to track down problems. The kernel log messages can be retrieved with "grep kernel /var/log/system.log" in Terminal. Starting from Sierra use "log show --predicate "processID == 0" --debug" in order to retrieve kernel logs. Include the log data when asking for support or giving feedback. I'm an engineer, not a clairvoyant. Don't copy and paste large amounts of log data to your post. Create an archive with the log data and attach it to your post. In case you don't want to make your log data publicly accessible, contact me via PM and I will provide you a mail address to send it directly to me.  Check your BIOS settings. You might want to disable Network Boot and the UEFI Network Stack as these can interfere with the driver. Double check that you have removed any AppleIntelE1000e.kext from your system because it could prevent the driver from working properly. Delete the following files: /Library/Preferences/SystemConfiguration/NetworkInterfaces.plist /Library/Preferences/SystemConfiguration/preferences.plist Verify your bootloader configuration, in particular the kernel flags. Avoid using npci=0x2000 or npci=0x3000.  In Terminal run netstat -s in order to display network statistics. Carefully examine the data for any unusual activity like a high number of packets with bad IP header checksums, etc. In case auto-configuration of the link layer connection doesn't work it might be necessary to select the medium manually in System Preferences under Network for the interface. Use Wireshark to create a packet dump in order to collect diagnostic information. Keep in mind that there are many manufacturers of network equipment. Although Ethernet is an IEEE standard, different implementations may show different behavior causing incompatibilities. In case you are having trouble try a different switch or a different cable. Changelog
      Version 2.4.0 (2018-04-14) Added support for 300 series versions of I219LM and I219V. Updated underlying Linux source code. Version 2.3.0 (2017-06-20) Added support for 200 series versions of I219LM and I219V. Version 2.2.0 (2016-09-23) Disabled TSO to work around a hardware bug. Version 2.1.0 (2016-05-24) Added support for I219LM and I219V Version 2.0.0 (2015-04-22) First official release which is identical to 2.0.0d2 (only the version number has been changed). Version 2.0.0d2 (2015-04-04) Changed the tx descriptor write back policy for 82579, I217 and I218 to prevent random tx deadlocks. Version 2.0.0d1 (2015-03-14) Uses Apple's private driver interface introduced with 10.8. Supports packet scheduling with QFQ Solves the VMware issue. Version 1.0.0d6 (2015-03-04) Reworked TSO6 support to avoid problems with VMware. Wake-on-LAN now working. Version 1.0.0d5 (2015-02-27) Reworked TSO4 support to eliminate the bug of 1.0.0d4. Added some debug code in order to collect information about the VMware related issue. Version 1.0.0d4 (2015-02-25) Set total length field of the IP-header to zero for TSO4 operations. Report EEE activation state in kernel log message when the link has been established. Version 1.0.0d3 (2015-02-11) Reworked media selection and EEE support (EEE is now activated when both link partners support it. It can be disabled selecting the medium manually). Duplex setting for 10/100 MBit connections is now reported correctly. The number of tx descriptors has been reduced from 2048 to 1024. The code has been cleaned up and obsolete files have been removed. Version 1.0.0d2 (2015-01-31) First development release. Getting the Driver
      The source code can be found on GitHub: https://github.com/Mieze/IntelMausiEthernet There is also a prebuilt binary for 10.11 and above in the download section: http://www.insanelymac.com/forum/files/file/396-intelmausiethernet/ Build from Source for 10.8
      Register as a developer on Apple's developer website. A free membership is sufficient. Download a copy of Xcode 5.1.1 and install it on your machine. In the project select 10.8 as the "Base SDK" and the "Deployment Target". Call "Archive" from the menu "Product" and save the built driver. Credits
      Thanks to RehabMan and Yung Raj for running tests and pointing me in the right direction while I was trying to fix TSO. Special thanks to Yung Raj for motivating me when I was about to give up.
    • By grisno
      Hi people,
       
      Installer to activate the sound card REALTEK ALC282-v2 (10ec:0282) with LayoutID 1 or 3 in MacOS. This installer does not contain AppleHDA patched Kext. To work properly, it must be installed over vanilla AppleHDA.kext.
       
      I want to thank the whole community for their efforts and content provided, because without these it would not be possible to create this installer.
       
      I would appreciate comments and suggestions!!
       
      Status:
      Speakers : OK Headphones : OK HDMI Audio : OK (Intel HD4K Tested) LineIn : N/A (Model Without LineIn) MicInt : OK MicIntNoiseReduction : OK MicExt : N/A (Model Without MicExt) AutoDetectLineIn : N/A (Model Without LineIn) Sleep : OK WakeUp : OK AutoSleep : OK Hibernate : OK Siri : OK   Tested Laptops:
       
      - HP Pavillion 15-D002SS
       
      Coming Soon:
       
      - Unified installer for the different supported operating systems.
      - Support model with LineIn jack.
       
      Modified Verbs:
      01271C20 01271D00 01271EA0 01271F90 01471C10 01471D00 01471E17 01471F90 01871CF0 01871D00 01871E00 01871F40 01E71CF0 01E71D00 01E71E00 01E71F40 02171C30 02171D10 02171E21 02171F00 01470C02   DSDT:
       
      Patch to apply with MaciASL in your DSDT
      ######################################### HDEF v1.00######################################## into method label _DSM parent_label HDEF remove_entry;into device label HDEF insertbeginMethod (_DSM, 4, NotSerialized)\n{\n If (LEqual (Arg2, Zero)) { Return (Buffer() { 0x03 } ) }\n Return (Package()\n {\n "layout-id", Buffer() { 0x01, 0x00, 0x00, 0x00 },\n //"layout-id", Buffer() { 0x03, 0x00, 0x00, 0x00 },\n "hda-gfx", Buffer() { "onboard-1" },\n "PinConfigurations", Buffer() { },\n })\n}\nend;  
×