NXP SJA1105 switch driver

Overview

The NXP SJA1105 is a family of 10 SPI-managed automotive switches:

  • SJA1105E: First generation, no TTEthernet

  • SJA1105T: First generation, TTEthernet

  • SJA1105P: Second generation, no TTEthernet, no SGMII

  • SJA1105Q: Second generation, TTEthernet, no SGMII

  • SJA1105R: Second generation, no TTEthernet, SGMII

  • SJA1105S: Second generation, TTEthernet, SGMII

  • SJA1110A: Third generation, TTEthernet, SGMII, integrated 100base-T1 and100base-TX PHYs

  • SJA1110B: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX

  • SJA1110C: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX

  • SJA1110D: Third generation, TTEthernet, SGMII, 100base-T1

Being automotive parts, their configuration interface is geared towardsset-and-forget use, with minimal dynamic interaction at runtime. Theyrequire a static configuration to be composed by software and packedwith CRC and table headers, and sent over SPI.

The static configuration is composed of several configuration tables. Eachtable takes a number of entries. Some configuration tables can be (partially)reconfigured at runtime, some not. Some tables are mandatory, some not:

Table

Mandatory

Reconfigurable

Schedule

no

no

Schedule entry points

if Scheduling

no

VL Lookup

no

no

VL Policing

if VL Lookup

no

VL Forwarding

if VL Lookup

no

L2 Lookup

no

no

L2 Policing

yes

no

VLAN Lookup

yes

yes

L2 Forwarding

yes

partially (fully on P/Q/R/S)

MAC Config

yes

partially (fully on P/Q/R/S)

Schedule Params

if Scheduling

no

Schedule Entry Points Params

if Scheduling

no

VL Forwarding Params

if VL Forwarding

no

L2 Lookup Params

no

partially (fully on P/Q/R/S)

L2 Forwarding Params

yes

no

Clock Sync Params

no

no

AVB Params

no

no

General Params

yes

partially

Retagging

no

yes

xMII Params

yes

no

SGMII

no

yes

Also the configuration is write-only (software cannot read it back from theswitch except for very few exceptions).

The driver creates a static configuration at probe time, and keeps it atall times in memory, as a shadow for the hardware state. When required tochange a hardware setting, the static configuration is also updated.If that changed setting can be transmitted to the switch through the dynamicreconfiguration interface, it is; otherwise the switch is reset andreprogrammed with the updated static configuration.

Switching features

The driver supports the configuration of L2 forwarding rules in hardware forport bridging. The forwarding, broadcast and flooding domain between ports canbe restricted through two methods: either at the L2 forwarding level (isolateone bridge’s ports from another’s) or at the VLAN port membership level(isolate ports within the same bridge). The final forwarding decision taken bythe hardware is a logical AND of these two sets of rules.

The hardware tags all traffic internally with a port-based VLAN (pvid), or itdecodes the VLAN information from the 802.1Q tag. Advanced VLAN classificationis not possible. Once attributed a VLAN tag, frames are checked against theport’s membership rules and dropped at ingress if they don’t match any VLAN.This behavior is available when switch ports join a bridge withvlan_filtering1.

Normally the hardware is not configurable with respect to VLAN awareness, butby changing what TPID the switch searches 802.1Q tags for, the semantics of abridge withvlan_filtering0 can be kept (accept all traffic, tagged oruntagged), and therefore this mode is also supported.

Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), butall bridges should have the same level of VLAN awareness (either both havevlan_filtering 0, or both 1).

Topology and loop detection through STP is supported.

Offloads

Time-aware scheduling

The switch supports a variation of the enhancements for scheduled trafficspecified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used toensure deterministic latency for priority traffic that is sent in-band with itsgate-open event in the network schedule.

This capability can be managed through the tc-taprio offload (‘flags 2’). Thedifference compared to the software implementation of taprio is that the latterwould only be able to shape traffic originated from the CPU, but notautonomously forwarded flows.

The device has 8 traffic classes, and maps incoming frames to one of them basedon the VLAN PCP bits (if no VLAN is present, the port-based default is used).As described in the previous sections, depending on the value ofvlan_filtering, the EtherType recognized by the switch as being VLAN caneither be the typical 0x8100 or a custom value used internally by the driverfor tagging. Therefore, the switch ignores the VLAN PCP if used in standaloneor bridge mode withvlan_filtering=0, as it will not recognize the 0x8100EtherType. In these modes, injecting into a particular TX queue can only bedone by the DSA net devices, which populate the PCP field of the tagging headeron egress. Usingvlan_filtering=1, the behavior is the other way around:offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSAnet devices are no longer able to do that. To inject frames into a hardware TXqueue with VLAN awareness active, it is necessary to create a VLANsub-interface on the DSA conduit port, and send normal (0x8100) VLAN-taggedtowards the switch, with the VLAN PCP bits set appropriately.

Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is thenotable exception: the switch always treats it with a fixed priority anddisregards any VLAN PCP bits even if present. The traffic class for managementtraffic has a value of 7 (highest priority) at the moment, which is notconfigurable in the driver.

Below is an example of configuring a 500 us cyclic schedule on egress portswp5. The traffic class gate for management traffic (7) is open for 100 us,and the gates for all other traffic classes are open for 400 us:

#!/bin/bashset -e -u -o pipefailNSEC_PER_SEC="1000000000"gatemask() {        local tc_list="$1"        local mask=0        for tc in ${tc_list}; do                mask=$((${mask} | (1 << ${tc})))        done        printf "%02x" ${mask}}if ! systemctl is-active --quiet ptp4l; then        echo "Please start the ptp4l service"        exitfinow=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')# Phase-align the base time to the start of the next second.sec=$(echo "${now}" | gawk -F. '{ print $1; }')base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"tc qdisc add dev swp5 parent root handle 100 taprio \        num_tc 8 \        map 0 1 2 3 5 6 7 \        queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \        base-time ${base_time} \        sched-entry S $(gatemask 7) 100000 \        sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \        flags 2

It is possible to apply the tc-taprio offload on multiple egress ports. Thereare hardware restrictions related to the fact that no gate event may triggersimultaneously on two ports. The driver checks the consistency of the schedulesagainst this restriction and errors out when appropriate. Schedule analysis isneeded to avoid this, which is outside the scope of the document.

Routing actions (redirect, trap, drop)

The switch is able to offload flow-based redirection of packets to a set ofdestination ports specified by the user. Internally, this is implemented bymaking use of Virtual Links, a TTEthernet concept.

The driver supports 2 types of keys for Virtual Links:

  • VLAN-aware virtual links: these match on destination MAC address, VLAN ID andVLAN PCP.

  • VLAN-unaware virtual links: these match on destination MAC address only.

The VLAN awareness state of the bridge (vlan_filtering) cannot be changed whilethere are virtual link rules installed.

Composing multiple actions inside the same rule is supported. When only routingactions are requested, the driver creates a “non-critical” virtual link. Whenthe action list also contains tc-gate (more details below), the virtual linkbecomes “time-critical” (draws frame buffers from a reserved memory partition,etc).

The 3 routing actions that are supported are “trap”, “drop” and “redirect”.

Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to theCPU and to swp3. This type of key (DA only) when the port’s VLAN awarenessstate is off:

tc qdisc add dev swp2 clsacttc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \        action mirred egress redirect dev swp3 \        action trap

Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VIDof 100 and a PCP of 0:

tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \        dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop

Time-based ingress policing

The TTEthernet hardware abilities of the switch can be constrained to actsimilarly to the Per-Stream Filtering and Policing (PSFP) clause specified inIEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to performtight timing-based admission control for up to 1024 flows (identified by atuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets whichare received outside their expected reception window are dropped.

This capability can be managed through the offload of the tc-gate action. Asrouting actions are intrinsic to virtual links in TTEthernet (which performsexplicit routing of time-critical traffic and does not leave that in the handsof the FDB, flooding etc), the tc-gate action may never appear alone whenasking sja1105 to offload it. One (or more) redirect or trap actions must alsofollow along.

Example: create a tc-taprio schedule that is phase-aligned with a tc-gateschedule (the clocks must be synchronized by a 1588 application stack, which isoutside the scope of this document). No packet delivered by the sender will bedropped. Note that the reception window is larger than the transmission window(and much more so, in this example) to compensate for the packet propagationdelay of the link (which can be determined by the 1588 application stack).

Receiver (sja1105):

tc qdisc add dev swp2 clsactnow=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \        sec=$(echo $now | awk -F. '{print $1}') && \        base_time="$(((sec + 2) * 1000000000))" && \        echo "base time ${base_time}"tc filter add dev swp2 ingress flower skip_sw \        dst_mac 42:be:24:9b:76:20 \        action gate base-time ${base_time} \        sched-entry OPEN  60000 -1 -1 \        sched-entry CLOSE 40000 -1 -1 \        action trap

Sender:

now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \        sec=$(echo $now | awk -F. '{print $1}') && \        base_time="$(((sec + 2) * 1000000000))" && \        echo "base time ${base_time}"tc qdisc add dev eno0 parent root taprio \        num_tc 8 \        map 0 1 2 3 4 5 6 7 \        queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \        base-time ${base_time} \        sched-entry S 01  50000 \        sched-entry S 00  50000 \        flags 2

The engine used to schedule the ingress gate operations is the same that theone used for the tc-taprio offload. Therefore, the restrictions regarding thefact that no two gate actions (either tc-gate or tc-taprio gates) may fire atthe same time (during the same 200 ns slot) still apply.

To come in handy, it is possible to share time-triggered virtual links acrossmore than 1 ingress port, via flow blocks. In this case, the restriction offiring at the same time does not apply because there is a single schedule inthe system, that of the shared virtual link:

tc qdisc add dev swp2 ingress_block 1 clsacttc qdisc add dev swp3 ingress_block 1 clsacttc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \        action gate index 2 \        base-time 0 \        sched-entry OPEN 50000000 -1 -1 \        sched-entry CLOSE 50000000 -1 -1 \        action trap

Hardware statistics for each flow are also available (“pkts” counts the numberof dropped frames, which is a sum of frames dropped due to timing violations,lack of destination ports and MTU enforcement checks). Byte-level counters arenot available.

Limitations

The SJA1105 switch family always performs VLAN processing. When configured asVLAN-unaware, frames carry a different VLAN tag internally, depending onwhether the port is standalone or under a VLAN-unaware bridge.

The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but thedriver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-awarebridge. Otherwise, it fills in the VLAN ID and PCP automatically, based onwhether the port is standalone or in a VLAN-unaware bridge, and accepts only“VLAN-unaware” tc-flower keys (MAC DA).

The existing tc-flower keys that are offloaded using virtual links are nolonger operational after one of the following happens:

  • port was standalone and joins a bridge (VLAN-aware or VLAN-unaware)

  • port is part of a bridge whose VLAN awareness state changes

  • port was part of a bridge and becomes standalone

  • port was standalone, but another port joins a VLAN-aware bridge and thischanges the global VLAN awareness state of the bridge

The driver cannot veto all these operations, and it cannot update/remove theexisting tc-flower filters either. So for proper operation, the tc-flowerfilters should be installed only after the forwarding configuration of the porthas been made, and removed by user space before making any changes to it.

Device Tree bindings and board design

This section referencesDocumentation/devicetree/bindings/net/dsa/nxp,sja1105.yamland aims to showcase some potential switch caveats.

RMII PHY role and out-of-band signaling

In the RMII spec, the 50 MHz clock signals are either driven by the MAC or byan external oscillator (but not by the PHY).But the spec is rather loose and devices go outside it in several ways.Some PHYs go against the spec and may provide an output pin where they sourcethe 50 MHz clock themselves, in an attempt to be helpful.On the other hand, the SJA1105 is only binary configurable - when in the RMIIMAC role it will also attempt to drive the clock signal. To prevent this fromhappening it must be put in RMII PHY role.But doing so has some unintended consequences.In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].These are practically some extra code words (/J/ and /K/) sent prior to thepreamble of each frame. The MAC does not have this out-of-band signalingmechanism defined by the RMII spec.So when the SJA1105 port is put in PHY role to avoid having 2 drivers on theclock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105emulates a PHY interface fully and generates the /J/ and /K/ symbols prior toframe preambles, which the real PHY is not expected to understand. So the PHYsimply encodes the extra symbols received from the SJA1105-as-PHY onto the100Base-Tx wire.On the other side of the wire, some link partners might discard these extrasymbols, while others might choke on them and discard the entire Ethernetframes that follow along. This looks like packet loss with some link partnersbut not with others.The take-away is that in RMII mode, the SJA1105 must be let to drive thereference clock if connected to a PHY.

RGMII fixed-link and internal delays

As mentioned in the bindings document, the second generation of devices hastunable delay lines as part of the MAC, which can be used to establish thecorrect RGMII timing budget.When powered up, these can shift the Rx and Tx clocks with a phase differencebetween 73.8 and 101.7 degrees.The catch is that the delay lines need to lock onto a clock signal with astable frequency. This means that there must be at least 2 microseconds ofsilence between the clock at the old vs at the new frequency. Otherwise thelock is lost and the delay lines must be reset (powered down and back up).In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during theAN process.In the situation where the switch port is connected through an RGMII fixed-linkto a link partner whose link state life cycle is outside the control of Linux(such as a different SoC), then the delay lines would remain unlocked (andinactive) until there is manual intervention (ifdown/ifup on the switch port).The take-away is that in RGMII mode, the switch’s internal delays are onlyreliable if the link partner never changes link speeds, or if it does, it doesso in a way that is coordinated with the switch port (practically, both ends ofthe fixed-link are under control of the same Linux system).As to why would a fixed-link interface ever change link speeds: there areEthernet controllers out there which come out of reset in 100 Mbps mode, andtheir driver inevitably needs to change the speed and clock frequency if it’srequired to work at gigabit.

MDIO bus and PHY management

The SJA1105 does not have an MDIO bus and does not perform in-band AN either.Therefore there is no link state notification coming from the switch device.A board would need to hook up the PHYs connected to the switch to any otherMDIO bus available to Linux within the system (e.g. to the DSA conduit’s MDIObus). Link state management then works by the driver manually keeping in sync(over SPI commands) the MAC link speed with the settings negotiated by the PHY.

By comparison, the SJA1110 supports an MDIO slave access point over which itsinternal 100base-T1 PHYs can be accessed from the host. This is, however, notused by the driver, instead the internal 100base-T1 and 100base-TX PHYs areaccessed through SPI commands, modeled in Linux as virtual MDIO buses.

The microcontroller attached to the SJA1110 port 0 also has an MDIO controlleroperating in master mode, however the driver does not support this either,since the microcontroller gets disabled when the Linux driver operates.Discrete PHYs connected to the switch ports should have their MDIO interfaceattached to an MDIO controller from the host system and not to the switch,similar to SJA1105.

Port compatibility matrix

The SJA1105 port compatibility matrix is:

Port

SJA1105E/T

SJA1105P/Q

SJA1105R/S

0

xMII

xMII

xMII

1

xMII

xMII

xMII

2

xMII

xMII

xMII

3

xMII

xMII

xMII

4

xMII

xMII

SGMII

The SJA1110 port compatibility matrix is:

Port

SJA1110A

SJA1110B

SJA1110C

SJA1110D

0

RevMII (uC)

RevMII (uC)

RevMII (uC)

RevMII (uC)

1

100base-TXor SGMII

100base-TX

100base-TX

SGMII

2

xMIIor SGMII

xMII

xMII

xMIIor SGMII

3

xMIIor SGMIIor 2500base-X

xMIIor SGMIIor 2500base-X

xMII

SGMIIor 2500base-X

4

SGMIIor 2500base-X

SGMIIor 2500base-X

SGMIIor 2500base-X

SGMIIor 2500base-X

5

100base-T1

100base-T1

100base-T1

100base-T1

6

100base-T1

100base-T1

100base-T1

100base-T1

7

100base-T1

100base-T1

100base-T1

100base-T1

8

100base-T1

100base-T1

n/a

n/a

9

100base-T1

100base-T1

n/a

n/a

10

100base-T1

n/a

n/a

n/a