Linux Networking and Network Devices APIs

Linux Networking

Networking Base Types

enumsock_type

Socket types

Constants

SOCK_STREAM

stream (connection) socket

SOCK_DGRAM

datagram (conn.less) socket

SOCK_RAW

raw socket

SOCK_RDM

reliably-delivered message

SOCK_SEQPACKET

sequential packet socket

SOCK_DCCP

Datagram Congestion Control Protocol socket

SOCK_PACKET

linux specific way of getting packets at the dev level.For writing rarp and other similar things on the user level.

Description

When adding some new socket type pleasegrep ARCH_HAS_SOCKET_TYPE include/asm-* /socket.h, at least MIPSoverrides this enum for binary compat reasons.

enumsock_shutdown_cmd

Shutdown types

Constants

SHUT_RD

shutdown receptions

SHUT_WR

shutdown transmissions

SHUT_RDWR

shutdown receptions/transmissions

structsocket

general BSD socket

Definition:

struct socket {    socket_state state;    short type;    unsigned long           flags;    struct file             *file;    struct sock             *sk;    const struct proto_ops  *ops;    struct socket_wq        wq;};

Members

state

socket state (SS_CONNECTED, etc)

type

socket type (SOCK_STREAM, etc)

flags

socket flags (SOCK_NOSPACE, etc)

file

File back pointer for gc

sk

internal networking protocol agnostic socket representation

ops

protocol specific socket operations

wq

wait queue for several uses

Socket Buffer Functions

unsignedintskb_frag_size(constskb_frag_t*frag)

Returns the size of a skb fragment

Parameters

constskb_frag_t*frag

skb fragment

voidskb_frag_size_set(skb_frag_t*frag,unsignedintsize)

Sets the size of a skb fragment

Parameters

skb_frag_t*frag

skb fragment

unsignedintsize

size of fragment

voidskb_frag_size_add(skb_frag_t*frag,intdelta)

Increments the size of a skb fragment bydelta

Parameters

skb_frag_t*frag

skb fragment

intdelta

value to add

voidskb_frag_size_sub(skb_frag_t*frag,intdelta)

Decrements the size of a skb fragment bydelta

Parameters

skb_frag_t*frag

skb fragment

intdelta

value to subtract

boolskb_frag_must_loop(structpage*p)

Test ifp is a high memory page

Parameters

structpage*p

fragment’s page

skb_frag_foreach_page

skb_frag_foreach_page(f,f_off,f_len,p,p_off,p_len,copied)

loop over pages in a fragment

Parameters

f

skb frag to operate on

f_off

offset from start of f->netmem

f_len

length from f_off to loop over

p

(temp var) current page

p_off

(temp var) offset from start of current page,non-zero only on first page.

p_len

(temp var) length in current page,< PAGE_SIZE only on first and last page.

copied

(temp var) length so far, excluding current p_len.

Description

A fragment can hold a compound page, in which case per-pageoperations, notably kmap_atomic, must be called for eachregular page.

structskb_shared_hwtstamps

hardware time stamps

Definition:

struct skb_shared_hwtstamps {    union {        ktime_t hwtstamp;        void *netdev_data;    };};

Members

{unnamed_union}

anonymous

hwtstamp

hardware time stamp transformed into durationsince arbitrary point in time

netdev_data

address/cookie of network device driver used asreference to actual hardware time stamp

Description

Software time stamps generated byktime_get_real() are stored inskb->tstamp.

hwtstamps can only be compared against other hwtstamps fromthe same device.

This structure is attached to packets as part of theskb_shared_info. Useskb_hwtstamps() to get a pointer.

structsk_buff

socket buffer

Definition:

struct sk_buff {    union {        struct {            struct sk_buff          *next;            struct sk_buff          *prev;            union {                struct net_device       *dev;                unsigned long           dev_scratch;            };        };        struct rb_node          rbnode;        struct list_head        list;        struct llist_node       ll_node;    };    struct sock             *sk;    union {        ktime_t tstamp;        u64 skb_mstamp_ns;    };    char cb[48] ;    union {        struct {            unsigned long   _skb_refdst;            void (*destructor)(struct sk_buff *skb);        };        struct list_head        tcp_tsorted_anchor;#ifdef CONFIG_NET_SOCK_MSG;        unsigned long           _sk_redir;#endif;    };#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE);    unsigned long            _nfct;#endif;    unsigned int            len, data_len;    __u16 mac_len, hdr_len;    __u16 queue_mapping;#ifdef __BIG_ENDIAN_BITFIELD;#define CLONED_MASK     (1 << 7);#else;#define CLONED_MASK     1;#endif;#define CLONED_OFFSET           offsetof(struct sk_buff, __cloned_offset);    __u8 cloned:1, nohdr:1, fclone:2, peeked:1, head_frag:1, pfmemalloc:1, pp_recycle:1;#ifdef CONFIG_SKB_EXTENSIONS;    __u8 active_extensions;#endif;    __u8 pkt_type:3;    __u8 ignore_df:1;    __u8 dst_pending_confirm:1;    __u8 ip_summed:2;    __u8 ooo_okay:1;    __u8 tstamp_type:2;#ifdef CONFIG_NET_XGRESS;    __u8 tc_at_ingress:1;    __u8 tc_skip_classify:1;#endif;    __u8 remcsum_offload:1;    __u8 csum_complete_sw:1;    __u8 csum_level:2;    __u8 inner_protocol_type:1;    __u8 l4_hash:1;    __u8 sw_hash:1;#ifdef CONFIG_WIRELESS;    __u8 wifi_acked_valid:1;    __u8 wifi_acked:1;#endif;    __u8 no_fcs:1;    __u8 encapsulation:1;    __u8 encap_hdr_csum:1;    __u8 csum_valid:1;#ifdef CONFIG_IPV6_NDISC_NODETYPE;    __u8 ndisc_nodetype:2;#endif;#if IS_ENABLED(CONFIG_IP_VS);    __u8 ipvs_property:1;#endif;#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || IS_ENABLED(CONFIG_NF_TABLES);    __u8 nf_trace:1;#endif;#ifdef CONFIG_NET_SWITCHDEV;    __u8 offload_fwd_mark:1;    __u8 offload_l3_fwd_mark:1;#endif;    __u8 redirected:1;#ifdef CONFIG_NET_REDIRECT;    __u8 from_ingress:1;#endif;#ifdef CONFIG_NETFILTER_SKIP_EGRESS;    __u8 nf_skip_egress:1;#endif;#ifdef CONFIG_SKB_DECRYPTED;    __u8 decrypted:1;#endif;    __u8 slow_gro:1;#if IS_ENABLED(CONFIG_IP_SCTP);    __u8 csum_not_inet:1;#endif;    __u8 unreadable:1;#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS);    __u16 tc_index;#endif;    u16 alloc_cpu;    union {        __wsum csum;        struct {            __u16 csum_start;            __u16 csum_offset;        };    };    __u32 priority;    int skb_iif;    __u32 hash;    union {        u32 vlan_all;        struct {            __be16 vlan_proto;            __u16 vlan_tci;        };    };#if defined(CONFIG_NET_RX_BUSY_POLL) || defined(CONFIG_XPS);    union {        unsigned int    napi_id;        unsigned int    sender_cpu;    };#endif;#ifdef CONFIG_NETWORK_SECMARK;    __u32 secmark;#endif;    union {        __u32 mark;        __u32 reserved_tailroom;    };    union {        __be16 inner_protocol;        __u8 inner_ipproto;    };    __u16 inner_transport_header;    __u16 inner_network_header;    __u16 inner_mac_header;    __be16 protocol;    __u16 transport_header;    __u16 network_header;    __u16 mac_header;#ifdef CONFIG_KCOV;    u64 kcov_handle;#endif;    sk_buff_data_t tail;    sk_buff_data_t end;    unsigned char           *head, *data;    unsigned int            truesize;    refcount_t users;#ifdef CONFIG_SKB_EXTENSIONS;    struct skb_ext          *extensions;#endif;};

Members

{unnamed_union}

anonymous

{unnamed_struct}

anonymous

next

Next buffer in list

prev

Previous buffer in list

{unnamed_union}

anonymous

dev

Device we arrived on/are leaving by

dev_scratch

(akadev) alternate use ofdev whendev would beNULL

rbnode

RB tree node, alternative to next/prev for netem/tcp

list

queue head

ll_node

anchor in an llist (eg socket defer_list)

sk

Socket we are owned by

{unnamed_union}

anonymous

tstamp

Time we arrived/left

skb_mstamp_ns

(akatstamp) earliest departure time; start pointfor retransmit timer

cb

Control buffer. Free for use by every layer. Put private vars here

{unnamed_union}

anonymous

{unnamed_struct}

anonymous

_skb_refdst

destination entry (with norefcount bit)

destructor

Destruct function

tcp_tsorted_anchor

list structure for TCP (tp->tsorted_sent_queue)

_sk_redir

socket redirection information for skmsg

_nfct

Associated connection, if any (with nfctinfo bits)

len

Length of actual data

data_len

Data length

mac_len

Length of link layer header

hdr_len

writable header length of cloned skb

queue_mapping

Queue mapping for multiqueue devices

cloned

Head may be cloned (check refcnt to be sure)

nohdr

Payload reference only, must not modify header

fclone

skbuff clone status

peeked

this packet has been seen already, so stats have beendone for it, don’t do them again

head_frag

skb was allocated from page fragments,not allocated bykmalloc() orvmalloc().

pfmemalloc

skbuff was allocated from PFMEMALLOC reserves

pp_recycle

mark the packet for recycling instead of freeing (impliespage_pool support on driver)

active_extensions

active extensions (skb_ext_id types)

pkt_type

Packet class

ignore_df

allow local fragmentation

dst_pending_confirm

need to confirm neighbour

ip_summed

Driver fed us an IP checksum

ooo_okay

allow the mapping of a socket to a queue to be changed

tstamp_type

When set, skb->tstamp has thedelivery_time clock base of skb->tstamp.

tc_at_ingress

used within tc_classify to distinguish in/egress

tc_skip_classify

do not classify packet. set by IFB device

remcsum_offload

remote checksum offload is enabled

csum_complete_sw

checksum was completed by software

csum_level

indicates the number of consecutive checksums found inthe packet minus one that have been verified asCHECKSUM_UNNECESSARY (max 3)

inner_protocol_type

whether the inner protocol isENCAP_TYPE_ETHER or ENCAP_TYPE_IPPROTO

l4_hash

indicate hash is a canonical 4-tuple hash over transportports.

sw_hash

indicates hash was computed in software stack

wifi_acked_valid

wifi_acked was set

wifi_acked

whether frame was acked on wifi or not

no_fcs

Request NIC to treat last 4 bytes as Ethernet FCS

encapsulation

indicates the inner headers in the skbuff are valid

encap_hdr_csum

software checksum is needed

csum_valid

checksum is already valid

ndisc_nodetype

router type (from link layer)

ipvs_property

skbuff is owned by ipvs

nf_trace

netfilter packet trace flag

offload_fwd_mark

Packet was L2-forwarded in hardware

offload_l3_fwd_mark

Packet was L3-forwarded in hardware

redirected

packet was redirected by packet classifier

from_ingress

packet was redirected from the ingress path

nf_skip_egress

packet shall skip nf egress - see netfilter_netdev.h

decrypted

Decrypted SKB

slow_gro

state present at GRO time, slower prepare step required

csum_not_inet

use CRC32c to resolve CHECKSUM_PARTIAL

unreadable

indicates that at least 1 of the fragments in this skb isunreadable.

tc_index

Traffic control index

alloc_cpu

CPU which did the skb allocation.

{unnamed_union}

anonymous

csum

Checksum (must include start/offset pair)

{unnamed_struct}

anonymous

csum_start

Offset from skb->head where checksumming should start

csum_offset

Offset from csum_start where checksum should be stored

priority

Packet queueing priority

skb_iif

ifindex of device we arrived on

hash

the packet hash

{unnamed_union}

anonymous

vlan_all

vlan fields (proto & tci)

{unnamed_struct}

anonymous

vlan_proto

vlan encapsulation protocol

vlan_tci

vlan tag control information

{unnamed_union}

anonymous

napi_id

id of the NAPIstructthis skb came from

sender_cpu

(akanapi_id) source CPU in XPS

secmark

security marking

{unnamed_union}

anonymous

mark

Generic packet mark

reserved_tailroom

(akamark) number of bytes of free space availableat the tail of an sk_buff

{unnamed_union}

anonymous

inner_protocol

Protocol (encapsulation)

inner_ipproto

(akainner_protocol) stores ipproto whenskb->inner_protocol_type == ENCAP_TYPE_IPPROTO;

inner_transport_header

Inner transport layer header (encapsulation)

inner_network_header

Network layer header (encapsulation)

inner_mac_header

Link layer header (encapsulation)

protocol

Packet protocol from driver

transport_header

Transport layer header

network_header

Network layer header

mac_header

Link layer header

kcov_handle

KCOV remote handle for remote coverage collection

tail

Tail pointer

end

End pointer

head

Head of buffer

data

Data head pointer

truesize

Buffer size

users

User count - see {datagram,tcp}.c

extensions

allocated extensions, valid if active_extensions is nonzero

boolskb_pfmemalloc(conststructsk_buff*skb)

Test if the skb was allocated from PFMEMALLOC reserves

Parameters

conststructsk_buff*skb

buffer

structdst_entry*skb_dst(conststructsk_buff*skb)

returns skb dst_entry

Parameters

conststructsk_buff*skb

buffer

Return

skb dst_entry, regardless of reference taken or not.

unsignedlongskb_dstref_steal(structsk_buff*skb)

return current dst_entry value and clear it

Parameters

structsk_buff*skb

buffer

Description

Resets skb dst_entry without adjusting its reference count. Useful incases where dst_entry needs to be temporarily reset and restored.Note that the returned value cannot be used directly because itmight contain SKB_DST_NOREF bit.

When in doubt, preferskb_dst_drop() overskb_dstref_steal() to correctlyhandle dst_entry reference counting.

Return

original skb dst_entry.

voidskb_dstref_restore(structsk_buff*skb,unsignedlongrefdst)

restore skb dst_entry removed viaskb_dstref_steal()

Parameters

structsk_buff*skb

buffer

unsignedlongrefdst

dst entry from a call toskb_dstref_steal()

voidskb_dst_set(structsk_buff*skb,structdst_entry*dst)

sets skb dst

Parameters

structsk_buff*skb

buffer

structdst_entry*dst

dst entry

Description

Sets skb dst, assuming a reference was taken on dst and shouldbe released byskb_dst_drop()

voidskb_dst_set_noref(structsk_buff*skb,structdst_entry*dst)

sets skb dst, hopefully, without taking reference

Parameters

structsk_buff*skb

buffer

structdst_entry*dst

dst entry

Description

Sets skb dst, assuming a reference was not taken on dst.If dst entry is cached, we do not take reference and dst_releasewill be avoided by refdst_drop. If dst entry is not cached, we takereference, so that last dst_release can destroy the dst immediately.

boolskb_dst_is_noref(conststructsk_buff*skb)

Test if skb dst isn’t refcounted

Parameters

conststructsk_buff*skb

buffer

unsignedintskb_napi_id(conststructsk_buff*skb)

Returns the skb’s NAPI id

Parameters

conststructsk_buff*skb

buffer

boolskb_unref(structsk_buff*skb)

decrement the skb’s reference count

Parameters

structsk_buff*skb

buffer

Return

true if we can free the skb.

voidkfree_skb(structsk_buff*skb)

free an sk_buff with ‘NOT_SPECIFIED’ reason

Parameters

structsk_buff*skb

buffer to free

structsk_buff*alloc_skb(unsignedintsize,gfp_tpriority)

allocate a network buffer

Parameters

unsignedintsize

size to allocate

gfp_tpriority

allocation mask

Description

This function is a convenient wrapper around__alloc_skb().

boolskb_fclone_busy(conststructsock*sk,conststructsk_buff*skb)

check if fclone is busy

Parameters

conststructsock*sk

socket

conststructsk_buff*skb

buffer

Return

true if skb is a fast clone, and its clone is not freed.Some drivers callskb_orphan() in theirndo_start_xmit(),so we also check that didn’t happen.

structsk_buff*alloc_skb_fclone(unsignedintsize,gfp_tpriority)

allocate a network buffer from fclone cache

Parameters

unsignedintsize

size to allocate

gfp_tpriority

allocation mask

Description

This function is a convenient wrapper around__alloc_skb().

intskb_pad(structsk_buff*skb,intpad)

zero pad the tail of an skb

Parameters

structsk_buff*skb

buffer to pad

intpad

space to pad

Description

Ensure that a buffer is followed by a padding area that is zerofilled. Used by network drivers which may DMA or transfer databeyond the buffer end onto the wire.

May return error in out of memory cases. The skb is freed on error.

intskb_queue_empty(conststructsk_buff_head*list)

check if a queue is empty

Parameters

conststructsk_buff_head*list

queue head

Description

Returns true if the queue is empty, false otherwise.

boolskb_queue_empty_lockless(conststructsk_buff_head*list)

check if a queue is empty

Parameters

conststructsk_buff_head*list

queue head

Description

Returns true if the queue is empty, false otherwise.This variant can be used in lockless contexts.

boolskb_queue_is_last(conststructsk_buff_head*list,conststructsk_buff*skb)

check if skb is the last entry in the queue

Parameters

conststructsk_buff_head*list

queue head

conststructsk_buff*skb

buffer

Description

Returns true ifskb is the last buffer on the list.

boolskb_queue_is_first(conststructsk_buff_head*list,conststructsk_buff*skb)

check if skb is the first entry in the queue

Parameters

conststructsk_buff_head*list

queue head

conststructsk_buff*skb

buffer

Description

Returns true ifskb is the first buffer on the list.

structsk_buff*skb_queue_next(conststructsk_buff_head*list,conststructsk_buff*skb)

return the next packet in the queue

Parameters

conststructsk_buff_head*list

queue head

conststructsk_buff*skb

current buffer

Description

Return the next packet inlist afterskb. It is only valid tocall this ifskb_queue_is_last() evaluates to false.

structsk_buff*skb_queue_prev(conststructsk_buff_head*list,conststructsk_buff*skb)

return the prev packet in the queue

Parameters

conststructsk_buff_head*list

queue head

conststructsk_buff*skb

current buffer

Description

Return the prev packet inlist beforeskb. It is only valid tocall this ifskb_queue_is_first() evaluates to false.

structsk_buff*skb_get(structsk_buff*skb)

reference buffer

Parameters

structsk_buff*skb

buffer to reference

Description

Makes another reference to a socket buffer and returns a pointerto the buffer.

intskb_cloned(conststructsk_buff*skb)

is the buffer a clone

Parameters

conststructsk_buff*skb

buffer to check

Description

Returns true if the buffer was generated withskb_clone() and isone of multiple shared copies of the buffer. Cloned buffers areshared data so must not be written to under normal circumstances.

intskb_header_cloned(conststructsk_buff*skb)

is the header a clone

Parameters

conststructsk_buff*skb

buffer to check

Description

Returns true if modifying the header part of the buffer requiresthe data to be copied.

void__skb_header_release(structsk_buff*skb)

allow clones to use the headroom

Parameters

structsk_buff*skb

buffer to operate on

Description

See “DOC: dataref and headerless skbs”.

intskb_shared(conststructsk_buff*skb)

is the buffer shared

Parameters

conststructsk_buff*skb

buffer to check

Description

Returns true if more than one person has a reference to thisbuffer.

structsk_buff*skb_share_check(structsk_buff*skb,gfp_tpri)

check if buffer is shared and if so clone it

Parameters

structsk_buff*skb

buffer to check

gfp_tpri

priority for memory allocation

Description

If the buffer is shared the buffer is cloned and the old copydrops a reference. A new clone with a single reference is returned.If the buffer is not shared the original buffer is returned. Whenbeing called from interrupt status or with spinlocks held pri mustbe GFP_ATOMIC.

NULL is returned on a memory allocation failure.

structsk_buff*skb_unshare(structsk_buff*skb,gfp_tpri)

make a copy of a shared buffer

Parameters

structsk_buff*skb

buffer to check

gfp_tpri

priority for memory allocation

Description

If the socket buffer is a clone then this function creates a newcopy of the data, drops a reference count on the old copy and returnsthe new copy with the reference count at 1. If the buffer is not a clonethe original buffer is returned. When called with a spinlock held orfrom interrupt statepri must beGFP_ATOMIC

NULL is returned on a memory allocation failure.

structsk_buff*skb_peek(conststructsk_buff_head*list_)

peek at the head of ansk_buff_head

Parameters

conststructsk_buff_head*list_

list to peek at

Description

Peek ansk_buff. Unlike most other operations you _MUST_be careful with this one. A peek leaves the buffer on thelist and someone else may run off with it. You must holdthe appropriate locks or have a private queue to do this.

ReturnsNULL for an empty list or a pointer to the head element.The reference count is not incremented and the reference is thereforevolatile. Use with caution.

structsk_buff*__skb_peek(conststructsk_buff_head*list_)

peek at the head of a non-emptysk_buff_head

Parameters

conststructsk_buff_head*list_

list to peek at

Description

Likeskb_peek(), but the caller knows that the list is not empty.

structsk_buff*skb_peek_next(structsk_buff*skb,conststructsk_buff_head*list_)

peek skb following the given one from a queue

Parameters

structsk_buff*skb

skb to start from

conststructsk_buff_head*list_

list to peek at

Description

ReturnsNULL when the end of the list is met or a pointer to thenext element. The reference count is not incremented and thereference is therefore volatile. Use with caution.

structsk_buff*skb_peek_tail(conststructsk_buff_head*list_)

peek at the tail of ansk_buff_head

Parameters

conststructsk_buff_head*list_

list to peek at

Description

Peek ansk_buff. Unlike most other operations you _MUST_be careful with this one. A peek leaves the buffer on thelist and someone else may run off with it. You must holdthe appropriate locks or have a private queue to do this.

ReturnsNULL for an empty list or a pointer to the tail element.The reference count is not incremented and the reference is thereforevolatile. Use with caution.

__u32skb_queue_len(conststructsk_buff_head*list_)

get queue length

Parameters

conststructsk_buff_head*list_

list to measure

Description

Return the length of ansk_buff queue.

__u32skb_queue_len_lockless(conststructsk_buff_head*list_)

get queue length

Parameters

conststructsk_buff_head*list_

list to measure

Description

Return the length of ansk_buff queue.This variant can be used in lockless contexts.

void__skb_queue_head_init(structsk_buff_head*list)

initialize non-spinlock portions of sk_buff_head

Parameters

structsk_buff_head*list

queue to initialize

Description

This initializes only the list and queue length aspects ofan sk_buff_head object. This allows to initialize the listaspects of an sk_buff_head without reinitializing things likethe spinlock. It can also be used for on-stack sk_buff_headobjects where the spinlock is known to not be used.

voidskb_queue_splice(conststructsk_buff_head*list,structsk_buff_head*head)

join two skb lists, this is designed for stacks

Parameters

conststructsk_buff_head*list

the new list to add

structsk_buff_head*head

the place to add it in the first list

voidskb_queue_splice_init(structsk_buff_head*list,structsk_buff_head*head)

join two skb lists and reinitialise the emptied list

Parameters

structsk_buff_head*list

the new list to add

structsk_buff_head*head

the place to add it in the first list

Description

The list atlist is reinitialised

voidskb_queue_splice_tail(conststructsk_buff_head*list,structsk_buff_head*head)

join two skb lists, each list being a queue

Parameters

conststructsk_buff_head*list

the new list to add

structsk_buff_head*head

the place to add it in the first list

voidskb_queue_splice_tail_init(structsk_buff_head*list,structsk_buff_head*head)

join two skb lists and reinitialise the emptied list

Parameters

structsk_buff_head*list

the new list to add

structsk_buff_head*head

the place to add it in the first list

Description

Each of the lists is a queue.The list atlist is reinitialised

void__skb_queue_after(structsk_buff_head*list,structsk_buff*prev,structsk_buff*newsk)

queue a buffer at the list head

Parameters

structsk_buff_head*list

list to use

structsk_buff*prev

place after this buffer

structsk_buff*newsk

buffer to queue

Description

Queue a buffer int the middle of a list. This function takes no locksand you must therefore hold required locks before calling it.

A buffer cannot be placed on two lists at the same time.

void__skb_queue_head(structsk_buff_head*list,structsk_buff*newsk)

queue a buffer at the list head

Parameters

structsk_buff_head*list

list to use

structsk_buff*newsk

buffer to queue

Description

Queue a buffer at the start of a list. This function takes no locksand you must therefore hold required locks before calling it.

A buffer cannot be placed on two lists at the same time.

void__skb_queue_tail(structsk_buff_head*list,structsk_buff*newsk)

queue a buffer at the list tail

Parameters

structsk_buff_head*list

list to use

structsk_buff*newsk

buffer to queue

Description

Queue a buffer at the end of a list. This function takes no locksand you must therefore hold required locks before calling it.

A buffer cannot be placed on two lists at the same time.

structsk_buff*__skb_dequeue(structsk_buff_head*list)

remove from the head of the queue

Parameters

structsk_buff_head*list

list to dequeue from

Description

Remove the head of the list. This function does not take any locksso must be used with appropriate locks held only. The head item isreturned orNULL if the list is empty.

structsk_buff*__skb_dequeue_tail(structsk_buff_head*list)

remove from the tail of the queue

Parameters

structsk_buff_head*list

list to dequeue from

Description

Remove the tail of the list. This function does not take any locksso must be used with appropriate locks held only. The tail item isreturned orNULL if the list is empty.

voidskb_len_add(structsk_buff*skb,intdelta)

adds a number to len fields of skb

Parameters

structsk_buff*skb

buffer to add len to

intdelta

number of bytes to add

void__skb_fill_netmem_desc(structsk_buff*skb,inti,netmem_refnetmem,intoff,intsize)

initialise a fragment in an skb

Parameters

structsk_buff*skb

buffer containing fragment to be initialised

inti

fragment index to initialise

netmem_refnetmem

the netmem to use for this fragment

intoff

the offset to the data withpage

intsize

the length of the data

Description

Initialises thei’th fragment ofskb to point tosize bytes atoffsetoff withinpage.

Does not take any additional reference on the fragment.

voidskb_fill_page_desc(structsk_buff*skb,inti,structpage*page,intoff,intsize)

initialise a paged fragment in an skb

Parameters

structsk_buff*skb

buffer containing fragment to be initialised

inti

paged fragment index to initialise

structpage*page

the page to use for this fragment

intoff

the offset to the data withpage

intsize

the length of the data

Description

As per__skb_fill_page_desc() -- initialises thei’th fragment ofskb to point tosize bytes at offsetoff withinpage. Inaddition updatesskb such thati is the last fragment.

Does not take any additional reference on the fragment.

voidskb_fill_page_desc_noacc(structsk_buff*skb,inti,structpage*page,intoff,intsize)

initialise a paged fragment in an skb

Parameters

structsk_buff*skb

buffer containing fragment to be initialised

inti

paged fragment index to initialise

structpage*page

the page to use for this fragment

intoff

the offset to the data withpage

intsize

the length of the data

Description

Variant ofskb_fill_page_desc() which does not deal withpfmemalloc, if page is not owned by us.

unsignedintskb_headroom(conststructsk_buff*skb)

bytes at buffer head

Parameters

conststructsk_buff*skb

buffer to check

Description

Return the number of bytes of free space at the head of ansk_buff.

intskb_tailroom(conststructsk_buff*skb)

bytes at buffer end

Parameters

conststructsk_buff*skb

buffer to check

Description

Return the number of bytes of free space at the tail of an sk_buff

intskb_availroom(conststructsk_buff*skb)

bytes at buffer end

Parameters

conststructsk_buff*skb

buffer to check

Description

Return the number of bytes of free space at the tail of an sk_buffallocated bysk_stream_alloc()

voidskb_reserve(structsk_buff*skb,intlen)

adjust headroom

Parameters

structsk_buff*skb

buffer to alter

intlen

bytes to move

Description

Increase the headroom of an emptysk_buff by reducing the tailroom. This is only allowed for an empty buffer.

voidskb_tailroom_reserve(structsk_buff*skb,unsignedintmtu,unsignedintneeded_tailroom)

adjust reserved_tailroom

Parameters

structsk_buff*skb

buffer to alter

unsignedintmtu

maximum amount of headlen permitted

unsignedintneeded_tailroom

minimum amount of reserved_tailroom

Description

Set reserved_tailroom so that headlen can be as large as possible butnot larger than mtu and tailroom cannot be smaller thanneeded_tailroom.The required headroom should already have been reserved before usingthis function.

boolskb_reset_transport_header_careful(structsk_buff*skb)

conditionally reset transport header

Parameters

structsk_buff*skb

buffer to alter

Description

Hardened version ofskb_reset_transport_header().

Return

true if the operation was a success.

voidpskb_trim_unique(structsk_buff*skb,unsignedintlen)

remove end from a paged unique (not cloned) buffer

Parameters

structsk_buff*skb

buffer to alter

unsignedintlen

new length

Description

This is identical to pskb_trim except that the caller knows thatthe skb is not cloned so we should never get an error due to out-of-memory.

voidskb_orphan(structsk_buff*skb)

orphan a buffer

Parameters

structsk_buff*skb

buffer to orphan

Description

If a buffer currently has an owner then we call the owner’sdestructor function and make theskb unowned. The buffer continuesto exist but is no longer charged to its former owner.

intskb_orphan_frags(structsk_buff*skb,gfp_tgfp_mask)

orphan the frags contained in a buffer

Parameters

structsk_buff*skb

buffer to orphan frags from

gfp_tgfp_mask

allocation mask for replacement pages

Description

For each frag in the SKB which needs a destructor (i.e. has anowner) create a copy of that frag and release the originalpage by calling the destructor.

void__skb_queue_purge_reason(structsk_buff_head*list,enumskb_drop_reasonreason)

empty a list

Parameters

structsk_buff_head*list

list to empty

enumskb_drop_reasonreason

drop reason

Description

Delete all buffers on ansk_buff list. Each buffer is removed fromthe list and one reference dropped. This function does not take thelist lock and the caller must hold the relevant locks to use it.

void*netdev_alloc_frag(unsignedintfragsz)

allocate a page fragment

Parameters

unsignedintfragsz

fragment size

Description

Allocates a frag from a page for receive buffer.Uses GFP_ATOMIC allocations.

structsk_buff*netdev_alloc_skb(structnet_device*dev,unsignedintlength)

allocate an skbuff for rx on a specific device

Parameters

structnet_device*dev

network device to receive on

unsignedintlength

length to allocate

Description

Allocate a newsk_buff and assign it a usage count of one. Thebuffer has unspecified headroom built in. Users should allocatethe headroom they think they need without accounting for thebuilt in space. The built in space is used for optimisations.

NULL is returned if there is no free memory. Although this functionallocates memory it can be called from an interrupt.

structpage*__dev_alloc_pages(gfp_tgfp_mask,unsignedintorder)

allocate page for network Rx

Parameters

gfp_tgfp_mask

allocation priority. Set __GFP_NOMEMALLOC if not for network Rx

unsignedintorder

size of the allocation

Description

Allocate a new page.

NULL is returned if there is no free memory.

structpage*__dev_alloc_page(gfp_tgfp_mask)

allocate a page for network Rx

Parameters

gfp_tgfp_mask

allocation priority. Set __GFP_NOMEMALLOC if not for network Rx

Description

Allocate a new page.

NULL is returned if there is no free memory.

booldev_page_is_reusable(conststructpage*page)

check whether a page can be reused for network Rx

Parameters

conststructpage*page

the page to test

Description

A page shouldn’t be considered for reusing/recycling if it was allocatedunder memory pressure or at a distant memory node.

Return

false if this page should be returned to page allocator, trueotherwise.

voidskb_propagate_pfmemalloc(conststructpage*page,structsk_buff*skb)

Propagate pfmemalloc if skb is allocated after RX page

Parameters

conststructpage*page

The page that was allocated from skb_alloc_page

structsk_buff*skb

The skb that may need pfmemalloc set

unsignedintskb_frag_off(constskb_frag_t*frag)

Returns the offset of a skb fragment

Parameters

constskb_frag_t*frag

the paged fragment

voidskb_frag_off_add(skb_frag_t*frag,intdelta)

Increments the offset of a skb fragment bydelta

Parameters

skb_frag_t*frag

skb fragment

intdelta

value to add

voidskb_frag_off_set(skb_frag_t*frag,unsignedintoffset)

Sets the offset of a skb fragment

Parameters

skb_frag_t*frag

skb fragment

unsignedintoffset

offset of fragment

voidskb_frag_off_copy(skb_frag_t*fragto,constskb_frag_t*fragfrom)

Sets the offset of a skb fragment from another fragment

Parameters

skb_frag_t*fragto

skb fragment where offset is set

constskb_frag_t*fragfrom

skb fragment offset is copied from

structnet_iov*skb_frag_net_iov(constskb_frag_t*frag)

retrieve the net_iov referred to by fragment

Parameters

constskb_frag_t*frag

the fragment

Return

thestructnet_iov associated withfrag. Returns NULL if thisfrag has no associated net_iov.

structpage*skb_frag_page(constskb_frag_t*frag)

retrieve the page referred to by a paged fragment

Parameters

constskb_frag_t*frag

the paged fragment

Return

thestructpage associated withfrag. Returns NULL if this fraghas no associated page.

netmem_refskb_frag_netmem(constskb_frag_t*frag)

retrieve the netmem referred to by a fragment

Parameters

constskb_frag_t*frag

the fragment

Return

thenetmem_ref associated withfrag.

void*skb_frag_address(constskb_frag_t*frag)

gets the address of the data contained in a paged fragment

Parameters

constskb_frag_t*frag

the paged fragment buffer

Return

the address of the data withinfrag. The page must alreadybe mapped.

void*skb_frag_address_safe(constskb_frag_t*frag)

gets the address of the data contained in a paged fragment

Parameters

constskb_frag_t*frag

the paged fragment buffer

Return

the address of the data withinfrag. Checks that the pageis mapped and returnsNULL otherwise.

voidskb_frag_page_copy(skb_frag_t*fragto,constskb_frag_t*fragfrom)

sets the page in a fragment from another fragment

Parameters

skb_frag_t*fragto

skb fragment where page is set

constskb_frag_t*fragfrom

skb fragment page is copied from

dma_addr_t__skb_frag_dma_map(structdevice*dev,constskb_frag_t*frag,size_toffset,size_tsize,enumdma_data_directiondir)

maps a paged fragment via the DMA API

Parameters

structdevice*dev

the device to map the fragment to

constskb_frag_t*frag

the paged fragment to map

size_toffset

the offset within the fragment (starting at thefragment’s own offset)

size_tsize

the number of bytes to map

enumdma_data_directiondir

the direction of the mapping (PCI_DMA_*)

Description

Maps the page associated withfrag todevice.

intskb_clone_writable(conststructsk_buff*skb,unsignedintlen)

is the header of a clone writable

Parameters

conststructsk_buff*skb

buffer to check

unsignedintlen

length up to which to write

Description

Returns true if modifying the header part of the cloned bufferdoes not requires the data to be copied.

intskb_cow(structsk_buff*skb,unsignedintheadroom)

copy header of skb when it is required

Parameters

structsk_buff*skb

buffer to cow

unsignedintheadroom

needed headroom

Description

If the skb passed lacks sufficient headroom or its data partis shared, data is reallocated. If reallocation fails, an erroris returned and original skb is not changed.

The result is skb with writable area skb->head...skb->tailand at leastheadroom of space at head.

intskb_cow_head(structsk_buff*skb,unsignedintheadroom)

skb_cow but only making the head writable

Parameters

structsk_buff*skb

buffer to cow

unsignedintheadroom

needed headroom

Description

This function is identical to skb_cow except that we replace theskb_cloned check by skb_header_cloned. It should be used whenyou only need to push on some header and do not need to modifythe data.

intskb_padto(structsk_buff*skb,unsignedintlen)

pad an skbuff up to a minimal size

Parameters

structsk_buff*skb

buffer to pad

unsignedintlen

minimal length

Description

Pads up a buffer to ensure the trailing bytes exist and areblanked. If the buffer already contains sufficient data itis untouched. Otherwise it is extended. Returns zero onsuccess. The skb is freed on error.

int__skb_put_padto(structsk_buff*skb,unsignedintlen,boolfree_on_error)

increase size and pad an skbuff up to a minimal size

Parameters

structsk_buff*skb

buffer to pad

unsignedintlen

minimal length

boolfree_on_error

free buffer on error

Description

Pads up a buffer to ensure the trailing bytes exist and areblanked. If the buffer already contains sufficient data itis untouched. Otherwise it is extended. Returns zero onsuccess. The skb is freed on error iffree_on_error is true.

intskb_put_padto(structsk_buff*skb,unsignedintlen)

increase size and pad an skbuff up to a minimal size

Parameters

structsk_buff*skb

buffer to pad

unsignedintlen

minimal length

Description

Pads up a buffer to ensure the trailing bytes exist and areblanked. If the buffer already contains sufficient data itis untouched. Otherwise it is extended. Returns zero onsuccess. The skb is freed on error.

intskb_linearize(structsk_buff*skb)

convert paged skb to linear one

Parameters

structsk_buff*skb

buffer to linarize

Description

If there is no free memory -ENOMEM is returned, otherwise zerois returned and the old skb data released.

boolskb_has_shared_frag(conststructsk_buff*skb)

can any frag be overwritten

Parameters

conststructsk_buff*skb

buffer to test

Return

true if the skb has at least one frag that might be modifiedby an external entity (as invmsplice()/sendfile())

intskb_linearize_cow(structsk_buff*skb)

make sure skb is linear and writable

Parameters

structsk_buff*skb

buffer to process

Description

If there is no free memory -ENOMEM is returned, otherwise zerois returned and the old skb data released.

voidskb_postpull_rcsum(structsk_buff*skb,constvoid*start,unsignedintlen)

update checksum for received skb after pull

Parameters

structsk_buff*skb

buffer to update

constvoid*start

start of data before pull

unsignedintlen

length of data pulled

Description

After doing a pull on a received packet, you need to call this toupdate the CHECKSUM_COMPLETE checksum, or set ip_summed toCHECKSUM_NONE so that it can be recomputed from scratch.

voidskb_postpush_rcsum(structsk_buff*skb,constvoid*start,unsignedintlen)

update checksum for received skb after push

Parameters

structsk_buff*skb

buffer to update

constvoid*start

start of data after push

unsignedintlen

length of data pushed

Description

After doing a push on a received packet, you need to call this toupdate the CHECKSUM_COMPLETE checksum.

void*skb_push_rcsum(structsk_buff*skb,unsignedintlen)

push skb and update receive checksum

Parameters

structsk_buff*skb

buffer to update

unsignedintlen

length of data pulled

Description

This function performs an skb_push on the packet and updatesthe CHECKSUM_COMPLETE checksum. It should be used onreceive path processing instead of skb_push unless you knowthat the checksum difference is zero (e.g., a valid IP header)or you are setting ip_summed to CHECKSUM_NONE.

intpskb_trim_rcsum(structsk_buff*skb,unsignedintlen)

trim received skb and update checksum

Parameters

structsk_buff*skb

buffer to trim

unsignedintlen

new length

Description

This is exactly the same as pskb_trim except that it ensures thechecksum of received packets are still valid after the operation.It can change skb pointers.

boolskb_needs_linearize(structsk_buff*skb,netdev_features_tfeatures)

check if we need to linearize a given skb depending on the given device features.

Parameters

structsk_buff*skb

socket buffer to check

netdev_features_tfeatures

net device features

Description

Returns true if either:1. skb has frag_list and the device doesn’t support FRAGLIST, or2. skb is fragmented and the device does not support SG.

voidskb_get_timestamp(conststructsk_buff*skb,struct__kernel_old_timeval*stamp)

get timestamp from a skb

Parameters

conststructsk_buff*skb

skb to get stamp from

struct__kernel_old_timeval*stamp

pointer tostruct__kernel_old_timeval to store stamp in

Description

Timestamps are stored in the skb as offsets to a base timestamp.This function converts the offset back to astructtimeval and storesit in stamp.

voidskb_data_move(structsk_buff*skb,constintlen,constunsignedintn)

Move packet data and metadata afterskb_push() orskb_pull().

Parameters

structsk_buff*skb

packet to operate on

constintlen

number of bytes pushed or pulled fromsk_buff->data

constunsignedintn

number of bytes tomemmove() from pre-push/pullsk_buff->data

Description

Movesn bytes of packet data, can be zero, and all bytes of skb metadata.

Assumes metadata is located immediately beforesk_buff->data prior to thepush/pull, and that sufficient headroom exists to hold it after anskb_push(). Otherwise, metadata is cleared and a one-time warning is issued.

Preferskb_postpull_data_move() orskb_postpush_data_move() to calling thishelper directly.

voidskb_postpull_data_move(structsk_buff*skb,constunsignedintlen,constunsignedintn)

Move packet data and metadata afterskb_pull().

Parameters

structsk_buff*skb

packet to operate on

constunsignedintlen

number of bytes pulled fromsk_buff->data

constunsignedintn

number of bytes tomemmove() from pre-pullsk_buff->data

Description

Seeskb_data_move() for details.

voidskb_postpush_data_move(structsk_buff*skb,constunsignedintlen,constunsignedintn)

Move packet data and metadata afterskb_push().

Parameters

structsk_buff*skb

packet to operate on

constunsignedintlen

number of bytes pushed ontosk_buff->data

constunsignedintn

number of bytes tomemmove() from pre-pushsk_buff->data

Description

Seeskb_data_move() for details.

voidskb_complete_tx_timestamp(structsk_buff*skb,structskb_shared_hwtstamps*hwtstamps)

deliver cloned skb with tx timestamps

Parameters

structsk_buff*skb

clone of the original outgoing packet

structskb_shared_hwtstamps*hwtstamps

hardware time stamps

Description

PHY drivers may accept clones of transmitted packets fortimestamping via their phy_driver.txtstamp method. These driversmust call this function to return the skb back to the stack with atimestamp.

voidskb_tstamp_tx(structsk_buff*orig_skb,structskb_shared_hwtstamps*hwtstamps)

queue clone of skb with send time stamps

Parameters

structsk_buff*orig_skb

the original outgoing packet

structskb_shared_hwtstamps*hwtstamps

hardware time stamps, may be NULL if not available

Description

If the skb has a socket associated, then this function clones theskb (thus sharing the actual data and optional structures), storesthe optional hardware time stamping information (if non NULL) orgenerates a software time stamp (otherwise), then queues the cloneto the error queue of the socket. Errors are silently ignored.

voidskb_tx_timestamp(structsk_buff*skb)

Driver hook for transmit timestamping

Parameters

structsk_buff*skb

A socket buffer.

Description

Ethernet MAC Drivers should call this function in theirhard_xmit()function immediately before giving the sk_buff to the MAC hardware.

Specifically, one should make absolutely sure that this function iscalled before TX completion of this packet can trigger. Otherwisethe packet could potentially already be freed.

voidskb_complete_wifi_ack(structsk_buff*skb,boolacked)

deliver skb with wifi status

Parameters

structsk_buff*skb

the original outgoing packet

boolacked

ack status

__sum16skb_checksum_complete(structsk_buff*skb)

Calculate checksum of an entire packet

Parameters

structsk_buff*skb

packet to process

Description

This function calculates the checksum over the entire packet plusthe value of skb->csum. The latter can be used to supply thechecksum of a pseudo header as used by TCP/UDP. It returns thechecksum.

For protocols that contain complete checksums such as ICMP/TCP/UDP,this function can be used to verify that checksum on receivedpackets. In that case the function should return zero if thechecksum is correct. In particular, this function will return zeroif skb->ip_summed is CHECKSUM_UNNECESSARY which indicates that thehardware has already verified the correctness of the checksum.

structskb_ext

sk_buff extensions

Definition:

struct skb_ext {    refcount_t refcnt;    u8 offset[SKB_EXT_NUM];    u8 chunks;    char data[] ;};

Members

refcnt

1 on allocation, deallocated on 0

offset

offset to add todata to obtain extension address

chunks

size currently allocated, stored in SKB_EXT_ALIGN_SHIFT units

data

start of extension data, variable sized

Note

offsets/lengths are stored in chunks of 8 bytes, this allows

to use ‘u8’ types while allowing up to 2kb worth of extension data.

voidskb_checksum_none_assert(conststructsk_buff*skb)

make sure skb ip_summed is CHECKSUM_NONE

Parameters

conststructsk_buff*skb

skb to check

Description

fresh skbs have their ip_summed set to CHECKSUM_NONE.Instead of forcing ip_summed to CHECKSUM_NONE, we canuse this helper, to document places where we make this assertion.

boolskb_head_is_locked(conststructsk_buff*skb)

Determine if the skb->head is locked down

Parameters

conststructsk_buff*skb

skb to check

Description

The head on skbs build around a head frag can be removed if they arenot cloned. This function returns true if the skb head is locked downdue to either being allocated via kmalloc, or by being a clone withmultiple references to the head.

structsock_common

minimal network layer representation of sockets

Definition:

struct sock_common {    union {        __addrpair skc_addrpair;        struct {            __be32 skc_daddr;            __be32 skc_rcv_saddr;        };    };    union {        unsigned int    skc_hash;        __u16 skc_u16hashes[2];    };    union {        __portpair skc_portpair;        struct {            __be16 skc_dport;            __u16 skc_num;        };    };    unsigned short          skc_family;    volatile unsigned char  skc_state;    unsigned char           skc_reuse:4;    unsigned char           skc_reuseport:1;    unsigned char           skc_ipv6only:1;    unsigned char           skc_net_refcnt:1;    unsigned char           skc_bypass_prot_mem:1;    int skc_bound_dev_if;    union {        struct hlist_node       skc_bind_node;        struct hlist_node       skc_portaddr_node;    };    struct proto            *skc_prot;    possible_net_t skc_net;#if IS_ENABLED(CONFIG_IPV6);    struct in6_addr         skc_v6_daddr;    struct in6_addr         skc_v6_rcv_saddr;#endif;    atomic64_t skc_cookie;    union {        unsigned long   skc_flags;        struct sock     *skc_listener;        struct inet_timewait_death_row *skc_tw_dr;    };    union {        struct hlist_node       skc_node;        struct hlist_nulls_node skc_nulls_node;    };    unsigned short          skc_tx_queue_mapping;#ifdef CONFIG_SOCK_RX_QUEUE_MAPPING;    unsigned short          skc_rx_queue_mapping;#endif;    union {        int skc_incoming_cpu;        u32 skc_rcv_wnd;        u32 skc_tw_rcv_nxt;    };    refcount_t skc_refcnt;};

Members

{unnamed_union}

anonymous

skc_addrpair

8-byte-aligned __u64unionofskc_daddr &skc_rcv_saddr

{unnamed_struct}

anonymous

skc_daddr

Foreign IPv4 addr

skc_rcv_saddr

Bound local IPv4 addr

{unnamed_union}

anonymous

skc_hash

hash value used with various protocol lookup tables

skc_u16hashes

two u16 hash values used by UDP lookup tables

{unnamed_union}

anonymous

skc_portpair

__u32unionofskc_dport &skc_num

{unnamed_struct}

anonymous

skc_dport

placeholder for inet_dport/tw_dport

skc_num

placeholder for inet_num/tw_num

skc_family

network address family

skc_state

Connection state

skc_reuse

SO_REUSEADDR setting

skc_reuseport

SO_REUSEPORT setting

skc_ipv6only

socket is IPV6 only

skc_net_refcnt

socket is using net ref counting

skc_bypass_prot_mem

bypass the per-protocol memory accounting for skb

skc_bound_dev_if

bound device index if != 0

{unnamed_union}

anonymous

skc_bind_node

bind hash linkage for various protocol lookup tables

skc_portaddr_node

second hash linkage for UDP/UDP-Lite protocol

skc_prot

protocol handlers inside a network family

skc_net

reference to the network namespace of this socket

skc_v6_daddr

IPV6 destination address

skc_v6_rcv_saddr

IPV6 source address

skc_cookie

socket’s cookie value

{unnamed_union}

anonymous

skc_flags

place holder for sk_flagsSO_LINGER (l_onoff),SO_BROADCAST,SO_KEEPALIVE,SO_OOBINLINE settings,SO_TIMESTAMPING settings

skc_listener

connection request listener socket (aka rsk_listener)[unionwithskc_flags]

skc_tw_dr

(aka tw_dr) ptr tostructinet_timewait_death_row[unionwithskc_flags]

{unnamed_union}

anonymous

skc_node

main hash linkage for various protocol lookup tables

skc_nulls_node

main hash linkage for TCP/UDP/UDP-Lite protocol

skc_tx_queue_mapping

tx queue number for this connection

skc_rx_queue_mapping

rx queue number for this connection

{unnamed_union}

anonymous

skc_incoming_cpu

record/match cpu processing incoming packets

skc_rcv_wnd

(aka rsk_rcv_wnd) TCP receive window size (possibly scaled)[unionwithskc_incoming_cpu]

skc_tw_rcv_nxt

(aka tw_rcv_nxt) TCP window next expected seq number[unionwithskc_incoming_cpu]

skc_refcnt

reference count

Description

This is the minimal network layer representation of sockets, the headerforstructsock andstructinet_timewait_sock.

structsock

network layer representation of sockets

Definition:

struct sock {    struct sock_common      __sk_common;#define sk_node                 __sk_common.skc_node;#define sk_nulls_node           __sk_common.skc_nulls_node;#define sk_refcnt               __sk_common.skc_refcnt;#define sk_tx_queue_mapping     __sk_common.skc_tx_queue_mapping;#ifdef CONFIG_SOCK_RX_QUEUE_MAPPING;#define sk_rx_queue_mapping     __sk_common.skc_rx_queue_mapping;#endif;#define sk_dontcopy_begin       __sk_common.skc_dontcopy_begin;#define sk_dontcopy_end         __sk_common.skc_dontcopy_end;#define sk_hash                 __sk_common.skc_hash;#define sk_portpair             __sk_common.skc_portpair;#define sk_num                  __sk_common.skc_num;#define sk_dport                __sk_common.skc_dport;#define sk_addrpair             __sk_common.skc_addrpair;#define sk_daddr                __sk_common.skc_daddr;#define sk_rcv_saddr            __sk_common.skc_rcv_saddr;#define sk_family               __sk_common.skc_family;#define sk_state                __sk_common.skc_state;#define sk_reuse                __sk_common.skc_reuse;#define sk_reuseport            __sk_common.skc_reuseport;#define sk_ipv6only             __sk_common.skc_ipv6only;#define sk_net_refcnt           __sk_common.skc_net_refcnt;#define sk_bypass_prot_mem      __sk_common.skc_bypass_prot_mem;#define sk_bound_dev_if         __sk_common.skc_bound_dev_if;#define sk_bind_node            __sk_common.skc_bind_node;#define sk_prot                 __sk_common.skc_prot;#define sk_net                  __sk_common.skc_net;#define sk_v6_daddr             __sk_common.skc_v6_daddr;#define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr;#define sk_cookie               __sk_common.skc_cookie;#define sk_incoming_cpu         __sk_common.skc_incoming_cpu;#define sk_flags                __sk_common.skc_flags;#define sk_rxhash               __sk_common.skc_rxhash;    atomic_t sk_drops;    __s32 sk_peek_off;    struct sk_buff_head     sk_error_queue;    struct sk_buff_head     sk_receive_queue;    struct {        atomic_t rmem_alloc;        int len;        struct sk_buff  *head;        struct sk_buff  *tail;    } sk_backlog;#define sk_rmem_alloc sk_backlog.rmem_alloc;    struct dst_entry   *sk_rx_dst;    int sk_rx_dst_ifindex;    u32 sk_rx_dst_cookie;#ifdef CONFIG_NET_RX_BUSY_POLL;    unsigned int            sk_ll_usec;    unsigned int            sk_napi_id;    u16 sk_busy_poll_budget;    u8 sk_prefer_busy_poll;#endif;    u8 sk_userlocks;    int sk_rcvbuf;    struct sk_filter   *sk_filter;    union {        struct socket_wq   *sk_wq;    };    void (*sk_data_ready)(struct sock *sk);    long sk_rcvtimeo;    int sk_rcvlowat;    int sk_err;    struct socket           *sk_socket;#ifdef CONFIG_MEMCG;    struct mem_cgroup       *sk_memcg;#endif;#ifdef CONFIG_XFRM;    struct xfrm_policy  *sk_policy[2];#endif;#if IS_ENABLED(CONFIG_INET_PSP);    struct psp_assoc   *psp_assoc;#endif;    socket_lock_t sk_lock;    u32 sk_reserved_mem;    int sk_forward_alloc;    u32 sk_tsflags;    int sk_write_pending;    atomic_t sk_omem_alloc;    int sk_err_soft;    int sk_wmem_queued;    refcount_t sk_wmem_alloc;    unsigned long           sk_tsq_flags;    union {        struct sk_buff  *sk_send_head;        struct rb_root  tcp_rtx_queue;    };    struct sk_buff_head     sk_write_queue;    struct page_frag        sk_frag;    union {        struct timer_list       sk_timer;        struct timer_list       tcp_retransmit_timer;        struct timer_list       mptcp_retransmit_timer;    };    unsigned long           sk_pacing_rate;    atomic_t sk_zckey;    atomic_t sk_tskey;    unsigned long           sk_tx_queue_mapping_jiffies;    u32 sk_dst_pending_confirm;    u32 sk_pacing_status;    unsigned long           sk_max_pacing_rate;    long sk_sndtimeo;    u32 sk_priority;    u32 sk_mark;    kuid_t sk_uid;    u16 sk_protocol;    u16 sk_type;    struct dst_entry   *sk_dst_cache;    netdev_features_t sk_route_caps;#ifdef CONFIG_SOCK_VALIDATE_XMIT;    struct sk_buff*         (*sk_validate_xmit_skb)(struct sock *sk, struct net_device *dev, struct sk_buff *skb);#endif;    u16 sk_gso_type;    u16 sk_gso_max_segs;    unsigned int            sk_gso_max_size;    gfp_t sk_allocation;    u32 sk_txhash;    int sk_sndbuf;    u8 sk_pacing_shift;    bool sk_use_task_frag;    u8 sk_gso_disabled : 1, sk_kern_sock : 1, sk_no_check_tx : 1, sk_no_check_rx : 1;    u8 sk_shutdown;    unsigned long           sk_lingertime;    struct proto            *sk_prot_creator;    rwlock_t sk_callback_lock;    u32 sk_ack_backlog;    u32 sk_max_ack_backlog;    unsigned long           sk_ino;    spinlock_t sk_peer_lock;    int sk_bind_phc;    struct pid              *sk_peer_pid;    const struct cred       *sk_peer_cred;    ktime_t sk_stamp;#if BITS_PER_LONG==32;    seqlock_t sk_stamp_seq;#endif;    int sk_disconnects;    union {        u8 sk_txrehash;        u8 sk_scm_recv_flags;        struct {            u8 sk_scm_credentials : 1, sk_scm_security : 1, sk_scm_pidfd : 1, sk_scm_rights : 1, sk_scm_unused : 4;        };    };    u8 sk_clockid;    u8 sk_txtime_deadline_mode : 1, sk_txtime_report_errors : 1, sk_txtime_unused : 6;#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG));    u8 sk_bpf_cb_flags;    void *sk_user_data;#ifdef CONFIG_SECURITY;    void *sk_security;#endif;    struct sock_cgroup_data sk_cgrp_data;    void (*sk_state_change)(struct sock *sk);    void (*sk_write_space)(struct sock *sk);    void (*sk_error_report)(struct sock *sk);    int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb);    void (*sk_destruct)(struct sock *sk);    struct sock_reuseport      *sk_reuseport_cb;#ifdef CONFIG_BPF_SYSCALL;    struct bpf_local_storage   *sk_bpf_storage;#endif;    struct numa_drop_counters *sk_drop_counters;    struct rcu_head         sk_rcu;    netns_tracker ns_tracker;    struct xarray           sk_user_frags;#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES);    struct module           *sk_owner;#endif;};

Members

__sk_common

shared layout with inet_timewait_sock

sk_drops

raw/udp drops counter

sk_peek_off

current peek_offset value

sk_error_queue

rarely used

sk_receive_queue

incoming packets

sk_backlog

always used with the per-socket spinlock held

sk_rx_dst

receive input route used by early demux

sk_rx_dst_ifindex

ifindex forsk_rx_dst

sk_rx_dst_cookie

cookie forsk_rx_dst

sk_ll_usec

usecs to busypoll when there is no data

sk_napi_id

id of the last napi context to receive data for sk

sk_busy_poll_budget

napi processing budget when busypolling

sk_prefer_busy_poll

prefer busypolling over softirq processing

sk_userlocks

SO_SNDBUF andSO_RCVBUF settings

sk_rcvbuf

size of receive buffer in bytes

sk_filter

socket filtering instructions

{unnamed_union}

anonymous

sk_wq

sock wait queue and async head

sk_data_ready

callback to indicate there is data to be processed

sk_rcvtimeo

SO_RCVTIMEO setting

sk_rcvlowat

SO_RCVLOWAT setting

sk_err

last error

sk_socket

Identd and reporting IO signals

sk_memcg

this socket’s memory cgroup association

sk_policy

flow policy

psp_assoc

PSP association, if socket is PSP-secured

sk_lock

synchronizer

sk_reserved_mem

space reserved and non-reclaimable for the socket

sk_forward_alloc

space allocated forward

sk_tsflags

SO_TIMESTAMPING flags

sk_write_pending

a write to stream socket waits to start

sk_omem_alloc

“o” is “option” or “other”

sk_err_soft

errors that don’t cause failure but are the cause of apersistent failure not just ‘timed out’

sk_wmem_queued

persistent queue size

sk_wmem_alloc

transmit queue bytes committed

sk_tsq_flags

TCP Small Queues flags

{unnamed_union}

anonymous

sk_send_head

front of stuff to transmit

tcp_rtx_queue

TCP re-transmit queue [unionwithsk_send_head]

sk_write_queue

Packet sending queue

sk_frag

cached page frag

{unnamed_union}

anonymous

sk_timer

sock cleanup timer

tcp_retransmit_timer

tcp retransmit timer

mptcp_retransmit_timer

mptcp retransmit timer

sk_pacing_rate

Pacing rate (if supported by transport/packet scheduler)

sk_zckey

counter to order MSG_ZEROCOPY notifications

sk_tskey

counter to disambiguate concurrent tstamp requests

sk_tx_queue_mapping_jiffies

time in jiffies of lastsk_tx_queue_mapping refresh.

sk_dst_pending_confirm

need to confirm neighbour

sk_pacing_status

Pacing status (requested, handled by sch_fq)

sk_max_pacing_rate

Maximum pacing rate (SO_MAX_PACING_RATE)

sk_sndtimeo

SO_SNDTIMEO setting

sk_priority

SO_PRIORITY setting

sk_mark

generic packet mark

sk_uid

user id of owner

sk_protocol

which protocol this socket belongs in this network family

sk_type

socket type (SOCK_STREAM, etc)

sk_dst_cache

destination cache

sk_route_caps

route capabilities (e.g.NETIF_F_TSO)

sk_validate_xmit_skb

ptr to an optional validate function

sk_gso_type

GSO type (e.g.SKB_GSO_TCPV4)

sk_gso_max_segs

Maximum number of GSO segments

sk_gso_max_size

Maximum GSO segment size to build

sk_allocation

allocation mode

sk_txhash

computed flow hash for use on transmit

sk_sndbuf

size of send buffer in bytes

sk_pacing_shift

scaling factor for TCP Small Queues

sk_use_task_frag

allowsk_page_frag() to use current->task_frag.Sockets that can be used under memory reclaim shouldset this to false.

sk_gso_disabled

if set, NETIF_F_GSO_MASK is forbidden.

sk_kern_sock

True if sock is using kernel lock classes

sk_no_check_tx

SO_NO_CHECK setting, set checksum in TX packets

sk_no_check_rx

allow zero checksum in RX packets

sk_shutdown

mask ofSEND_SHUTDOWN and/orRCV_SHUTDOWN

sk_lingertime

SO_LINGER l_linger setting

sk_prot_creator

sk_prot of original sock creator (see ipv6_setsockopt,IPV6_ADDRFORM for instance)

sk_callback_lock

used with the callbacks in the end of this struct

sk_ack_backlog

current listen backlog

sk_max_ack_backlog

listen backlog set inlisten()

sk_ino

inode number (zero if orphaned)

sk_peer_lock

lock protectingsk_peer_pid andsk_peer_cred

sk_bind_phc

SO_TIMESTAMPING bind PHC index of PTP virtual clockfor timestamping

sk_peer_pid

structpid for this socket’s peer

sk_peer_cred

SO_PEERCRED setting

sk_stamp

time stamp of last packet received

sk_stamp_seq

lock for accessing sk_stamp on 32 bit architectures only

sk_disconnects

number of disconnect operations performed on this sock

{unnamed_union}

anonymous

sk_txrehash

enable TX hash rethink

sk_scm_recv_flags

all flags used byscm_recv()

{unnamed_struct}

anonymous

sk_scm_credentials

flagged by SO_PASSCRED to recv SCM_CREDENTIALS

sk_scm_security

flagged by SO_PASSSEC to recv SCM_SECURITY

sk_scm_pidfd

flagged by SO_PASSPIDFD to recv SCM_PIDFD

sk_scm_rights

flagged by SO_PASSRIGHTS to recv SCM_RIGHTS

sk_scm_unused

unused flags forscm_recv()

sk_clockid

clockid used by time-based scheduling (SO_TXTIME)

sk_txtime_deadline_mode

set deadline mode for SO_TXTIME

sk_txtime_report_errors

set report errors mode for SO_TXTIME

sk_txtime_unused

unused txtime flags

sk_bpf_cb_flags

used inbpf_setsockopt()

sk_user_data

RPC layer private data. Write-protected bysk_callback_lock.

sk_security

used by security modules

sk_cgrp_data

cgroup data for this cgroup

sk_state_change

callback to indicate change in the state of the sock

sk_write_space

callback to indicate there is bf sending space available

sk_error_report

callback to indicate errors (e.g.MSG_ERRQUEUE)

sk_backlog_rcv

callback to process the backlog

sk_destruct

called at sock freeing time, i.e. when all refcnt == 0

sk_reuseport_cb

reuseport group container

sk_bpf_storage

ptr to cache and control for bpf_sk_storage

sk_drop_counters

optional pointer to numa_drop_counters

sk_rcu

used during RCU grace period

ns_tracker

tracker for netns reference

sk_user_frags

xarray of pages the user is holding a reference on.

sk_owner

reference to the real owner of the socket that callssock_lock_init_class_and_name().

boolsk_user_data_is_nocopy(conststructsock*sk)

Test if sk_user_data pointer must not be copied

Parameters

conststructsock*sk

socket

void*__locked_read_sk_user_data_with_flags(conststructsock*sk,uintptr_tflags)

return the pointer only if argument flags all has been set in sk_user_data. Otherwise return NULL

Parameters

conststructsock*sk

socket

uintptr_tflags

flag bits

Description

The caller must be holding sk->sk_callback_lock.

void*__rcu_dereference_sk_user_data_with_flags(conststructsock*sk,uintptr_tflags)

return the pointer only if argument flags all has been set in sk_user_data. Otherwise return NULL

Parameters

conststructsock*sk

socket

uintptr_tflags

flag bits

sk_for_each_entry_offset_rcu

sk_for_each_entry_offset_rcu(tpos,pos,head,offset)

iterate over a list at a givenstructoffset

Parameters

tpos

the type * to use as a loop cursor.

pos

thestructhlist_node to use as a loop cursor.

head

the head for your list.

offset

offset of hlist_node within the struct.

SOCK_CONNECT_BIND

SOCK_CONNECT_BIND

sock->sk_userlocks flag for auto-bind atconnect() time

boollock_sock_fast(structsock*sk)

fast version of lock_sock

Parameters

structsock*sk

socket

Description

This version should be used for very small section, where process won’t blockreturn false if fast path is taken:

sk_lock.slock locked, owned = 0, BH disabled

return true if slow path is taken:

sk_lock.slock unlocked, owned = 1, BH enabled

voidunlock_sock_fast(structsock*sk,boolslow)

complement of lock_sock_fast

Parameters

structsock*sk

socket

boolslow

slow mode

Description

fast unlock socket for user context.If slow mode is on, we call regularrelease_sock()

intsk_wmem_alloc_get(conststructsock*sk)

returns write allocations

Parameters

conststructsock*sk

socket

Return

sk_wmem_alloc minus initial offset of one

intsk_rmem_alloc_get(conststructsock*sk)

returns read allocations

Parameters

conststructsock*sk

socket

Return

sk_rmem_alloc

boolsk_has_allocations(conststructsock*sk)

check if allocations are outstanding

Parameters

conststructsock*sk

socket

Return

true if socket has write or read allocations

boolskwq_has_sleeper(structsocket_wq*wq)

check if there are any waiting processes

Parameters

structsocket_wq*wq

structsocket_wq

Return

true if socket_wq has waiting processes

Description

The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memorybarrier call. They were added due to the race found within the tcp code.

Consider following tcp code paths:

CPU1                CPU2sys_select          receive packet...                 ...__add_wait_queue    update tp->rcv_nxt...                 ...tp->rcv_nxt check   sock_def_readable...                 {schedule               rcu_read_lock();                       wq = rcu_dereference(sk->sk_wq);                       if (wq && waitqueue_active(&wq->wait))                           wake_up_interruptible(&wq->wait)                       ...                    }

The race for tcp fires when the __add_wait_queue changes done by CPU1 stayin its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1could then endup calling schedule and sleep forever if there are no moredata on the socket.

voidsock_poll_wait(structfile*filp,structsocket*sock,poll_table*p)

wrapper for the poll_wait call.

Parameters

structfile*filp

file

structsocket*sock

socket to wait on

poll_table*p

poll_table

Description

See the comments in the wq_has_sleeper function.

structpage_frag*sk_page_frag(structsock*sk)

return an appropriate page_frag

Parameters

structsock*sk

socket

Description

Use the per task page_frag instead of the per socket one foroptimization when we know that we’re in process context and owneverything that’s associated withcurrent.

Both direct reclaim and page faults can nest inside othersocket operations and end up recursing intosk_page_frag()while it’s already in use: explicitly avoid task page_fragwhen users disable sk_use_task_frag.

Return

a per task page_frag if context allows that,otherwise a per socket one.

void_sock_tx_timestamp(structsock*sk,conststructsockcm_cookie*sockc,__u8*tx_flags,__u32*tskey)

checks whether the outgoing packet is to be time stamped

Parameters

structsock*sk

socket sending this packet

conststructsockcm_cookie*sockc

pointer to socket cmsg cookie to get timestamping info

__u8*tx_flags

completed with instructions for time stamping

__u32*tskey

filled in with next sk_tskey (not for TCP, which uses seqno)

Note

callers should take care of initial*tx_flags value (usually 0)

voidsk_eat_skb(structsock*sk,structsk_buff*skb)

Release a skb if it is no longer needed

Parameters

structsock*sk

socket to eat this skb from

structsk_buff*skb

socket buffer to eat

Description

This routine must be called with interrupts disabled or with the socketlocked so that the sk_buff queue operation is ok.

structfile*sock_alloc_file(structsocket*sock,intflags,constchar*dname)

Bind asocket to afile

Parameters

structsocket*sock

socket

intflags

file status flags

constchar*dname

protocol name

Description

Returns thefile bound withsock, implicitly storing itin sock->file. If dname isNULL, sets to “”.

On failuresock is released, and an ERR pointer is returned.

This function uses GFP_KERNEL internally.

structsocket*sock_from_file(structfile*file)

Return thesocket bounded tofile.

Parameters

structfile*file

file

Description

On failure returnsNULL.

structsocket*sockfd_lookup(intfd,int*err)

Go from a file number to its socket slot

Parameters

intfd

file handle

int*err

pointer to an error code return

Description

The file handle passed in is locked and the socket it is boundto is returned. If an error occurs the err pointer is overwrittenwith a negative errno code and NULL is returned. The function checksfor both invalid handles and passing a handle which is not a socket.

On a success the socket object pointer is returned.

structsocket*sock_alloc(void)

allocate a socket

Parameters

void

no arguments

Description

Allocate a new inode and socket object. The two are bound togetherand initialised. The socket is then returned. If we are out of inodesNULL is returned. This functions uses GFP_KERNEL internally.

voidsock_release(structsocket*sock)

close a socket

Parameters

structsocket*sock

socket to close

Description

The socket is released from the protocol stack if it has a releasecallback, and the inode is then released if the socket is bound toan inode not a file.

intsock_sendmsg(structsocket*sock,structmsghdr*msg)

send a message throughsock

Parameters

structsocket*sock

socket

structmsghdr*msg

message to send

Description

Sendsmsg throughsock, passing through LSM.Returns the number of bytes sent, or an error code.

intkernel_sendmsg(structsocket*sock,structmsghdr*msg,structkvec*vec,size_tnum,size_tsize)

send a message throughsock (kernel-space)

Parameters

structsocket*sock

socket

structmsghdr*msg

message header

structkvec*vec

kernel vec

size_tnum

vec array length

size_tsize

total message data size

Description

Builds the message data withvec and sends it throughsock.Returns the number of bytes sent, or an error code.

intsock_recvmsg(structsocket*sock,structmsghdr*msg,intflags)

receive a message fromsock

Parameters

structsocket*sock

socket

structmsghdr*msg

message to receive

intflags

message flags

Description

Receivesmsg fromsock, passing through LSM. Returns the total numberof bytes received, or an error.

intkernel_recvmsg(structsocket*sock,structmsghdr*msg,structkvec*vec,size_tnum,size_tsize,intflags)

Receive a message from a socket (kernel space)

Parameters

structsocket*sock

The socket to receive the message from

structmsghdr*msg

Received message

structkvec*vec

Input s/g array for message data

size_tnum

Size of input s/g array

size_tsize

Number of bytes to read

intflags

Message flags (MSG_DONTWAIT, etc...)

Description

On return the msg structure contains the scatter/gather array passed in thevec argument. The array is modified so that it consists of the unfilledportion of the original array.

The returned value is the total number of bytes received, or an error.

intsock_create_lite(intfamily,inttype,intprotocol,structsocket**res)

creates a socket

Parameters

intfamily

protocol family (AF_INET, ...)

inttype

communication type (SOCK_STREAM, ...)

intprotocol

protocol (0, ...)

structsocket**res

new socket

Description

Creates a new socket and assigns it tores, passing through LSM.The new socket initialization is not complete, seekernel_accept().Returns 0 or an error. On failureres is set toNULL.This function internally uses GFP_KERNEL.

int__sock_create(structnet*net,intfamily,inttype,intprotocol,structsocket**res,intkern)

creates a socket

Parameters

structnet*net

net namespace

intfamily

protocol family (AF_INET, ...)

inttype

communication type (SOCK_STREAM, ...)

intprotocol

protocol (0, ...)

structsocket**res

new socket

intkern

boolean for kernel space sockets

Description

Creates a new socket and assigns it tores, passing through LSM.Returns 0 or an error. On failureres is set toNULL.kern mustbe set to true if the socket resides in kernel space.This function internally uses GFP_KERNEL.

intsock_create(intfamily,inttype,intprotocol,structsocket**res)

creates a socket

Parameters

intfamily

protocol family (AF_INET, ...)

inttype

communication type (SOCK_STREAM, ...)

intprotocol

protocol (0, ...)

structsocket**res

new socket

Description

A wrapper around__sock_create().Returns 0 or an error. This function internally uses GFP_KERNEL.

intsock_create_kern(structnet*net,intfamily,inttype,intprotocol,structsocket**res)

creates a socket (kernel space)

Parameters

structnet*net

net namespace

intfamily

protocol family (AF_INET, ...)

inttype

communication type (SOCK_STREAM, ...)

intprotocol

protocol (0, ...)

structsocket**res

new socket

Description

A wrapper around__sock_create().Returns 0 or an error. This function internally uses GFP_KERNEL.

intsock_register(conststructnet_proto_family*ops)

add a socket protocol handler

Parameters

conststructnet_proto_family*ops

description of protocol

Description

This function is called by a protocol handler that wants toadvertise its address family, and have it linked into thesocket interface. The value ops->family corresponds to thesocket system call protocol family.

voidsock_unregister(intfamily)

remove a protocol handler

Parameters

intfamily

protocol family to remove

Description

This function is called by a protocol handler that wants toremove its address family, and have it unlinked from thenew socket creation.

If protocol handler is a module, then it can use module referencecounts to protect against new references. If protocol handler is nota module then it needs to provide its own protection inthe ops->create routine.

intkernel_bind(structsocket*sock,structsockaddr_unsized*addr,intaddrlen)

bind an address to a socket (kernel space)

Parameters

structsocket*sock

socket

structsockaddr_unsized*addr

address

intaddrlen

length of address

Description

Returns 0 or an error.

intkernel_listen(structsocket*sock,intbacklog)

move socket to listening state (kernel space)

Parameters

structsocket*sock

socket

intbacklog

pending connections queue size

Description

Returns 0 or an error.

intkernel_accept(structsocket*sock,structsocket**newsock,intflags)

accept a connection (kernel space)

Parameters

structsocket*sock

listening socket

structsocket**newsock

new connected socket

intflags

flags

Description

flags must be SOCK_CLOEXEC, SOCK_NONBLOCK or 0.If it fails,newsock is guaranteed to beNULL.Returns 0 or an error.

intkernel_connect(structsocket*sock,structsockaddr_unsized*addr,intaddrlen,intflags)

connect a socket (kernel space)

Parameters

structsocket*sock

socket

structsockaddr_unsized*addr

address

intaddrlen

address length

intflags

flags (O_NONBLOCK, ...)

Description

For datagram sockets,addr is the address to which datagrams are sentby default, and the only address from which datagrams are received.For stream sockets, attempts to connect toaddr.Returns 0 or an error code.

intkernel_getsockname(structsocket*sock,structsockaddr*addr)

get the address which the socket is bound (kernel space)

Parameters

structsocket*sock

socket

structsockaddr*addr

address holder

Description

Fills theaddr pointer with the address which the socket is bound.Returns the length of the address in bytes or an error code.

intkernel_getpeername(structsocket*sock,structsockaddr*addr)

get the address which the socket is connected (kernel space)

Parameters

structsocket*sock

socket

structsockaddr*addr

address holder

Description

Fills theaddr pointer with the address which the socket is connected.Returns the length of the address in bytes or an error code.

intkernel_sock_shutdown(structsocket*sock,enumsock_shutdown_cmdhow)

shut down part of a full-duplex connection (kernel space)

Parameters

structsocket*sock

socket

enumsock_shutdown_cmdhow

connection part

Description

Returns 0 or an error.

u32kernel_sock_ip_overhead(structsock*sk)

returns the IP overhead imposed by a socket

Parameters

structsock*sk

socket

Description

This routine returns the IP overhead imposed by a socket i.e.the length of the underlying IP header, depending on whetherthis is an IPv4 or IPv6 socket and the length from IP options turnedon at the socket. Assumes that the caller has a lock on the socket.

voiddrop_reasons_register_subsys(enumskb_drop_reason_subsyssubsys,conststructdrop_reason_list*list)

register another drop reason subsystem

Parameters

enumskb_drop_reason_subsyssubsys

the subsystem to register, must not be the core

conststructdrop_reason_list*list

the list of drop reasons within the subsystem, must point toa statically initialized list

voiddrop_reasons_unregister_subsys(enumskb_drop_reason_subsyssubsys)

unregister a drop reason subsystem

Parameters

enumskb_drop_reason_subsyssubsys

the subsystem to remove, must not be the core

Note

This willsynchronize_rcu() to ensure no users when it returns.

u32napi_skb_cache_get_bulk(void**skbs,u32n)

obtain a number of zeroed skb heads from the cache

Parameters

void**skbs

pointer to an at leastn-sized array to fill with skb pointers

u32n

number of entries to provide

Description

Tries to obtainnsk_buff entries from the NAPI percpu cache and writesthe pointers into the provided arrayskbs. If there are less entriesavailable, tries to replenish the cache and bulk-allocates the diff fromthe MM layer if needed.The heads are being zeroed with eithermemset() or__GFP_ZERO, so they areready for {,__}build_skb_around() and don’t have any data buffers attached.Must be calledonly from the BH context.

Return

number of successfully allocated skbs (n if no actual allocationneeded orkmem_cache_alloc_bulk() didn’t fail).

structsk_buff*build_skb_around(structsk_buff*skb,void*data,unsignedintfrag_size)

build a network buffer around provided skb

Parameters

structsk_buff*skb

sk_buff provide by caller, must be memset cleared

void*data

data buffer provided by caller

unsignedintfrag_size

size of data

structsk_buff*napi_build_skb(void*data,unsignedintfrag_size)

build a network buffer

Parameters

void*data

data buffer provided by caller

unsignedintfrag_size

size of data

Description

Version of__napi_build_skb() that takes care of skb->head_fragand skb->pfmemalloc when the data is a page or page fragment.

Returns a newsk_buff on success,NULL on allocation failure.

structsk_buff*__alloc_skb(unsignedintsize,gfp_tgfp_mask,intflags,intnode)

allocate a network buffer

Parameters

unsignedintsize

size to allocate

gfp_tgfp_mask

allocation mask

intflags

If SKB_ALLOC_FCLONE is set, allocate from fclone cacheinstead of head cache and allocate a cloned (child) skb.If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used forallocations in case the data is required for writeback

intnode

numa node to allocate memory on

Description

Allocate a newsk_buff. The returned buffer has no headroom and atail room of at least size bytes. The object has a reference countof one. The return is the buffer. On a failure the return isNULL.

Buffers may only be allocated from interrupts using agfp_mask ofGFP_ATOMIC.

structsk_buff*__netdev_alloc_skb(structnet_device*dev,unsignedintlen,gfp_tgfp_mask)

allocate an skbuff for rx on a specific device

Parameters

structnet_device*dev

network device to receive on

unsignedintlen

length to allocate

gfp_tgfp_mask

get_free_pages mask, passed to alloc_skb

Description

Allocate a newsk_buff and assign it a usage count of one. Thebuffer has NET_SKB_PAD headroom built in. Users should allocatethe headroom they think they need without accounting for thebuilt in space. The built in space is used for optimisations.

NULL is returned if there is no free memory.

structsk_buff*napi_alloc_skb(structnapi_struct*napi,unsignedintlen)

allocate skbuff for rx in a specific NAPI instance

Parameters

structnapi_struct*napi

napi instance this buffer was allocated for

unsignedintlen

length to allocate

Description

Allocate a new sk_buff for use in NAPI receive. This buffer willattempt to allocate the head from a special reserved region usedonly for NAPI Rx allocation. By doing this we can save severalCPU cycles by avoiding having to disable and re-enable IRQs.

NULL is returned if there is no free memory.

void__kfree_skb(structsk_buff*skb)

private function

Parameters

structsk_buff*skb

buffer

Description

Free an sk_buff. Release anything attached to the buffer.Clean the state. This is an internal helper function. Users shouldalways call kfree_skb

void__fix_addresssk_skb_reason_drop(structsock*sk,structsk_buff*skb,enumskb_drop_reasonreason)

free an sk_buff with special reason

Parameters

structsock*sk

the socket to receiveskb, or NULL if not applicable

structsk_buff*skb

buffer to free

enumskb_drop_reasonreason

reason why this skb is dropped

Description

Drop a reference to the buffer and free it if the usage count has hitzero. Meanwhile, pass the receiving socket and drop reason to‘kfree_skb’ tracepoint.

voidskb_tx_error(structsk_buff*skb)

report an sk_buff xmit error

Parameters

structsk_buff*skb

buffer that triggered an error

Description

Report xmit error if a device callback is tracking this skb.skb must be freed afterwards.

voidconsume_skb(structsk_buff*skb)

free an skbuff

Parameters

structsk_buff*skb

buffer to free

Description

Drop a ref to the buffer and free it if the usage count has hit zeroFunctions identically to kfree_skb, but kfree_skb assumes that the frameis being dropped after a failure and notes that

structsk_buff*alloc_skb_for_msg(structsk_buff*first)

allocate sk_buff to wrap frag list forming a msg

Parameters

structsk_buff*first

first sk_buff of the msg

structsk_buff*skb_morph(structsk_buff*dst,structsk_buff*src)

morph one skb into another

Parameters

structsk_buff*dst

the skb to receive the contents

structsk_buff*src

the skb to supply the contents

Description

This is identical to skb_clone except that the target skb issupplied by the user.

The target skb is returned upon exit.

intskb_copy_ubufs(structsk_buff*skb,gfp_tgfp_mask)

copy userspace skb frags buffers to kernel

Parameters

structsk_buff*skb

the skb to modify

gfp_tgfp_mask

allocation priority

Description

This must be called on skb with SKBFL_ZEROCOPY_ENABLE.It will copy all frags into kernel and drop the referenceto userspace pages.

If this function is called from an interruptgfp_mask() must beGFP_ATOMIC.

Returns 0 on success or a negative error code on failureto allocate kernel memory to copy to.

structsk_buff*skb_clone(structsk_buff*skb,gfp_tgfp_mask)

duplicate an sk_buff

Parameters

structsk_buff*skb

buffer to clone

gfp_tgfp_mask

allocation priority

Description

Duplicate ansk_buff. The new one is not owned by a socket. Bothcopies share the same packet data but not structure. The newbuffer has a reference count of 1. If the allocation fails thefunction returnsNULL otherwise the new buffer is returned.

If this function is called from an interruptgfp_mask() must beGFP_ATOMIC.

structsk_buff*skb_copy(conststructsk_buff*skb,gfp_tgfp_mask)

create private copy of an sk_buff

Parameters

conststructsk_buff*skb

buffer to copy

gfp_tgfp_mask

allocation priority

Description

Make a copy of both ansk_buff and its data. This is used when thecaller wishes to modify the data and needs a private copy of thedata to alter. ReturnsNULL on failure or the pointer to the bufferon success. The returned buffer has a reference count of 1.

As by-product this function converts non-linearsk_buff to linearone, so thatsk_buff becomes completely private and caller is allowedto modify all the data of returned buffer. This means that thisfunction is not recommended for use in circumstances when onlyheader is going to be modified. Usepskb_copy() instead.

structsk_buff*__pskb_copy_fclone(structsk_buff*skb,intheadroom,gfp_tgfp_mask,boolfclone)

create copy of an sk_buff with private head.

Parameters

structsk_buff*skb

buffer to copy

intheadroom

headroom of new skb

gfp_tgfp_mask

allocation priority

boolfclone

if true allocate the copy of the skb from the fclonecache instead of the head cache; it is recommended to set thisto true for the cases where the copy will likely be cloned

Description

Make a copy of both ansk_buff and part of its data, locatedin header. Fragmented data remain shared. This is used whenthe caller wishes to modify only header ofsk_buff and needsprivate copy of the header to alter. ReturnsNULL on failureor the pointer to the buffer on success.The returned buffer has a reference count of 1.

intpskb_expand_head(structsk_buff*skb,intnhead,intntail,gfp_tgfp_mask)

reallocate header ofsk_buff

Parameters

structsk_buff*skb

buffer to reallocate

intnhead

room to add at head

intntail

room to add at tail

gfp_tgfp_mask

allocation priority

Description

Expands (or creates identical copy, ifnhead andntail are zero)header ofskb.sk_buff itself is not changed.sk_buff MUST havereference count of 1. Returns zero in the case of success or error,if expansion failed. In the last case,sk_buff is not changed.

All the pointers pointing into skb header may change and must bereloaded after call to this function.

Note

If you skb_push() the start of the buffer after reallocating the

header, callskb_postpush_data_move() first to move the metadata out ofthe way before writing tosk_buff->data.

structsk_buff*skb_expand_head(structsk_buff*skb,unsignedintheadroom)

reallocate header ofsk_buff

Parameters

structsk_buff*skb

buffer to reallocate

unsignedintheadroom

needed headroom

Description

Unlike skb_realloc_headroom, this one does not allocate a new skbif possible; copies skb->sk to new skb as neededand frees original skb in case of failures.

It expect increased headroom and generates warning otherwise.

structsk_buff*skb_copy_expand(conststructsk_buff*skb,intnewheadroom,intnewtailroom,gfp_tgfp_mask)

copy and expand sk_buff

Parameters

conststructsk_buff*skb

buffer to copy

intnewheadroom

new free bytes at head

intnewtailroom

new free bytes at tail

gfp_tgfp_mask

allocation priority

Description

Make a copy of both ansk_buff and its data and while doing soallocate additional space.

This is used when the caller wishes to modify the data and needs aprivate copy of the data to alter as well as more space for new fields.ReturnsNULL on failure or the pointer to the bufferon success. The returned buffer has a reference count of 1.

You must passGFP_ATOMIC as the allocation priority if this functionis called from an interrupt.

int__skb_pad(structsk_buff*skb,intpad,boolfree_on_error)

zero pad the tail of an skb

Parameters

structsk_buff*skb

buffer to pad

intpad

space to pad

boolfree_on_error

free buffer on error

Description

Ensure that a buffer is followed by a padding area that is zerofilled. Used by network drivers which may DMA or transfer databeyond the buffer end onto the wire.

May return error in out of memory cases. The skb is freed on erroriffree_on_error is true.

void*pskb_put(structsk_buff*skb,structsk_buff*tail,intlen)

add data to the tail of a potentially fragmented buffer

Parameters

structsk_buff*skb

start of the buffer to use

structsk_buff*tail

tail fragment of the buffer to use

intlen

amount of data to add

Description

This function extends the used data area of the potentiallyfragmented buffer.tail must be the last fragment ofskb -- orskb itself. If this would exceed the total buffer size the kernelwill panic. A pointer to the first byte of the extra data isreturned.

void*skb_put(structsk_buff*skb,unsignedintlen)

add data to a buffer

Parameters

structsk_buff*skb

buffer to use

unsignedintlen

amount of data to add

Description

This function extends the used data area of the buffer. If this wouldexceed the total buffer size the kernel will panic. A pointer to thefirst byte of the extra data is returned.

void*skb_push(structsk_buff*skb,unsignedintlen)

add data to the start of a buffer

Parameters

structsk_buff*skb

buffer to use

unsignedintlen

amount of data to add

Description

This function extends the used data area of the buffer at the bufferstart. If this would exceed the total buffer headroom the kernel willpanic. A pointer to the first byte of the extra data is returned.

void*skb_pull(structsk_buff*skb,unsignedintlen)

remove data from the start of a buffer

Parameters

structsk_buff*skb

buffer to use

unsignedintlen

amount of data to remove

Description

This function removes data from the start of a buffer, returningthe memory to the headroom. A pointer to the next data in the bufferis returned. Once the data has been pulled future pushes will overwritethe old data.

void*skb_pull_data(structsk_buff*skb,size_tlen)

remove data from the start of a buffer returning its original position.

Parameters

structsk_buff*skb

buffer to use

size_tlen

amount of data to remove

Description

This function removes data from the start of a buffer, returningthe memory to the headroom. A pointer to the original data in the bufferis returned after checking if there is enough data to pull. Once thedata has been pulled future pushes will overwrite the old data.

voidskb_trim(structsk_buff*skb,unsignedintlen)

remove end from a buffer

Parameters

structsk_buff*skb

buffer to alter

unsignedintlen

new length

Description

Cut the length of a buffer down by removing data from the tail. Ifthe buffer is already under the length specified it is not modified.The skb must be linear.

void*__pskb_pull_tail(structsk_buff*skb,intdelta)

advance tail of skb header

Parameters

structsk_buff*skb

buffer to reallocate

intdelta

number of bytes to advance tail

Description

The function makes a sense only on a fragmentedsk_buff,it expands header moving its tail forward and copying necessarydata from fragmented part.

sk_buff MUST have reference count of 1.

ReturnsNULL (andsk_buff does not change) if pull failedor value of new tail of skb in the case of success.

All the pointers pointing into skb header may change and must bereloaded after call to this function.

intskb_copy_bits(conststructsk_buff*skb,intoffset,void*to,intlen)

copy bits from skb to kernel buffer

Parameters

conststructsk_buff*skb

source skb

intoffset

offset in source

void*to

destination buffer

intlen

number of bytes to copy

Description

Copy the specified number of bytes from the source skb to thedestination buffer.

CAUTION ! :

If its prototype is ever changed,check arch/{*}/net/{*}.S files,since it is called from BPF assembly code.

intskb_store_bits(structsk_buff*skb,intoffset,constvoid*from,intlen)

store bits from kernel buffer to skb

Parameters

structsk_buff*skb

destination buffer

intoffset

offset in destination

constvoid*from

source buffer

intlen

number of bytes to copy

Description

Copy the specified number of bytes from the source buffer to thedestination skb. This function handles all the messy bits oftraversing fragment lists and such.

intskb_zerocopy(structsk_buff*to,structsk_buff*from,intlen,inthlen)

Zero copy skb to skb

Parameters

structsk_buff*to

destination buffer

structsk_buff*from

source buffer

intlen

number of bytes to copy from source buffer

inthlen

size of linear headroom in destination buffer

Description

Copies up tolen bytes fromfrom toto by creating referencesto the frags in the source buffer.

Thehlen as calculated byskb_zerocopy_headlen() specifies theheadroom in theto buffer.

Return value:0: everything is OK-ENOMEM: couldn’t orphan frags offrom due to lack of memory-EFAULT:skb_copy_bits() found some problem with skb geometry

structsk_buff*skb_dequeue(structsk_buff_head*list)

remove from the head of the queue

Parameters

structsk_buff_head*list

list to dequeue from

Description

Remove the head of the list. The list lock is taken so the functionmay be used safely with other locking list functions. The head item isreturned orNULL if the list is empty.

structsk_buff*skb_dequeue_tail(structsk_buff_head*list)

remove from the tail of the queue

Parameters

structsk_buff_head*list

list to dequeue from

Description

Remove the tail of the list. The list lock is taken so the functionmay be used safely with other locking list functions. The tail item isreturned orNULL if the list is empty.

voidskb_queue_purge_reason(structsk_buff_head*list,enumskb_drop_reasonreason)

empty a list

Parameters

structsk_buff_head*list

list to empty

enumskb_drop_reasonreason

drop reason

Description

Delete all buffers on ansk_buff list. Each buffer is removed fromthe list and one reference dropped. This function takes the listlock and is atomic with respect to other list locking functions.

voidskb_queue_head(structsk_buff_head*list,structsk_buff*newsk)

queue a buffer at the list head

Parameters

structsk_buff_head*list

list to use

structsk_buff*newsk

buffer to queue

Description

Queue a buffer at the start of the list. This function takes thelist lock and can be used safely with other lockingsk_buff functionssafely.

A buffer cannot be placed on two lists at the same time.

voidskb_queue_tail(structsk_buff_head*list,structsk_buff*newsk)

queue a buffer at the list tail

Parameters

structsk_buff_head*list

list to use

structsk_buff*newsk

buffer to queue

Description

Queue a buffer at the tail of the list. This function takes thelist lock and can be used safely with other lockingsk_buff functionssafely.

A buffer cannot be placed on two lists at the same time.

voidskb_unlink(structsk_buff*skb,structsk_buff_head*list)

remove a buffer from a list

Parameters

structsk_buff*skb

buffer to remove

structsk_buff_head*list

list to use

Description

Remove a packet from a list. The list locks are taken and thisfunction is atomic with respect to other list locked calls

You must know what list the SKB is on.

voidskb_append(structsk_buff*old,structsk_buff*newsk,structsk_buff_head*list)

append a buffer

Parameters

structsk_buff*old

buffer to insert after

structsk_buff*newsk

buffer to insert

structsk_buff_head*list

list to use

Description

Place a packet after a given packet in a list. The list locks are takenand this function is atomic with respect to other list locked calls.A buffer cannot be placed on two lists at the same time.

voidskb_split(structsk_buff*skb,structsk_buff*skb1,constu32len)

Split fragmented skb to two parts at length len.

Parameters

structsk_buff*skb

the buffer to split

structsk_buff*skb1

the buffer to receive the second part

constu32len

new length for skb

voidskb_prepare_seq_read(structsk_buff*skb,unsignedintfrom,unsignedintto,structskb_seq_state*st)

Prepare a sequential read of skb data

Parameters

structsk_buff*skb

the buffer to read

unsignedintfrom

lower offset of data to be read

unsignedintto

upper offset of data to be read

structskb_seq_state*st

state variable

Description

Initializes the specified state variable. Must be called beforeinvokingskb_seq_read() for the first time.

unsignedintskb_seq_read(unsignedintconsumed,constu8**data,structskb_seq_state*st)

Sequentially read skb data

Parameters

unsignedintconsumed

number of bytes consumed by the caller so far

constu8**data

destination pointer for data to be returned

structskb_seq_state*st

state variable

Description

Reads a block of skb data atconsumed relative to thelower offset specified toskb_prepare_seq_read(). Assignsthe head of the data block todata and returns the lengthof the block or 0 if the end of the skb data or the upperoffset has been reached.

The caller is not required to consume all of the datareturned, i.e.consumed is typically set to the numberof bytes already consumed and the next call toskb_seq_read() will return the remaining part of the block.

Note 1: The size of each block of data returned can be arbitrary,

this limitation is the cost for zerocopy sequentialreads of potentially non linear data.

Note 2: Fragment lists within fragments are not implemented

at the moment, state->root_skb could be replaced witha stack for this purpose.

voidskb_abort_seq_read(structskb_seq_state*st)

Abort a sequential read of skb data

Parameters

structskb_seq_state*st

state variable

Description

Must be called ifskb_seq_read() was not called until itreturned 0.

intskb_copy_seq_read(structskb_seq_state*st,intoffset,void*to,intlen)

copy from a skb_seq_state to a buffer

Parameters

structskb_seq_state*st

source skb_seq_state

intoffset

offset in source

void*to

destination buffer

intlen

number of bytes to copy

Description

Copylen bytes fromoffset bytes into the sourcest to the destinationbufferto.offset should increase (or be unchanged) with each subsequentcall to this function. If offset needs to decrease from the previous usestshould be reset first.

Return

0 on success or -EINVAL if the copy ended early

unsignedintskb_find_text(structsk_buff*skb,unsignedintfrom,unsignedintto,structts_config*config)

Find a text pattern in skb data

Parameters

structsk_buff*skb

the buffer to look in

unsignedintfrom

search offset

unsignedintto

search limit

structts_config*config

textsearch configuration

Description

Finds a pattern in the skb data according to the specifiedtextsearch configuration. Usetextsearch_next() to retrievesubsequent occurrences of the pattern. Returns the offsetto the first occurrence or UINT_MAX if no match was found.

void*skb_pull_rcsum(structsk_buff*skb,unsignedintlen)

pull skb and update receive checksum

Parameters

structsk_buff*skb

buffer to update

unsignedintlen

length of data pulled

Description

This function performs an skb_pull on the packet and updatesthe CHECKSUM_COMPLETE checksum. It should be used onreceive path processing instead of skb_pull unless you knowthat the checksum difference is zero (e.g., a valid IP header)or you are setting ip_summed to CHECKSUM_NONE.

structsk_buff*skb_segment(structsk_buff*head_skb,netdev_features_tfeatures)

Perform protocol segmentation on skb.

Parameters

structsk_buff*head_skb

buffer to segment

netdev_features_tfeatures

features for the output path (see dev->features)

Description

This function performs segmentation on the given skb. It returnsa pointer to the first in a list of new skbs for the segments.In case of error it returns ERR_PTR(err).

intskb_to_sgvec(structsk_buff*skb,structscatterlist*sg,intoffset,intlen)

Fill a scatter-gather list from a socket buffer

Parameters

structsk_buff*skb

Socket buffer containing the buffers to be mapped

structscatterlist*sg

The scatter-gather list to map into

intoffset

The offset into the buffer’s contents to start mapping

intlen

Length of buffer space to be mapped

Description

Fill the specified scatter-gather list with mappings/pointers into aregion of the buffer space attached to a socket buffer. Returns eitherthe number of scatterlist items used, or -EMSGSIZE if the contentscould not fit.

intskb_cow_data(structsk_buff*skb,inttailbits,structsk_buff**trailer)

Check that a socket buffer’s data buffers are writable

Parameters

structsk_buff*skb

The socket buffer to check.

inttailbits

Amount of trailing space to be added

structsk_buff**trailer

Returned pointer to the skb where thetailbits space begins

Description

Make sure that the data buffers attached to a socket buffer arewritable. If they are not, private copies are made of the data buffersand the socket buffer is set to use these instead.

Iftailbits is given, make sure that there is space to writetailbitsbytes of data beyond current end of socket buffer.trailer will beset to point to the skb in which this space begins.

The number of scatterlist elements required to completely map theCOW’d and extended socket buffer will be returned.

structsk_buff*skb_clone_sk(structsk_buff*skb)

create clone of skb, and take reference to socket

Parameters

structsk_buff*skb

the skb to clone

Description

This function creates a clone of a buffer that holds a reference onsk_refcnt. Buffers created via this function are meant to bereturned using sock_queue_err_skb, or free via kfree_skb.

When passing buffers allocated with this function to sock_queue_err_skbit is necessary to wrap the call with sock_hold/sock_put in order toprevent the socket from being released prior to being enqueued onthe sk_error_queue.

boolskb_partial_csum_set(structsk_buff*skb,u16start,u16off)

set up and verify partial csum values for packet

Parameters

structsk_buff*skb

the skb to set

u16start

the number of bytes after skb->data to start checksumming.

u16off

the offset from start to place the checksum.

Description

For untrusted partially-checksummed packets, we need to make sure the valuesfor skb->csum_start and skb->csum_offset are valid so we don’t oops.

This function checks and sets those values and skb->ip_summed: if thisreturns false you should drop the packet.

intskb_checksum_setup(structsk_buff*skb,boolrecalculate)

set up partial checksum offset

Parameters

structsk_buff*skb

the skb to set up

boolrecalculate

if true the pseudo-header checksum will be recalculated

structsk_buff*skb_checksum_trimmed(structsk_buff*skb,unsignedinttransport_len,__sum16(*skb_chkf)(structsk_buff*skb))

validate checksum of an skb

Parameters

structsk_buff*skb

the skb to check

unsignedinttransport_len

the data length beyond the network header

__sum16(*skb_chkf)(structsk_buff*skb)

checksum function to use

Description

Applies the given checksum function skb_chkf to the provided skb.Returns a checked and maybe trimmed skb. Returns NULL on error.

If the skb has data beyond the given transport length, then atrimmed & cloned skb is checked and returned.

Caller needs to set the skb transport header and free any returned skb if itdiffers from the provided skb.

boolskb_try_coalesce(structsk_buff*to,structsk_buff*from,bool*fragstolen,int*delta_truesize)

try to merge skb to prior one

Parameters

structsk_buff*to

prior buffer

structsk_buff*from

buffer to add

bool*fragstolen

pointer to boolean

int*delta_truesize

how much more was allocated than was requested

voidskb_scrub_packet(structsk_buff*skb,boolxnet)

scrub an skb

Parameters

structsk_buff*skb

buffer to clean

boolxnet

packet is crossing netns

Description

skb_scrub_packet can be used after encapsulating or decapsulating a packetinto/from a tunnel. Some information have to be cleared during theseoperations.skb_scrub_packet can also be used to clean a skb before injecting it inanother namespace (xnet == true). We have to clear all information in theskb that could impact namespace isolation.

intskb_eth_pop(structsk_buff*skb)

Drop the Ethernet header at the head of a packet

Parameters

structsk_buff*skb

Socket buffer to modify

Description

Drop the Ethernet header ofskb.

Expects that skb->data points to the mac header and that no VLAN tags arepresent.

Returns 0 on success, -errno otherwise.

intskb_eth_push(structsk_buff*skb,constunsignedchar*dst,constunsignedchar*src)

Add a new Ethernet header at the head of a packet

Parameters

structsk_buff*skb

Socket buffer to modify

constunsignedchar*dst

Destination MAC address of the new header

constunsignedchar*src

Source MAC address of the new header

Description

Prependskb with a new Ethernet header.

Expects that skb->data points to the mac header, which must be empty.

Returns 0 on success, -errno otherwise.

intskb_mpls_push(structsk_buff*skb,__be32mpls_lse,__be16mpls_proto,intmac_len,boolethernet)

push a new MPLS header after mac_len bytes from start of the packet

Parameters

structsk_buff*skb

buffer

__be32mpls_lse

MPLS label stack entry to push

__be16mpls_proto

ethertype of the new MPLS header (expects 0x8847 or 0x8848)

intmac_len

length of the MAC header

boolethernet

flag to indicate if the resulting packet after skb_mpls_push isethernet

Description

Expects skb->data at mac header.

Returns 0 on success, -errno otherwise.

intskb_mpls_pop(structsk_buff*skb,__be16next_proto,intmac_len,boolethernet)

pop the outermost MPLS header

Parameters

structsk_buff*skb

buffer

__be16next_proto

ethertype of header after popped MPLS header

intmac_len

length of the MAC header

boolethernet

flag to indicate if the packet is ethernet

Description

Expects skb->data at mac header.

Returns 0 on success, -errno otherwise.

intskb_mpls_update_lse(structsk_buff*skb,__be32mpls_lse)

modify outermost MPLS header and update csum

Parameters

structsk_buff*skb

buffer

__be32mpls_lse

new MPLS label stack entry to update to

Description

Expects skb->data at mac header.

Returns 0 on success, -errno otherwise.

intskb_mpls_dec_ttl(structsk_buff*skb)

decrement the TTL of the outermost MPLS header

Parameters

structsk_buff*skb

buffer

Description

Expects skb->data at mac header.

Returns 0 on success, -errno otherwise.

structsk_buff*alloc_skb_with_frags(unsignedlongheader_len,unsignedlongdata_len,intorder,int*errcode,gfp_tgfp_mask)

allocate skb with page frags

Parameters

unsignedlongheader_len

size of linear part

unsignedlongdata_len

needed length in frags

intorder

max page order desired.

int*errcode

pointer to error code if any

gfp_tgfp_mask

allocation mask

Description

This can be used to allocate a paged skb, given a maximal order for frags.

voidskb_condense(structsk_buff*skb)

try to get rid of fragments/frag_list if possible

Parameters

structsk_buff*skb

buffer

Description

Can be used to save memory before skb is added to a busy queue.If packet has bytes in frags and enough tail room in skb->head,pull all of them, so that we can free the frags right now and adjusttruesize.

Notes

We do not reallocate skb->head thus can not fail.Caller must re-evaluate skb->truesize if needed.

void*__skb_ext_set(structsk_buff*skb,enumskb_ext_idid,structskb_ext*ext)

attach the specified extension storage to this skb

Parameters

structsk_buff*skb

buffer

enumskb_ext_idid

extension id

structskb_ext*ext

extension storage previously allocated via__skb_ext_alloc()

Description

Existing extensions, if any, are cleared.

Returns the pointer to the extension.

void*skb_ext_add(structsk_buff*skb,enumskb_ext_idid)

allocate space for given extension, COW if needed

Parameters

structsk_buff*skb

buffer

enumskb_ext_idid

extension to allocate space for

Description

Allocates enough space for the given extension.If the extension is already present, a pointer to that extensionis returned.

If the skb was cloned, COW applies and the returned memory can bemodified without changing the extension space of clones buffers.

Returns pointer to the extension or NULL on allocation failure.

ssize_tskb_splice_from_iter(structsk_buff*skb,structiov_iter*iter,ssize_tmaxsize)

Splice (or copy) pages to skbuff

Parameters

structsk_buff*skb

The buffer to add pages to

structiov_iter*iter

Iterator representing the pages to be added

ssize_tmaxsize

Maximum amount of pages to be added

Description

This is a common helper function for supporting MSG_SPLICE_PAGES. Itextracts pages from an iterator and adds them to the socket buffer ifpossible, copying them to fragments if not possible (such as if they’re slabpages).

Returns the amount of data spliced/copied or -EMSGSIZE if there’sinsufficient space in the buffer to transfer anything.

boolsk_ns_capable(conststructsock*sk,structuser_namespace*user_ns,intcap)

General socket capability test

Parameters

conststructsock*sk

Socket to use a capability on or through

structuser_namespace*user_ns

The user namespace of the capability to use

intcap

The capability to use

Description

Test to see if the opener of the socket had when the socket wascreated and the current process has the capabilitycap in the usernamespaceuser_ns.

boolsk_capable(conststructsock*sk,intcap)

Socket global capability test

Parameters

conststructsock*sk

Socket to use a capability on or through

intcap

The global capability to use

Description

Test to see if the opener of the socket had when the socket wascreated and the current process has the capabilitycap in all usernamespaces.

boolsk_net_capable(conststructsock*sk,intcap)

Network namespace socket capability test

Parameters

conststructsock*sk

Socket to use a capability on or through

intcap

The capability to use

Description

Test to see if the opener of the socket had when the socket was createdand the current process has the capabilitycap over the network namespacethe socket is a member of.

voidsk_set_memalloc(structsock*sk)

setsSOCK_MEMALLOC

Parameters

structsock*sk

socket to set it on

Description

SetSOCK_MEMALLOC on a socket for access to emergency reserves.It’s the responsibility of the admin to adjust min_free_kbytesto meet the requirements

structsock*sk_alloc(structnet*net,intfamily,gfp_tpriority,structproto*prot,intkern)

All socket objects are allocated here

Parameters

structnet*net

the applicable net namespace

intfamily

protocol family

gfp_tpriority

for allocation (GFP_KERNEL,GFP_ATOMIC, etc)

structproto*prot

structproto associated with this new sock instance

intkern

is this to be a kernel socket?

structsock*sk_clone(conststructsock*sk,constgfp_tpriority,boollock)

clone a socket

Parameters

conststructsock*sk

the socket to clone

constgfp_tpriority

for allocation (GFP_KERNEL,GFP_ATOMIC, etc)

boollock

if true, lock the cloned sk

Description

Iflock is true, the clone is locked bybh_lock_sock(), andcaller must unlock socket even in error path bybh_unlock_sock().

boolskb_page_frag_refill(unsignedintsz,structpage_frag*pfrag,gfp_tgfp)

check that a page_frag contains enough room

Parameters

unsignedintsz

minimum size of the fragment we want to get

structpage_frag*pfrag

pointer to page_frag

gfp_tgfp

priority for memory allocation

Note

While this allocator tries to use high order pages, there isno guarantee that allocations succeed. Therefore,sz MUST beless or equal than PAGE_SIZE.

intsk_wait_data(structsock*sk,long*timeo,conststructsk_buff*skb)

wait for data to arrive at sk_receive_queue

Parameters

structsock*sk

sock to wait on

long*timeo

for how long

conststructsk_buff*skb

last skb seen on sk_receive_queue

Description

Now socket state including sk->sk_err is changed only under lock,hence we may omit checks after joining wait queue.We check receive queue beforeschedule() only as optimization;it is very likely thatrelease_sock() added new data.

int__sk_mem_schedule(structsock*sk,intsize,intkind)

increase sk_forward_alloc and memory_allocated

Parameters

structsock*sk

socket

intsize

memory size to allocate

intkind

allocation type

Description

If kind is SK_MEM_SEND, it means wmem allocation. Otherwise it meansrmem allocation. This function assumes that protocols which havememory_pressure use sk_wmem_queued as write buffer accounting.

void__sk_mem_reclaim(structsock*sk,intamount)

reclaim sk_forward_alloc and memory_allocated

Parameters

structsock*sk

socket

intamount

number of bytes (rounded down to a PAGE_SIZE multiple)

structsk_buff*__skb_try_recv_datagram(structsock*sk,structsk_buff_head*queue,unsignedintflags,int*off,int*err,structsk_buff**last)

Receive a datagram skbuff

Parameters

structsock*sk

socket

structsk_buff_head*queue

socket queue from which to receive

unsignedintflags

MSG_ flags

int*off

an offset in bytes to peek skb from. Returns an offsetwithin an skb where data actually starts

int*err

error code returned

structsk_buff**last

set to last peeked message to inform the wait functionwhat to look for when peeking

Description

Get a datagram skbuff, understands the peeking, nonblocking wakeupsand possible races. This replaces identical code in packet, raw andudp, as well as the IPX AX.25 and Appletalk. It also finally fixesthe long standing peek and read race for datagram sockets. If youalter this routine remember it must be re-entrant.

This function will lock the socket if a skb is returned, sothe caller needs to unlock the socket in that case (usually bycalling skb_free_datagram). Returns NULL witherr set to-EAGAIN if no data was available or to some other value if anerror was detected.

  • It does not lock socket since today. This function is

  • free of race conditions. This measure should/can improve

  • significantly datagram socket latencies at high loads,

  • when data copying to user space takes lots of time.

  • (BTW I’ve just killed the lastcli() in IP/IPv6/core/netlink/packet

    1. Great win.)

  • --ANK (980729)

The order of the tests when we find no data waiting are specifiedquite explicitly by POSIX 1003.1g, don’t change them without havingthe standard around please.

intskb_kill_datagram(structsock*sk,structsk_buff*skb,unsignedintflags)

Free a datagram skbuff forcibly

Parameters

structsock*sk

socket

structsk_buff*skb

datagram skbuff

unsignedintflags

MSG_ flags

Description

This function frees a datagram skbuff that was received byskb_recv_datagram. The flags argument must match the oneused for skb_recv_datagram.

If the MSG_PEEK flag is set, and the packet is still on thereceive queue of the socket, it will be taken off the queuebefore it is freed.

This function currently only disables BH when acquiring thesk_receive_queue lock. Therefore it must not be used in acontext where that lock is acquired in an IRQ context.

It returns 0 if the packet was removed by us.

intskb_copy_and_crc32c_datagram_iter(conststructsk_buff*skb,intoffset,structiov_iter*to,intlen,u32*crcp)

Copy datagram to an iovec iterator and update a CRC32C value.

Parameters

conststructsk_buff*skb

buffer to copy

intoffset

offset in the buffer to start copying from

structiov_iter*to

iovec iterator to copy to

intlen

amount of data to copy from buffer to iovec

u32*crcp

pointer to CRC32C value to update

Return

0 on success, -EFAULT if there was a fault during copy.

intskb_copy_datagram_iter(conststructsk_buff*skb,intoffset,structiov_iter*to,intlen)

Copy a datagram to an iovec iterator.

Parameters

conststructsk_buff*skb

buffer to copy

intoffset

offset in the buffer to start copying from

structiov_iter*to

iovec iterator to copy to

intlen

amount of data to copy from buffer to iovec

intskb_copy_datagram_from_iter(structsk_buff*skb,intoffset,structiov_iter*from,intlen)

Copy a datagram from an iov_iter.

Parameters

structsk_buff*skb

buffer to copy

intoffset

offset in the buffer to start copying to

structiov_iter*from

the copy source

intlen

amount of data to copy to buffer from iovec

Description

Returns 0 or -EFAULT.

intzerocopy_sg_from_iter(structsk_buff*skb,structiov_iter*from)

Build a zerocopy datagram from an iov_iter

Parameters

structsk_buff*skb

buffer to copy

structiov_iter*from

the source to copy from

Description

The function will first copy up to headlen, and then pin the userspacepages and build frags through them.

Returns 0, -EFAULT or -EMSGSIZE.

intskb_copy_and_csum_datagram_msg(structsk_buff*skb,inthlen,structmsghdr*msg)

Copy and checksum skb to user iovec.

Parameters

structsk_buff*skb

skbuff

inthlen

hardware length

structmsghdr*msg

destination

Description

Caller _must_ check that skb will fit to this iovec.

Return

0 - success.-EINVAL - checksum failure.-EFAULT - fault during copy.

__poll_tdatagram_poll_queue(structfile*file,structsocket*sock,poll_table*wait,structsk_buff_head*rcv_queue)

same as datagram_poll, but on a specific receive queue

Parameters

structfile*file

file struct

structsocket*sock

socket

poll_table*wait

poll table

structsk_buff_head*rcv_queue

receive queue to poll

Description

Performs polling on the given receive queue, handling shutdown, error,and connection state. This is useful for protocols that deliveruserspace-bound packets through a custom queue instead ofsk->sk_receive_queue.

Return

poll bitmask indicating the socket’s current state

__poll_tdatagram_poll(structfile*file,structsocket*sock,poll_table*wait)

generic datagram poll

Parameters

structfile*file

file struct

structsocket*sock

socket

poll_table*wait

poll table

Description

Datagram poll: Again totally generic. This also handlessequenced packet sockets providing the socket receive queueis only ever holding data ready to receive.

Note

when youdon’t use this routine for this protocol,

and you use a different write policy fromsock_writeable()then please supply your own write_space callback.

Return

poll bitmask indicating the socket’s current state

intsk_stream_wait_connect(structsock*sk,long*timeo_p)

Wait for a socket to get into the connected state

Parameters

structsock*sk

sock to wait on

long*timeo_p

for how long to wait

Description

Must be called with the socket locked.

intsk_stream_wait_memory(structsock*sk,long*timeo_p)

Wait for more memory for a socket

Parameters

structsock*sk

socket to wait for memory

long*timeo_p

for how long

Socket Filter

intsk_filter_trim_cap(structsock*sk,structsk_buff*skb,unsignedintcap,enumskb_drop_reason*reason)

run a packet through a socket filter

Parameters

structsock*sk

sock associated withsk_buff

structsk_buff*skb

buffer to filter

unsignedintcap

limit on how short the eBPF program may trim the packet

enumskb_drop_reason*reason

record drop reason on errors (negative return value)

Description

Run the eBPF program and then cut skb->data to correct size returned bythe program. If pkt_len is 0 we toss packet. If skb->len is smallerthan pkt_len we keep whole skb->data. This is the socket levelwrapper to bpf_prog_run. It returns 0 if the packet shouldbe accepted or -EPERM if the packet should be tossed.

intbpf_prog_create(structbpf_prog**pfp,structsock_fprog_kern*fprog)

create an unattached filter

Parameters

structbpf_prog**pfp

the unattached filter that is created

structsock_fprog_kern*fprog

the filter program

Description

Create a filter independent of any socket. We first run somesanity checks on it to make sure it does not explode on us later.If an error occurs or there is insufficient memory for the filtera negative errno code is returned. On success the return is zero.

intbpf_prog_create_from_user(structbpf_prog**pfp,structsock_fprog*fprog,bpf_aux_classic_check_ttrans,boolsave_orig)

create an unattached filter from user buffer

Parameters

structbpf_prog**pfp

the unattached filter that is created

structsock_fprog*fprog

the filter program

bpf_aux_classic_check_ttrans

post-classic verifier transformation handler

boolsave_orig

save classic BPF program

Description

This function effectively does the same asbpf_prog_create(), onlythat it builds up its insns buffer from user space provided buffer.It also allows for passing a bpf_aux_classic_check_t handler.

intsk_attach_filter(structsock_fprog*fprog,structsock*sk)

attach a socket filter

Parameters

structsock_fprog*fprog

the filter program

structsock*sk

the socket to use

Description

Attach the user’s filter code. We first run some sanity checks onit to make sure it does not explode on us later. If an erroroccurs or there is insufficient memory for the filter a negativeerrno code is returned. On success the return is zero.

Generic Network Statistics

structgnet_stats_basic

byte/packet throughput statistics

Definition:

struct gnet_stats_basic {    __u64 bytes;    __u32 packets;};

Members

bytes

number of seen bytes

packets

number of seen packets

structgnet_stats_rate_est

rate estimator

Definition:

struct gnet_stats_rate_est {    __u32 bps;    __u32 pps;};

Members

bps

current byte rate

pps

current packet rate

structgnet_stats_rate_est64

rate estimator

Definition:

struct gnet_stats_rate_est64 {    __u64 bps;    __u64 pps;};

Members

bps

current byte rate

pps

current packet rate

structgnet_stats_queue

queuing statistics

Definition:

struct gnet_stats_queue {    __u32 qlen;    __u32 backlog;    __u32 drops;    __u32 requeues;    __u32 overlimits;};

Members

qlen

queue length

backlog

backlog size of queue

drops

number of dropped packets

requeues

number of requeues

overlimits

number of enqueues over the limit

structgnet_estimator

rate estimator configuration

Definition:

struct gnet_estimator {    signed char     interval;    unsigned char   ewma_log;};

Members

interval

sampling period

ewma_log

the log of measurement window weight

intgnet_stats_start_copy_compat(structsk_buff*skb,inttype,inttc_stats_type,intxstats_type,spinlock_t*lock,structgnet_dump*d,intpadattr)

start dumping procedure in compatibility mode

Parameters

structsk_buff*skb

socket buffer to put statistics TLVs into

inttype

TLV type for top level statistic TLV

inttc_stats_type

TLV type for backward compatibilitystructtc_stats TLV

intxstats_type

TLV type for backward compatibility xstats TLV

spinlock_t*lock

statistics lock

structgnet_dump*d

dumping handle

intpadattr

padding attribute

Description

Initializes the dumping handle, grabs the statistic lock and appendsan empty TLV header to the socket buffer for use a container for allother statistic TLVS.

The dumping handle is marked to be in backward compatibility mode tellingallgnet_stats_copy_XXX() functions to fill a local copy ofstructtc_stats.

Returns 0 on success or -1 if the room in the socket buffer was not sufficient.

intgnet_stats_start_copy(structsk_buff*skb,inttype,spinlock_t*lock,structgnet_dump*d,intpadattr)

start dumping procedure in compatibility mode

Parameters

structsk_buff*skb

socket buffer to put statistics TLVs into

inttype

TLV type for top level statistic TLV

spinlock_t*lock

statistics lock

structgnet_dump*d

dumping handle

intpadattr

padding attribute

Description

Initializes the dumping handle, grabs the statistic lock and appendsan empty TLV header to the socket buffer for use a container for allother statistic TLVS.

Returns 0 on success or -1 if the room in the socket buffer was not sufficient.

intgnet_stats_copy_basic(structgnet_dump*d,structgnet_stats_basic_sync__percpu*cpu,structgnet_stats_basic_sync*b,boolrunning)

copy basic statistics into statistic TLV

Parameters

structgnet_dump*d

dumping handle

structgnet_stats_basic_sync__percpu*cpu

copy statistic per cpu

structgnet_stats_basic_sync*b

basic statistics

boolrunning

true ifb represents a running qdisc, thusb’sinternal values might change during basic reads.Only used ifcpu is NULL

Context

task; must not be run from IRQ or BH contexts

Description

Appends the basic statistics to the top level TLV created bygnet_stats_start_copy().

Returns 0 on success or -1 with the statistic lock releasedif the room in the socket buffer was not sufficient.

intgnet_stats_copy_basic_hw(structgnet_dump*d,structgnet_stats_basic_sync__percpu*cpu,structgnet_stats_basic_sync*b,boolrunning)

copy basic hw statistics into statistic TLV

Parameters

structgnet_dump*d

dumping handle

structgnet_stats_basic_sync__percpu*cpu

copy statistic per cpu

structgnet_stats_basic_sync*b

basic statistics

boolrunning

true ifb represents a running qdisc, thusb’sinternal values might change during basic reads.Only used ifcpu is NULL

Context

task; must not be run from IRQ or BH contexts

Description

Appends the basic statistics to the top level TLV created bygnet_stats_start_copy().

Returns 0 on success or -1 with the statistic lock releasedif the room in the socket buffer was not sufficient.

intgnet_stats_copy_rate_est(structgnet_dump*d,structnet_rate_estimator__rcu**rate_est)

copy rate estimator statistics into statistics TLV

Parameters

structgnet_dump*d

dumping handle

structnet_rate_estimator__rcu**rate_est

rate estimator

Description

Appends the rate estimator statistics to the top level TLV created bygnet_stats_start_copy().

Returns 0 on success or -1 with the statistic lock releasedif the room in the socket buffer was not sufficient.

intgnet_stats_copy_queue(structgnet_dump*d,structgnet_stats_queue__percpu*cpu_q,structgnet_stats_queue*q,__u32qlen)

copy queue statistics into statistics TLV

Parameters

structgnet_dump*d

dumping handle

structgnet_stats_queue__percpu*cpu_q

per cpu queue statistics

structgnet_stats_queue*q

queue statistics

__u32qlen

queue length statistics

Description

Appends the queue statistics to the top level TLV created bygnet_stats_start_copy(). Using per cpu queue statistics ifthey are available.

Returns 0 on success or -1 with the statistic lock releasedif the room in the socket buffer was not sufficient.

intgnet_stats_copy_app(structgnet_dump*d,void*st,intlen)

copy application specific statistics into statistics TLV

Parameters

structgnet_dump*d

dumping handle

void*st

application specific statistics data

intlen

length of data

Description

Appends the application specific statistics to the top level TLV created bygnet_stats_start_copy() and remembers the data for XSTATS if the dumpinghandle is in backward compatibility mode.

Returns 0 on success or -1 with the statistic lock releasedif the room in the socket buffer was not sufficient.

intgnet_stats_finish_copy(structgnet_dump*d)

finish dumping procedure

Parameters

structgnet_dump*d

dumping handle

Description

Corrects the length of the top level TLV to include all TLVs addedbygnet_stats_copy_XXX() calls. Adds the backward compatibility TLVsifgnet_stats_start_copy_compat() was used and releases the statisticslock.

Returns 0 on success or -1 with the statistic lock releasedif the room in the socket buffer was not sufficient.

intgen_new_estimator(structgnet_stats_basic_sync*bstats,structgnet_stats_basic_sync__percpu*cpu_bstats,structnet_rate_estimator__rcu**rate_est,spinlock_t*lock,boolrunning,structnlattr*opt)

create a new rate estimator

Parameters

structgnet_stats_basic_sync*bstats

basic statistics

structgnet_stats_basic_sync__percpu*cpu_bstats

bstats per cpu

structnet_rate_estimator__rcu**rate_est

rate estimator statistics

spinlock_t*lock

lock for statistics and control path

boolrunning

true ifbstats represents a running qdisc, thusbstats’internal values might change during basic reads. Only usedifbstats_cpu is NULL

structnlattr*opt

rate estimator configuration TLV

Description

Creates a new rate estimator withbstats as source andrate_estas destination. A new timer with the interval specified in theconfiguration TLV is created. Upon each interval, the latest statisticswill be read frombstats and the estimated rate will be stored inrate_est with the statistics lock grabbed during this period.

Returns 0 on success or a negative error code.

voidgen_kill_estimator(structnet_rate_estimator__rcu**rate_est)

remove a rate estimator

Parameters

structnet_rate_estimator__rcu**rate_est

rate estimator

Description

Removes the rate estimator.

intgen_replace_estimator(structgnet_stats_basic_sync*bstats,structgnet_stats_basic_sync__percpu*cpu_bstats,structnet_rate_estimator__rcu**rate_est,spinlock_t*lock,boolrunning,structnlattr*opt)

replace rate estimator configuration

Parameters

structgnet_stats_basic_sync*bstats

basic statistics

structgnet_stats_basic_sync__percpu*cpu_bstats

bstats per cpu

structnet_rate_estimator__rcu**rate_est

rate estimator statistics

spinlock_t*lock

lock for statistics and control path

boolrunning

true ifbstats represents a running qdisc, thusbstats’internal values might change during basic reads. Only usedifcpu_bstats is NULL

structnlattr*opt

rate estimator configuration TLV

Description

Replaces the configuration of a rate estimator by callinggen_kill_estimator() andgen_new_estimator().

Returns 0 on success or a negative error code.

boolgen_estimator_active(structnet_rate_estimator__rcu**rate_est)

test if estimator is currently in use

Parameters

structnet_rate_estimator__rcu**rate_est

rate estimator

Description

Returns true if estimator is active, and false if not.

SUN RPC subsystem

__be32*xdr_encode_opaque_fixed(__be32*p,constvoid*ptr,unsignedintnbytes)

Encode fixed length opaque data

Parameters

__be32*p

pointer to current position in XDR buffer.

constvoid*ptr

pointer to data to encode (or NULL)

unsignedintnbytes

size of data.

Description

Copy the array of data of length nbytes at ptr to the XDR bufferat position p, then align to the next 32-bit boundary by paddingwith zero bytes (see RFC1832).

Note

if ptr is NULL, only the padding is performed.

Returns the updated current XDR buffer position

__be32*xdr_encode_opaque(__be32*p,constvoid*ptr,unsignedintnbytes)

Encode variable length opaque data

Parameters

__be32*p

pointer to current position in XDR buffer.

constvoid*ptr

pointer to data to encode (or NULL)

unsignedintnbytes

size of data.

Description

Returns the updated current XDR buffer position

voidxdr_terminate_string(conststructxdr_buf*buf,constu32len)

‘0’-terminate a string residing in an xdr_buf

Parameters

conststructxdr_buf*buf

XDR buffer where string resides

constu32len

length of string, in bytes

unsignedintxdr_buf_to_bvec(structbio_vec*bvec,unsignedintbvec_size,conststructxdr_buf*xdr)

Copy components of an xdr_buf into a bio_vec array

Parameters

structbio_vec*bvec

bio_vec array to populate

unsignedintbvec_size

element count ofbio_vec

conststructxdr_buf*xdr

xdr_buf to be copied

Description

Returns the number of entries consumed inbvec.

voidxdr_inline_pages(structxdr_buf*xdr,unsignedintoffset,structpage**pages,unsignedintbase,unsignedintlen)

Prepare receive buffer for a large reply

Parameters

structxdr_buf*xdr

xdr_buf into which reply will be placed

unsignedintoffset

expected offset where data payload will start, in bytes

structpage**pages

vector ofstructpage pointers

unsignedintbase

offset in first page where receive should start, in bytes

unsignedintlen

expected size of the upper layer data payload, in bytes

void_copy_from_pages(char*p,structpage**pages,size_tpgbase,size_tlen)

Parameters

char*p

pointer to destination

structpage**pages

array of pages

size_tpgbase

offset of source data

size_tlen

length

Description

Copies data into an arbitrary memory location from an array of pagesThe copy is assumed to be non-overlapping.

unsignedintxdr_stream_pos(conststructxdr_stream*xdr)

Return the current offset from the start of the xdr_stream

Parameters

conststructxdr_stream*xdr

pointer tostructxdr_stream

unsignedintxdr_page_pos(conststructxdr_stream*xdr)

Return the current offset from the start of the xdr pages

Parameters

conststructxdr_stream*xdr

pointer tostructxdr_stream

voidxdr_init_encode(structxdr_stream*xdr,structxdr_buf*buf,__be32*p,structrpc_rqst*rqst)

Initialize astructxdr_stream for sending data.

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

structxdr_buf*buf

pointer to XDR buffer in which to encode data

__be32*p

current pointer inside XDR buffer

structrpc_rqst*rqst

pointer to controlling rpc_rqst, for debugging

Note

at the moment the RPC client only passes the length of our

scratch buffer in the xdr_buf’s header kvec. Previously thismeant we needed to callxdr_adjust_iovec() after encoding thedata. With the new scheme, the xdr_stream manages the detailsof the buffer length, and takes care of adjusting the kveclength for us.

voidxdr_init_encode_pages(structxdr_stream*xdr,structxdr_buf*buf)

Initialize an xdr_stream for encoding into pages

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

structxdr_buf*buf

pointer to XDR buffer into which to encode data

void__xdr_commit_encode(structxdr_stream*xdr)

Ensure all data is written to buffer

Parameters

structxdr_stream*xdr

pointer to xdr_stream

Description

We handle encoding across page boundaries by giving the caller atemporary location to write to, then later copying the data intoplace; xdr_commit_encode does that copying.

Normally the caller doesn’t need to call this directly, as thefollowing xdr_reserve_space will do it. But an explicit call may berequired at the end of encoding, or any other time when the xdr_bufdata might be read.

__be32*xdr_reserve_space(structxdr_stream*xdr,size_tnbytes)

Reserve buffer space for sending

Parameters

structxdr_stream*xdr

pointer to xdr_stream

size_tnbytes

number of bytes to reserve

Description

Checks that we have enough buffer space to encode ‘nbytes’ morebytes of data. If so, update the total xdr_buf length, andadjust the length of the current kvec.

The returned pointer is valid only until the next call toxdr_reserve_space() orxdr_commit_encode() onxdr. The currentimplementation of this API guarantees that space reserved for afour-byte data item remains valid untilxdr is destroyed, butthat might not always be true in the future.

intxdr_reserve_space_vec(structxdr_stream*xdr,size_tnbytes)

Reserves a large amount of buffer space for sending

Parameters

structxdr_stream*xdr

pointer to xdr_stream

size_tnbytes

number of bytes to reserve

Description

The size argument passed toxdr_reserve_space() is determined basedon the number of bytes remaining in the current page to avoidinvalidating iov_base pointers whenxdr_commit_encode() is called.

Return values:

0: success-EMSGSIZE: not enough space is available inxdr

voidxdr_truncate_encode(structxdr_stream*xdr,size_tlen)

truncate an encode buffer

Parameters

structxdr_stream*xdr

pointer to xdr_stream

size_tlen

new length of buffer

Description

Truncates the xdr stream, so that xdr->buf->len == len,and xdr->p points at offset len from the start of the buffer, andhead, tail, and page lengths are adjusted to correspond.

If this means moving xdr->p to a different buffer, we assume thatthe end pointer should be set to the end of the current page,except in the case of the head buffer when we assume the headbuffer’s current length represents the end of the available buffer.

This isnot safe to use on a buffer that already has inlined pagecache pages (as in a zero-copy server read reply), except for thesimple case of truncating from one position in the tail to another.

voidxdr_truncate_decode(structxdr_stream*xdr,size_tlen)

Truncate a decoding stream

Parameters

structxdr_stream*xdr

pointer tostructxdr_stream

size_tlen

Number of bytes to remove

intxdr_restrict_buflen(structxdr_stream*xdr,intnewbuflen)

decrease available buffer space

Parameters

structxdr_stream*xdr

pointer to xdr_stream

intnewbuflen

new maximum number of bytes available

Description

Adjust our idea of how much space is available in the buffer.If we’ve already used too much space in the buffer, returns -1.If the available space is already smaller than newbuflen, returns 0and does nothing. Otherwise, adjusts xdr->buf->buflen to newbuflenand ensures xdr->end is set at most offset newbuflen from the startof the buffer.

voidxdr_write_pages(structxdr_stream*xdr,structpage**pages,unsignedintbase,unsignedintlen)

Insert a list of pages into an XDR buffer for sending

Parameters

structxdr_stream*xdr

pointer to xdr_stream

structpage**pages

array of pages to insert

unsignedintbase

starting offset of first data byte inpages

unsignedintlen

number of data bytes inpages to insert

Description

After thepages are added, the tail iovec is instantiated pointing toend of the head buffer, and the stream is set up to encode subsequentitems into the tail.

voidxdr_init_decode(structxdr_stream*xdr,structxdr_buf*buf,__be32*p,structrpc_rqst*rqst)

Initialize an xdr_stream for decoding data.

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

structxdr_buf*buf

pointer to XDR buffer from which to decode data

__be32*p

current pointer inside XDR buffer

structrpc_rqst*rqst

pointer to controlling rpc_rqst, for debugging

voidxdr_init_decode_pages(structxdr_stream*xdr,structxdr_buf*buf,structpage**pages,unsignedintlen)

Initialize an xdr_stream for decoding into pages

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

structxdr_buf*buf

pointer to XDR buffer from which to decode data

structpage**pages

list of pages to decode into

unsignedintlen

length in bytes of buffer in pages

voidxdr_finish_decode(structxdr_stream*xdr)

Clean up the xdr_stream after decoding data.

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

__be32*xdr_inline_decode(structxdr_stream*xdr,size_tnbytes)

Retrieve XDR data to decode

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

size_tnbytes

number of bytes of data to decode

Description

Check if the input buffer is long enough to enable us to decode‘nbytes’ more bytes of data starting at the current position.If so return the current pointer, then update the currentpointer position.

unsignedintxdr_read_pages(structxdr_stream*xdr,unsignedintlen)

align page-based XDR data to current pointer position

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

unsignedintlen

number of bytes of page data

Description

Moves data beyond the current pointer position from the XDR head[] bufferinto the page list. Any data that lies beyond current position +lenbytes is moved into the XDR tail[]. The xdr_stream current position isthen advanced past that data to align to the next XDR object in the tail.

Returns the number of XDR encoded bytes now contained in the pages

voidxdr_set_pagelen(structxdr_stream*xdr,unsignedintlen)

Sets the length of the XDR pages

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

unsignedintlen

new length of the XDR page data

Description

Either grows or shrinks the length of the xdr pages by setting pagelen tolen bytes. When shrinking, any extra data is moved into buf->tail, whereaswhen growing any data beyond the current pointer is moved into the tail.

Returns True if the operation was successful, and False otherwise.

voidxdr_enter_page(structxdr_stream*xdr,unsignedintlen)

decode data from the XDR page

Parameters

structxdr_stream*xdr

pointer to xdr_stream struct

unsignedintlen

number of bytes of page data

Description

Moves data beyond the current pointer position from the XDR head[] bufferinto the page list. Any data that lies beyond current position + “len”bytes is moved into the XDR tail[]. The current pointer is thenrepositioned at the beginning of the first XDR page.

intxdr_buf_subsegment(conststructxdr_buf*buf,structxdr_buf*subbuf,unsignedintbase,unsignedintlen)

set subbuf to a portion of buf

Parameters

conststructxdr_buf*buf

an xdr buffer

structxdr_buf*subbuf

the result buffer

unsignedintbase

beginning of range in bytes

unsignedintlen

length of range in bytes

Description

setssubbuf to an xdr buffer representing the portion ofbuf oflengthlen starting at offsetbase.

buf andsubbuf may be pointers to the samestructxdr_buf.

Returns -1 if base or length are out of bounds.

boolxdr_stream_subsegment(structxdr_stream*xdr,structxdr_buf*subbuf,unsignedintnbytes)

setsubbuf to a portion ofxdr

Parameters

structxdr_stream*xdr

an xdr_stream set up for decoding

structxdr_buf*subbuf

the result buffer

unsignedintnbytes

length ofxdr to extract, in bytes

Description

Sets upsubbuf to represent a portion ofxdr. The portionstarts at the current offset inxdr, and extends for a lengthofnbytes. If this is successful,xdr is advanced to the nextXDR data item following that portion.

Return values:

true:subbuf has been initialized, andxdr has been advanced.false: a bounds error has occurred

unsignedintxdr_stream_move_subsegment(structxdr_stream*xdr,unsignedintoffset,unsignedinttarget,unsignedintlength)

Move part of a stream to another position

Parameters

structxdr_stream*xdr

the source xdr_stream

unsignedintoffset

the source offset of the segment

unsignedinttarget

the target offset of the segment

unsignedintlength

the number of bytes to move

Description

Moveslength bytes fromoffset totarget in the xdr_stream, overwritinganything in its space. Returns the number of bytes in the segment.

unsignedintxdr_stream_zero(structxdr_stream*xdr,unsignedintoffset,unsignedintlength)

zero out a portion of an xdr_stream

Parameters

structxdr_stream*xdr

an xdr_stream to zero out

unsignedintoffset

the starting point in the stream

unsignedintlength

the number of bytes to zero

voidxdr_buf_trim(structxdr_buf*buf,unsignedintlen)

lop at most “len” bytes off the end of “buf”

Parameters

structxdr_buf*buf

buf to be trimmed

unsignedintlen

number of bytes to reduce “buf” by

Description

Trim an xdr_buf by the given number of bytes by fixing up the lengths. Notethat it’s possible that we’ll trim less than that amount if the xdr_buf istoo small, or if (for instance) it’s all in the head and the parser hasalready read too far into it.

ssize_txdr_stream_decode_string_dup(structxdr_stream*xdr,char**str,size_tmaxlen,gfp_tgfp_flags)

Decode and duplicate variable length string

Parameters

structxdr_stream*xdr

pointer to xdr_stream

char**str

location to store pointer to string

size_tmaxlen

maximum acceptable string length

gfp_tgfp_flags

GFP mask to use

Description

Return values:

On success, returns length of NUL-terminated string stored in*ptr-EBADMSG on XDR buffer overflow-EMSGSIZE if the size of the string would exceedmaxlen-ENOMEM on memory allocation failure

ssize_txdr_stream_decode_opaque_auth(structxdr_stream*xdr,u32*flavor,void**body,unsignedint*body_len)

Decodestructopaque_auth (RFC5531 S8.2)

Parameters

structxdr_stream*xdr

pointer to xdr_stream

u32*flavor

location to store decoded flavor

void**body

location to store decode body

unsignedint*body_len

location to store length of decoded body

Description

Return values:

On success, returns the number of buffer bytes consumed-EBADMSG on XDR buffer overflow-EMSGSIZE if the decoded size of the body field exceeds 400 octets

ssize_txdr_stream_encode_opaque_auth(structxdr_stream*xdr,u32flavor,void*body,unsignedintbody_len)

Encodestructopaque_auth (RFC5531 S8.2)

Parameters

structxdr_stream*xdr

pointer to xdr_stream

u32flavor

verifier flavor to encode

void*body

content of body to encode

unsignedintbody_len

length of body to encode

Description

Return values:

On success, returns length in bytes of XDR buffer consumed-EBADMSG on XDR buffer overflow-EMSGSIZE if the size ofbody exceeds 400 octets

intsvc_reg_xprt_class(structsvc_xprt_class*xcl)

Register a server-side RPC transport class

Parameters

structsvc_xprt_class*xcl

New transport class to be registered

Description

Returns zero on success; otherwise a negative errno is returned.

voidsvc_unreg_xprt_class(structsvc_xprt_class*xcl)

Unregister a server-side RPC transport class

Parameters

structsvc_xprt_class*xcl

Transport class to be unregistered

voidsvc_xprt_deferred_close(structsvc_xprt*xprt)

Close a transport

Parameters

structsvc_xprt*xprt

transport instance

Description

Used in contexts that need to defer the work of shutting downthe transport to an nfsd thread.

voidsvc_xprt_received(structsvc_xprt*xprt)

start next receiver thread

Parameters

structsvc_xprt*xprt

controlling transport

Description

The caller must hold the XPT_BUSY bit and mustnot thereafter touch transport data.

Note

XPT_DATA only gets cleared when a read-attempt finds no (orinsufficient) data.

intsvc_xprt_create_from_sa(structsvc_serv*serv,constchar*xprt_name,structnet*net,structsockaddr*sap,intflags,conststructcred*cred)

Add a new listener toserv from socket address

Parameters

structsvc_serv*serv

target RPC service

constchar*xprt_name

transport class name

structnet*net

network namespace

structsockaddr*sap

socket address pointer

intflags

SVC_SOCK flags

conststructcred*cred

credential to bind to this transport

Description

Return local xprt port on success or-EPROTONOSUPPORT on failure

intsvc_xprt_create(structsvc_serv*serv,constchar*xprt_name,structnet*net,constintfamily,constunsignedshortport,intflags,conststructcred*cred)

Add a new listener toserv

Parameters

structsvc_serv*serv

target RPC service

constchar*xprt_name

transport class name

structnet*net

network namespace

constintfamily

network address family

constunsignedshortport

listener port

intflags

SVC_SOCK flags

conststructcred*cred

credential to bind to this transport

Description

Return local xprt port on success or-EPROTONOSUPPORT on failure

char*svc_print_addr(structsvc_rqst*rqstp,char*buf,size_tlen)

Format rq_addr field for printing

Parameters

structsvc_rqst*rqstp

svc_rqststructcontaining address to print

char*buf

target buffer for formatted address

size_tlen

length of target buffer

voidsvc_xprt_enqueue(structsvc_xprt*xprt)

Queue a transport on an idle nfsd thread

Parameters

structsvc_xprt*xprt

transport with data pending

voidsvc_reserve(structsvc_rqst*rqstp,intspace)

change the space reserved for the reply to a request.

Parameters

structsvc_rqst*rqstp

The request in question

intspace

new max space to reserve

Description

Each request reserves some space on the output queue of the transportto make sure the reply fits. This function reduces that reservedspace to be the amount of space used already, plusspace.

voidsvc_wake_up(structsvc_serv*serv)

Wake up a service thread for non-transport work

Parameters

structsvc_serv*serv

RPC service

Description

Some svc_serv’s will have occasional work to do, even when a xprt is notwaiting to be serviced. This function is there to “kick” a task in one ofthose services so that it can wake up and do that work. Note that we onlybother with pool 0 as we don’t need to wake up more than one thread forthis purpose.

voidsvc_recv(structsvc_rqst*rqstp)

Receive and process the next request on any transport

Parameters

structsvc_rqst*rqstp

an idle RPC service thread

Description

This code is carefully organised not to touch any cachelines inthe shared svc_serv structure, only cachelines in the localsvc_pool.

voidsvc_xprt_close(structsvc_xprt*xprt)

Close a client connection

Parameters

structsvc_xprt*xprt

transport to disconnect

voidsvc_xprt_destroy_all(structsvc_serv*serv,structnet*net,boolunregister)

Destroy transports associated withserv

Parameters

structsvc_serv*serv

RPC service to be shut down

structnet*net

target network namespace

boolunregister

true if it is OK to unregister the destroyed xprts

Description

Server threads may still be running (especially in the case where theservice is still running in other network namespaces).

So we shut down sockets the same way we would on a running server, bysetting XPT_CLOSE, enqueuing, and letting a thread pick it up to dothe close. In the case there are no such other threads,threads running,svc_clean_up_xprts() does a simple version of aserver’s main event loop, and in the case where there are otherthreads, we may need to wait a little while and then check again tosee if they’re done.

structsvc_xprt*svc_find_listener(structsvc_serv*serv,constchar*xcl_name,structnet*net,conststructsockaddr*sa)

find an RPC transport instance

Parameters

structsvc_serv*serv

pointer to svc_serv to search

constchar*xcl_name

C string containing transport’s class name

structnet*net

owner net pointer

conststructsockaddr*sa

sockaddr containing address

Description

Return the transport instance pointer for the endpoint acceptingconnections/peer traffic from the specified transport class,and matching sockaddr.

structsvc_xprt*svc_find_xprt(structsvc_serv*serv,constchar*xcl_name,structnet*net,constsa_family_taf,constunsignedshortport)

find an RPC transport instance

Parameters

structsvc_serv*serv

pointer to svc_serv to search

constchar*xcl_name

C string containing transport’s class name

structnet*net

owner net pointer

constsa_family_taf

Address family of transport’s local address

constunsignedshortport

transport’s IP port number

Description

Return the transport instance pointer for the endpoint acceptingconnections/peer traffic from the specified transport class,address family and port.

Specifying 0 for the address family or port is effectively awild-card, and will result in matching the first transport in theservice’s list that has a matching class name.

intsvc_xprt_names(structsvc_serv*serv,char*buf,constintbuflen)

format a buffer with a list of transport names

Parameters

structsvc_serv*serv

pointer to an RPC service

char*buf

pointer to a buffer to be filled in

constintbuflen

length of buffer to be filled in

Description

Fills inbuf with a string containing a list of transport names,each name terminated with ‘n’.

Returns positive length of the filled-in string on success; otherwisea negative errno value is returned if an error occurs.

intxprt_register_transport(structxprt_class*transport)

register a transport implementation

Parameters

structxprt_class*transport

transport to register

Description

If a transport implementation is loaded as a kernel module, it cancall this interface to make itself known to the RPC client.

Return

0: transport successfully registered-EEXIST: transport already registered-EINVAL: transport module being unloaded

intxprt_unregister_transport(structxprt_class*transport)

unregister a transport implementation

Parameters

structxprt_class*transport

transport to unregister

Return

0: transport successfully unregistered-ENOENT: transport never registered

intxprt_find_transport_ident(constchar*netid)

convert a netid into a transport identifier

Parameters

constchar*netid

transport to load

Return

> 0: transport identifier-ENOENT: transport module not available

intxprt_reserve_xprt(structrpc_xprt*xprt,structrpc_task*task)

serialize write access to transports

Parameters

structrpc_xprt*xprt

pointer to the target transport

structrpc_task*task

task that is requesting access to the transport

Description

This prevents mixing the payload of separate requests, and preventstransport connects from colliding with writes. No congestion controlis provided.

voidxprt_release_xprt(structrpc_xprt*xprt,structrpc_task*task)

allow other requests to use a transport

Parameters

structrpc_xprt*xprt

transport with other tasks potentially waiting

structrpc_task*task

task that is releasing access to the transport

Description

Note that “task” can be NULL. No congestion control is provided.

voidxprt_release_xprt_cong(structrpc_xprt*xprt,structrpc_task*task)

allow other requests to use a transport

Parameters

structrpc_xprt*xprt

transport with other tasks potentially waiting

structrpc_task*task

task that is releasing access to the transport

Description

Note that “task” can be NULL. Another task is awoken to use thetransport if the transport’s congestion window allows it.

boolxprt_request_get_cong(structrpc_xprt*xprt,structrpc_rqst*req)

Request congestion control credits

Parameters

structrpc_xprt*xprt

pointer to transport

structrpc_rqst*req

pointer to RPC request

Description

Useful for transports that require congestion control.

voidxprt_release_rqst_cong(structrpc_task*task)

housekeeping when request is complete

Parameters

structrpc_task*task

RPC request that recently completed

Description

Useful for transports that require congestion control.

voidxprt_adjust_cwnd(structrpc_xprt*xprt,structrpc_task*task,intresult)

adjust transport congestion window

Parameters

structrpc_xprt*xprt

pointer to xprt

structrpc_task*task

recently completed RPC request used to adjust window

intresult

result code of completed RPC request

Description

The transport code maintains an estimate on the maximum number of out-standing RPC requests, using a smoothed version of the congestionavoidance implemented in 44BSD. This is basically the Van Jacobsoncongestion algorithm: If a retransmit occurs, the congestion window ishalved; otherwise, it is incremented by 1/cwnd when

  • a reply is received and

  • a full number of requests are outstanding and

  • the congestion window hasn’t been updated recently.

voidxprt_wake_pending_tasks(structrpc_xprt*xprt,intstatus)

wake all tasks on a transport’s pending queue

Parameters

structrpc_xprt*xprt

transport with waiting tasks

intstatus

result code to plant in each task before waking it

voidxprt_wait_for_buffer_space(structrpc_xprt*xprt)

wait for transport output buffer to clear

Parameters

structrpc_xprt*xprt

transport

Description

Note that we only set the timer for the case ofRPC_IS_SOFT(), sincewe don’t in general want to force a socket disconnection due toan incomplete RPC call transmission.

boolxprt_write_space(structrpc_xprt*xprt)

wake the task waiting for transport output buffer space

Parameters

structrpc_xprt*xprt

transport with waiting tasks

Description

Can be called in a soft IRQ context, so xprt_write_space never sleeps.

voidxprt_disconnect_done(structrpc_xprt*xprt)

mark a transport as disconnected

Parameters

structrpc_xprt*xprt

transport to flag for disconnect

voidxprt_force_disconnect(structrpc_xprt*xprt)

force a transport to disconnect

Parameters

structrpc_xprt*xprt

transport to disconnect

unsignedlongxprt_reconnect_delay(conststructrpc_xprt*xprt)

compute the wait before scheduling a connect

Parameters

conststructrpc_xprt*xprt

transport instance

voidxprt_reconnect_backoff(structrpc_xprt*xprt,unsignedlonginit_to)

compute the new re-establish timeout

Parameters

structrpc_xprt*xprt

transport instance

unsignedlonginit_to

initial reestablish timeout

structrpc_rqst*xprt_lookup_rqst(structrpc_xprt*xprt,__be32xid)

find an RPC request corresponding to an XID

Parameters

structrpc_xprt*xprt

transport on which the original request was transmitted

__be32xid

RPC XID of incoming reply

Description

Caller holds xprt->queue_lock.

voidxprt_pin_rqst(structrpc_rqst*req)

Pin a request on the transport receive list

Parameters

structrpc_rqst*req

Request to pin

Description

Caller must ensure this is atomic with the call toxprt_lookup_rqst()so should be holding xprt->queue_lock.

voidxprt_unpin_rqst(structrpc_rqst*req)

Unpin a request on the transport receive list

Parameters

structrpc_rqst*req

Request to pin

Description

Caller should be holding xprt->queue_lock.

voidxprt_update_rtt(structrpc_task*task)

Update RPC RTT statistics

Parameters

structrpc_task*task

RPC request that recently completed

Description

Caller holds xprt->queue_lock.

voidxprt_complete_rqst(structrpc_task*task,intcopied)

called when reply processing is complete

Parameters

structrpc_task*task

RPC request that recently completed

intcopied

actual number of bytes received from the transport

Description

Caller holds xprt->queue_lock.

voidxprt_wait_for_reply_request_def(structrpc_task*task)

wait for reply

Parameters

structrpc_task*task

pointer to rpc_task

Description

Set a request’s retransmit timeout based on the transport’sdefault timeout parameters. Used by transports that don’t adjustthe retransmit timeout based on round-trip time estimation,and put the task to sleep on the pending queue.

voidxprt_wait_for_reply_request_rtt(structrpc_task*task)

wait for reply using RTT estimator

Parameters

structrpc_task*task

pointer to rpc_task

Description

Set a request’s retransmit timeout using the RTT estimator,and put the task to sleep on the pending queue.

structrpc_xprt*xprt_get(structrpc_xprt*xprt)

return a reference to an RPC transport.

Parameters

structrpc_xprt*xprt

pointer to the transport

voidxprt_put(structrpc_xprt*xprt)

release a reference to an RPC transport.

Parameters

structrpc_xprt*xprt

pointer to the transport

voidrpc_wake_up(structrpc_wait_queue*queue)

wake up all rpc_tasks

Parameters

structrpc_wait_queue*queue

rpc_wait_queue on which the tasks are sleeping

Description

Grabs queue->lock

voidrpc_wake_up_status(structrpc_wait_queue*queue,intstatus)

wake up all rpc_tasks and set their status value.

Parameters

structrpc_wait_queue*queue

rpc_wait_queue on which the tasks are sleeping

intstatus

status value to set

Description

Grabs queue->lock

structrpc_iostats*rpc_alloc_iostats(structrpc_clnt*clnt)

allocate an rpc_iostats structure

Parameters

structrpc_clnt*clnt

RPC program, version, and xprt

voidrpc_free_iostats(structrpc_iostats*stats)

release an rpc_iostats structure

Parameters

structrpc_iostats*stats

doomed rpc_iostats structure

voidrpc_count_iostats_metrics(conststructrpc_task*task,structrpc_iostats*op_metrics)

tally up per-task stats

Parameters

conststructrpc_task*task

completed rpc_task

structrpc_iostats*op_metrics

stat structure for OP that will accumulate stats fromtask

voidrpc_count_iostats(conststructrpc_task*task,structrpc_iostats*stats)

tally up per-task stats

Parameters

conststructrpc_task*task

completed rpc_task

structrpc_iostats*stats

array of stat structures

Description

Uses the statidx fromtask

intrpc_queue_upcall(structrpc_pipe*pipe,structrpc_pipe_msg*msg)

queue an upcall message to userspace

Parameters

structrpc_pipe*pipe

upcall pipe on which to queue given message

structrpc_pipe_msg*msg

message to queue

Description

Call with aninode created byrpc_mkpipe() to queue an upcall.A userspace process may then later read the upcall by performing aread on an open file for this inode. It is up to the caller toinitialize the fields ofmsg (other thanmsg->list) appropriately.

intrpc_mkpipe_dentry(structdentry*parent,constchar*name,void*private,structrpc_pipe*pipe)

make an rpc_pipefs file for kernel<->userspace communication

Parameters

structdentry*parent

dentry of directory to create new “pipe” in

constchar*name

name of pipe

void*private

private data to associate with the pipe, for the caller’s use

structrpc_pipe*pipe

rpc_pipe containing input parameters

Description

Data is made available for userspace to read by calls torpc_queue_upcall(). The actual reads will result in calls toops->upcall, which will be called with the file pointer,message, and userspace buffer to copy to.

Writes can come at any time, and do not necessarily have to beresponses to upcalls. They will result in calls tomsg->downcall.

Theprivate argument passed here will be available to all these methodsfrom the file pointer, via RPC_I(file_inode(file))->private.

voidrpc_unlink(structrpc_pipe*pipe)

remove a pipe

Parameters

structrpc_pipe*pipe

the pipe to be removed

Description

After this call, lookups will no longer find the pipe, and anyattempts to read or write using preexisting opens of the pipe willreturn -EPIPE.

voidrpc_init_pipe_dir_head(structrpc_pipe_dir_head*pdh)

initialise astructrpc_pipe_dir_head

Parameters

structrpc_pipe_dir_head*pdh

pointer tostructrpc_pipe_dir_head

voidrpc_init_pipe_dir_object(structrpc_pipe_dir_object*pdo,conststructrpc_pipe_dir_object_ops*pdo_ops,void*pdo_data)

initialise astructrpc_pipe_dir_object

Parameters

structrpc_pipe_dir_object*pdo

pointer tostructrpc_pipe_dir_object

conststructrpc_pipe_dir_object_ops*pdo_ops

pointer to conststructrpc_pipe_dir_object_ops

void*pdo_data

pointer to caller-defined data

intrpc_add_pipe_dir_object(structnet*net,structrpc_pipe_dir_head*pdh,structrpc_pipe_dir_object*pdo)

associate a rpc_pipe_dir_object to a directory

Parameters

structnet*net

pointer tostructnet

structrpc_pipe_dir_head*pdh

pointer tostructrpc_pipe_dir_head

structrpc_pipe_dir_object*pdo

pointer tostructrpc_pipe_dir_object

voidrpc_remove_pipe_dir_object(structnet*net,structrpc_pipe_dir_head*pdh,structrpc_pipe_dir_object*pdo)

remove a rpc_pipe_dir_object from a directory

Parameters

structnet*net

pointer tostructnet

structrpc_pipe_dir_head*pdh

pointer tostructrpc_pipe_dir_head

structrpc_pipe_dir_object*pdo

pointer tostructrpc_pipe_dir_object

structrpc_pipe_dir_object*rpc_find_or_alloc_pipe_dir_object(structnet*net,structrpc_pipe_dir_head*pdh,int(*match)(structrpc_pipe_dir_object*,void*),structrpc_pipe_dir_object*(*alloc)(void*),void*data)

Parameters

structnet*net

pointer tostructnet

structrpc_pipe_dir_head*pdh

pointer tostructrpc_pipe_dir_head

int(*match)(structrpc_pipe_dir_object*,void*)

matchstructrpc_pipe_dir_object to data

structrpc_pipe_dir_object*(*alloc)(void*)

allocate a newstructrpc_pipe_dir_object

void*data

user defined data formatch() andalloc()

voidrpcb_getport_async(structrpc_task*task)

obtain the port for a given RPC service on a given host

Parameters

structrpc_task*task

task that is waiting for portmapper request

Description

This one can be called for an ongoing RPC request, and can be used inan async (rpciod) context.

structrpc_clnt*rpc_create(structrpc_create_args*args)

create an RPC client and transport with one call

Parameters

structrpc_create_args*args

rpc_clnt create argument structure

Description

Creates and initializes an RPC transport and an RPC client.

It can ping the server in order to determine if it is up, and to see ifit supports this program and version. RPC_CLNT_CREATE_NOPING disablesthis behavior so asynchronous tasks can also use rpc_create.

structrpc_clnt*rpc_clone_client(structrpc_clnt*clnt)

Clone an RPC client structure

Parameters

structrpc_clnt*clnt

RPC client whose parameters are copied

Description

Returns a fresh RPC client or an ERR_PTR.

structrpc_clnt*rpc_clone_client_set_auth(structrpc_clnt*clnt,rpc_authflavor_tflavor)

Clone an RPC client structure and set its auth

Parameters

structrpc_clnt*clnt

RPC client whose parameters are copied

rpc_authflavor_tflavor

security flavor for new client

Description

Returns a fresh RPC client or an ERR_PTR.

intrpc_switch_client_transport(structrpc_clnt*clnt,structxprt_create*args,conststructrpc_timeout*timeout)

switch the RPC transport on the fly

Parameters

structrpc_clnt*clnt

pointer to astructrpc_clnt

structxprt_create*args

pointer to the new transport arguments

conststructrpc_timeout*timeout

pointer to the new timeout parameters

Description

This function allows the caller to switch the RPC transport for therpc_clnt structure ‘clnt’ to allow it to connect to a mirrored NFSserver, for instance. It assumes that the caller has ensured thatthere are no active RPC tasks by using some form of locking.

Returns zero if “clnt” is now using the new xprt. Otherwise anegative errno is returned, and “clnt” continues to use the oldxprt.

intrpc_clnt_iterate_for_each_xprt(structrpc_clnt*clnt,int(*fn)(structrpc_clnt*,structrpc_xprt*,void*),void*data)

Apply a function to all transports

Parameters

structrpc_clnt*clnt

pointer to client

int(*fn)(structrpc_clnt*,structrpc_xprt*,void*)

function to apply

void*data

void pointer to function data

Description

Iterates through the list of RPC transports currently attached to theclient and applies the function fn(clnt, xprt, data).

On error, the iteration stops, and the function returns the error value.

unsignedlongrpc_cancel_tasks(structrpc_clnt*clnt,interror,bool(*fnmatch)(conststructrpc_task*,constvoid*),constvoid*data)

try to cancel a set of RPC tasks

Parameters

structrpc_clnt*clnt

Pointer to RPC client

interror

RPC task error value to set

bool(*fnmatch)(conststructrpc_task*,constvoid*)

Pointer to selector function

constvoid*data

User data

Description

Usesfnmatch to define a set of RPC tasks that are to be cancelled.The argumenterror must be a negative error value.

structrpc_clnt*rpc_bind_new_program(structrpc_clnt*old,conststructrpc_program*program,u32vers)

bind a new RPC program to an existing client

Parameters

structrpc_clnt*old

old rpc_client

conststructrpc_program*program

rpc program to set

u32vers

rpc program version

Description

Clones the rpc client and sets up a new RPC program. This is mainlyof use for enabling different RPC programs to share the same transport.The Sun NFSv2/v3 ACL protocol can do this.

structrpc_task*rpc_run_task(conststructrpc_task_setup*task_setup_data)

Allocate a new RPC task, then run rpc_execute against it

Parameters

conststructrpc_task_setup*task_setup_data

pointer to task initialisation data

intrpc_call_sync(structrpc_clnt*clnt,conststructrpc_message*msg,intflags)

Perform a synchronous RPC call

Parameters

structrpc_clnt*clnt

pointer to RPC client

conststructrpc_message*msg

RPC call parameters

intflags

RPC call flags

intrpc_call_async(structrpc_clnt*clnt,conststructrpc_message*msg,intflags,conststructrpc_call_ops*tk_ops,void*data)

Perform an asynchronous RPC call

Parameters

structrpc_clnt*clnt

pointer to RPC client

conststructrpc_message*msg

RPC call parameters

intflags

RPC call flags

conststructrpc_call_ops*tk_ops

RPC call ops

void*data

user call data

voidrpc_prepare_reply_pages(structrpc_rqst*req,structpage**pages,unsignedintbase,unsignedintlen,unsignedinthdrsize)

Prepare to receive a reply data payload into pages

Parameters

structrpc_rqst*req

RPC request to prepare

structpage**pages

vector ofstructpage pointers

unsignedintbase

offset in first page where receive should start, in bytes

unsignedintlen

expected size of the upper layer data payload, in bytes

unsignedinthdrsize

expected size of upper layer reply header, in XDR words

size_trpc_peeraddr(structrpc_clnt*clnt,structsockaddr*buf,size_tbufsize)

extract remote peer address from clnt’s xprt

Parameters

structrpc_clnt*clnt

RPC client structure

structsockaddr*buf

target buffer

size_tbufsize

length of target buffer

Description

Returns the number of bytes that are actually in the stored address.

constchar*rpc_peeraddr2str(structrpc_clnt*clnt,enumrpc_display_format_tformat)

return remote peer address in printable format

Parameters

structrpc_clnt*clnt

RPC client structure

enumrpc_display_format_tformat

address format

Description

NB: the lifetime of the memory referenced by the returned pointer isthe same as the rpc_xprt itself. As long as the caller uses thispointer, it must hold the RCU read lock.

intrpc_localaddr(structrpc_clnt*clnt,structsockaddr*buf,size_tbuflen)

discover local endpoint address for an RPC client

Parameters

structrpc_clnt*clnt

RPC client structure

structsockaddr*buf

target buffer

size_tbuflen

size of target buffer, in bytes

Description

Returns zero and fills in “buf” and “buflen” if successful;otherwise, a negative errno is returned.

This works even if the underlying transport is not currently connected,or if the upper layer never previously provided a source address.

The result of this function call is transient: multiple calls insuccession may give different results, depending on how localnetworking configuration changes over time.

structnet*rpc_net_ns(structrpc_clnt*clnt)

Get the network namespace for this RPC client

Parameters

structrpc_clnt*clnt

RPC client to query

size_trpc_max_payload(structrpc_clnt*clnt)

Get maximum payload size for a transport, in bytes

Parameters

structrpc_clnt*clnt

RPC client to query

Description

For stream transports, this is one RPC record fragment (see RFC1831), as we don’t support multi-record requests yet. For datagramtransports, this is the size of an IP packet minus the IP, UDP, andRPC header sizes.

size_trpc_max_bc_payload(structrpc_clnt*clnt)

Get maximum backchannel payload size, in bytes

Parameters

structrpc_clnt*clnt

RPC client to query

voidrpc_force_rebind(structrpc_clnt*clnt)

force transport to check that remote port is unchanged

Parameters

structrpc_clnt*clnt

client to rebind

intrpc_clnt_test_and_add_xprt(structrpc_clnt*clnt,structrpc_xprt_switch*xps,structrpc_xprt*xprt,void*in_max_connect)

Test and add a new transport to a rpc_clnt

Parameters

structrpc_clnt*clnt

pointer tostructrpc_clnt

structrpc_xprt_switch*xps

pointer tostructrpc_xprt_switch,

structrpc_xprt*xprt

pointerstructrpc_xprt

void*in_max_connect

pointer to the max_connect value for the passed in xprt transport

intrpc_clnt_setup_test_and_add_xprt(structrpc_clnt*clnt,structrpc_xprt_switch*xps,structrpc_xprt*xprt,void*data)

Parameters

structrpc_clnt*clnt

structrpc_clnt to get the new transport

structrpc_xprt_switch*xps

the rpc_xprt_switch to hold the new transport

structrpc_xprt*xprt

the rpc_xprt to test

void*data

astructrpc_add_xprt_test pointer that holds the test functionand test function call data

Description

This is an rpc_clnt_add_xprt setup() function which returns 1 so:

1) caller of the test function must dereference the rpc_xprt_switchand the rpc_xprt.2) test function must call rpc_xprt_switch_add_xprt, usually inthe rpc_call_done routine.

Upon success (return of 1), the test function adds the newtransport to the rpc_clnt xprt switch

intrpc_clnt_add_xprt(structrpc_clnt*clnt,structxprt_create*xprtargs,int(*setup)(structrpc_clnt*,structrpc_xprt_switch*,structrpc_xprt*,void*),void*data)

Add a new transport to a rpc_clnt

Parameters

structrpc_clnt*clnt

pointer tostructrpc_clnt

structxprt_create*xprtargs

pointer tostructxprt_create

int(*setup)(structrpc_clnt*,structrpc_xprt_switch*,structrpc_xprt*,void*)

callback to test and/or set up the connection

void*data

pointer to setup function data

Description

Creates a new transport using the parameters set in args andadds it to clnt.If ping is set, then test that connectivity succeeds beforeadding the new transport.

Network device support

Driver Support

voiddev_add_pack(structpacket_type*pt)

add packet handler

Parameters

structpacket_type*pt

packet type declaration

Description

Add a protocol handler to the networking stack. The passedpacket_typeis linked into kernel lists and may not be freed until it has beenremoved from the kernel lists.

This call does not sleep therefore it can notguarantee all CPU’s that are in middle of receiving packetswill see the new packet type (until the next received packet).

void__dev_remove_pack(structpacket_type*pt)

remove packet handler

Parameters

structpacket_type*pt

packet type declaration

Description

Remove a protocol handler that was previously added to the kernelprotocol handlers bydev_add_pack(). The passedpacket_type is removedfrom the kernel lists and can be freed or reused once this functionreturns.

The packet type might still be in use by receiversand must not be freed until after all the CPU’s have gonethrough a quiescent state.

voiddev_remove_pack(structpacket_type*pt)

remove packet handler

Parameters

structpacket_type*pt

packet type declaration

Description

Remove a protocol handler that was previously added to the kernelprotocol handlers bydev_add_pack(). The passedpacket_type is removedfrom the kernel lists and can be freed or reused once this functionreturns.

This call sleeps to guarantee that no CPU is looking at the packettype after return.

intdev_get_iflink(conststructnet_device*dev)

get ‘iflink’ value of a interface

Parameters

conststructnet_device*dev

targeted interface

Description

Indicates the ifindex the interface is linked to.Physical interfaces have the same ‘ifindex’ and ‘iflink’ values.

intdev_fill_metadata_dst(structnet_device*dev,structsk_buff*skb)

Retrieve tunnel egress information.

Parameters

structnet_device*dev

targeted interface

structsk_buff*skb

The packet.

Description

For better visibility of tunnel traffic OVS needs to retrieveegress tunnel information for a packet. Following API allowsuser to get this info.

structnet_device*__dev_get_by_name(structnet*net,constchar*name)

find a device by its name

Parameters

structnet*net

the applicable net namespace

constchar*name

name to find

Description

Find an interface by name. Must be called under RTNL semaphore.If the name is found a pointer to the device is returned.If the name is not found thenNULL is returned. Thereference counters are not incremented so the caller must becareful with locks.

structnet_device*dev_get_by_name_rcu(structnet*net,constchar*name)

find a device by its name

Parameters

structnet*net

the applicable net namespace

constchar*name

name to find

Description

Find an interface by name.If the name is found a pointer to the device is returned.If the name is not found thenNULL is returned.The reference counters are not incremented so the caller must becareful with locks. The caller must hold RCU lock.

structnet_device*netdev_get_by_name(structnet*net,constchar*name,netdevice_tracker*tracker,gfp_tgfp)

find a device by its name

Parameters

structnet*net

the applicable net namespace

constchar*name

name to find

netdevice_tracker*tracker

tracking object for the acquired reference

gfp_tgfp

allocation flags for the tracker

Description

Find an interface by name. This can be called from anycontext and does its own locking. The returned handle hasthe usage count incremented and the caller must usenetdev_put() torelease it when it is no longer needed.NULL is returned if nomatching device is found.

structnet_device*__dev_get_by_index(structnet*net,intifindex)

find a device by its ifindex

Parameters

structnet*net

the applicable net namespace

intifindex

index of device

Description

Search for an interface by index. ReturnsNULL if the deviceis not found or a pointer to the device. The device has nothad its reference counter increased so the caller must be carefulabout locking. The caller must hold the RTNL semaphore.

structnet_device*dev_get_by_index_rcu(structnet*net,intifindex)

find a device by its ifindex

Parameters

structnet*net

the applicable net namespace

intifindex

index of device

Description

Search for an interface by index. ReturnsNULL if the deviceis not found or a pointer to the device. The device has nothad its reference counter increased so the caller must be carefulabout locking. The caller must hold RCU lock.

structnet_device*netdev_get_by_index(structnet*net,intifindex,netdevice_tracker*tracker,gfp_tgfp)

find a device by its ifindex

Parameters

structnet*net

the applicable net namespace

intifindex

index of device

netdevice_tracker*tracker

tracking object for the acquired reference

gfp_tgfp

allocation flags for the tracker

Description

Search for an interface by index. Returns NULL if the deviceis not found or a pointer to the device. The device returned hashad a reference added and the pointer is safe until the user callsnetdev_put() to indicate they have finished with it.

structnet_device*dev_getbyhwaddr_rcu(structnet*net,unsignedshorttype,constchar*ha)

find a device by its hardware address

Parameters

structnet*net

the applicable net namespace

unsignedshorttype

media type of device

constchar*ha

hardware address

Description

Search for an interface by MAC address. Returns NULL if the deviceis not found or a pointer to the device.The caller must hold RCU.The returned device has not had its ref count increasedand the caller must therefore be careful about locking

structnet_device*dev_getbyhwaddr(structnet*net,unsignedshorttype,constchar*ha)

find a device by its hardware address

Parameters

structnet*net

the applicable net namespace

unsignedshorttype

media type of device

constchar*ha

hardware address

Description

Similar todev_getbyhwaddr_rcu(), but the owner needs to holdrtnl_lock.

Context

rtnl_lock() must be held.

Return

pointer to the net_device, or NULL if not found

booldev_valid_name(constchar*name)

check if name is okay for network device

Parameters

constchar*name

name string

Description

Network device names need to be valid file names toallow sysfs to work. We also disallow any kind ofwhitespace.

intdev_alloc_name(structnet_device*dev,constchar*name)

allocate a name for a device

Parameters

structnet_device*dev

device

constchar*name

name format string

Description

Passed a format string - eg “lt``d``” it will try and find a suitableid. It scans list of devices to build up a free map, then choosesthe first empty slot. The caller must hold the dev_base or rtnl lockwhile allocating the name and adding the device in order to avoidduplicates.Limited to bits_per_byte * page size devices (ie 32K on most platforms).Returns the number of the unit assigned or a negative errno code.

voidnetdev_features_change(structnet_device*dev)

device changes features

Parameters

structnet_device*dev

device to cause notification

Description

Called to indicate a device has changed features.

void__netdev_notify_peers(structnet_device*dev)

notify network peers about existence ofdev, to be called when rtnl lock is already held.

Parameters

structnet_device*dev

network device

Description

Generate traffic such that interested network peers are aware ofdev, such as by generating a gratuitous ARP. This may be used whena device wants to inform the rest of the network about some sort ofreconfiguration such as a failover event or virtual machinemigration.

voidnetdev_notify_peers(structnet_device*dev)

notify network peers about existence ofdev

Parameters

structnet_device*dev

network device

Description

Generate traffic such that interested network peers are aware ofdev, such as by generating a gratuitous ARP. This may be used whena device wants to inform the rest of the network about some sort ofreconfiguration such as a failover event or virtual machinemigration.

intregister_netdevice_notifier(structnotifier_block*nb)

register a network notifier block

Parameters

structnotifier_block*nb

notifier

Description

Register a notifier to be called when network device events occur.The notifier passed is linked into the kernel structures and mustnot be reused until it has been unregistered. A negative errno codeis returned on a failure.

When registered all registration and up events are replayedto the new notifier to allow device to have a race freeview of the network device list.

intunregister_netdevice_notifier(structnotifier_block*nb)

unregister a network notifier block

Parameters

structnotifier_block*nb

notifier

Description

Unregister a notifier previously registered byregister_netdevice_notifier(). The notifier is unlinked into thekernel structures and may then be reused. A negative errno codeis returned on a failure.

After unregistering unregister and down device events are synthesizedfor all devices on the device list to the removed notifier to removethe need for special case cleanup code.

intregister_netdevice_notifier_net(structnet*net,structnotifier_block*nb)

register a per-netns network notifier block

Parameters

structnet*net

network namespace

structnotifier_block*nb

notifier

Description

Register a notifier to be called when network device events occur.The notifier passed is linked into the kernel structures and mustnot be reused until it has been unregistered. A negative errno codeis returned on a failure.

When registered all registration and up events are replayedto the new notifier to allow device to have a race freeview of the network device list.

intunregister_netdevice_notifier_net(structnet*net,structnotifier_block*nb)

unregister a per-netns network notifier block

Parameters

structnet*net

network namespace

structnotifier_block*nb

notifier

Description

Unregister a notifier previously registered byregister_netdevice_notifier_net(). The notifier is unlinked from thekernel structures and may then be reused. A negative errno codeis returned on a failure.

After unregistering unregister and down device events are synthesizedfor all devices on the device list to the removed notifier to removethe need for special case cleanup code.

intcall_netdevice_notifiers(unsignedlongval,structnet_device*dev)

call all network notifier blocks

Parameters

unsignedlongval

value passed unmodified to notifier function

structnet_device*dev

net_device pointer passed unmodified to notifier function

Description

Call all network notifier blocks. Parameters and return valueare as forraw_notifier_call_chain().

intdev_forward_skb(structnet_device*dev,structsk_buff*skb)

loopback an skb to another netif

Parameters

structnet_device*dev

destination network device

structsk_buff*skb

buffer to forward

Description

return values:

NET_RX_SUCCESS (no congestion)NET_RX_DROP (packet was dropped, but freed)

dev_forward_skb can be used for injecting an skb from thestart_xmit function of one device into the receive queueof another device.

The receiving device may be in another namespace, sowe have to clear all information in the skb that couldimpact namespace isolation.

booldev_nit_active_rcu(conststructnet_device*dev)

return true if any network interface taps are in use

Parameters

conststructnet_device*dev

network device to check for the presence of taps

Description

The caller must hold the RCU lock

intnetif_set_real_num_rx_queues(structnet_device*dev,unsignedintrxq)

set actual number of RX queues used

Parameters

structnet_device*dev

Network device

unsignedintrxq

Actual number of RX queues

Description

This must be called either with the rtnl_lock held or beforeregistration of the net device. Returns 0 on success, or anegative error code. If called before registration, it alwayssucceeds.

intnetif_set_real_num_queues(structnet_device*dev,unsignedinttxq,unsignedintrxq)

set actual number of RX and TX queues used

Parameters

structnet_device*dev

Network device

unsignedinttxq

Actual number of TX queues

unsignedintrxq

Actual number of RX queues

Description

Set the real number of both TX and RX queues.Does nothing if the number of queues is already correct.

voidnetif_set_tso_max_size(structnet_device*dev,unsignedintsize)

set the max size of TSO frames supported

Parameters

structnet_device*dev

netdev to update

unsignedintsize

max skb->len of a TSO frame

Description

Set the limit on the size of TSO super-frames the device can handle.Unless explicitly set the stack will assume the value ofGSO_LEGACY_MAX_SIZE.

voidnetif_set_tso_max_segs(structnet_device*dev,unsignedintsegs)

set the max number of segs supported for TSO

Parameters

structnet_device*dev

netdev to update

unsignedintsegs

max number of TCP segments

Description

Set the limit on the number of TCP segments the device can generate froma single TSO super-frame.Unless explicitly set the stack will assume the value ofGSO_MAX_SEGS.

voidnetif_inherit_tso_max(structnet_device*to,conststructnet_device*from)

copy all TSO limits from a lower device to an upper

Parameters

structnet_device*to

netdev to update

conststructnet_device*from

netdev from which to copy the limits

intnetif_get_num_default_rss_queues(void)

default number of RSS queues

Parameters

void

no arguments

Description

Default value is the number of physical cores if there are only 1 or 2, ordivided by 2 if there are more.

voidnetif_device_detach(structnet_device*dev)

mark device as removed

Parameters

structnet_device*dev

network device

Description

Mark device as removed from system and therefore no longer available.

voidnetif_device_attach(structnet_device*dev)

mark device as attached

Parameters

structnet_device*dev

network device

Description

Mark device as attached from system and restart if needed.

intdev_loopback_xmit(structnet*net,structsock*sk,structsk_buff*skb)

loop backskb

Parameters

structnet*net

network namespace this loopback is happening in

structsock*sk

sk needed to be a netfilter okfn

structsk_buff*skb

buffer to transmit

int__dev_queue_xmit(structsk_buff*skb,structnet_device*sb_dev)

transmit a buffer

Parameters

structsk_buff*skb

buffer to transmit

structnet_device*sb_dev

suboordinate device used for L2 forwarding offload

Description

Queue a buffer for transmission to a network device. The caller musthave set the device and priority and built the buffer before callingthis function. The function can be called from an interrupt.

When calling this method, interrupts MUST be enabled. This is becausethe BH enable code must have IRQs enabled so that it will not deadlock.

Regardless of the return value, the skb is consumed, so it is currentlydifficult to retry a send to this method. (You can bump the ref countbefore sending to hold a reference for retry if you are careful.)

Return

  • 0 - buffer successfully transmitted

  • positive qdisc return code - NET_XMIT_DROP etc.

  • negative errno - other errors

boolrps_may_expire_flow(structnet_device*dev,u16rxq_index,u32flow_id,u16filter_id)

check whether an RFS hardware filter may be removed

Parameters

structnet_device*dev

Device on which the filter was set

u16rxq_index

RX queue index

u32flow_id

Flow ID passed tondo_rx_flow_steer()

u16filter_id

Filter ID returned byndo_rx_flow_steer()

Description

Drivers that implementndo_rx_flow_steer() should periodically callthis function for each installed filter and remove the filters forwhich it returnstrue.

int__netif_rx(structsk_buff*skb)

Slightly optimized version of netif_rx

Parameters

structsk_buff*skb

buffer to post

Description

This behaves as netif_rx except that it does not disable bottom halves.As a result this function may only be invoked from the interrupt context(either hard or soft interrupt).

intnetif_rx(structsk_buff*skb)

post buffer to the network code

Parameters

structsk_buff*skb

buffer to post

Description

This function receives a packet from a device driver and queues it forthe upper (protocol) levels to process via the backlog NAPI device. Italways succeeds. The buffer may be dropped during processing forcongestion control or by the protocol layers.The network buffer is passed via the backlog NAPI device. Modern NICdriver should use NAPI and GRO.This function can used from interrupt and from process context. Thecaller from process context must not disable interrupts before invokingthis function.

return values:NET_RX_SUCCESS (no congestion)NET_RX_DROP (packet was dropped)

boolnetdev_is_rx_handler_busy(structnet_device*dev)

check if receive handler is registered

Parameters

structnet_device*dev

device to check

Description

Check if a receive handler is already registered for a given device.Return true if there one.

The caller must hold the rtnl_mutex.

intnetdev_rx_handler_register(structnet_device*dev,rx_handler_func_t*rx_handler,void*rx_handler_data)

register receive handler

Parameters

structnet_device*dev

device to register a handler for

rx_handler_func_t*rx_handler

receive handler to register

void*rx_handler_data

data pointer that is used by rx handler

Description

Register a receive handler for a device. This handler will then becalled from __netif_receive_skb. A negative errno code is returnedon a failure.

The caller must hold the rtnl_mutex.

For a general description of rx_handler, seeenumrx_handler_result.

voidnetdev_rx_handler_unregister(structnet_device*dev)

unregister receive handler

Parameters

structnet_device*dev

device to unregister a handler from

Description

Unregister a receive handler from a device.

The caller must hold the rtnl_mutex.

intnetif_receive_skb_core(structsk_buff*skb)

special purpose version of netif_receive_skb

Parameters

structsk_buff*skb

buffer to process

Description

More direct receive version ofnetif_receive_skb(). It shouldonly be used by callers that have a need to skip RPS and Generic XDP.Caller must also take care of handling if(page_is_)pfmemalloc.

This function may only be called from softirq context and interruptsshould be enabled.

Return values (usually ignored):NET_RX_SUCCESS: no congestionNET_RX_DROP: packet was dropped

intnetif_receive_skb(structsk_buff*skb)

process receive buffer from network

Parameters

structsk_buff*skb

buffer to process

Description

netif_receive_skb() is the main receive data processing function.It always succeeds. The buffer may be dropped during processingfor congestion control or by the protocol layers.

This function may only be called from softirq context and interruptsshould be enabled.

Return values (usually ignored):NET_RX_SUCCESS: no congestionNET_RX_DROP: packet was dropped

voidnetif_receive_skb_list(structlist_head*head)

process many receive buffers from network

Parameters

structlist_head*head

list of skbs to process.

Description

Since return value ofnetif_receive_skb() is normally ignored, andwouldn’t be meaningful for a list, this function returns void.

This function may only be called from softirq context and interruptsshould be enabled.

void__napi_schedule(structnapi_struct*n)

schedule for receive

Parameters

structnapi_struct*n

entry to schedule

Description

The entry’s receive function will be scheduled to run.Consider using__napi_schedule_irqoff() if hard irqs are masked.

boolnapi_schedule_prep(structnapi_struct*n)

check if napi can be scheduled

Parameters

structnapi_struct*n

napi context

Description

Test if NAPI routine is already running, and if not markit as running. This is used as a condition variable toinsure only one NAPI poll instance runs. We also makesure there is no pending NAPI disable.

void__napi_schedule_irqoff(structnapi_struct*n)

schedule for receive

Parameters

structnapi_struct*n

entry to schedule

Description

Variant of__napi_schedule() assuming hard irqs are masked.

On PREEMPT_RT enabled kernels this maps to__napi_schedule()because the interrupt disabled assumption might not be truedue to force-threaded interrupts and spinlock substitution.

voidnetif_threaded_enable(structnet_device*dev)

enable threaded NAPIs

Parameters

structnet_device*dev

net_device instance

Description

Enable threaded mode for the NAPI instances of the device. This may be usefulfor devices where multiple NAPI instances get scheduled by a singleinterrupt. Threaded NAPI allows moving the NAPI processing to cores otherthan the core where IRQ is mapped.

This function should be called beforedev is registered.

voidnetif_queue_set_napi(structnet_device*dev,unsignedintqueue_index,enumnetdev_queue_typetype,structnapi_struct*napi)

Associate queue with the napi

Parameters

structnet_device*dev

device to which NAPI and queue belong

unsignedintqueue_index

Index of queue

enumnetdev_queue_typetype

queue type as RX or TX

structnapi_struct*napi

NAPI context, pass NULL to clear previously set NAPI

Description

Set queue with its corresponding napi context. This should be done afterregistering the NAPI handler for the queue-vector and the queues have beenmapped to the corresponding interrupt vector.

voidnapi_disable(structnapi_struct*n)

prevent NAPI from scheduling

Parameters

structnapi_struct*n

NAPI context

Description

Stop NAPI from being scheduled on this context.Waits till any outstanding processing completes.Takesnetdev_lock() for associated net_device.

voidnapi_enable(structnapi_struct*n)

enable NAPI scheduling

Parameters

structnapi_struct*n

NAPI context

Description

Enable scheduling of a NAPI instance.Must be paired withnapi_disable().Takesnetdev_lock() for associated net_device.

boolnetdev_has_upper_dev(structnet_device*dev,structnet_device*upper_dev)

Check if device is linked to an upper device

Parameters

structnet_device*dev

device

structnet_device*upper_dev

upper device to check

Description

Find out if a device is linked to specified upper device and return truein case it is. Note that this checks only immediate upper device,not through a complete stack of devices. The caller must hold the RTNL lock.

boolnetdev_has_upper_dev_all_rcu(structnet_device*dev,structnet_device*upper_dev)

Check if device is linked to an upper device

Parameters

structnet_device*dev

device

structnet_device*upper_dev

upper device to check

Description

Find out if a device is linked to specified upper device and return truein case it is. Note that this checks the entire upper device chain.The caller must hold rcu lock.

boolnetdev_has_any_upper_dev(structnet_device*dev)

Check if device is linked to some device

Parameters

structnet_device*dev

device

Description

Find out if a device is linked to an upper device and return true in caseit is. The caller must hold the RTNL lock.

structnet_device*netdev_master_upper_dev_get(structnet_device*dev)

Get master upper device

Parameters

structnet_device*dev

device

Description

Find a master upper device and return pointer to it or NULL in caseit’s not there. The caller must hold the RTNL lock.

structnet_device*netdev_upper_get_next_dev_rcu(structnet_device*dev,structlist_head**iter)

Get the next dev from upper list

Parameters

structnet_device*dev

device

structlist_head**iter

list_head ** of the current position

Description

Gets the next device from the dev’s upper list, starting from iterposition. The caller must hold RCU read lock.

void*netdev_lower_get_next_private(structnet_device*dev,structlist_head**iter)

Get the next ->private from the lower neighbour list

Parameters

structnet_device*dev

device

structlist_head**iter

list_head ** of the current position

Description

Gets the next netdev_adjacent->private from the dev’s lower neighbourlist, starting from iter position. The caller must hold either hold theRTNL lock or its own locking that guarantees that the neighbour lowerlist will remain unchanged.

void*netdev_lower_get_next_private_rcu(structnet_device*dev,structlist_head**iter)

Get the next ->private from the lower neighbour list, RCU variant

Parameters

structnet_device*dev

device

structlist_head**iter

list_head ** of the current position

Description

Gets the next netdev_adjacent->private from the dev’s lower neighbourlist, starting from iter position. The caller must hold RCU read lock.

void*netdev_lower_get_next(structnet_device*dev,structlist_head**iter)

Get the next device from the lower neighbour list

Parameters

structnet_device*dev

device

structlist_head**iter

list_head ** of the current position

Description

Gets the next netdev_adjacent from the dev’s lower neighbourlist, starting from iter position. The caller must hold RTNL lock orits own locking that guarantees that the neighbour lowerlist will remain unchanged.

void*netdev_lower_get_first_private_rcu(structnet_device*dev)

Get the first ->private from the lower neighbour list, RCU variant

Parameters

structnet_device*dev

device

Description

Gets the first netdev_adjacent->private from the dev’s lower neighbourlist. The caller must hold RCU read lock.

structnet_device*netdev_master_upper_dev_get_rcu(structnet_device*dev)

Get master upper device

Parameters

structnet_device*dev

device

Description

Find a master upper device and return pointer to it or NULL in caseit’s not there. The caller must hold the RCU read lock.

intnetdev_upper_dev_link(structnet_device*dev,structnet_device*upper_dev,structnetlink_ext_ack*extack)

Add a link to the upper device

Parameters

structnet_device*dev

device

structnet_device*upper_dev

new upper device

structnetlink_ext_ack*extack

netlink extended ack

Description

Adds a link to device which is upper to this one. The caller must holdthe RTNL lock. On a failure a negative errno code is returned.On success the reference counts are adjusted and the functionreturns zero.

intnetdev_master_upper_dev_link(structnet_device*dev,structnet_device*upper_dev,void*upper_priv,void*upper_info,structnetlink_ext_ack*extack)

Add a master link to the upper device

Parameters

structnet_device*dev

device

structnet_device*upper_dev

new upper device

void*upper_priv

upper device private

void*upper_info

upper info to be passed down via notifier

structnetlink_ext_ack*extack

netlink extended ack

Description

Adds a link to device which is upper to this one. In this case, onlyone master upper device can be linked, although other non-master devicesmight be linked as well. The caller must hold the RTNL lock.On a failure a negative errno code is returned. On success the referencecounts are adjusted and the function returns zero.

voidnetdev_upper_dev_unlink(structnet_device*dev,structnet_device*upper_dev)

Removes a link to upper device

Parameters

structnet_device*dev

device

structnet_device*upper_dev

new upper device

Description

Removes a link to device which is upper to this one. The caller must holdthe RTNL lock.

voidnetdev_bonding_info_change(structnet_device*dev,structnetdev_bonding_info*bonding_info)

Dispatch event about slave change

Parameters

structnet_device*dev

device

structnetdev_bonding_info*bonding_info

info to dispatch

Description

Send NETDEV_BONDING_INFO to netdev notifiers with info.The caller must hold the RTNL lock.

structnet_device*netdev_get_xmit_slave(structnet_device*dev,structsk_buff*skb,boolall_slaves)

Get the xmit slave of master device

Parameters

structnet_device*dev

device

structsk_buff*skb

The packet

boolall_slaves

assume all the slaves are active

Description

The reference counters are not incremented so the caller must becareful with locks. The caller must hold RCU lock.NULL is returned if no slave is found.

structnet_device*netdev_sk_get_lowest_dev(structnet_device*dev,structsock*sk)

Get the lowest device in chain given device and socket

Parameters

structnet_device*dev

device

structsock*sk

the socket

Description

NULL is returned if no lower device is found.

voidnetdev_lower_state_changed(structnet_device*lower_dev,void*lower_state_info)

Dispatch event about lower device state change

Parameters

structnet_device*lower_dev

device

void*lower_state_info

state to dispatch

Description

Send NETDEV_CHANGELOWERSTATE to netdev notifiers with info.The caller must hold the RTNL lock.

unsignedintnetif_get_flags(conststructnet_device*dev)

get flags reported to userspace

Parameters

conststructnet_device*dev

device

Description

Get the combination of flag bits exported through APIs to userspace.

intnetif_pre_changeaddr_notify(structnet_device*dev,constchar*addr,structnetlink_ext_ack*extack)

Call NETDEV_PRE_CHANGEADDR.

Parameters

structnet_device*dev

device

constchar*addr

new address

structnetlink_ext_ack*extack

netlink extended ack

Return

0 on success, -errno on failure.

intnetif_get_port_parent_id(structnet_device*dev,structnetdev_phys_item_id*ppid,boolrecurse)

Get the device’s port parent identifier

Parameters

structnet_device*dev

network device

structnetdev_phys_item_id*ppid

pointer to a storage for the port’s parent identifier

boolrecurse

allow/disallow recursion to lower devices

Description

Get the devices’s port parent identifier.

Return

0 on success, -errno on failure.

boolnetdev_port_same_parent_id(structnet_device*a,structnet_device*b)

Indicate if two network devices have the same port parent identifier

Parameters

structnet_device*a

first network device

structnet_device*b

second network device

voidnetdev_update_features(structnet_device*dev)

recalculate device features

Parameters

structnet_device*dev

the device to check

Description

Recalculate dev->features set and send notifications if ithas changed. Should be called after driver or hardware dependentconditions might have changed that influence the features.

voidnetdev_change_features(structnet_device*dev)

recalculate device features

Parameters

structnet_device*dev

the device to check

Description

Recalculate dev->features set and send notifications evenif they have not changed. Should be called instead ofnetdev_update_features() if also dev->vlan_features mighthave changed to allow the changes to be propagated to stackedVLAN devices.

voidnetif_stacked_transfer_operstate(conststructnet_device*rootdev,structnet_device*dev)

transfer operstate

Parameters

conststructnet_device*rootdev

the root or lower level device to transfer state from

structnet_device*dev

the device to transfer operstate to

Description

Transfer operational state from root to device. This is normallycalled when a stacking relationship exists between the rootdevice and the device(a leaf device).

intregister_netdevice(structnet_device*dev)

register a network device

Parameters

structnet_device*dev

device to register

Description

Take a prepared network device structure and make it externally accessible.ANETDEV_REGISTER message is sent to the netdev notifier chain.Callers must hold the rtnl lock - you may wantregister_netdev()instead of this.

intregister_netdev(structnet_device*dev)

register a network device

Parameters

structnet_device*dev

device to register

Description

Take a completed network device structure and add it to the kernelinterfaces. ANETDEV_REGISTER message is sent to the netdev notifierchain. 0 is returned on success. A negative errno code is returnedon a failure to set up the device, or if the name is a duplicate.

This is a wrapper around register_netdevice that takes the rtnl semaphoreand expands the device name if you passed a format string toalloc_netdev.

structrtnl_link_stats64*dev_get_stats(structnet_device*dev,structrtnl_link_stats64*storage)

get network device statistics

Parameters

structnet_device*dev

device to get statistics from

structrtnl_link_stats64*storage

place to store stats

Description

Get network statistics from device. Returnstorage.The device driver may provide its own method by settingdev->netdev_ops->get_stats64 or dev->netdev_ops->get_stats;otherwise the internal statistics structure is used.

voiddev_fetch_sw_netstats(structrtnl_link_stats64*s,conststructpcpu_sw_netstats__percpu*netstats)

get per-cpu network device statistics

Parameters

structrtnl_link_stats64*s

place to store stats

conststructpcpu_sw_netstats__percpu*netstats

per-cpu network stats to read from

Description

Read per-cpu network statistics and populate the related fields ins.

voiddev_get_tstats64(structnet_device*dev,structrtnl_link_stats64*s)

ndo_get_stats64 implementation

Parameters

structnet_device*dev

device to get statistics from

structrtnl_link_stats64*s

place to store stats

Description

Populates from dev->stats and dev->tstats. Can be used asndo_get_stats64() callback.

voidnetdev_sw_irq_coalesce_default_on(structnet_device*dev)

enable SW IRQ coalescing by default

Parameters

structnet_device*dev

netdev to enable the IRQ coalescing on

Description

Sets a conservative default for SW IRQ coalescing. Users can usesysfs attributes to override the default values.

structnet_device*alloc_netdev_mqs(intsizeof_priv,constchar*name,unsignedcharname_assign_type,void(*setup)(structnet_device*),unsignedinttxqs,unsignedintrxqs)

allocate network device

Parameters

intsizeof_priv

size of private data to allocate space for

constchar*name

device name format string

unsignedcharname_assign_type

origin of device name

void(*setup)(structnet_device*)

callback to initialize device

unsignedinttxqs

the number of TX subqueues to allocate

unsignedintrxqs

the number of RX subqueues to allocate

Description

Allocates astructnet_device with private data area for driver useand performs basic initialization. Also allocates subqueue structsfor each queue on the device.

voidfree_netdev(structnet_device*dev)

free network device

Parameters

structnet_device*dev

device

Description

This function does the last stage of destroying an allocated deviceinterface. The reference to the device object is released. If thisis the last reference then it will be freed.Must be called in processcontext.

structnet_device*alloc_netdev_dummy(intsizeof_priv)

Allocate and initialize a dummy net device.

Parameters

intsizeof_priv

size of private data to allocate space for

Return

the allocated net_device on success, NULL otherwise

voidsynchronize_net(void)

Synchronize with packet receive processing

Parameters

void

no arguments

Description

Wait for packets currently being received to be done.Does not block later packets from starting.

voidunregister_netdevice_queue(structnet_device*dev,structlist_head*head)

remove device from the kernel

Parameters

structnet_device*dev

device

structlist_head*head

list

Description

This function shuts down a device interface and removes itfrom the kernel tables.If head not NULL, device is queued to be unregistered later.

Callers must hold the rtnl semaphore. You may wantunregister_netdev() instead of this.

voidunregister_netdevice_many(structlist_head*head)

unregister many devices

Parameters

structlist_head*head

list of devices

Note

As most callers use a stack allocated list_head,

we force alist_del() to make sure stack won’t be corrupted later.

voidunregister_netdev(structnet_device*dev)

remove device from the kernel

Parameters

structnet_device*dev

device

Description

This function shuts down a device interface and removes itfrom the kernel tables.

This is just a wrapper for unregister_netdevice that takesthe rtnl semaphore. In general you want to use this and notunregister_netdevice.

netdev_features_tnetdev_increment_features(netdev_features_tall,netdev_features_tone,netdev_features_tmask)

increment feature set by one

Parameters

netdev_features_tall

current feature set

netdev_features_tone

new feature set

netdev_features_tmask

mask feature set

Description

Computes a new feature set after adding a device with feature setone to the master device with current feature setall. Will notenable anything that is off inmask. Returns the new feature set.

voidnetdev_compute_master_upper_features(structnet_device*dev,boolupdate_header)

compute feature from lowers

Parameters

structnet_device*dev

the upper device

boolupdate_header

whether to update upper device’s header_len/headroom/tailroom

Description

Recompute the upper device’s feature based on all lower devices.

inteth_header(structsk_buff*skb,structnet_device*dev,unsignedshorttype,constvoid*daddr,constvoid*saddr,unsignedintlen)

create the Ethernet header

Parameters

structsk_buff*skb

buffer to alter

structnet_device*dev

source device

unsignedshorttype

Ethernet type field

constvoid*daddr

destination address (NULL leave destination address)

constvoid*saddr

source address (NULL use device source address)

unsignedintlen

packet length (<= skb->len)

Description

Set the protocol type. For a packet of type ETH_P_802_3/2 we put the lengthin here instead.

u32eth_get_headlen(conststructnet_device*dev,constvoid*data,u32len)

determine the length of header for an ethernet frame

Parameters

conststructnet_device*dev

pointer to network device

constvoid*data

pointer to start of frame

u32len

total length of frame

Description

Make a best effort attempt to pull the length for all of the headers fora given frame in a linear buffer.

__be16eth_type_trans(structsk_buff*skb,structnet_device*dev)

determine the packet’s protocol ID.

Parameters

structsk_buff*skb

received socket data

structnet_device*dev

receiving network device

Description

The rule here is that weassume 802.3 if the type field is short enough to be a length.This is normal practice and works for any ‘now in use’ protocol.

inteth_header_parse(conststructsk_buff*skb,unsignedchar*haddr)

extract hardware address from packet

Parameters

conststructsk_buff*skb

packet to extract header from

unsignedchar*haddr

destination buffer

inteth_header_cache(conststructneighbour*neigh,structhh_cache*hh,__be16type)

fill cache entry from neighbour

Parameters

conststructneighbour*neigh

source neighbour

structhh_cache*hh

destination cache entry

__be16type

Ethernet type field

Description

Create an Ethernet header template from the neighbour.

voideth_header_cache_update(structhh_cache*hh,conststructnet_device*dev,constunsignedchar*haddr)

update cache entry

Parameters

structhh_cache*hh

destination cache entry

conststructnet_device*dev

network device

constunsignedchar*haddr

new hardware address

Description

Called by Address Resolution module to notify changes in address.

__be16eth_header_parse_protocol(conststructsk_buff*skb)

extract protocol from L2 header

Parameters

conststructsk_buff*skb

packet to extract protocol from

inteth_prepare_mac_addr_change(structnet_device*dev,void*p)

prepare for mac change

Parameters

structnet_device*dev

network device

void*p

socket address

voideth_commit_mac_addr_change(structnet_device*dev,void*p)

commit mac change

Parameters

structnet_device*dev

network device

void*p

socket address

inteth_mac_addr(structnet_device*dev,void*p)

set new Ethernet hardware address

Parameters

structnet_device*dev

network device

void*p

socket address

Description

Change hardware address of device.

This doesn’t change hardware matching, so needs to be overriddenfor most real devices.

voidether_setup(structnet_device*dev)

setup Ethernet network device

Parameters

structnet_device*dev

network device

Description

Fill in the fields of the device structure with Ethernet-generic values.

structnet_device*alloc_etherdev_mqs(intsizeof_priv,unsignedinttxqs,unsignedintrxqs)

Allocates and sets up an Ethernet device

Parameters

intsizeof_priv

Size of additional driver-private structure to be allocatedfor this Ethernet device

unsignedinttxqs

The number of TX queues this device has.

unsignedintrxqs

The number of RX queues this device has.

Description

Fill in the fields of the device structure with Ethernet-genericvalues. Basically does everything except registering the device.

Constructs a new net device, complete with a private data area ofsize (sizeof_priv). A 32-byte (not bit) alignment is enforced forthis private data area.

intplatform_get_ethdev_address(structdevice*dev,structnet_device*netdev)

Set netdev’s MAC address from a given device

Parameters

structdevice*dev

Pointer to the device

structnet_device*netdev

Pointer to netdev to write the address to

Description

Wrapper aroundeth_platform_get_mac_address() which writes the addressdirectly to netdev->dev_addr.

intfwnode_get_mac_address(structfwnode_handle*fwnode,char*addr)

Get the MAC from the firmware node

Parameters

structfwnode_handle*fwnode

Pointer to the firmware node

char*addr

Address of buffer to store the MAC in

Description

Search the firmware node for the best MAC address to use. ‘mac-address’ ischecked first, because that is supposed to contain to “most recent” MACaddress. If that isn’t set, then ‘local-mac-address’ is checked next,because that is the default address. If that isn’t set, then the obsolete‘address’ is checked, just in case we’re using an old device tree.

Note that the ‘address’ property is supposed to contain a virtual address ofthe register set, but some DTS files have redefined that property to be theMAC address.

All-zero MAC addresses are rejected, because those could be properties thatexist in the firmware tables, but were not updated by the firmware. Forexample, the DTS could define ‘mac-address’ and ‘local-mac-address’, withzero MAC addresses. Some older U-Boots only initialized ‘local-mac-address’.In this case, the real MAC is in ‘local-mac-address’, and ‘mac-address’exists but is all zeros.

intdevice_get_mac_address(structdevice*dev,char*addr)

Get the MAC for a given device

Parameters

structdevice*dev

Pointer to the device

char*addr

Address of buffer to store the MAC in

intdevice_get_ethdev_address(structdevice*dev,structnet_device*netdev)

Set netdev’s MAC address from a given device

Parameters

structdevice*dev

Pointer to the device

structnet_device*netdev

Pointer to netdev to write the address to

Description

Wrapper arounddevice_get_mac_address() which writes the addressdirectly to netdev->dev_addr.

voidnetif_carrier_on(structnet_device*dev)

set carrier

Parameters

structnet_device*dev

network device

Description

Device has detected acquisition of carrier.

voidnetif_carrier_off(structnet_device*dev)

clear carrier

Parameters

structnet_device*dev

network device

Description

Device has detected loss of carrier.

voidnetif_carrier_event(structnet_device*dev)

report carrier state event

Parameters

structnet_device*dev

network device

Description

Device has detected a carrier event but the carrier state wasn’t changed.Use in drivers when querying carrier state asynchronously, to avoid missingevents (link flaps) if link recovers before it’s queried.

boolis_link_local_ether_addr(constu8*addr)

Determine if given Ethernet address is link-local

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

true if address is link local reserved addr (01:80:c2:00:00:0X) perIEEE 802.1Q 8.6.3 Frame filtering.

Description

Please note: addr must be aligned to u16.

boolis_zero_ether_addr(constu8*addr)

Determine if give Ethernet address is all zeros.

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

true if the address is all zeroes.

Description

Please note: addr must be aligned to u16.

boolis_multicast_ether_addr(constu8*addr)

Determine if the Ethernet address is a multicast.

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

true if the address is a multicast address.By definition the broadcast address is also a multicast address.

boolis_local_ether_addr(constu8*addr)

Determine if the Ethernet address is locally-assigned one (IEEE 802).

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

true if the address is a local address.

boolis_broadcast_ether_addr(constu8*addr)

Determine if the Ethernet address is broadcast

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

true if the address is the broadcast address.

Description

Please note: addr must be aligned to u16.

boolis_unicast_ether_addr(constu8*addr)

Determine if the Ethernet address is unicast

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

true if the address is a unicast address.

boolis_valid_ether_addr(constu8*addr)

Determine if the given Ethernet address is valid

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Description

Check that the Ethernet address (MAC) is not 00:00:00:00:00:00, is nota multicast address, and is not FF:FF:FF:FF:FF:FF.

Please note: addr must be aligned to u16.

Return

true if the address is valid.

booleth_proto_is_802_3(__be16proto)

Determine if a given Ethertype/length is a protocol

Parameters

__be16proto

Ethertype/length value to be tested

Description

Check that the value from the Ethertype/length field is a valid Ethertype.

Return

true if the valid is an 802.3 supported Ethertype.

voideth_random_addr(u8*addr)

Generate software assigned random Ethernet address

Parameters

u8*addr

Pointer to a six-byte array containing the Ethernet address

Description

Generate a random Ethernet address (MAC) that is not multicastand has the local assigned bit set.

voideth_broadcast_addr(u8*addr)

Assign broadcast address

Parameters

u8*addr

Pointer to a six-byte array containing the Ethernet address

Description

Assign the broadcast address to the given address array.

voideth_zero_addr(u8*addr)

Assign zero address

Parameters

u8*addr

Pointer to a six-byte array containing the Ethernet address

Description

Assign the zero address to the given address array.

voideth_hw_addr_random(structnet_device*dev)

Generate software assigned random Ethernet and set device flag

Parameters

structnet_device*dev

pointer to net_device structure

Description

Generate a random Ethernet address (MAC) to be used by a net deviceand set addr_assign_type so the state can be read by sysfs and beused by userspace.

u32eth_hw_addr_crc(structnetdev_hw_addr*ha)

Calculate CRC from netdev_hw_addr

Parameters

structnetdev_hw_addr*ha

pointer to hardware address

Description

Calculate CRC from a hardware address as basis for filter hashes.

voidether_addr_copy(u8*dst,constu8*src)

Copy an Ethernet address

Parameters

u8*dst

Pointer to a six-byte array Ethernet address destination

constu8*src

Pointer to a six-byte array Ethernet address source

Description

Please note: dst & src must both be aligned to u16.

voideth_hw_addr_set(structnet_device*dev,constu8*addr)

Assign Ethernet address to a net_device

Parameters

structnet_device*dev

pointer to net_device structure

constu8*addr

address to assign

Description

Assign given address to the net_device, addr_assign_type is not changed.

voideth_hw_addr_inherit(structnet_device*dst,structnet_device*src)

Copy dev_addr from another net_device

Parameters

structnet_device*dst

pointer to net_device to copy dev_addr to

structnet_device*src

pointer to net_device to copy dev_addr from

Description

Copy the Ethernet address from one net_device to another along withthe address attributes (addr_assign_type).

boolether_addr_equal(constu8*addr1,constu8*addr2)

Compare two Ethernet addresses

Parameters

constu8*addr1

Pointer to a six-byte array containing the Ethernet address

constu8*addr2

Pointer other six-byte array containing the Ethernet address

Description

Compare two Ethernet addresses, returns true if equal

Please note: addr1 & addr2 must both be aligned to u16.

boolether_addr_equal_64bits(constu8*addr1,constu8*addr2)

Compare two Ethernet addresses

Parameters

constu8*addr1

Pointer to an array of 8 bytes

constu8*addr2

Pointer to an other array of 8 bytes

Description

Compare two Ethernet addresses, returns true if equal, false otherwise.

The function doesn’t need any conditional branches and possibly usesword memory accesses on CPU allowing cheap unaligned memory reads.arrays = { byte1, byte2, byte3, byte4, byte5, byte6, pad1, pad2 }

Please note that alignment of addr1 & addr2 are only guaranteed to be 16 bits.

boolether_addr_equal_unaligned(constu8*addr1,constu8*addr2)

Compare two not u16 aligned Ethernet addresses

Parameters

constu8*addr1

Pointer to a six-byte array containing the Ethernet address

constu8*addr2

Pointer other six-byte array containing the Ethernet address

Description

Compare two Ethernet addresses, returns true if equal

Please note: Use only when any Ethernet address may not be u16 aligned.

boolether_addr_equal_masked(constu8*addr1,constu8*addr2,constu8*mask)

Compare two Ethernet addresses with a mask

Parameters

constu8*addr1

Pointer to a six-byte array containing the 1st Ethernet address

constu8*addr2

Pointer to a six-byte array containing the 2nd Ethernet address

constu8*mask

Pointer to a six-byte array containing the Ethernet address bitmask

Description

Compare two Ethernet addresses with a mask, returns true if for every bitset in the bitmask the equivalent bits in the ethernet addresses are equal.Using a mask with all bits set is a slower ether_addr_equal.

u64ether_addr_to_u64(constu8*addr)

Convert an Ethernet address into a u64 value.

Parameters

constu8*addr

Pointer to a six-byte array containing the Ethernet address

Return

a u64 value of the address

voidu64_to_ether_addr(u64u,u8*addr)

Convert a u64 to an Ethernet address.

Parameters

u64u

u64 to convert to an Ethernet MAC address

u8*addr

Pointer to a six-byte array to contain the Ethernet address

voideth_addr_dec(u8*addr)

Decrement the given MAC address

Parameters

u8*addr

Pointer to a six-byte array containing Ethernet address to decrement

voideth_addr_inc(u8*addr)

Increment the given MAC address.

Parameters

u8*addr

Pointer to a six-byte array containing Ethernet address to increment.

voideth_addr_add(u8*addr,longoffset)

Add (or subtract) an offset to/from the given MAC address.

Parameters

u8*addr

Pointer to a six-byte array containing Ethernet address to increment.

longoffset

Offset to add.

boolis_etherdev_addr(conststructnet_device*dev,constu8addr[6+2])

Tell if given Ethernet address belongs to the device.

Parameters

conststructnet_device*dev

Pointer to a device structure

constu8addr[6+2]

Pointer to a six-byte array containing the Ethernet address

Description

Compare passed address with all addresses of the device. Return true if theaddress if one of the device addresses.

Note that this function callsether_addr_equal_64bits() so take care ofthe right padding.

unsignedlongcompare_ether_header(constvoid*a,constvoid*b)

Compare two Ethernet headers

Parameters

constvoid*a

Pointer to Ethernet header

constvoid*b

Pointer to Ethernet header

Description

Compare two Ethernet headers, returns 0 if equal.This assumes that the network header (i.e., IP header) is 4-bytealigned OR the platform can handle unaligned access. This is thecase for all packets coming into netif_receive_skb or similarentry points.

voideth_hw_addr_gen(structnet_device*dev,constu8*base_addr,unsignedintid)

Generate and assign Ethernet address to a port

Parameters

structnet_device*dev

pointer to port’s net_device structure

constu8*base_addr

base Ethernet address

unsignedintid

offset to add to the base address

Description

Generate a MAC address using a base address and an offset and assign itto a net_device. Commonly used by switch drivers which need to computeaddresses for all their ports. addr_assign_type is not changed.

voideth_skb_pkt_type(structsk_buff*skb,conststructnet_device*dev)

Assign packet type if destination address does not match

Parameters

structsk_buff*skb

Assigned a packet type if address does not matchdev address

conststructnet_device*dev

Network device used to compare packet address against

Description

If the destination MAC address of the packet does not match the networkdevice address, assign an appropriate packet type.

inteth_skb_pad(structsk_buff*skb)

Pad buffer to minimum number of octets for Ethernet frame

Parameters

structsk_buff*skb

Buffer to pad

Description

An Ethernet frame should have a minimum size of 60 bytes. This functiontakes short frames and pads them with zeros up to the 60 byte limit.

structgro_node

structure to support Generic Receive Offload

Definition:

struct gro_node {    unsigned long           bitmask;    struct gro_list         hash[GRO_HASH_BUCKETS];    struct list_head        rx_list;    u32 rx_count;    u32 cached_napi_id;};

Members

bitmask

bitmask to indicate used buckets inhash

hash

hashtable of pending aggregated skbs, separated by flows

rx_list

list of pendingGRO_NORMAL skbs

rx_count

cached current length ofrx_list

cached_napi_id

napi_struct::napi_id cached for hotpath, 0 for standalone

boolnapi_is_scheduled(structnapi_struct*n)

test if NAPI is scheduled

Parameters

structnapi_struct*n

NAPI context

Description

This check is “best-effort”. With no locking implemented,a NAPI can be scheduled or terminate right after this checkand produce not precise results.

NAPI_STATE_SCHED is an internal state, napi_is_scheduledshould not be used normally and napi_schedule should beused instead.

Use only if the driver really needs to check if a NAPIis scheduled for example in the context of delayed timerthat can be skipped if a NAPI is already scheduled.

Return

True if NAPI is scheduled, False otherwise.

boolnapi_schedule(structnapi_struct*n)

schedule NAPI poll

Parameters

structnapi_struct*n

NAPI context

Description

Schedule NAPI poll routine to be called if it is not alreadyrunning.

Return

true if we schedule a NAPI or false if not.Refer tonapi_schedule_prep() for additional reason on whya NAPI might not be scheduled.

voidnapi_schedule_irqoff(structnapi_struct*n)

schedule NAPI poll

Parameters

structnapi_struct*n

NAPI context

Description

Variant ofnapi_schedule(), assuming hard irqs are masked.

boolnapi_complete_done(structnapi_struct*n,intwork_done)

NAPI processing complete

Parameters

structnapi_struct*n

NAPI context

intwork_done

number of packets processed

Description

Mark NAPI processing as complete. Should only be called if poll budgethas not been completely consumed.Prefer overnapi_complete().

Return

false if device should avoid rearming interrupts.

voidnapi_synchronize(conststructnapi_struct*n)

wait until NAPI is not running

Parameters

conststructnapi_struct*n

NAPI context

Description

Wait until NAPI is done being scheduled on this context.Waits till any outstanding processing completes butdoes not disable future activations.

boolnapi_if_scheduled_mark_missed(structnapi_struct*n)

if napi is running, set the NAPIF_STATE_MISSED

Parameters

structnapi_struct*n

NAPI context

Description

If napi is running, set the NAPIF_STATE_MISSED, and return true ifNAPI is scheduled.

enumnetdev_priv_flags

structnet_device priv_flags

Constants

IFF_802_1Q_VLAN

802.1Q VLAN device

IFF_EBRIDGE

Ethernet bridging device

IFF_BONDING

bonding master or slave

IFF_ISATAP

ISATAP interface (RFC4214)

IFF_WAN_HDLC

WAN HDLC device

IFF_XMIT_DST_RELEASE

dev_hard_start_xmit() is allowed torelease skb->dst

IFF_DONT_BRIDGE

disallow bridging this ether dev

IFF_DISABLE_NETPOLL

disable netpoll at run-time

IFF_MACVLAN_PORT

device used as macvlan port

IFF_BRIDGE_PORT

device used as bridge port

IFF_OVS_DATAPATH

device used as Open vSwitch datapath port

IFF_TX_SKB_SHARING

The interface supports sharing skbs on transmit

IFF_UNICAST_FLT

Supports unicast filtering

IFF_TEAM_PORT

device used as team port

IFF_SUPP_NOFCS

device supports sending custom FCS

IFF_LIVE_ADDR_CHANGE

device supports hardware addresschange when it’s running

IFF_MACVLAN

Macvlan device

IFF_XMIT_DST_RELEASE_PERM

IFF_XMIT_DST_RELEASE not taking into accountunderlying stacked devices

IFF_L3MDEV_MASTER

device is an L3 master device

IFF_NO_QUEUE

device can run without qdisc attached

IFF_OPENVSWITCH

device is a Open vSwitch master

IFF_L3MDEV_SLAVE

device is enslaved to an L3 master device

IFF_TEAM

device is a team device

IFF_RXFH_CONFIGURED

device has had Rx Flow indirection table configured

IFF_PHONY_HEADROOM

the headroom value is controlled by an externalentity (i.e. the master device for bridged veth)

IFF_MACSEC

device is a MACsec device

IFF_NO_RX_HANDLER

device doesn’t support the rx_handler hook

IFF_FAILOVER

device is a failover master device

IFF_FAILOVER_SLAVE

device is lower dev of a failover master device

IFF_L3MDEV_RX_HANDLER

only invoke the rx handler of L3 master device

IFF_NO_ADDRCONF

prevent ipv6 addrconf

IFF_TX_SKB_NO_LINEAR

device/driver is capable of xmitting frames withskb_headlen(skb) == 0 (data starts from frag0)

Description

These are thestructnet_device, they are only set internallyby drivers and used in the kernel. These flags are invisible touserspace; this means that the order of these flags can changeduring any kernel release.

You should add bitfield booleans after either net_device::priv_flags(hotpath) or ::threaded (slowpath) instead of extending these flags.

structnet_device

The DEVICE structure.

Definition:

struct net_device {    unsigned long           priv_flags:32;    unsigned long           lltx:1;    unsigned long           netmem_tx:1;    const struct net_device_ops *netdev_ops;    const struct header_ops *header_ops;    struct netdev_queue     *_tx;    netdev_features_t gso_partial_features;    unsigned int            real_num_tx_queues;    unsigned int            gso_max_size;    unsigned int            gso_ipv4_max_size;    u16 gso_max_segs;    s16 num_tc;    unsigned int            mtu;    unsigned short          needed_headroom;    struct netdev_tc_txq    tc_to_txq[TC_MAX_QUEUE];#ifdef CONFIG_XPS;    struct xps_dev_maps  *xps_maps[XPS_MAPS_MAX];#endif;#ifdef CONFIG_NETFILTER_EGRESS;    struct nf_hook_entries  *nf_hooks_egress;#endif;#ifdef CONFIG_NET_XGRESS;    struct bpf_mprog_entry  *tcx_egress;#endif;    union {        struct pcpu_lstats __percpu             *lstats;        struct pcpu_sw_netstats __percpu        *tstats;        struct pcpu_dstats __percpu             *dstats;    };    unsigned long           state;    unsigned int            flags;    unsigned short          hard_header_len;    netdev_features_t features;    struct inet6_dev   *ip6_ptr;    struct bpf_prog    *xdp_prog;    struct list_head        ptype_specific;    int ifindex;    unsigned int            real_num_rx_queues;    struct netdev_rx_queue  *_rx;    unsigned int            gro_max_size;    unsigned int            gro_ipv4_max_size;    rx_handler_func_t *rx_handler;    void *rx_handler_data;    possible_net_t nd_net;#ifdef CONFIG_NETPOLL;    struct netpoll_info        *npinfo;#endif;#ifdef CONFIG_NET_XGRESS;    struct bpf_mprog_entry  *tcx_ingress;#endif;    char name[IFNAMSIZ];    struct netdev_name_node *name_node;    struct dev_ifalias  *ifalias;    unsigned long           mem_end;    unsigned long           mem_start;    unsigned long           base_addr;    struct list_head        dev_list;    struct list_head        napi_list;    struct list_head        unreg_list;    struct list_head        close_list;    struct list_head        ptype_all;    struct {        struct list_head upper;        struct list_head lower;    } adj_list;    xdp_features_t xdp_features;    const struct xdp_metadata_ops *xdp_metadata_ops;    const struct xsk_tx_metadata_ops *xsk_tx_metadata_ops;    unsigned short          gflags;    unsigned short          needed_tailroom;    netdev_features_t hw_features;    netdev_features_t wanted_features;    netdev_features_t vlan_features;    netdev_features_t hw_enc_features;    netdev_features_t mpls_features;    unsigned int            min_mtu;    unsigned int            max_mtu;    unsigned short          type;    unsigned char           min_header_len;    unsigned char           name_assign_type;    int group;    struct net_device_stats stats;    struct net_device_core_stats __percpu *core_stats;    atomic_t carrier_up_count;    atomic_t carrier_down_count;#ifdef CONFIG_WIRELESS_EXT;    const struct iw_handler_def *wireless_handlers;#endif;    const struct ethtool_ops *ethtool_ops;#ifdef CONFIG_NET_L3_MASTER_DEV;    const struct l3mdev_ops *l3mdev_ops;#endif;#if IS_ENABLED(CONFIG_IPV6);    const struct ndisc_ops *ndisc_ops;#endif;#ifdef CONFIG_XFRM_OFFLOAD;    const struct xfrmdev_ops *xfrmdev_ops;#endif;#if IS_ENABLED(CONFIG_TLS_DEVICE);    const struct tlsdev_ops *tlsdev_ops;#endif;    unsigned int            operstate;    unsigned char           link_mode;    unsigned char           if_port;    unsigned char           dma;    unsigned char           perm_addr[MAX_ADDR_LEN];    unsigned char           addr_assign_type;    unsigned char           addr_len;    unsigned char           upper_level;    unsigned char           lower_level;    u8 threaded;    unsigned short          neigh_priv_len;    unsigned short          dev_id;    unsigned short          dev_port;    int irq;    u32 priv_len;    spinlock_t addr_list_lock;    struct netdev_hw_addr_list      uc;    struct netdev_hw_addr_list      mc;    struct netdev_hw_addr_list      dev_addrs;#ifdef CONFIG_SYSFS;    struct kset             *queues_kset;#endif;#ifdef CONFIG_LOCKDEP;    struct list_head        unlink_list;#endif;    unsigned int            promiscuity;    unsigned int            allmulti;    bool uc_promisc;#ifdef CONFIG_LOCKDEP;    unsigned char           nested_level;#endif;    struct in_device   *ip_ptr;    struct hlist_head       fib_nh_head;#if IS_ENABLED(CONFIG_VLAN_8021Q);    struct vlan_info   *vlan_info;#endif;#if IS_ENABLED(CONFIG_NET_DSA);    struct dsa_port         *dsa_ptr;#endif;#if IS_ENABLED(CONFIG_TIPC);    struct tipc_bearer  *tipc_ptr;#endif;#if IS_ENABLED(CONFIG_ATALK);    void *atalk_ptr;#endif;#if IS_ENABLED(CONFIG_AX25);    struct ax25_dev    *ax25_ptr;#endif;#if IS_ENABLED(CONFIG_CFG80211);    struct wireless_dev     *ieee80211_ptr;#endif;#if IS_ENABLED(CONFIG_IEEE802154) || IS_ENABLED(CONFIG_6LOWPAN);    struct wpan_dev         *ieee802154_ptr;#endif;#if IS_ENABLED(CONFIG_MPLS_ROUTING);    struct mpls_dev    *mpls_ptr;#endif;#if IS_ENABLED(CONFIG_MCTP);    struct mctp_dev    *mctp_ptr;#endif;#if IS_ENABLED(CONFIG_INET_PSP);    struct psp_dev     *psp_dev;#endif;    const unsigned char     *dev_addr;    unsigned int            num_rx_queues;#define GRO_LEGACY_MAX_SIZE     65536u;#define GRO_MAX_SIZE            (8 * 65535u);    unsigned int            xdp_zc_max_segs;    struct netdev_queue  *ingress_queue;#ifdef CONFIG_NETFILTER_INGRESS;    struct nf_hook_entries  *nf_hooks_ingress;#endif;    unsigned char           broadcast[MAX_ADDR_LEN];#ifdef CONFIG_RFS_ACCEL;    struct cpu_rmap         *rx_cpu_rmap;#endif;    struct hlist_node       index_hlist;    unsigned int            num_tx_queues;    struct Qdisc       *qdisc;    unsigned int            tx_queue_len;    spinlock_t tx_global_lock;    struct xdp_dev_bulk_queue __percpu *xdp_bulkq;#ifdef CONFIG_NET_SCHED;    unsigned long qdisc_hash[1 << ((4) - 1)];#endif;    struct timer_list       watchdog_timer;    int watchdog_timeo;    u32 proto_down_reason;    struct list_head        todo_list;#ifdef CONFIG_PCPU_DEV_REFCNT;    int __percpu            *pcpu_refcnt;#else;    refcount_t dev_refcnt;#endif;    struct ref_tracker_dir  refcnt_tracker;    struct list_head        link_watch_list;    u8 reg_state;    bool dismantle;    bool moving_ns;    bool rtnl_link_initializing;    bool needs_free_netdev;    void (*priv_destructor)(struct net_device *dev);    void *ml_priv;    enum netdev_ml_priv_type        ml_priv_type;    enum netdev_stat_type           pcpu_stat_type:8;#if IS_ENABLED(CONFIG_GARP);    struct garp_port   *garp_port;#endif;#if IS_ENABLED(CONFIG_MRP);    struct mrp_port    *mrp_port;#endif;#if IS_ENABLED(CONFIG_NET_DROP_MONITOR);    struct dm_hw_stat_delta  *dm_private;#endif;    struct device           dev;    const struct attribute_group *sysfs_groups[5];    const struct attribute_group *sysfs_rx_queue_group;    const struct rtnl_link_ops *rtnl_link_ops;    const struct netdev_stat_ops *stat_ops;    const struct netdev_queue_mgmt_ops *queue_mgmt_ops;#define GSO_MAX_SEGS            65535u;#define GSO_LEGACY_MAX_SIZE     65536u;#define GSO_MAX_SIZE            (8 * GSO_MAX_SEGS);#define TSO_LEGACY_MAX_SIZE     65536;#define TSO_MAX_SIZE            UINT_MAX;    unsigned int            tso_max_size;#define TSO_MAX_SEGS            U16_MAX;    u16 tso_max_segs;#ifdef CONFIG_DCB;    const struct dcbnl_rtnl_ops *dcbnl_ops;#endif;    u8 prio_tc_map[TC_BITMASK + 1];#if IS_ENABLED(CONFIG_FCOE);    unsigned int            fcoe_ddp_xid;#endif;#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO);    struct netprio_map  *priomap;#endif;    struct phy_link_topology        *link_topo;    struct phy_device       *phydev;    struct sfp_bus          *sfp_bus;    struct lock_class_key   *qdisc_tx_busylock;    bool proto_down;    bool irq_affinity_auto;    bool rx_cpu_rmap_auto;    unsigned long           see_all_hwtstamp_requests:1;    unsigned long           change_proto_down:1;    unsigned long           netns_immutable:1;    unsigned long           fcoe_mtu:1;    struct list_head        net_notifier_list;#if IS_ENABLED(CONFIG_MACSEC);    const struct macsec_ops *macsec_ops;#endif;    const struct udp_tunnel_nic_info        *udp_tunnel_nic_info;    struct udp_tunnel_nic   *udp_tunnel_nic;    struct netdev_config    *cfg;    struct netdev_config    *cfg_pending;    struct ethtool_netdev_state *ethtool;    struct bpf_xdp_entity   xdp_state[__MAX_XDP_MODE];    u8 dev_addr_shadow[MAX_ADDR_LEN];    netdevice_tracker linkwatch_dev_tracker;    netdevice_tracker watchdog_dev_tracker;    netdevice_tracker dev_registered_tracker;    struct rtnl_hw_stats64  *offload_xstats_l3;    struct devlink_port     *devlink_port;#if IS_ENABLED(CONFIG_DPLL);    struct dpll_pin    *dpll_pin;#endif;#if IS_ENABLED(CONFIG_PAGE_POOL);    struct hlist_head       page_pools;#endif;    struct dim_irq_moder    *irq_moder;    u64 max_pacing_offload_horizon;    struct napi_config      *napi_config;    u32 num_napi_configs;    u32 napi_defer_hard_irqs;    unsigned long           gro_flush_timeout;    bool up;    bool request_ops_lock;    struct mutex            lock;#if IS_ENABLED(CONFIG_NET_SHAPER);    struct net_shaper_hierarchy *net_shaper_hierarchy;#endif;    struct hlist_head neighbours[NEIGH_NR_TABLES];    struct hwtstamp_provider   *hwprov;    u8 priv[]  ;};

Members

priv_flags

flags invisible to userspace defined as bits, seeenumnetdev_priv_flags for the definitions

lltx

device supports lockless Tx. Deprecated for real HWdrivers. Mainly used by logical interfaces, such asbonding and tunnels

netmem_tx

device support netmem_tx.

netdev_ops

Includes several pointers to callbacks,if one wants to override the ndo_*() functions

header_ops

Includes callbacks for creating,parsing,caching,etcof Layer 2 headers.

_tx

Array of TX queues

gso_partial_features

value(s) from NETIF_F_GSO*

real_num_tx_queues

Number of TX queues currently active in device

gso_max_size

Maximum size of generic segmentation offload

gso_ipv4_max_size

Maximum size of generic segmentation offload,for IPv4.

gso_max_segs

Maximum number of segments that can be passed to theNIC for GSO

num_tc

Number of traffic classes in the net device

mtu

Interface MTU value

needed_headroom

Extra headroom the hardware may need, but not in allcases can this be guaranteed

tc_to_txq

XXX: need comments on this one

xps_maps

XXX: need comments on this one

nf_hooks_egress

netfilter hooks executed for egress packets

tcx_egress

BPF & clsact qdisc specific data for egress processing

{unnamed_union}

anonymous

lstats

Loopback statistics: packets, bytes

tstats

Tunnel statistics: RX/TX packets, RX/TX bytes

dstats

Dummy statistics: RX/TX/drop packets, RX/TX bytes

state

Generic network queuing layer state, see netdev_state_t

flags

Interface flags (a la BSD)

hard_header_len

Maximum hardware header length.

features

Currently active device features

ip6_ptr

IPv6 specific data

xdp_prog

XDP sockets filter program pointer

ptype_specific

Device-specific, protocol-specific packet handlers

ifindex

interface index

real_num_rx_queues

Number of RX queues currently active in device

_rx

Array of RX queues

gro_max_size

Maximum size of aggregated packet in genericreceive offload (GRO)

gro_ipv4_max_size

Maximum size of aggregated packet in genericreceive offload (GRO), for IPv4.

rx_handler

handler for received packets

rx_handler_data

XXX: need comments on this one

nd_net

Network namespace this network device is insideprotected bylock

npinfo

XXX: need comments on this one

tcx_ingress

BPF & clsact qdisc specific data for ingress processing

name

This is the first field of the “visible” part of this structure(i.e. as seen by users in the “Space.c” file). It is the nameof the interface.

name_node

Name hashlist node

ifalias

SNMP alias

mem_end

Shared memory end

mem_start

Shared memory start

base_addr

Device I/O address

dev_list

The global list of network devices

napi_list

List entry used for polling NAPI devices

unreg_list

List entry when we are unregistering thedevice; see the function unregister_netdev

close_list

List entry used when we are closing the device

ptype_all

Device-specific packet handlers for all protocols

adj_list

Directly linked devices, like slaves for bonding

xdp_features

XDP capability supported by the device

xdp_metadata_ops

Includes pointers to XDP metadata callbacks.

xsk_tx_metadata_ops

Includes pointers to AF_XDP TX metadata callbacks.

gflags

Global flags ( kept as legacy )

needed_tailroom

Extra tailroom the hardware may need, but not in allcases can this be guaranteed. Some cases also useLL_MAX_HEADER instead to allocate the skb

hw_features

User-changeable features

wanted_features

User-requested features

vlan_features

Mask of features inheritable by VLAN devices

hw_enc_features

Mask of features inherited by encapsulating devicesThis field indicates what encapsulationoffloads the hardware is capable of doing,and drivers will need to set them appropriately.

mpls_features

Mask of features inheritable by MPLS

min_mtu

Interface Minimum MTU value

max_mtu

Interface Maximum MTU value

type

Interface hardware type

min_header_len

Minimum hardware header length

name_assign_type

network interface name assignment type

group

The group the device belongs to

stats

Statistics struct, which was left as a legacy, usertnl_link_stats64 instead

core_stats

core networking counters,do not use this in drivers

carrier_up_count

Number of times the carrier has been up

carrier_down_count

Number of times the carrier has been down

wireless_handlers

List of functions to handle Wireless Extensions,instead of ioctl,see <net/iw_handler.h> for details.

ethtool_ops

Management operations

l3mdev_ops

Layer 3 master device operations

ndisc_ops

Includes callbacks for different IPv6 neighbourdiscovery handling. Necessary for e.g. 6LoWPAN.

xfrmdev_ops

Transformation offload operations

tlsdev_ops

Transport Layer Security offload operations

operstate

RFC2863 operstate

link_mode

Mapping policy to operstate

if_port

Selectable AUI, TP, ...

dma

DMA channel

perm_addr

Permanent hw address

addr_assign_type

Hw address assignment type

addr_len

Hardware address length

upper_level

Maximum depth level of upper devices.

lower_level

Maximum depth level of lower devices.

threaded

napi threaded state.

neigh_priv_len

Used inneigh_alloc()

dev_id

Used to differentiate devices that sharethe same link layer address

dev_port

Used to differentiate devices that sharethe same function

irq

Device IRQ number

priv_len

Size of the ->priv flexible array

addr_list_lock

XXX: need comments on this one

uc

unicast mac addresses

mc

multicast mac addresses

dev_addrs

list of device hw addresses

queues_kset

Group of all Kobjects in the Tx and RX queues

unlink_list

Asnetif_addr_lock() can be called recursively,keep a list of interfaces to be deleted.

promiscuity

Number of times the NIC is told to work inpromiscuous mode; if it becomes 0 the NIC willexit promiscuous mode

allmulti

Counter, enables or disables allmulticast mode

uc_promisc

Counter that indicates promiscuous modehas been enabled due to the need to listen toadditional unicast addresses in a device thatdoes not implementndo_set_rx_mode()

nested_level

Used as a parameter ofspin_lock_nested() ofdev->addr_list_lock.

ip_ptr

IPv4 specific data

fib_nh_head

nexthops associated with this netdev

vlan_info

VLAN info

dsa_ptr

dsa specific data

tipc_ptr

TIPC specific data

atalk_ptr

AppleTalk link

ax25_ptr

AX.25 specific data

ieee80211_ptr

IEEE 802.11 specific data, assign before registering

ieee802154_ptr

IEEE 802.15.4 low-rate Wireless Personal Area Networkdevice struct

mpls_ptr

mpls_devstructpointer

mctp_ptr

MCTP specific data

psp_dev

PSP crypto device registered for this netdev

dev_addr

Hw address (before bcast,because most packets are unicast)

num_rx_queues

Number of RX queuesallocated atregister_netdev() time

xdp_zc_max_segs

Maximum number of segments supported by AF_XDPzero copy driver

ingress_queue

XXX: need comments on this one

nf_hooks_ingress

netfilter hooks executed for ingress packets

broadcast

hw bcast address

rx_cpu_rmap

CPU reverse-mapping for RX completion interrupts,indexed by RX queue number. Assigned by driver.This must only be set if the ndo_rx_flow_steeroperation is defined

index_hlist

Device index hash chain

num_tx_queues

Number of TX queues allocated atalloc_netdev_mq() time

qdisc

Root qdisc from userspace point of view

tx_queue_len

Max frames per queue allowed

tx_global_lock

XXX: need comments on this one

xdp_bulkq

XDP device bulk queue

qdisc_hash

qdisc hash table

watchdog_timer

List of timers

watchdog_timeo

Represents the timeout that is used bythe watchdog (seedev_watchdog())

proto_down_reason

reason a netdev interface is held down

todo_list

Delayed register/unregister

pcpu_refcnt

Number of references to this device

dev_refcnt

Number of references to this device

refcnt_tracker

Tracker directory for tracked references to this device

link_watch_list

XXX: need comments on this one

reg_state

Register/unregister state machine

dismantle

Device is going to be freed

moving_ns

device is changing netns, protected bylock

rtnl_link_initializing

Device being created, suppress events

needs_free_netdev

Should unregister perform free_netdev?

priv_destructor

Called from unregister

ml_priv

Mid-layer private

ml_priv_type

Mid-layer private type

pcpu_stat_type

Type of device statistics which the core shouldallocate/free: none, lstats, tstats, dstats. nonemeans the driver is handling statistics allocation/freeing internally.

garp_port

GARP

mrp_port

MRP

dm_private

Drop monitor private

dev

Class/net/name entry

sysfs_groups

Space for optional device, statistics and wirelesssysfs groups

sysfs_rx_queue_group

Space for optional per-rx queue attributes

rtnl_link_ops

Rtnl_link_ops

stat_ops

Optional ops for queue-aware statistics

queue_mgmt_ops

Optional ops for queue management

tso_max_size

Device (as in HW) limit on the max TSO request size

tso_max_segs

Device (as in HW) limit on the max TSO segment count

dcbnl_ops

Data Center Bridging netlink ops

prio_tc_map

XXX: need comments on this one

fcoe_ddp_xid

Max exchange id for FCoE LRO by ddp

priomap

XXX: need comments on this one

link_topo

Physical link topology tracking attached PHYs

phydev

Physical device may attach itselffor hardware timestamping

sfp_bus

attachedstructsfp_bus structure.

qdisc_tx_busylock

lockdep class annotating Qdisc->busylock spinlock

proto_down

protocol port state information can be sent to theswitch driver and used to set the phys state of theswitch port.

irq_affinity_auto

driver wants the core to store and re-assign the IRQaffinity. Set bynetif_enable_irq_affinity(), thenthe driver must create a persistent napi bynetif_napi_add_config() and finally bind the napi toIRQ (vianetif_napi_set_irq()).

rx_cpu_rmap_auto

driver wants the core to manage the ARFS rmap.Set by callingnetif_enable_cpu_rmap().

see_all_hwtstamp_requests

device wants to see calls tondo_hwtstamp_set() for all timestamp requestsregardless of source, even if those aren’tHWTSTAMP_SOURCE_NETDEV

change_proto_down

device supports setting carrier via IFLA_PROTO_DOWN

netns_immutable

interface can’t change network namespaces

fcoe_mtu

device supports maximum FCoE MTU, 2158 bytes

net_notifier_list

List of per-net netdev notifier blockthat follow this device when it is movedto another network namespace.

macsec_ops

MACsec offloading ops

udp_tunnel_nic_info

static structure describing the UDP tunneloffload capabilities of the device

udp_tunnel_nic

UDP tunnel offload state

cfg

net_device queue-related configuration

cfg_pending
same ascfg but when device is being actively

reconfigured includes any changes to the configurationrequested by the user, but which may or may not be rejected.

ethtool

ethtool related state

xdp_state

stores info on attached XDP BPF programs

dev_addr_shadow

Copy ofdev_addr to catch direct writes.

linkwatch_dev_tracker

refcount tracker used by linkwatch.

watchdog_dev_tracker

refcount tracker used by watchdog.

dev_registered_tracker

tracker for reference held whileregistered

offload_xstats_l3

L3 HW stats for this netdevice.

devlink_port

Pointer to related devlink port structure.Assigned by a driver before netdev registration usingSET_NETDEV_DEVLINK_PORT macro. This pointer is staticduring the time netdevice is registered.

dpll_pin

Pointer to the SyncE source pin of a DPLL subsystem,where the clock is recovered.

page_pools

page pools created for this netdevice

irq_moder

dim parameters used if IS_ENABLED(CONFIG_DIMLIB).

max_pacing_offload_horizon

max EDT offload horizon in nsec.

napi_config

An array of napi_config structures containing per-NAPIsettings.

num_napi_configs

number of allocated NAPI config structs,always >= max(num_rx_queues, num_tx_queues).

napi_defer_hard_irqs

If not zero, provides a counter that wouldallow to avoid NIC hard IRQ, on busy queues.

gro_flush_timeout

timeout for GRO layer in NAPI

up
copy ofstate’s IFF_UP, but safe to read with justlock.

May report false negatives while the device is being openedor closed (lock does not protect .ndo_open, or .ndo_close).

request_ops_lock

request the core to run allnetdev_ops andethtool_ops under thelock.

lock

netdev-scope lock, protects a small selection of fields.Should always be taken usingnetdev_lock() /netdev_unlock() helpers.Drivers are free to use it for other protection.

For the drivers that implement shaper or queue API, the scopeof this lock is expanded to cover most ndo/queue/ethtool/sysfsoperations. Drivers may opt-in to this behavior by settingrequest_ops_lock.

lock protection mixes with rtnl_lock in multiple ways, fields areeither:

  • simply protected by the instancelock;

  • double protected - writers hold both locks, readers hold either;

  • ops protected - protected by the lock held around the NDOsand other callbacks, that is the instance lock on devices forwhichnetdev_need_ops_lock() returns true, otherwise by rtnl_lock;

  • double ops protected - always protected by rtnl_lock but fordevices for whichnetdev_need_ops_lock() returns true - alsothe instance lock.

Simply protects:

gro_flush_timeout,napi_defer_hard_irqs,napi_list,net_shaper_hierarchy,reg_state,threaded

Double protects:

up,moving_ns,nd_net,xdp_features

Double ops protects:

real_num_rx_queues,real_num_tx_queues

Also protects some fields in:

structnapi_struct,structnetdev_queue,structnetdev_rx_queue

Ordering: take after rtnl_lock.

net_shaper_hierarchy
data tracking the current shaper status

see include/net/net_shapers.h

neighbours

List heads pointing to this device’s neighbours’dev_list, one per address-family.

hwprov

Tracks which PTP performs hardware packet time stamping.

priv

Flexible array containing private data

Description

Actually, this whole structure is a big mistake. It mixes I/Odata with strictly “high-level” data, and it has to know aboutalmost every data structure used in the INET module.

interface address info:

FIXME: cleanupstructnet_device such that network protocol infomoves out.

void*netdev_priv(conststructnet_device*dev)

access network device private data

Parameters

conststructnet_device*dev

network device

Description

Get network device private data

voidnetif_napi_add(structnet_device*dev,structnapi_struct*napi,int(*poll)(structnapi_struct*,int))

initialize a NAPI context

Parameters

structnet_device*dev

network device

structnapi_struct*napi

NAPI context

int(*poll)(structnapi_struct*,int)

polling function

Description

netif_napi_add() must be used to initialize a NAPI context prior to callingany of the other NAPI-related functions.

voidnetif_napi_add_config(structnet_device*dev,structnapi_struct*napi,int(*poll)(structnapi_struct*,int),intindex)

initialize a NAPI context with persistent config

Parameters

structnet_device*dev

network device

structnapi_struct*napi

NAPI context

int(*poll)(structnapi_struct*,int)

polling function

intindex

the NAPI index

voidnetif_napi_add_tx(structnet_device*dev,structnapi_struct*napi,int(*poll)(structnapi_struct*,int))

initialize a NAPI context to be used for Tx only

Parameters

structnet_device*dev

network device

structnapi_struct*napi

NAPI context

int(*poll)(structnapi_struct*,int)

polling function

Description

This variant ofnetif_napi_add() should be used from drivers using NAPIto exclusively poll a TX queue.This will avoid we add it into napi_hash[], thus polluting this hash table.

void__netif_napi_del(structnapi_struct*napi)

remove a NAPI context

Parameters

structnapi_struct*napi

NAPI context

Description

Warning: caller must observe RCU grace period before freeing memorycontainingnapi. Drivers might want to call this helper to combineall the needed RCU grace periods into a single one.

voidnetif_napi_del(structnapi_struct*napi)

remove a NAPI context

Parameters

structnapi_struct*napi

NAPI context

Description

netif_napi_del() removes a NAPI context from the network device NAPI list

voidnetif_start_queue(structnet_device*dev)

allow transmit

Parameters

structnet_device*dev

network device

Description

Allow upper layers to call the device hard_start_xmit routine.

voidnetif_wake_queue(structnet_device*dev)

restart transmit

Parameters

structnet_device*dev

network device

Description

Allow upper layers to call the device hard_start_xmit routine.Used for flow control when transmit resources are available.

voidnetif_stop_queue(structnet_device*dev)

stop transmitted packets

Parameters

structnet_device*dev

network device

Description

Stop upper layers calling the device hard_start_xmit routine.Used for flow control when transmit resources are unavailable.

boolnetif_queue_stopped(conststructnet_device*dev)

test if transmit queue is flowblocked

Parameters

conststructnet_device*dev

network device

Description

Test if transmit queue on device is currently unable to send.

voidnetdev_queue_set_dql_min_limit(structnetdev_queue*dev_queue,unsignedintmin_limit)

set dql minimum limit

Parameters

structnetdev_queue*dev_queue

pointer to transmit queue

unsignedintmin_limit

dql minimum limit

Description

Forcesxmit_more() to return true until the minimum thresholddefined bymin_limit is reached (or until the tx queue isempty). Warning: to be use with care, misuse will impact thelatency.

voidnetdev_txq_bql_enqueue_prefetchw(structnetdev_queue*dev_queue)

prefetch bql data for write

Parameters

structnetdev_queue*dev_queue

pointer to transmit queue

Description

BQL enabled drivers might use this helper in theirndo_start_xmit(),to give appropriate hint to the CPU.

voidnetdev_txq_bql_complete_prefetchw(structnetdev_queue*dev_queue)

prefetch bql data for write

Parameters

structnetdev_queue*dev_queue

pointer to transmit queue

Description

BQL enabled drivers might use this helper in their TX completion path,to give appropriate hint to the CPU.

voidnetdev_tx_sent_queue(structnetdev_queue*dev_queue,unsignedintbytes)

report the number of bytes queued to a given tx queue

Parameters

structnetdev_queue*dev_queue

network device queue

unsignedintbytes

number of bytes queued to the device queue

Description

Report the number of bytes queued for sending/completion to the networkdevice hardware queue.bytes should be a good approximation and shouldexactly matchnetdev_completed_queue()bytes.This is typically called once per packet, fromndo_start_xmit().

voidnetdev_sent_queue(structnet_device*dev,unsignedintbytes)

report the number of bytes queued to hardware

Parameters

structnet_device*dev

network device

unsignedintbytes

number of bytes queued to the hardware device queue

Description

Report the number of bytes queued for sending/completion to the networkdevice hardware queue#0.bytes should be a good approximation and shouldexactly matchnetdev_completed_queue()bytes.This is typically called once per packet, fromndo_start_xmit().

voidnetdev_tx_completed_queue(structnetdev_queue*dev_queue,unsignedintpkts,unsignedintbytes)

report number of packets/bytes at TX completion.

Parameters

structnetdev_queue*dev_queue

network device queue

unsignedintpkts

number of packets (currently ignored)

unsignedintbytes

number of bytes dequeued from the device queue

Description

Must be called at most once per TX completion round (and not perindividual packet), so that BQL can adjust its limits appropriately.

voidnetdev_completed_queue(structnet_device*dev,unsignedintpkts,unsignedintbytes)

report bytes and packets completed by device

Parameters

structnet_device*dev

network device

unsignedintpkts

actual number of packets sent over the medium

unsignedintbytes

actual number of bytes sent over the medium

Description

Report the number of bytes and packets transmitted by the network devicehardware queue over the physical medium,bytes must exactly match thebytes amount passed tonetdev_sent_queue()

voidnetdev_tx_reset_subqueue(conststructnet_device*dev,u32qid)

reset the BQL stats and state of a netdev queue

Parameters

conststructnet_device*dev

network device

u32qid

stack index of the queue to reset

voidnetdev_reset_queue(structnet_device*dev_queue)

reset the packets and bytes count of a network device

Parameters

structnet_device*dev_queue

network device

Description

Reset the bytes and packet count of a network device and clear thesoftware flow control OFF bit for this network device

u16netdev_cap_txqueue(structnet_device*dev,u16queue_index)

check if selected tx queue exceeds device queues

Parameters

structnet_device*dev

network device

u16queue_index

given tx queue index

Description

Returns 0 if given tx queue index >= number of device tx queues,otherwise returns the originally passed tx queue index.

boolnetif_running(conststructnet_device*dev)

test if up

Parameters

conststructnet_device*dev

network device

Description

Test if the device has been brought up.

voidnetif_start_subqueue(structnet_device*dev,u16queue_index)

allow sending packets on subqueue

Parameters

structnet_device*dev

network device

u16queue_index

sub queue index

Description

Start individual transmit queue of a device with multiple transmit queues.

voidnetif_stop_subqueue(structnet_device*dev,u16queue_index)

stop sending packets on subqueue

Parameters

structnet_device*dev

network device

u16queue_index

sub queue index

Description

Stop individual transmit queue of a device with multiple transmit queues.

bool__netif_subqueue_stopped(conststructnet_device*dev,u16queue_index)

test status of subqueue

Parameters

conststructnet_device*dev

network device

u16queue_index

sub queue index

Description

Check individual transmit queue of a device with multiple transmit queues.

boolnetif_subqueue_stopped(conststructnet_device*dev,structsk_buff*skb)

test status of subqueue

Parameters

conststructnet_device*dev

network device

structsk_buff*skb

sub queue buffer pointer

Description

Check individual transmit queue of a device with multiple transmit queues.

voidnetif_wake_subqueue(structnet_device*dev,u16queue_index)

allow sending packets on subqueue

Parameters

structnet_device*dev

network device

u16queue_index

sub queue index

Description

Resume individual transmit queue of a device with multiple transmit queues.

boolnetif_attr_test_mask(unsignedlongj,constunsignedlong*mask,unsignedintnr_bits)

Test a CPU or Rx queue set in a mask

Parameters

unsignedlongj

CPU/Rx queue index

constunsignedlong*mask

bitmask of all cpus/rx queues

unsignedintnr_bits

number of bits in the bitmask

Description

Test if a CPU or Rx queue index is set in a mask of all CPU/Rx queues.

boolnetif_attr_test_online(unsignedlongj,constunsignedlong*online_mask,unsignedintnr_bits)

Test for online CPU/Rx queue

Parameters

unsignedlongj

CPU/Rx queue index

constunsignedlong*online_mask

bitmask for CPUs/Rx queues that are online

unsignedintnr_bits

number of bits in the bitmask

Return

true if a CPU/Rx queue is online.

unsignedintnetif_attrmask_next(intn,constunsignedlong*srcp,unsignedintnr_bits)

get the next CPU/Rx queue in a cpu/Rx queues mask

Parameters

intn

CPU/Rx queue index

constunsignedlong*srcp

the cpumask/Rx queue mask pointer

unsignedintnr_bits

number of bits in the bitmask

Return

next (after n) CPU/Rx queue index in the mask;>= nr_bits if no further CPUs/Rx queues set.

intnetif_attrmask_next_and(intn,constunsignedlong*src1p,constunsignedlong*src2p,unsignedintnr_bits)

get the next CPU/Rx queue in *src1p & *src2p

Parameters

intn

CPU/Rx queue index

constunsignedlong*src1p

the first CPUs/Rx queues mask pointer

constunsignedlong*src2p

the second CPUs/Rx queues mask pointer

unsignedintnr_bits

number of bits in the bitmask

Return

next (after n) CPU/Rx queue index set in both masks;>= nr_bits if no further CPUs/Rx queues set in both.

boolnetif_is_multiqueue(conststructnet_device*dev)

test if device has multiple transmit queues

Parameters

conststructnet_device*dev

network device

Description

Check if device has multiple transmit queues

voiddev_hold(structnet_device*dev)

get reference to device

Parameters

structnet_device*dev

network device

Description

Hold reference to device to keep it from being freed.Try usingnetdev_hold() instead.

voiddev_put(structnet_device*dev)

release reference to device

Parameters

structnet_device*dev

network device

Description

Release reference to device to allow it to be freed.Try usingnetdev_put() instead.

voidlinkwatch_sync_dev(structnet_device*dev)

sync linkwatch for the given device

Parameters

structnet_device*dev

network device to sync linkwatch for

Description

Sync linkwatch for the given device, removing it from thepending work list (if queued).

boolnetif_carrier_ok(conststructnet_device*dev)

test if carrier present

Parameters

conststructnet_device*dev

network device

Description

Check if carrier is present on device

voidnetif_dormant_on(structnet_device*dev)

mark device as dormant.

Parameters

structnet_device*dev

network device

Description

Mark device as dormant (as per RFC2863).

The dormant state indicates that the relevant interface is notactually in a condition to pass packets (i.e., it is not ‘up’) but isin a “pending” state, waiting for some external event. For “on-demand” interfaces, this new state identifies the situation where theinterface is waiting for events to place it in the up state.

voidnetif_dormant_off(structnet_device*dev)

set device as not dormant.

Parameters

structnet_device*dev

network device

Description

Device is not in dormant state.

boolnetif_dormant(conststructnet_device*dev)

test if device is dormant

Parameters

conststructnet_device*dev

network device

Description

Check if device is dormant.

voidnetif_testing_on(structnet_device*dev)

mark device as under test.

Parameters

structnet_device*dev

network device

Description

Mark device as under test (as per RFC2863).

The testing state indicates that some test(s) must be performed onthe interface. After completion, of the test, the interface statewill change to up, dormant, or down, as appropriate.

voidnetif_testing_off(structnet_device*dev)

set device as not under test.

Parameters

structnet_device*dev

network device

Description

Device is not in testing state.

boolnetif_testing(conststructnet_device*dev)

test if device is under test

Parameters

conststructnet_device*dev

network device

Description

Check if device is under test

boolnetif_oper_up(conststructnet_device*dev)

test if device is operational

Parameters

conststructnet_device*dev

network device

Description

Check if carrier is operational

boolnetif_device_present(conststructnet_device*dev)

is device available or removed

Parameters

conststructnet_device*dev

network device

Description

Check if device has not been removed from system.

voidnetif_tx_lock(structnet_device*dev)

grab network device transmit lock

Parameters

structnet_device*dev

network device

Description

Get network device transmit lock

int__dev_uc_sync(structnet_device*dev,int(*sync)(structnet_device*,constunsignedchar*),int(*unsync)(structnet_device*,constunsignedchar*))

Synchronize device’s unicast list

Parameters

structnet_device*dev

device to sync

int(*sync)(structnet_device*,constunsignedchar*)

function to call if address should be added

int(*unsync)(structnet_device*,constunsignedchar*)

function to call if address should be removed

Description

Add newly added addresses to the interface, and releaseaddresses that have been deleted.

void__dev_uc_unsync(structnet_device*dev,int(*unsync)(structnet_device*,constunsignedchar*))

Remove synchronized addresses from device

Parameters

structnet_device*dev

device to sync

int(*unsync)(structnet_device*,constunsignedchar*)

function to call if address should be removed

Description

Remove all addresses that were added to the device bydev_uc_sync().

int__dev_mc_sync(structnet_device*dev,int(*sync)(structnet_device*,constunsignedchar*),int(*unsync)(structnet_device*,constunsignedchar*))

Synchronize device’s multicast list

Parameters

structnet_device*dev

device to sync

int(*sync)(structnet_device*,constunsignedchar*)

function to call if address should be added

int(*unsync)(structnet_device*,constunsignedchar*)

function to call if address should be removed

Description

Add newly added addresses to the interface, and releaseaddresses that have been deleted.

void__dev_mc_unsync(structnet_device*dev,int(*unsync)(structnet_device*,constunsignedchar*))

Remove synchronized addresses from device

Parameters

structnet_device*dev

device to sync

int(*unsync)(structnet_device*,constunsignedchar*)

function to call if address should be removed

Description

Remove all addresses that were added to the device bydev_mc_sync().

structnet_shaper

represents a shaping node on the NIC H/W zeroed field are considered not set.

Definition:

struct net_shaper {    struct net_shaper_handle parent;    struct net_shaper_handle handle;    enum net_shaper_metric metric;    u64 bw_min;    u64 bw_max;    u64 burst;    u32 priority;    u32 weight;};

Members

parent

Unique identifier for the shaper parent, usually implied

handle

Unique identifier for this shaper

metric

Specify if the rate limits refers to PPS or BPS

bw_min

Minimum guaranteed rate for this shaper

bw_max

Maximum peak rate allowed for this shaper

burst

Maximum burst for the peek rate of this shaper

priority

Scheduling priority for this shaper

weight

Scheduling weight for this shaper

structnet_shaper_ops

Operations on device H/W shapers

Definition:

struct net_shaper_ops {    int (*group)(struct net_shaper_binding *binding, int leaves_count, const struct net_shaper *leaves, const struct net_shaper *node, struct netlink_ext_ack *extack);    int (*set)(struct net_shaper_binding *binding, const struct net_shaper *shaper, struct netlink_ext_ack *extack);    int (*delete)(struct net_shaper_binding *binding, const struct net_shaper_handle *handle, struct netlink_ext_ack *extack);    void (*capabilities)(struct net_shaper_binding *binding, enum net_shaper_scope scope, unsigned long *cap);};

Members

group

create the specified shapers scheduling group

Nest theleaves shapers identified under the *node shaper.All the shapers belong to the device specified bybinding.Theleaves arrays size is specified byleaves_count.Create either theleaves and thenode shaper; or if they alreadyexists, links them together in the desired way.leaves scope must be NET_SHAPER_SCOPE_QUEUE.

set

Updates the specified shaper

Updates or creates theshaper on the device specified bybinding.

delete

Removes the specified shaper

Removes the shaper configuration as identified by the givenhandleon the device specified bybinding, restoring the default behavior.

capabilities

get the shaper features supported by the device

Fills the bitmaskcap with the supported capabilities for thespecifiedscope and device specified bybinding.

Description

The operations applies to either net_device and devlink objects.The initial shaping configuration at device initialization is empty:does not constraint the rate in any way.The network core keeps track of the applied user-configuration inthe net_device or devlink structure.The operations are serialized via a per device lock.

Device not supporting any kind of nesting should not provide thegroup operation.

Each shaper is uniquely identified within the device with a ‘handle’comprising the shaper scope and a scope-specific id.

PHY Support

voidphy_print_status(structphy_device*phydev)

Convenience function to print out the current phy status

Parameters

structphy_device*phydev

the phy_device struct

intphy_get_rate_matching(structphy_device*phydev,phy_interface_tiface)

determine if rate matching is supported

Parameters

structphy_device*phydev

The phy device to return rate matching for

phy_interface_tiface

The interface mode to use

Description

This determines the type of rate matching (if any) thatphy supportsusingiface.iface may bePHY_INTERFACE_MODE_NA to determine if anyinterface supports rate matching.

Return

The type of rate matchingphy supports foriface, orRATE_MATCH_NONE.

intphy_restart_aneg(structphy_device*phydev)

restart auto-negotiation

Parameters

structphy_device*phydev

target phy_device struct

Description

Restart the autonegotiation onphydev. Returns >= 0 on success ornegative errno on error.

intphy_aneg_done(structphy_device*phydev)

return auto-negotiation status

Parameters

structphy_device*phydev

target phy_device struct

Description

Return the auto-negotiation status from thisphydevReturns > 0 on success or < 0 on error. 0 means that auto-negotiationis still pending.

boolphy_check_valid(intspeed,intduplex,unsignedlong*features)

check if there is a valid PHY setting which matches speed, duplex, and feature mask

Parameters

intspeed

speed to match

intduplex

duplex to match

unsignedlong*features

A mask of the valid settings

Description

Returns true if there is a valid setting, false otherwise.

intphy_mii_ioctl(structphy_device*phydev,structifreq*ifr,intcmd)

generic PHY MII ioctl interface

Parameters

structphy_device*phydev

the phy_device struct

structifreq*ifr

structifreq for socket ioctl’s

intcmd

ioctl cmd to execute

Description

Note that this function is currently incompatible with thePHYCONTROL layer. It changes registers without regard tocurrent state. Use at own risk.

intphy_do_ioctl(structnet_device*dev,structifreq*ifr,intcmd)

generic ndo_eth_ioctl implementation

Parameters

structnet_device*dev

the net_device struct

structifreq*ifr

structifreq for socket ioctl’s

intcmd

ioctl cmd to execute

intphy_do_ioctl_running(structnet_device*dev,structifreq*ifr,intcmd)

generic ndo_eth_ioctl implementation but test first

Parameters

structnet_device*dev

the net_device struct

structifreq*ifr

structifreq for socket ioctl’s

intcmd

ioctl cmd to execute

Description

Same as phy_do_ioctl, but ensures that net_device is running beforehandling the ioctl.

voidphy_trigger_machine(structphy_device*phydev)

Trigger the state machine to run now

Parameters

structphy_device*phydev

the phy_device struct

intphy_ethtool_get_strings(structphy_device*phydev,u8*data)

Get the statistic counter names

Parameters

structphy_device*phydev

the phy_device struct

u8*data

Where to put the strings

intphy_ethtool_get_sset_count(structphy_device*phydev)

Get the number of statistic counters

Parameters

structphy_device*phydev

the phy_device struct

intphy_ethtool_get_stats(structphy_device*phydev,structethtool_stats*stats,u64*data)

Get the statistic counters

Parameters

structphy_device*phydev

the phy_device struct

structethtool_stats*stats

What counters to get

u64*data

Where to store the counters

intphy_start_cable_test(structphy_device*phydev,structnetlink_ext_ack*extack)

Start a cable test

Parameters

structphy_device*phydev

the phy_device struct

structnetlink_ext_ack*extack

extack for reporting useful error messages

intphy_start_cable_test_tdr(structphy_device*phydev,structnetlink_ext_ack*extack,conststructphy_tdr_config*config)

Start a raw TDR cable test

Parameters

structphy_device*phydev

the phy_device struct

structnetlink_ext_ack*extack

extack for reporting useful error messages

conststructphy_tdr_config*config

Configuration of the test to run

unsignedintphy_inband_caps(structphy_device*phydev,phy_interface_tinterface)

query which in-band signalling modes are supported

Parameters

structphy_device*phydev

a pointer to astructphy_device

phy_interface_tinterface

the interface mode for the PHY

Description

Returns zero if it is unknown what in-band signalling is supported by thePHY (e.g. because the PHY driver doesn’t implement the method.) Otherwise,returns a bit mask of the LINK_INBAND_* values fromenumlink_inband_signalling to describe which inband modes are supportedby the PHY for this interface mode.

intphy_config_inband(structphy_device*phydev,unsignedintmodes)

configure the desired PHY in-band mode

Parameters

structphy_device*phydev

the phy_device struct

unsignedintmodes

in-band modes to configure

Description

disables, enables or enables-with-bypass in-band signalling

between the PHY and host system.

Return

zero on success, or negative errno value.

int_phy_start_aneg(structphy_device*phydev)

start auto-negotiation for this PHY device

Parameters

structphy_device*phydev

the phy_device struct

Description

Sanitizes the settings (if we’re not autonegotiating

them), and then calls the driver’s config_aneg function.If the PHYCONTROL Layer is operating, we change the state toreflect the beginning of Auto-negotiation or forcing.

intphy_start_aneg(structphy_device*phydev)

start auto-negotiation for this PHY device

Parameters

structphy_device*phydev

the phy_device struct

Description

Sanitizes the settings (if we’re not autonegotiating

them), and then calls the driver’s config_aneg function.If the PHYCONTROL Layer is operating, we change the state toreflect the beginning of Auto-negotiation or forcing.

intphy_speed_down(structphy_device*phydev,boolsync)

set speed to lowest speed supported by both link partners

Parameters

structphy_device*phydev

the phy_device struct

boolsync

perform action synchronously

Description

Typically used to save energy when waiting for a WoL packet

WARNING: Setting sync to false may cause the system being unable to suspendin case the PHY generates an interrupt when finishing the autonegotiation.This interrupt may wake up the system immediately after suspend.Therefore use sync = false only if you’re sure it’s safe with the respectivenetwork chip.

intphy_speed_up(structphy_device*phydev)

(re)set advertised speeds to all supported speeds

Parameters

structphy_device*phydev

the phy_device struct

Description

Used to revert the effect of phy_speed_down

voidphy_start_machine(structphy_device*phydev)

start PHY state machine tracking

Parameters

structphy_device*phydev

the phy_device struct

Description

The PHY infrastructure can run a state machine

which tracks whether the PHY is starting up, negotiating,etc. This function starts the delayed workqueue which tracksthe state of the PHY. If you want to maintain your own state machine,do not call this function.

voidphy_error(structphy_device*phydev)

enter ERROR state for this PHY device

Parameters

structphy_device*phydev

target phy_device struct

Description

Moves the PHY to the ERROR state in response to a reador write error, and tells the controller the link is down.Must be called with phydev->lock held.

voidphy_request_interrupt(structphy_device*phydev)

request and enable interrupt for a PHY device

Parameters

structphy_device*phydev

target phy_device struct

Description

Request and enable the interrupt for the given PHY.

If this fails, then we set irq to PHY_POLL.This should only be called with a valid IRQ number.

voidphy_free_interrupt(structphy_device*phydev)

disable and free interrupt for a PHY device

Parameters

structphy_device*phydev

target phy_device struct

Description

Disable and free the interrupt for the given PHY.

This should only be called with a valid IRQ number.

voidphy_stop(structphy_device*phydev)

Bring down the PHY link, and stop checking the status

Parameters

structphy_device*phydev

target phy_device struct

voidphy_start(structphy_device*phydev)

start or restart a PHY device

Parameters

structphy_device*phydev

target phy_device struct

Description

Indicates the attached device’s readiness to

handle PHY-related work. Used during startup to start thePHY, and after a call tophy_stop() to resume operation.Also used to indicate the MDIO bus has cleared an errorcondition.

voidphy_mac_interrupt(structphy_device*phydev)

MAC says the link has changed

Parameters

structphy_device*phydev

phy_devicestructwith changed link

Description

The MAC layer is able to indicate there has been a change in the PHY linkstatus. Trigger the state machine and work a work queue.

intphy_loopback(structphy_device*phydev,boolenable,intspeed)

Configure loopback mode of PHY

Parameters

structphy_device*phydev

target phy_device struct

boolenable

enable or disable loopback mode

intspeed

enable loopback mode with speed

Description

Configure loopback mode of PHY and signal link down and link up if speed ischanging.

Return

0 on success, negative error code on failure.

intphy_eee_tx_clock_stop_capable(structphy_device*phydev)

indicate whether the MAC can stop tx clock

Parameters

structphy_device*phydev

target phy_device struct

Description

Indicate whether the MAC can disable the transmit xMII clock while in LPIstate. Returns 1 if the MAC may stop the transmit clock, 0 if the MAC mustnot stop the transmit clock, or negative error.

intphy_eee_rx_clock_stop(structphy_device*phydev,boolclk_stop_enable)

configure PHY receive clock in LPI

Parameters

structphy_device*phydev

target phy_device struct

boolclk_stop_enable

flag to indicate whether the clock can be stopped

Description

Configure whether the PHY can disable its receive clock during LPI mode,See IEEE 802.3 sections 22.2.2.2, 35.2.2.10, and 45.2.3.1.4.

Return

0 or negative error.

intphy_init_eee(structphy_device*phydev,boolclk_stop_enable)

init and check the EEE feature

Parameters

structphy_device*phydev

target phy_device struct

boolclk_stop_enable

PHY may stop the clock during LPI

Description

it checks if the Energy-Efficient Ethernet (EEE)is supported by looking at the MMD registers 3.20 and 7.60/61and it programs the MMD register 3.0 setting the “Clock stop enable”bit if required.

intphy_get_eee_err(structphy_device*phydev)

report the EEE wake error count

Parameters

structphy_device*phydev

target phy_device struct

Description

it is to report the number of time where the PHYfailed to complete its normal wake sequence.

intphy_ethtool_get_eee(structphy_device*phydev,structethtool_keee*data)

get EEE supported and status

Parameters

structphy_device*phydev

target phy_device struct

structethtool_keee*data

ethtool_keee data

Description

get the current EEE settings, filling in all members ofdata.

intphy_ethtool_set_eee(structphy_device*phydev,structethtool_keee*data)

set EEE supported and status

Parameters

structphy_device*phydev

target phy_device struct

structethtool_keee*data

ethtool_keee data

Description

it is to program the Advertisement EEE register.

intphy_ethtool_set_wol(structphy_device*phydev,structethtool_wolinfo*wol)

Configure Wake On LAN

Parameters

structphy_device*phydev

target phy_device struct

structethtool_wolinfo*wol

Configuration requested

voidphy_ethtool_get_wol(structphy_device*phydev,structethtool_wolinfo*wol)

Get the current Wake On LAN configuration

Parameters

structphy_device*phydev

target phy_device struct

structethtool_wolinfo*wol

Store the current configuration here

intphy_ethtool_nway_reset(structnet_device*ndev)

Restart auto negotiation

Parameters

structnet_device*ndev

Network device to restart autoneg for

intphy_config_interrupt(structphy_device*phydev,boolinterrupts)

configure the PHY device for the requested interrupts

Parameters

structphy_device*phydev

the phy_device struct

boolinterrupts

interrupt flags to configure for thisphydev

Description

Returns 0 on success or < 0 on error.

unsignedintphy_supported_speeds(structphy_device*phy,unsignedint*speeds,unsignedintsize)

return all speeds currently supported by a phy device

Parameters

structphy_device*phy

The phy device to return supported speeds of.

unsignedint*speeds

buffer to store supported speeds in.

unsignedintsize

size of speeds buffer.

Description

Returns the number of supported speeds, and fills the speedsbuffer with the supported speeds. If speeds buffer is too small to containall currently supported speeds, will return as many speeds as can fit.

voidphy_sanitize_settings(structphy_device*phydev)

make sure the PHY is set to supported speed and duplex

Parameters

structphy_device*phydev

the target phy_device struct

Description

Make sure the PHY is set to supported speeds and

duplexes. Drop down by one in this order: 1000/FULL,1000/HALF, 100/FULL, 100/HALF, 10/FULL, 10/HALF.

int__phy_hwtstamp_get(structphy_device*phydev,structkernel_hwtstamp_config*config)

Get hardware timestamping configuration from PHY

Parameters

structphy_device*phydev

the PHY device structure

structkernel_hwtstamp_config*config

structure holding the timestamping configuration

Description

Query the PHY device for its current hardware timestamping configuration.

int__phy_hwtstamp_set(structphy_device*phydev,structkernel_hwtstamp_config*config,structnetlink_ext_ack*extack)

Modify PHY hardware timestamping configuration

Parameters

structphy_device*phydev

the PHY device structure

structkernel_hwtstamp_config*config

structure holding the timestamping configuration

structnetlink_ext_ack*extack

netlink extended ack structure, for error reporting

voidphy_queue_state_machine(structphy_device*phydev,unsignedlongjiffies)

Trigger the state machine to run soon

Parameters

structphy_device*phydev

the phy_device struct

unsignedlongjiffies

Run the state machine after these jiffies

void__phy_ethtool_get_phy_stats(structphy_device*phydev,structethtool_eth_phy_stats*phy_stats,structethtool_phy_stats*phydev_stats)

Retrieve standardized PHY statistics

Parameters

structphy_device*phydev

Pointer to the PHY device

structethtool_eth_phy_stats*phy_stats

Pointer to ethtool_eth_phy_stats structure

structethtool_phy_stats*phydev_stats

Pointer to ethtool_phy_stats structure

Description

Fetches PHY statistics using a kernel-defined interface for consistentdiagnostics. Unlikephy_ethtool_get_stats(), which allows custom stats,this function enforces a standardized format for better interoperability.

void__phy_ethtool_get_link_ext_stats(structphy_device*phydev,structethtool_link_ext_stats*link_stats)

Retrieve extended link statistics for a PHY

Parameters

structphy_device*phydev

Pointer to the PHY device

structethtool_link_ext_stats*link_stats

Pointer to the structure to store extended link statistics

Description

Populates the ethtool_link_ext_stats structure with link down event countsand additional driver-specific link statistics, if available.

intphy_ethtool_get_plca_cfg(structphy_device*phydev,structphy_plca_cfg*plca_cfg)

Get PLCA RS configuration

Parameters

structphy_device*phydev

the phy_device struct

structphy_plca_cfg*plca_cfg

where to store the retrieved configuration

Description

Retrieve the PLCA configuration from the PHY. Return 0 on success or anegative value if an error occurred.

intplca_check_valid(structphy_device*phydev,conststructphy_plca_cfg*plca_cfg,structnetlink_ext_ack*extack)

Check PLCA configuration before enabling

Parameters

structphy_device*phydev

the phy_device struct

conststructphy_plca_cfg*plca_cfg

current PLCA configuration

structnetlink_ext_ack*extack

extack for reporting useful error messages

Description

Checks whether the PLCA and PHY configuration are consistent and it is safeto enable PLCA. Returns 0 on success or a negative value if the PLCA or PHYconfiguration is not consistent.

intphy_ethtool_set_plca_cfg(structphy_device*phydev,conststructphy_plca_cfg*plca_cfg,structnetlink_ext_ack*extack)

Set PLCA RS configuration

Parameters

structphy_device*phydev

the phy_device struct

conststructphy_plca_cfg*plca_cfg

new PLCA configuration to apply

structnetlink_ext_ack*extack

extack for reporting useful error messages

Description

Sets the PLCA configuration in the PHY. Return 0 on success or anegative value if an error occurred.

intphy_ethtool_get_plca_status(structphy_device*phydev,structphy_plca_status*plca_st)

Get PLCA RS status information

Parameters

structphy_device*phydev

the phy_device struct

structphy_plca_status*plca_st

where to store the retrieved status information

Description

Retrieve the PLCA status information from the PHY. Return 0 on success or anegative value if an error occurred.

intphy_check_link_status(structphy_device*phydev)

check link status and set state accordingly

Parameters

structphy_device*phydev

the phy_device struct

Description

Check for link and whether autoneg was triggered / is runningand set state accordingly

voidphy_stop_machine(structphy_device*phydev)

stop the PHY state machine tracking

Parameters

structphy_device*phydev

target phy_device struct

Description

Stops the state machine delayed workqueue, sets the

state to UP (unless it wasn’t up yet). This function must becalled BEFORE phy_detach.

intphy_disable_interrupts(structphy_device*phydev)

Disable the PHY interrupts from the PHY side

Parameters

structphy_device*phydev

target phy_device struct

irqreturn_tphy_interrupt(intirq,void*phy_dat)

PHY interrupt handler

Parameters

intirq

interrupt line

void*phy_dat

phy_device pointer

Description

Handle PHY interrupt

intphy_enable_interrupts(structphy_device*phydev)

Enable the interrupts from the PHY side

Parameters

structphy_device*phydev

target phy_device struct

intphy_update_stats(structphy_device*phydev)

Update PHY device statistics if supported.

Parameters

structphy_device*phydev

Pointer to the PHY device structure.

Description

If the PHY driver provides an update_stats callback, this functioninvokes it to update the PHY statistics. If not, it returns 0.

Return

0 on success, or a negative error code if the callback fails.

unsignedintphy_get_next_update_time(structphy_device*phydev)

Determine the next PHY update time

Parameters

structphy_device*phydev

Pointer to the phy_device structure

Description

This function queries the PHY driver to get the time for the next pollingevent. If the driver does not implement the callback, a default value isused.

Return

The time for the next polling event in jiffies

voidphy_state_machine(structwork_struct*work)

Handle the state machine

Parameters

structwork_struct*work

work_struct that describes the work to be done

voidphy_ethtool_set_eee_noneg(structphy_device*phydev,conststructeee_config*old_cfg)

Adjusts MAC LPI configuration without PHY renegotiation

Parameters

structphy_device*phydev

pointer to the target PHY device structure

conststructeee_config*old_cfg

pointer to the eee_config structure containing the old EEE settings

Description

This function updates the Energy Efficient Ethernet (EEE) configurationfor cases where only the MAC’s Low Power Idle (LPI) configuration changes,without triggering PHY renegotiation. It ensures that the MAC is properlyinformed of the new LPI settings by cycling the link down and up, whichis necessary for the MAC to adopt the new configuration. This adjustmentis done only if there is a change in the tx_lpi_enabled or tx_lpi_timerconfiguration.

constchar*phy_speed_to_str(intspeed)

Return a string representing the PHY link speed

Parameters

intspeed

Speed of the link

constchar*phy_duplex_to_str(unsignedintduplex)

Return string describing the duplex

Parameters

unsignedintduplex

Duplex setting to describe

constchar*phy_rate_matching_to_str(intrate_matching)

Return a string describing the rate matching

Parameters

intrate_matching

Type of rate matching to describe

phy_interface_tphy_fix_phy_mode_for_mac_delays(phy_interface_tinterface,boolmac_txid,boolmac_rxid)

Convenience function for fixing PHY mode based on whether mac adds internal delay

Parameters

phy_interface_tinterface

The current interface mode of the port

boolmac_txid

True if the mac adds internal tx delay

boolmac_rxid

True if the mac adds internal rx delay

Return

fixed PHY mode, or PHY_INTERFACE_MODE_NA if the interface cannot apply the internal delay

intphy_interface_num_ports(phy_interface_tinterface)

Return the number of links that can be carried by a given MAC-PHY physical link. Returns 0 if this is unknown, the number of links else.

Parameters

phy_interface_tinterface

The interface mode we want to get the number of ports

voidphy_set_max_speed(structphy_device*phydev,u32max_speed)

Set the maximum speed the PHY should support

Parameters

structphy_device*phydev

The phy_device struct

u32max_speed

Maximum speed

Description

The PHY might be more capable than the MAC. For example a Fast Ethernetis connected to a 1G PHY. This function allows the MAC to indicate itsmaximum speed, and so limit what the PHY will advertise.

voidphy_resolve_aneg_pause(structphy_device*phydev)

Determine pause autoneg results

Parameters

structphy_device*phydev

The phy_device struct

Description

Once autoneg has completed the local pause settings can beresolved. Determine if pause and asymmetric pause should be usedby the MAC.

voidphy_resolve_aneg_linkmode(structphy_device*phydev)

resolve the advertisements into PHY settings

Parameters

structphy_device*phydev

The phy_device struct

Description

Resolve our and the link partner advertisements into their correspondingspeed and duplex. If full duplex was negotiated, extract the pause modefrom the link partner mask.

int__phy_read_mmd(structphy_device*phydev,intdevad,u32regnum)

Convenience function for reading a register from an MMD on a given PHY.

Parameters

structphy_device*phydev

The phy_device struct

intdevad

The MMD to read from (0..31)

u32regnum

The register on the MMD to read (0..65535)

Description

Same rules as for__phy_read();

intphy_read_mmd(structphy_device*phydev,intdevad,u32regnum)

Convenience function for reading a register from an MMD on a given PHY.

Parameters

structphy_device*phydev

The phy_device struct

intdevad

The MMD to read from

u32regnum

The register on the MMD to read

Description

Same rules as forphy_read();

int__phy_write_mmd(structphy_device*phydev,intdevad,u32regnum,u16val)

Convenience function for writing a register on an MMD on a given PHY.

Parameters

structphy_device*phydev

The phy_device struct

intdevad

The MMD to read from

u32regnum

The register on the MMD to read

u16val

value to write toregnum

Description

Same rules as for__phy_write();

intphy_write_mmd(structphy_device*phydev,intdevad,u32regnum,u16val)

Convenience function for writing a register on an MMD on a given PHY.

Parameters

structphy_device*phydev

The phy_device struct

intdevad

The MMD to read from

u32regnum

The register on the MMD to read

u16val

value to write toregnum

Description

Same rules as forphy_write();

intphy_modify_changed(structphy_device*phydev,u32regnum,u16mask,u16set)

Function for modifying a PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

Returns negative errno, 0 if there was no change, and 1 in case of change

int__phy_modify(structphy_device*phydev,u32regnum,u16mask,u16set)

Convenience function for modifying a PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intphy_modify(structphy_device*phydev,u32regnum,u16mask,u16set)

Convenience function for modifying a given PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

int__phy_modify_mmd_changed(structphy_device*phydev,intdevad,u32regnum,u16mask,u16set)

Function for modifying a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

Description

Unlocked helper function which allows a MMD register to be modified asnew register value = (old register value & ~mask) | set

Returns negative errno, 0 if there was no change, and 1 in case of change

intphy_modify_mmd_changed(structphy_device*phydev,intdevad,u32regnum,u16mask,u16set)

Function for modifying a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

Returns negative errno, 0 if there was no change, and 1 in case of change

int__phy_modify_mmd(structphy_device*phydev,intdevad,u32regnum,u16mask,u16set)

Convenience function for modifying a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intphy_modify_mmd(structphy_device*phydev,intdevad,u32regnum,u16mask,u16set)

Convenience function for modifying a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

new value of bits set in mask to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intphy_save_page(structphy_device*phydev)

take the bus lock and save the current page

Parameters

structphy_device*phydev

a pointer to astructphy_device

Description

Take the MDIO bus lock, and return the current page number. On error,returns a negative errno.phy_restore_page() must always be calledafter this, irrespective of success or failure of this call.

intphy_select_page(structphy_device*phydev,intpage)

take the bus lock, save the current page, and set a page

Parameters

structphy_device*phydev

a pointer to astructphy_device

intpage

desired page

Description

Take the MDIO bus lock to protect against concurrent access, save thecurrent PHY page, and set the current page. On error, returns anegative errno, otherwise returns the previous page number.phy_restore_page() must always be called after this, irrespectiveof success or failure of this call.

intphy_restore_page(structphy_device*phydev,intoldpage,intret)

restore the page register and release the bus lock

Parameters

structphy_device*phydev

a pointer to astructphy_device

intoldpage

the old page, return value fromphy_save_page() orphy_select_page()

intret

operation’s return code

Description

Release the MDIO bus lock, restoringoldpage if it is a valid page.This function propagates the earliest error code from the group ofoperations.

Return

oldpage if it was a negative value, otherwiseret if it was a negative errno value, otherwisephy_write_page()’s negative value if it were in error, otherwiseret.

intphy_read_paged(structphy_device*phydev,intpage,u32regnum)

Convenience function for reading a paged register

Parameters

structphy_device*phydev

a pointer to astructphy_device

intpage

the page for the phy

u32regnum

register number

Description

Same rules as forphy_read().

intphy_write_paged(structphy_device*phydev,intpage,u32regnum,u16val)

Convenience function for writing a paged register

Parameters

structphy_device*phydev

a pointer to astructphy_device

intpage

the page for the phy

u32regnum

register number

u16val

value to write

Description

Same rules as forphy_write().

intphy_modify_paged_changed(structphy_device*phydev,intpage,u32regnum,u16mask,u16set)

Function for modifying a paged register

Parameters

structphy_device*phydev

a pointer to astructphy_device

intpage

the page for the phy

u32regnum

register number

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Description

Returns negative errno, 0 if there was no change, and 1 in case of change

intphy_modify_paged(structphy_device*phydev,intpage,u32regnum,u16mask,u16set)

Convenience function for modifying a paged register

Parameters

structphy_device*phydev

a pointer to astructphy_device

intpage

the page for the phy

u32regnum

register number

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Description

Same rules as forphy_read() andphy_write().

intgenphy_c45_pma_resume(structphy_device*phydev)

wakes up the PMA module

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_pma_suspend(structphy_device*phydev)

suspends the PMA module

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_pma_baset1_setup_master_slave(structphy_device*phydev)

configures forced master/slave role of BaseT1 devices.

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_pma_setup_forced(structphy_device*phydev)

configures a forced speed

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_an_config_aneg(structphy_device*phydev)

configure advertisement registers

Parameters

structphy_device*phydev

target phy_device struct

Description

Configure advertisement registers based on modes set in phydev->advertising

Returns negative errno code on failure, 0 if advertisement didn’t change,or 1 if advertised modes changed.

intgenphy_c45_an_disable_aneg(structphy_device*phydev)

disable auto-negotiation

Parameters

structphy_device*phydev

target phy_device struct

Description

Disable auto-negotiation in the Clause 45 PHY. The link parametersare controlled through the PMA/PMD MMD registers.

Returns zero on success, negative errno code on failure.

intgenphy_c45_restart_aneg(structphy_device*phydev)

Enable and restart auto-negotiation

Parameters

structphy_device*phydev

target phy_device struct

Description

This assumes that the auto-negotiation MMD is present.

Enable and restart auto-negotiation.

intgenphy_c45_check_and_restart_aneg(structphy_device*phydev,boolrestart)

Enable and restart auto-negotiation

Parameters

structphy_device*phydev

target phy_device struct

boolrestart

whether aneg restart is requested

Description

This assumes that the auto-negotiation MMD is present.

Check, and restart auto-negotiation if needed.

intgenphy_c45_aneg_done(structphy_device*phydev)

return auto-negotiation complete status

Parameters

structphy_device*phydev

target phy_device struct

Description

This assumes that the auto-negotiation MMD is present.

Reads the status register from the auto-negotiation MMD, returning:- positive if auto-negotiation is complete- negative errno code on error- zero otherwise

intgenphy_c45_read_link(structphy_device*phydev)

read the overall link status from the MMDs

Parameters

structphy_device*phydev

target phy_device struct

Description

Read the link status from the specified MMDs, and if they all indicatethat the link is up, set phydev->link to 1. If an error is encountered,a negative errno will be returned, otherwise zero.

intgenphy_c45_read_lpa(structphy_device*phydev)

read the link partner advertisement and pause

Parameters

structphy_device*phydev

target phy_device struct

Description

Read the Clause 45 defined base (7.19) and 10G (7.33) status registers,filling in the link partner advertisement, pause and asym_pause membersinphydev. This assumes that the auto-negotiation MMD is present, andthe backplane bit (7.48.0) is clear. Clause 45 PHY drivers are expectedto fill in the remainder of the link partner advert from vendor registers.

intgenphy_c45_pma_baset1_read_master_slave(structphy_device*phydev)

read forced master/slave configuration

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_read_pma(structphy_device*phydev)

read link speed etc from PMA

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_read_mdix(structphy_device*phydev)

read mdix status from PMA

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_read_eee_abilities(structphy_device*phydev)

read supported EEE link modes

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_an_config_eee_aneg(structphy_device*phydev)

configure EEE advertisement

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_c45_pma_baset1_read_abilities(structphy_device*phydev)

read supported baset1 link modes from PMA

Parameters

structphy_device*phydev

target phy_device struct

Description

Read the supported link modes from the extended BASE-T1 ability register

intgenphy_c45_pma_read_ext_abilities(structphy_device*phydev)

read supported link modes from PMA

Parameters

structphy_device*phydev

target phy_device struct

Description

Read the supported link modes from the PMA/PMD extended ability register(Register 1.11).

intgenphy_c45_pma_read_abilities(structphy_device*phydev)

read supported link modes from PMA

Parameters

structphy_device*phydev

target phy_device struct

Description

Read the supported link modes from the PMA Status 2 (1.8) register. If bit1.8.9 is set, the list of supported modes is build using the values in thePMA Extended Abilities (1.11) register, indicating 1000BASET an 10G relatedmodes. If bit 1.11.14 is set, then the list is also extended with the modesin the 2.5G/5G PMA Extended register (1.21), indicating if 2.5GBASET and5GBASET are supported.

intgenphy_c45_read_status(structphy_device*phydev)

read PHY status

Parameters

structphy_device*phydev

target phy_device struct

Description

Reads status from PHY and sets phy_device members accordingly.

intgenphy_c45_config_aneg(structphy_device*phydev)

restart auto-negotiation or forced setup

Parameters

structphy_device*phydev

target phy_device struct

Description

If auto-negotiation is enabled, we configure the

advertising, and then restart auto-negotiation. If it is notenabled, then we force a configuration.

intgenphy_c45_fast_retrain(structphy_device*phydev,boolenable)

configure fast retrain registers

Parameters

structphy_device*phydev

target phy_device struct

boolenable

enable fast retrain or not

Description

If fast-retrain is enabled, we configure PHY as

advertising fast retrain capable and THP Bypass Request, thenenable fast retrain. If it is not enabled, we configure fastretrain disabled.

intgenphy_c45_plca_get_cfg(structphy_device*phydev,structphy_plca_cfg*plca_cfg)

get PLCA configuration from standard registers

Parameters

structphy_device*phydev

target phy_device struct

structphy_plca_cfg*plca_cfg

output structure to store the PLCA configuration

Description

if the PHY complies to the Open Alliance TC14 10BASE-T1S PLCA

Management Registers specifications, this function can be used to retrievethe current PLCA configuration from the standard registers in MMD 31.

intgenphy_c45_plca_set_cfg(structphy_device*phydev,conststructphy_plca_cfg*plca_cfg)

set PLCA configuration using standard registers

Parameters

structphy_device*phydev

target phy_device struct

conststructphy_plca_cfg*plca_cfg

structure containing the PLCA configuration. Fields set to -1 arenot to be changed.

Description

if the PHY complies to the Open Alliance TC14 10BASE-T1S PLCA

Management Registers specifications, this function can be used to modifythe PLCA configuration using the standard registers in MMD 31.

intgenphy_c45_plca_get_status(structphy_device*phydev,structphy_plca_status*plca_st)

get PLCA status from standard registers

Parameters

structphy_device*phydev

target phy_device struct

structphy_plca_status*plca_st

output structure to store the PLCA status

Description

if the PHY complies to the Open Alliance TC14 10BASE-T1S PLCA

Management Registers specifications, this function can be used to retrievethe current PLCA status information from the standard registers in MMD 31.

intgenphy_c45_eee_is_active(structphy_device*phydev,unsignedlong*lp)

get EEE status

Parameters

structphy_device*phydev

target phy_device struct

unsignedlong*lp

variable to store LP advertised linkmodes

Description

this function will read link partner PHY advertisementand compare it to local advertisement to return current EEE state.

intgenphy_c45_ethtool_get_eee(structphy_device*phydev,structethtool_keee*data)

get EEE supported and status

Parameters

structphy_device*phydev

target phy_device struct

structethtool_keee*data

ethtool_keee data

Description

it reports the Supported/Advertisement/LP Advertisementcapabilities.

intgenphy_c45_ethtool_set_eee(structphy_device*phydev,structethtool_keee*data)

set EEE supported and status

Parameters

structphy_device*phydev

target phy_device struct

structethtool_keee*data

ethtool_keee data

Description

sets the Supported/Advertisement/LP Advertisementcapabilities. If eee_enabled is false, no links modes areadvertised, but the previously advertised link modes areretained. This allows EEE to be enabled/disabled in anon-destructive way.Returns either error code, 0 if there was no change, or positivevalue if there was a change which triggered auto-neg.

intgenphy_c45_oatc14_cable_test_get_status(structphy_device*phydev,bool*finished)

Get status of OATC14 10Base-T1S PHY cable test.

Parameters

structphy_device*phydev

pointer to the PHY device structure

bool*finished

pointer to a boolean set true if the test is complete

Description

Retrieves the current status of the OATC14 10Base-T1S PHY cable test.This function reads the OATC14 HDD register to determine whether the testresults are valid and whether the test has finished.

If the test is complete, the function reports the cable test result viathe ethtool cable test interface usingethnl_cable_test_result(), and thenclears the test control bit in the PHY register to reset the test state.

Return

0 on success, or a negative error code on failure (e.g. registerread/write error).

intgenphy_c45_oatc14_cable_test_start(structphy_device*phydev)

Start a cable test on an OATC14 10Base-T1S PHY.

Parameters

structphy_device*phydev

Pointer to the PHY device structure

Description

This function initiates a cable diagnostic test on a Clause 45 OATC1410Base-T1S capable PHY device. It first reads the PHY’s advanced diagnosticcapability register to check if High Definition Diagnostics (HDD) mode issupported. If the PHY does not report HDD capability, cable testing is notsupported and the function returns -EOPNOTSUPP.

For PHYs that support HDD, the function sets the appropriate control bits inthe OATC14_HDD register to enable and start the cable diagnostic test.

Return

  • 0 on success

  • -EOPNOTSUPP if the PHY does not support HDD capability

  • A negative error code on I/O or register access failures

intgenphy_c45_oatc14_get_sqi_max(structphy_device*phydev)

Get maximum supported SQI or SQI+ level of OATC14 10Base-T1S PHY

Parameters

structphy_device*phydev

pointer to the PHY device structure

Description

This function returns the maximum supported Signal Quality Indicator (SQI) orSQI+ level. The SQI capability is updated on first invocation if it has notalready been updated.

Return

  • Maximum SQI/SQI+ level supported

  • Negative errno on capability read failure

intgenphy_c45_oatc14_get_sqi(structphy_device*phydev)

Get Signal Quality Indicator (SQI) from an OATC14 10Base-T1S PHY

Parameters

structphy_device*phydev

pointer to the PHY device structure

Description

This function reads the SQI+ or SQI value from an OATC14-compatible10Base-T1S PHY. If SQI+ capability is supported, the function returns theextended SQI+ value; otherwise, it returns the basic SQI value. The SQIcapability is updated on first invocation if it has not already been updated.

Return

  • SQI/SQI+ value on success

  • Negative errno on read failure

enumphy_interface_t

Interface Mode definitions

Constants

PHY_INTERFACE_MODE_NA

Not Applicable - don’t touch

PHY_INTERFACE_MODE_INTERNAL

No interface, MAC and PHY combined

PHY_INTERFACE_MODE_MII

Media-independent interface

PHY_INTERFACE_MODE_GMII

Gigabit media-independent interface

PHY_INTERFACE_MODE_SGMII

Serial gigabit media-independent interface

PHY_INTERFACE_MODE_TBI

Ten Bit Interface

PHY_INTERFACE_MODE_REVMII

Reverse Media Independent Interface

PHY_INTERFACE_MODE_RMII

Reduced Media Independent Interface

PHY_INTERFACE_MODE_REVRMII

Reduced Media Independent Interface in PHY role

PHY_INTERFACE_MODE_RGMII

Reduced gigabit media-independent interface

PHY_INTERFACE_MODE_RGMII_ID

RGMII with Internal RX+TX delay

PHY_INTERFACE_MODE_RGMII_RXID

RGMII with Internal RX delay

PHY_INTERFACE_MODE_RGMII_TXID

RGMII with Internal TX delay

PHY_INTERFACE_MODE_RTBI

Reduced TBI

PHY_INTERFACE_MODE_SMII

Serial MII

PHY_INTERFACE_MODE_XGMII

10 gigabit media-independent interface

PHY_INTERFACE_MODE_XLGMII

40 gigabit media-independent interface

PHY_INTERFACE_MODE_MOCA

Multimedia over Coax

PHY_INTERFACE_MODE_PSGMII

Penta SGMII

PHY_INTERFACE_MODE_QSGMII

Quad SGMII

PHY_INTERFACE_MODE_TRGMII

Turbo RGMII

PHY_INTERFACE_MODE_100BASEX

100 BaseX

PHY_INTERFACE_MODE_1000BASEX

1000 BaseX

PHY_INTERFACE_MODE_2500BASEX

2500 BaseX

PHY_INTERFACE_MODE_5GBASER

5G BaseR

PHY_INTERFACE_MODE_RXAUI

Reduced XAUI

PHY_INTERFACE_MODE_XAUI

10 Gigabit Attachment Unit Interface

PHY_INTERFACE_MODE_10GBASER

10G BaseR

PHY_INTERFACE_MODE_25GBASER

25G BaseR

PHY_INTERFACE_MODE_USXGMII

Universal Serial 10GE MII

PHY_INTERFACE_MODE_10GKR

10GBASE-KR - with Clause 73 AN

PHY_INTERFACE_MODE_QUSGMII

Quad Universal SGMII

PHY_INTERFACE_MODE_1000BASEKX

1000Base-KX - with Clause 73 AN

PHY_INTERFACE_MODE_10G_QXGMII

10G-QXGMII - 4 ports over 10G USXGMII

PHY_INTERFACE_MODE_50GBASER

50GBase-R - with Clause 134 FEC

PHY_INTERFACE_MODE_LAUI

50 Gigabit Attachment Unit Interface

PHY_INTERFACE_MODE_100GBASEP

100GBase-P - with Clause 134 FEC

PHY_INTERFACE_MODE_MIILITE

MII-Lite - MII without RXER TXER CRS COL

PHY_INTERFACE_MODE_MAX

Book keeping

Description

Describes the interface between the MAC and PHY.

constchar*phy_modes(phy_interface_tinterface)

map phy_interface_tenumto device tree binding of phy-mode

Parameters

phy_interface_tinterface

enumphy_interface_t value

Description

maps enumphy_interface_t defined in this fileinto the device tree binding of ‘phy-mode’, so that Ethernetdevice driver can get PHY interface from device tree.

longrgmii_clock(intspeed)

map link speed to the clock rate

Parameters

intspeed

link speed value

Description

maps RGMII supported link speeds into the clock rates.This can also be used for MII, GMII, and RMII interface modes as theclock rates are identical, but the caller must be aware that errorsfor unsupported clock rates will not be signalled.

Return

clock rate or negative errno

structmdio_bus_stats

Statistics counters for MDIO busses

Definition:

struct mdio_bus_stats {    u64_stats_t transfers;    u64_stats_t errors;    u64_stats_t writes;    u64_stats_t reads;    struct u64_stats_sync syncp;};

Members

transfers

Total number of transfers, i.e.writes +reads

errors

Number of MDIO transfers that returned an error

writes

Number of write transfers

reads

Number of read transfers

syncp

Synchronisation for incrementing statistics

structmii_bus

Represents an MDIO bus

Definition:

struct mii_bus {    struct module *owner;    const char *name;    char id[MII_BUS_ID_SIZE];    void *priv;    int (*read)(struct mii_bus *bus, int addr, int regnum);    int (*write)(struct mii_bus *bus, int addr, int regnum, u16 val);    int (*read_c45)(struct mii_bus *bus, int addr, int devnum, int regnum);    int (*write_c45)(struct mii_bus *bus, int addr, int devnum, int regnum, u16 val);    int (*reset)(struct mii_bus *bus);    struct mdio_bus_stats stats[PHY_MAX_ADDR];    struct mutex mdio_lock;    struct device *parent;    enum {        MDIOBUS_ALLOCATED = 1,        MDIOBUS_REGISTERED,        MDIOBUS_UNREGISTERED,        MDIOBUS_RELEASED,    } state;    struct device dev;    struct mdio_device *mdio_map[PHY_MAX_ADDR];    u32 phy_mask;    u32 phy_ignore_ta_mask;    int irq[PHY_MAX_ADDR];    int reset_delay_us;    int reset_post_delay_us;    struct gpio_desc *reset_gpiod;    struct mutex shared_lock;#if IS_ENABLED(CONFIG_PHY_PACKAGE);    struct phy_package_shared *shared[PHY_MAX_ADDR];#endif;};

Members

owner

Who owns this device

name

User friendly name for this MDIO device, or driver name

id

Unique identifier for this bus, typical from bus hierarchy

priv

Driver private data

read

Perform a read transfer on the bus

write

Perform a write transfer on the bus

read_c45

Perform a C45 read transfer on the bus

write_c45

Perform a C45 write transfer on the bus

reset

Perform a reset of the bus

stats

Statistic counters per device on the bus

mdio_lock

A lock to ensure that only one thing can read/writethe MDIO bus at a time

parent

Parent device of this bus

state

State of bus structure

dev

Kernel device representation

mdio_map

list of all MDIO devices on bus

phy_mask

PHY addresses to be ignored when probing

phy_ignore_ta_mask

PHY addresses to ignore the TA/read failure

irq

An array of interrupts, each PHY’s interrupt at the indexmatching its address

reset_delay_us

GPIO reset pulse width in microseconds

reset_post_delay_us

GPIO reset deassert delay in microseconds

reset_gpiod

Reset GPIO descriptor pointer

shared_lock

protect access to the shared element

shared

shared state across different PHYs

Description

The Bus class for PHYs. Devices which provide access toPHYs should register using this structure

structmii_bus*mdiobus_alloc(void)

Allocate an MDIO bus structure

Parameters

void

no arguments

Description

The internal state of the MDIO bus will be set of MDIOBUS_ALLOCATED readyfor the driver to register the bus.

enumphy_state

PHY state machine states:

Constants

PHY_DOWN

PHY device and driver are not ready for anything. probeshould be called if and only if the PHY is in this state,given that the PHY device exists.- PHY driver probe function will set the state toPHY_READY

PHY_READY

PHY is ready to send and receive packets, but thecontroller is not. By default, PHYs which do not implementprobe will be set to this state byphy_probe().- start will set the state to UP

PHY_HALTED

PHY is up, but no polling or interrupts are done.- phy_start moves toPHY_UP

PHY_ERROR

PHY is up, but is in an error state.- phy_stop moves toPHY_HALTED

PHY_UP

The PHY and attached device are ready to do work.Interrupts should be started here.- timer moves toPHY_NOLINK orPHY_RUNNING

PHY_RUNNING

PHY is currently up, running, and possibly sendingand/or receiving packets- irq or timer will setPHY_NOLINK if link goes down- phy_stop moves toPHY_HALTED

PHY_NOLINK

PHY is up, but not currently plugged in.- irq or timer will setPHY_RUNNING if link comes back- phy_stop moves toPHY_HALTED

PHY_CABLETEST

PHY is performing a cable test. Packet reception/sendingis not expected to work, carrier will be indicated as down. PHY will bepoll once per second, or on interrupt for it current state.Once complete, move to UP to restart the PHY.- phy_stop aborts the running test and moves toPHY_HALTED

structphy_c45_device_ids

802.3-c45 Device Identifiers

Definition:

struct phy_c45_device_ids {    u32 devices_in_package;    u32 mmds_present;    u32 device_ids[MDIO_MMD_NUM];};

Members

devices_in_package

IEEE 802.3 devices in package register value.

mmds_present

bit vector of MMDs present.

device_ids

The device identifier for each present device.

structphy_oatc14_sqi_capability

SQI capability information for OATC14 10Base-T1S PHY

Definition:

struct phy_oatc14_sqi_capability {    bool updated;    int sqi_max;    u8 sqiplus_bits;};

Members

updated

Indicates whether the SQI capability fields have been updated.

sqi_max

Maximum supported Signal Quality Indicator (SQI) level reported bythe PHY.

sqiplus_bits

Bits for SQI+ levels supported by the PHY.0 - SQI+ is not supported3 - SQI+ is supported, using 3 bits (8 levels)4 - SQI+ is supported, using 4 bits (16 levels)5 - SQI+ is supported, using 5 bits (32 levels)6 - SQI+ is supported, using 6 bits (64 levels)7 - SQI+ is supported, using 7 bits (128 levels)8 - SQI+ is supported, using 8 bits (256 levels)

Description

This structure is used by the OATC14 10Base-T1S PHY driver to store the SQIand SQI+ capability information retrieved from the PHY.

structphy_device

An instance of a PHY

Definition:

struct phy_device {    struct mdio_device mdio;    const struct phy_driver *drv;    struct device_link *devlink;    u32 phyindex;    u32 phy_id;    struct phy_c45_device_ids c45_ids;    unsigned is_c45:1;    unsigned is_internal:1;    unsigned is_pseudo_fixed_link:1;    unsigned is_gigabit_capable:1;    unsigned has_fixups:1;    unsigned suspended:1;    unsigned suspended_by_mdio_bus:1;    unsigned sysfs_links:1;    unsigned loopback_enabled:1;    unsigned downshifted_rate:1;    unsigned is_on_sfp_module:1;    unsigned mac_managed_pm:1;    unsigned wol_enabled:1;    unsigned is_genphy_driven:1;    unsigned autoneg:1;    unsigned link:1;    unsigned autoneg_complete:1;    bool pause:1;    bool asym_pause:1;    unsigned interrupts:1;    unsigned irq_suspended:1;    unsigned irq_rerun:1;    unsigned default_timestamp:1;    int rate_matching;    enum phy_state state;    u32 dev_flags;    phy_interface_t interface;    unsigned long possible_interfaces[BITS_TO_LONGS(PHY_INTERFACE_MODE_MAX)];    int speed;    int duplex;    int port;    u8 master_slave_get;    u8 master_slave_set;    u8 master_slave_state;    unsigned long supported[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long advertising[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long lp_advertising[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long adv_old[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long supported_eee[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long advertising_eee[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long eee_disabled_modes[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    bool enable_tx_lpi;    bool eee_active;    struct eee_config eee_cfg;    unsigned long host_interfaces[BITS_TO_LONGS(PHY_INTERFACE_MODE_MAX)];#ifdef CONFIG_LED_TRIGGER_PHY;    struct phy_led_trigger *phy_led_triggers;    unsigned int phy_num_led_triggers;    struct phy_led_trigger *last_triggered;    struct phy_led_trigger *led_link_trigger;#endif;    struct list_head leds;    int irq;    void *priv;#if IS_ENABLED(CONFIG_PHY_PACKAGE);    struct phy_package_shared *shared;#endif;    struct sk_buff *skb;    void *ehdr;    struct nlattr *nest;    struct delayed_work state_queue;    struct mutex lock;    bool sfp_bus_attached;    struct sfp_bus *sfp_bus;    struct phylink *phylink;    struct net_device *attached_dev;    struct mii_timestamper *mii_ts;    struct pse_control *psec;    u8 mdix;    u8 mdix_ctrl;    int pma_extable;    unsigned int link_down_events;    void (*phy_link_change)(struct phy_device *phydev, bool up);    void (*adjust_link)(struct net_device *dev);#if IS_ENABLED(CONFIG_MACSEC);    const struct macsec_ops *macsec_ops;#endif;    struct phy_oatc14_sqi_capability oatc14_sqi_capability;};

Members

mdio

MDIO bus this PHY is on

drv

Pointer to the driver for this PHY instance

devlink

Create a link between phy dev and mac dev, if the external phyused by current mac interface is managed by another mac interface.

phyindex

Unique id across the phy’s parent tree of phys to address the PHYfrom userspace, similar to ifindex. A zero index means the PHYwasn’t assigned an id yet.

phy_id

UID for this device found during discovery

c45_ids

802.3-c45 Device Identifiers if is_c45.

is_c45

Set to true if this PHY uses clause 45 addressing.

is_internal

Set to true if this PHY is internal to a MAC.

is_pseudo_fixed_link

Set to true if this PHY is an Ethernet switch, etc.

is_gigabit_capable

Set to true if PHY supports 1000Mbps

has_fixups

Set to true if this PHY has fixups/quirks.

suspended

Set to true if this PHY has been suspended successfully.

suspended_by_mdio_bus

Set to true if this PHY was suspended by MDIO bus.

sysfs_links

Internal boolean tracking sysfs symbolic links setup/removal.

loopback_enabled

Set true if this PHY has been loopbacked successfully.

downshifted_rate

Set true if link speed has been downshifted.

is_on_sfp_module

Set true if PHY is located on an SFP module.

mac_managed_pm

Set true if MAC driver takes of suspending/resuming PHY

wol_enabled

Set to true if the PHY or the attached MAC have Wake-on-LANenabled.

is_genphy_driven

PHY is driven by one of the generic PHY drivers

autoneg

Flag autoneg being used

link

Current link state

autoneg_complete

Flag auto negotiation of the link has completed

pause

Current pause

asym_pause

Current asymmetric pause

interrupts

Flag interrupts have been enabled

irq_suspended

Flag indicating PHY is suspended and therefore interrupthandling shall be postponed until PHY has resumed

irq_rerun

Flag indicating interrupts occurred while PHY was suspended,requiring a rerun of the interrupt handler after resume

default_timestamp

Flag indicating whether we are using the phytimestamp as the default one

rate_matching

Current rate matching mode

state

State of the PHY for management purposes

dev_flags

Device-specific flags used by the PHY driver.

interface

enumphy_interface_t value

possible_interfaces

bitmap if interface modes that the attached PHYwill switch between depending on media speed.

speed

Current link speed

duplex

Current duplex

port

Current port

master_slave_get

Current master/slave advertisement

master_slave_set

User requested master/slave configuration

master_slave_state

Current master/slave configuration

supported

Combined MAC/PHY supported linkmodes

advertising

Currently advertised linkmodes

lp_advertising

Current link partner advertised linkmodes

adv_old

Saved advertised while power saving for WoL

supported_eee

supported PHY EEE linkmodes

advertising_eee

Currently advertised EEE linkmodes

eee_disabled_modes

Energy efficient ethernet modes not to be advertised

enable_tx_lpi

When True, MAC should transmit LPI to PHY

eee_active

phylib private state, indicating that EEE has been negotiated

eee_cfg

User configuration of EEE

host_interfaces

PHY interface modes supported by host

phy_led_triggers

Array of LED triggers

phy_num_led_triggers

Number of triggers inphy_led_triggers

last_triggered

last LED trigger for link speed

led_link_trigger

LED trigger for link up/down

leds

list of PHY LED structures

irq

IRQ number of the PHY’s interrupt (-1 if none)

priv

Pointer to driver private data

shared

Pointer to private data shared by phys in one package

skb

Netlink message for cable diagnostics

ehdr

nNtlink header for cable diagnostics

nest

Netlink nest used for cable diagnostics

state_queue

Work queue for state machine

lock

Mutex for serialization access to PHY

sfp_bus_attached

Flag indicating whether the SFP bus has been attached

sfp_bus

SFP bus attached to this PHY’s fiber port

phylink

Pointer to phylink instance for this PHY

attached_dev

The attached enet driver’s device instance ptr

mii_ts

Pointer to time stamper callbacks

psec

Pointer to Power Sourcing Equipment control struct

mdix

Current crossover

mdix_ctrl

User setting of crossover

pma_extable

Cached value of PMA/PMD Extended Abilities Register

link_down_events

Number of times link was lost

phy_link_change

Callback for phylink for notification of link change

adjust_link

Callback for the enet controller to respond to changes: in thelink state.

macsec_ops

MACsec offloading ops.

oatc14_sqi_capability

SQI capability information for OATC14 10Base-T1S PHY

Description

  • Bits [15:0] are free to use by the PHY driver to communicatedriver specific behavior.

  • Bits [23:16] are currently reserved for future use.

  • Bits [31:24] are reserved for defining genericPHY driver behavior.

interrupts currently only supports enabled or disabled,but could be changed in the future to support enablingand disabling specific interrupts

Contains some infrastructure for polling and interrupthandling, as well as handling shifts in PHY hardware state

structphy_tdr_config

Configuration of a TDR raw test

Definition:

struct phy_tdr_config {    u32 first;    u32 last;    u32 step;    s8 pair;};

Members

first

Distance for first data collection point

last

Distance for last data collection point

step

Step between data collection points

pair

Bitmap of cable pairs to collect data for

Description

A structure containing possible configuration parametersfor a TDR cable test. The driver does not need to implementall the parameters, but should report what is actually used.All distances are in centimeters.

enumlink_inband_signalling

in-band signalling modes that are supported

Constants

LINK_INBAND_DISABLE

in-band signalling can be disabled

LINK_INBAND_ENABLE

in-band signalling can be enabled without bypass

LINK_INBAND_BYPASS

in-band signalling can be enabled with bypass

Description

The possible and required bits can only be used if the valid bit is set.If possible is clear, that means inband signalling can not be used.Required is only valid when possible is set, and means that inbandsignalling must be used.

structphy_plca_cfg

Configuration of the PLCA (Physical Layer Collision Avoidance) Reconciliation Sublayer.

Definition:

struct phy_plca_cfg {    int version;    int enabled;    int node_id;    int node_cnt;    int to_tmr;    int burst_cnt;    int burst_tmr;};

Members

version

read-only PLCA register map version. -1 = not available. Ignoredwhen setting the configuration. Format is the same as reported by the PLCAIDVER register (31.CA00). -1 = not available.

enabled

PLCA configured mode (enabled/disabled). -1 = not available / don’tset. 0 = disabled, anything else = enabled.

node_id

the PLCA local node identifier. -1 = not available / don’t set.Allowed values [0 .. 254]. 255 = node disabled.

node_cnt

the PLCA node count (maximum number of nodes having a TO). Onlymeaningful for the coordinator (node_id = 0). -1 = not available / don’tset. Allowed values [1 .. 255].

to_tmr

The value of the PLCA to_timer in bit-times, which determines thePLCA transmit opportunity window opening. See IEEE802.3 Clause 148 formore details. The to_timer shall be set equal over all nodes.-1 = not available / don’t set. Allowed values [0 .. 255].

burst_cnt

controls how many additional frames a node is allowed to send insingle transmit opportunity (TO). The default value of 0 means that thenode is allowed exactly one frame per TO. A value of 1 allows two framesper TO, and so on. -1 = not available / don’t set.Allowed values [0 .. 255].

burst_tmr

controls how many bit times to wait for the MAC to send a newframe before interrupting the burst. This value should be set to a valuegreater than the MAC inter-packet gap (which is typically 96 bits).-1 = not available / don’t set. Allowed values [0 .. 255].

Description

A structure containing configuration parameters for setting/getting the PLCARS configuration. The driver does not need to implement all the parameters,but should report what is actually used.

structphy_plca_status

Status of the PLCA (Physical Layer Collision Avoidance) Reconciliation Sublayer.

Definition:

struct phy_plca_status {    bool pst;};

Members

pst

The PLCA status as reported by the PST bit in the PLCA STATUSregister(31.CA03), indicating BEACON activity.

Description

A structure containing status information of the PLCA RS configuration.The driver does not need to implement all the parameters, but should reportwhat is actually used.

structphy_led

An LED driven by the PHY

Definition:

struct phy_led {    struct list_head list;    struct phy_device *phydev;    struct led_classdev led_cdev;    u8 index;};

Members

list

List of LEDs

phydev

PHY this LED is attached to

led_cdev

Standard LED class structure

index

Number of the LED

structphy_mse_capability

Capabilities of Mean Square Error (MSE) measurement interface

Definition:

struct phy_mse_capability {    u64 max_average_mse;    u64 max_peak_mse;    u64 refresh_rate_ps;    u64 num_symbols;    u32 supported_caps;};

Members

max_average_mse

The maximum value for an average MSE snapshot. Thisdefines the scale for the measurement. If the PHY_MSE_CAP_AVG capability issupported, this value MUST be greater than 0. (vendor-specific units).

max_peak_mse

The maximum value for a peak MSE snapshot. If eitherPHY_MSE_CAP_PEAK or PHY_MSE_CAP_WORST_PEAK is supported, this value MUSTbe greater than 0. (vendor-specific units).

refresh_rate_ps

The typical interval, in picoseconds, between hardwareupdates of the MSE values. This is an estimate, and callers should notassume synchronous sampling. (vendor-specific units).

num_symbols

The number of symbols aggregated per hardware sample tocalculate the MSE. (vendor-specific units).

supported_caps

A bitmask of PHY_MSE_CAP_* values indicating whichmeasurement types (e.g., average, peak) and channels(e.g., per-pair or link-wide) are supported.

Description

Standardization notes:

  • Presence of MSE/SQI/pMSE is defined by OPEN Alliance specs, but numericscaling, refresh/update rate and aggregation windows are not fixed andare vendor-/product-specific. (OA 100BASE-T1 TC1 v1.0 6.1.*;OA 1000BASE-T1 TC12 v2.2 6.1.*)

  • Typical recommendations: 2^16 symbols and 0..511 scaling for MSE; pMSE onlydefined for 100BASE-T1 (sliding window example), others are vendorextensions. Drivers must report actual scale/limits here.

Describes the MSE measurement capabilities for the current link mode. Theseproperties are dynamic and may change when link settings are modified.Callers should re-query this capability after any link state change toensure they have the most up-to-date information.

Callers should only request measurements for channels and types that areindicated as supported by thesupported_caps bitmask. Ifsupported_capsis 0, the device provides no MSE diagnostics, and driver operations shouldtypically return -EOPNOTSUPP.

Snapshot values for average and peak MSE can be normalized to a 0..1 ratioby dividing the raw snapshot by the correspondingmax_average_mse ormax_peak_mse value.

structphy_mse_snapshot

A snapshot of Mean Square Error (MSE) diagnostics

Definition:

struct phy_mse_snapshot {    u64 average_mse;    u64 peak_mse;    u64 worst_peak_mse;};

Members

average_mse

The average MSE value over the measurement window.OPEN Alliance references MSE as a DCQ metric; recommends 2^16 symbols and0..511 scaling. Exact scale and refresh are vendor-specific.(100BASE-T1 TC1 v1.0 6.1.1; 1000BASE-T1 TC12 v2.2 6.1.1).

peak_mse

The peak MSE value observed within the measurement window.For 100BASE-T1, “pMSE” is optional and may be implemented via a sliding128-symbol window with periodic capture; not standardized for 1000BASE-T1.(100BASE-T1 TC1 v1.0 6.1.3, Table “DCQ.peakMSE”).

worst_peak_mse

A latched high-water mark of the peak MSE since last read(read-to-clear if implemented). OPEN Alliance shows a latched “worst casepeak MSE” for 100BASE-T1 pMSE; availability/semantics outside that arevendor-specific. (100BASE-T1 TC1 v1.0 6.1.3, DCQ.peakMSE high byte;1000BASE-T1 TC12 v2.2 treats DCQ details as vendor-specific.)

Description

Holds a set of MSE diagnostic values that were all captured from a singlemeasurement window.

Values are raw, device-scaled and not normalized. Usestructphy_mse_capability to interpret the scale and sampling window.

structphy_driver

Driver structure for a particular PHY type

Definition:

struct phy_driver {    struct mdio_driver_common mdiodrv;    u32 phy_id;    char *name;    u32 phy_id_mask;    const unsigned long * const features;    u32 flags;    const void *driver_data;    int (*soft_reset)(struct phy_device *phydev);    int (*config_init)(struct phy_device *phydev);    int (*probe)(struct phy_device *phydev);    int (*get_features)(struct phy_device *phydev);    unsigned int (*inband_caps)(struct phy_device *phydev, phy_interface_t interface);    int (*config_inband)(struct phy_device *phydev, unsigned int modes);    int (*get_rate_matching)(struct phy_device *phydev, phy_interface_t iface);    int (*suspend)(struct phy_device *phydev);    int (*resume)(struct phy_device *phydev);    int (*config_aneg)(struct phy_device *phydev);    int (*aneg_done)(struct phy_device *phydev);    int (*read_status)(struct phy_device *phydev);    int (*config_intr)(struct phy_device *phydev);    irqreturn_t (*handle_interrupt)(struct phy_device *phydev);    void (*remove)(struct phy_device *phydev);    int (*match_phy_device)(struct phy_device *phydev, const struct phy_driver *phydrv);    int (*set_wol)(struct phy_device *dev, struct ethtool_wolinfo *wol);    void (*get_wol)(struct phy_device *dev, struct ethtool_wolinfo *wol);    void (*link_change_notify)(struct phy_device *dev);    int (*read_mmd)(struct phy_device *dev, int devnum, u16 regnum);    int (*write_mmd)(struct phy_device *dev, int devnum, u16 regnum, u16 val);    int (*read_page)(struct phy_device *dev);    int (*write_page)(struct phy_device *dev, int page);    int (*module_info)(struct phy_device *dev, struct ethtool_modinfo *modinfo);    int (*module_eeprom)(struct phy_device *dev, struct ethtool_eeprom *ee, u8 *data);    int (*cable_test_start)(struct phy_device *dev);    int (*cable_test_tdr_start)(struct phy_device *dev, const struct phy_tdr_config *config);    int (*cable_test_get_status)(struct phy_device *dev, bool *finished);    void (*get_phy_stats)(struct phy_device *dev, struct ethtool_eth_phy_stats *eth_stats, struct ethtool_phy_stats *stats);    void (*get_link_stats)(struct phy_device *dev, struct ethtool_link_ext_stats *link_stats);    int (*update_stats)(struct phy_device *dev);    int (*get_sset_count)(struct phy_device *dev);    void (*get_strings)(struct phy_device *dev, u8 *data);    void (*get_stats)(struct phy_device *dev, struct ethtool_stats *stats, u64 *data);    int (*get_tunable)(struct phy_device *dev, struct ethtool_tunable *tuna, void *data);    int (*set_tunable)(struct phy_device *dev, struct ethtool_tunable *tuna, const void *data);    int (*set_loopback)(struct phy_device *dev, bool enable, int speed);    int (*get_sqi)(struct phy_device *dev);    int (*get_sqi_max)(struct phy_device *dev);    int (*get_mse_capability)(struct phy_device *dev, struct phy_mse_capability *cap);    int (*get_mse_snapshot)(struct phy_device *dev, enum phy_mse_channel channel, struct phy_mse_snapshot *snapshot);    int (*get_plca_cfg)(struct phy_device *dev, struct phy_plca_cfg *plca_cfg);    int (*set_plca_cfg)(struct phy_device *dev, const struct phy_plca_cfg *plca_cfg);    int (*get_plca_status)(struct phy_device *dev, struct phy_plca_status *plca_st);    int (*led_brightness_set)(struct phy_device *dev, u8 index, enum led_brightness value);    int (*led_blink_set)(struct phy_device *dev, u8 index, unsigned long *delay_on, unsigned long *delay_off);    int (*led_hw_is_supported)(struct phy_device *dev, u8 index, unsigned long rules);    int (*led_hw_control_set)(struct phy_device *dev, u8 index, unsigned long rules);    int (*led_hw_control_get)(struct phy_device *dev, u8 index, unsigned long *rules);    int (*led_polarity_set)(struct phy_device *dev, int index, unsigned long modes);    unsigned int (*get_next_update_time)(struct phy_device *dev);};

Members

mdiodrv

Data common to all MDIO devices

phy_id

The result of reading the UID registers of this PHYtype, and ANDing them with the phy_id_mask. This driveronly works for PHYs with IDs which match this field

name

The friendly name of this PHY type

phy_id_mask

Defines the important bits of the phy_id

features

A mandatory list of features (speed, duplex, etc)supported by this PHY

flags

A bitfield defining certain other features this PHYsupports (like interrupts)

driver_data

Static driver data

soft_reset

Called to issue a PHY software reset

config_init

Called to initialize the PHY,including after a reset

probe

Called during discovery. Used to setup device-specific structures, if any

get_features

Probe the hardware to determine whatabilities it has. Should only set phydev->supported.

inband_caps

query whether in-band is supported for the given PHYinterface mode. Returns a bitmask of bits defined byenumlink_inband_signalling.

config_inband

configure in-band mode for the PHY

get_rate_matching

Get the supported type of rate matching for aparticular phy interface. This is used by phy consumers to determinewhether to advertise lower-speed modes for that interface. It isassumed that if a rate matching mode is supported on an interface,then that interface’s rate can be adapted to all slower link speedssupported by the phy. If the interface is not supported, this shouldreturnRATE_MATCH_NONE.

suspend

Suspend the hardware, saving state if needed

resume

Resume the hardware, restoring state if needed

config_aneg

Configures the advertisement and resetsautonegotiation if phydev->autoneg is on,forces the speed to the current settings in phydevif phydev->autoneg is off

aneg_done

Determines the auto negotiation result

read_status

Determines the negotiated speed and duplex

config_intr

Enables or disables interrupts.It should also clear any pending interrupts prior to enabling theIRQs and after disabling them.

handle_interrupt

Override default interrupt handling

remove

Clears up any memory if needed

match_phy_device

Returns true if this is a suitabledriver for the given phydev. If NULL, matching is based onphy_id and phy_id_mask.

set_wol

Some devices (e.g. qnap TS-119P II) require PHYregister changes to enable Wake on LAN, so set_wol isprovided to be called in the ethernet driver’s set_wolfunction.

get_wol

See set_wol, but for checking whether Wake on LANis enabled.

link_change_notify

Called to inform a PHY device driverwhen the core is about to change the link state. Thiscallback is supposed to be used as fixup hook for driversthat need to take action when the link statechanges. Drivers are by no means allowed to mess with thePHY device structure in their implementations.

read_mmd

PHY specific driver override for reading a MMDregister. This function is optional for PHY specificdrivers. When not provided, the default MMD read functionwill be used byphy_read_mmd(), which will use either adirect read for Clause 45 PHYs or an indirect read forClause 22 PHYs. devnum is the MMD device number within thePHY device, regnum is the register within the selected MMDdevice.

write_mmd

PHY specific driver override for writing a MMDregister. This function is optional for PHY specificdrivers. When not provided, the default MMD write functionwill be used byphy_write_mmd(), which will use either adirect write for Clause 45 PHYs, or an indirect write forClause 22 PHYs. devnum is the MMD device number within thePHY device, regnum is the register within the selected MMDdevice. val is the value to be written.

read_page

Return the current PHY register page number

write_page

Set the current PHY register page number

module_info

Get the size and type of the eeprom containedwithin a plug-in module

module_eeprom

Get the eeprom information from the plug-inmodule

cable_test_start

Start a cable test

cable_test_tdr_start

Start a raw TDR cable test

cable_test_get_status

Once per second, or on interrupt,request the status of the test.

get_phy_stats

Retrieve PHY statistics.dev: The PHY device for which the statistics are retrieved.eth_stats: structure where Ethernet PHY stats will be stored.stats: structure where additional PHY-specific stats will be stored.

Retrieves the supported PHY statistics and populates the providedstructures. The input structures are pre-initialized withETHTOOL_STAT_NOT_SET, and the driver must only modify memberscorresponding to supported statistics. Unmodified members will remainset toETHTOOL_STAT_NOT_SET and will not be returned to userspace.

get_link_stats

Retrieve link statistics.dev: The PHY device for which the statistics are retrieved.link_stats: structure where link-specific stats will be stored.

Retrieves link-related statistics for the given PHY device. The inputstructure is pre-initialized withETHTOOL_STAT_NOT_SET, and thedriver must only modify members corresponding to supportedstatistics. Unmodified members will remain set toETHTOOL_STAT_NOT_SET and will not be returned to userspace.

update_stats

Trigger periodic statistics updates.dev: The PHY device for which statistics updates are triggered.

Periodically gathers statistics from the PHY device to update locallymaintained 64-bit counters. This is necessary for PHYs that implementreduced-width counters (e.g., 16-bit or 32-bit) which can overflowmore frequently compared to 64-bit counters. By invoking thiscallback, drivers can fetch the current counter values, handleoverflow detection, and accumulate the results into local 64-bitcounters for accurate reporting through theget_phy_stats andget_link_stats interfaces.

Return: 0 on success or a negative error code on failure.

get_sset_count

Number of statistic counters

get_strings

Names of the statistic counters

get_stats

Return the statistic counter values

get_tunable

Return the value of a tunable

set_tunable

Set the value of a tunable

set_loopback

Set the loopback mode of the PHYenable selects if the loopback mode is enabled or disabled. If theloopback mode is enabled, then the speed of the loopback mode can berequested with the speed argument. If the speed argument is zero,then any speed can be selected. If the speed argument is > 0, thenthis speed shall be selected for the loopback mode or EOPNOTSUPPshall be returned if speed selection is not supported.

get_sqi

Get the signal quality indication

get_sqi_max

Get the maximum signal quality indication

get_mse_capability

Get capabilities and scale of MSE measurementdev: PHY devicecap: Output (filled on success)

Fillcap with the PHY’s MSE capability for the currentlink mode: scale limits (max_average_mse, max_peak_mse), updateinterval (refresh_rate_ps), sample length (num_symbols) and thecapability bitmask (supported_caps).

Implementations may defer capability report until hardware hasconverged; in that case they should return -EAGAIN and allow thecaller to retry later.

Return: 0 on success. On failure, returns a negative errno code, suchas -EOPNOTSUPP if MSE measurement is not supported by the PHY or inthe current link mode, or -EAGAIN if the capability information isnot yet available.

get_mse_snapshot

Retrieve a snapshot of MSE diagnostic valuesdev: PHY devicechannel: Channel identifier (PHY_MSE_CHANNEL_*)snapshot: Output (filled on success)

Fillsnapshot with a correlated set of MSE values from the mostrecent measurement window.

Callers must validatechannel against supported_caps returned byget_mse_capability(). Drivers must not coercechannel; if therequested selector is not implemented by the device or current linkmode, the operation must fail.

worst_peak_mse is latched and must be treated as read-to-clear.

Return: 0 on success. On failure, returns a negative errno code, suchas -EOPNOTSUPP if MSE measurement is not supported by the PHY or inthe current link mode, or -EAGAIN if measurements are not yetavailable.

get_plca_cfg

Return the current PLCA configuration

set_plca_cfg

Set the PLCA configuration

get_plca_status

Return the current PLCA status info

led_brightness_set

Set a PHY LED brightness. Indexindicates which of the PHYs led should be set. Valuefollows the standard LED class meaning, e.g. LED_OFF,LED_HALF, LED_FULL.

led_blink_set

Set a PHY LED blinking. Index indicateswhich of the PHYs led should be configured to blink. Delaysare in milliseconds and if both are zero then a sensibledefault should be chosen. The call should adjust thetimings in that case and if it can’t match the valuesspecified exactly.

led_hw_is_supported

Can the HW support the given rules.dev: PHY device which has the LEDindex: Which LED of the PHY devicerules The core is interested in these rules

Return 0 if yes, -EOPNOTSUPP if not, or an error code.

led_hw_control_set

Set the HW to control the LEDdev: PHY device which has the LEDindex: Which LED of the PHY devicerules The rules used to control the LED

Returns 0, or a an error code.

led_hw_control_get

Get how the HW is controlling the LEDdev: PHY device which has the LEDindex: Which LED of the PHY devicerules Pointer to the rules used to control the LED

Set*rules to how the HW is currently blinking. Returns 0on success, or a error code if the current blinking cannotbe represented in rules, or some other error happens.

led_polarity_set

Set the LED polarity modesdev: PHY device which has the LEDindex: Which LED of the PHY devicemodes: bitmap of LED polarity modes

Configure LED with all the required polarity modes inmodesto make it correctly turn ON or OFF.

Returns 0, or an error code.

get_next_update_time

Get the time until the next update eventdev: PHY device

Callback to determine the time (in jiffies) until the nextupdate event for the PHY state machine. Allows PHY drivers todynamically adjust polling intervals based on link state or otherconditions.

Returns the time in jiffies until the next update event.

Description

All functions are optional. If config_aneg or read_statusare not implemented, the phy core uses the genphy versions.Note that none of these functions should be called frominterrupt time. The goal is for the bus read/write functionsto be able to block when the bus transaction is happening,and be freed up by an interrupt (The MPC85xx has this ability,though it is not currently supported in the driver).

boolphy_id_compare(u32id1,u32id2,u32mask)

compareid1 withid2 taking account ofmask

Parameters

u32id1

first PHY ID

u32id2

second PHY ID

u32mask

the PHY ID mask, set bits are significant in matching

Description

Return true if the bits fromid1 andid2 specified bymask match.This uses an equivalent test to (id &mask) == (phy_id &mask).

boolphy_id_compare_vendor(u32id,u32vendor_mask)

compareid withvendor mask

Parameters

u32id

PHY ID

u32vendor_mask

PHY Vendor mask

Return

true if the bits fromid matchvendor using thegeneric PHY Vendor mask.

boolphy_id_compare_model(u32id,u32model_mask)

compareid withmodel mask

Parameters

u32id

PHY ID

u32model_mask

PHY Model mask

Return

true if the bits fromid matchmodel using thegeneric PHY Model mask.

boolphydev_id_compare(structphy_device*phydev,u32id)

compareid with the PHY’s Clause 22 ID

Parameters

structphy_device*phydev

the PHY device

u32id

the PHY ID to be matched

Description

Compare thephydev clause 22 ID with the providedid and return true orfalse depending whether it matches, using the bound driver mask. Thephydev must be bound to a driver.

boolphy_is_started(structphy_device*phydev)

Convenience function to check whether PHY is started

Parameters

structphy_device*phydev

The phy_device struct

boolphy_driver_is_genphy(structphy_device*phydev)

Convenience function to check whether PHY is driven by one of the generic PHY drivers

Parameters

structphy_device*phydev

The phy_device struct

Return

true if PHY is driven by one of the genphy drivers

voidphy_disable_eee_mode(structphy_device*phydev,u32link_mode)

Don’t advertise an EEE mode.

Parameters

structphy_device*phydev

The phy_device struct

u32link_mode

The EEE mode to be disabled

boolphy_can_wakeup(structphy_device*phydev)

indicate whether PHY has driver model wakeup capabilities

Parameters

structphy_device*phydev

The phy_device struct

Return

true/false depending on the PHY driver’sdevice_set_wakeup_capable()setting.

boolphy_may_wakeup(structphy_device*phydev)

indicate whether PHY has wakeup enabled

Parameters

structphy_device*phydev

The phy_device struct

Return

true/false depending on the PHY driver’sdevice_set_wakeup_enabled()setting if using the driver model, otherwise the legacy determination.

intphy_read(structphy_device*phydev,u32regnum)

Convenience function for reading a given PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to read

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

int__phy_read(structphy_device*phydev,u32regnum)

convenience function for reading a given PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to read

Description

The caller must have taken the MDIO bus lock.

intphy_write(structphy_device*phydev,u32regnum,u16val)

Convenience function for writing a given PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16val

value to write toregnum

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

int__phy_write(structphy_device*phydev,u32regnum,u16val)

Convenience function for writing a given PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16val

value to write toregnum

Description

The caller must have taken the MDIO bus lock.

int__phy_modify_changed(structphy_device*phydev,u32regnum,u16mask,u16set)

Convenience function for modifying a PHY register

Parameters

structphy_device*phydev

a pointer to astructphy_device

u32regnum

register number

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Description

Unlocked helper function which allows a PHY register to be modified asnew register value = (old register value & ~mask) | set

Returns negative errno, 0 if there was no change, and 1 in case of change

phy_read_mmd_poll_timeout

phy_read_mmd_poll_timeout(phydev,devaddr,regnum,val,cond,sleep_us,timeout_us,sleep_before_read)

Periodically poll a PHY register until a condition is met or a timeout occurs

Parameters

phydev

The phy_device struct

devaddr

The MMD to read from

regnum

The register on the MMD to read

val

Variable to read the register into

cond

Break condition (usually involvingval)

sleep_us

Maximum time to sleep between reads in us (0 tight-loops). Pleasereadusleep_range() function description for details andlimitations.

timeout_us

Timeout in us, 0 means never timeout

sleep_before_read

if it is true, sleepsleep_us before read.

Return

0 on success and -ETIMEDOUT upon a timeout. In eithercase, the last read value atargs is stored inval. Must notbe called from atomic context if sleep_us or timeout_us are used.

int__phy_set_bits(structphy_device*phydev,u32regnum,u16val)

Convenience function for setting bits in a PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16val

bits to set

Description

The caller must have taken the MDIO bus lock.

int__phy_clear_bits(structphy_device*phydev,u32regnum,u16val)

Convenience function for clearing bits in a PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16val

bits to clear

Description

The caller must have taken the MDIO bus lock.

intphy_set_bits(structphy_device*phydev,u32regnum,u16val)

Convenience function for setting bits in a PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16val

bits to set

intphy_clear_bits(structphy_device*phydev,u32regnum,u16val)

Convenience function for clearing bits in a PHY register

Parameters

structphy_device*phydev

the phy_device struct

u32regnum

register number to write

u16val

bits to clear

int__phy_set_bits_mmd(structphy_device*phydev,intdevad,u32regnum,u16val)

Convenience function for setting bits in a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16val

bits to set

Description

The caller must have taken the MDIO bus lock.

int__phy_clear_bits_mmd(structphy_device*phydev,intdevad,u32regnum,u16val)

Convenience function for clearing bits in a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16val

bits to clear

Description

The caller must have taken the MDIO bus lock.

intphy_set_bits_mmd(structphy_device*phydev,intdevad,u32regnum,u16val)

Convenience function for setting bits in a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16val

bits to set

intphy_clear_bits_mmd(structphy_device*phydev,intdevad,u32regnum,u16val)

Convenience function for clearing bits in a register on MMD

Parameters

structphy_device*phydev

the phy_device struct

intdevad

the MMD containing register to modify

u32regnum

register number to modify

u16val

bits to clear

boolphy_interrupt_is_valid(structphy_device*phydev)

Convenience function for testing a given PHY irq

Parameters

structphy_device*phydev

the phy_device struct

NOTE

must be kept in sync with addition/removal of PHY_POLL andPHY_MAC_INTERRUPT

boolphy_polling_mode(structphy_device*phydev)

Convenience function for testing whether polling is used to detect PHY status changes

Parameters

structphy_device*phydev

the phy_device struct

boolphy_has_hwtstamp(structphy_device*phydev)

Tests whether a PHY time stamp configuration.

Parameters

structphy_device*phydev

the phy_device struct

boolphy_has_rxtstamp(structphy_device*phydev)

Tests whether a PHY supports receive time stamping.

Parameters

structphy_device*phydev

the phy_device struct

boolphy_has_tsinfo(structphy_device*phydev)

Tests whether a PHY reports time stamping and/or PTP hardware clock capabilities.

Parameters

structphy_device*phydev

the phy_device struct

boolphy_has_txtstamp(structphy_device*phydev)

Tests whether a PHY supports transmit time stamping.

Parameters

structphy_device*phydev

the phy_device struct

boolphy_is_default_hwtstamp(structphy_device*phydev)

Is the PHY hwtstamp the default timestamp

Parameters

structphy_device*phydev

Pointer to phy_device

Description

This is used to get default timestamping device taking into accountthe new API choice, which is selecting the timestamping from MAC bydefault if the phydev does not have default_timestamp flag enabled.

Return

True if phy is the default hw timestamp, false otherwise.

boolphy_on_sfp(structphy_device*phydev)

Convenience function for testing if a PHY is on an SFP module

Parameters

structphy_device*phydev

the phy_device struct

boolphy_interface_mode_is_rgmii(phy_interface_tmode)

Convenience function for testing if a PHY interface mode is RGMII (all variants)

Parameters

phy_interface_tmode

thephy_interface_t enum

boolphy_interface_mode_is_8023z(phy_interface_tmode)

does the PHY interface mode use 802.3z negotiation

Parameters

phy_interface_tmode

one ofenumphy_interface_t

Description

Returns true if the PHY interface mode uses the 16-bit negotiationword as defined in 802.3z. (See 802.3-2015 37.2.1 Config_Reg encoding)

boolphy_interface_is_rgmii(structphy_device*phydev)

Convenience function for testing if a PHY interface is RGMII (all variants)

Parameters

structphy_device*phydev

the phy_device struct

boolphy_is_pseudo_fixed_link(structphy_device*phydev)

Convenience function for testing if this PHY is the CPU port facing side of an Ethernet switch, or similar.

Parameters

structphy_device*phydev

the phy_device struct

phy_module_driver

phy_module_driver(__phy_drivers,__count)

Helper macro for registering PHY drivers

Parameters

__phy_drivers

array of PHY drivers to register

__count

Numbers of members in array

Description

Helper macro for PHY drivers which do not do anything special in moduleinit/exit. Each module may only use this macro once, and calling itreplacesmodule_init() andmodule_exit().

intphy_unregister_fixup(constchar*bus_id,u32phy_uid,u32phy_uid_mask)

remove a phy_fixup from the list

Parameters

constchar*bus_id

A string matches fixup->bus_id (or PHY_ANY_ID) in phy_fixup_list

u32phy_uid

A phy id matches fixup->phy_id (or PHY_ANY_UID) in phy_fixup_list

u32phy_uid_mask

Applied to phy_uid and fixup->phy_uid before comparison

intgenphy_match_phy_device(structphy_device*phydev,conststructphy_driver*phydrv)

match a PHY device with a PHY driver

Parameters

structphy_device*phydev

target phy_device struct

conststructphy_driver*phydrv

target phy_driver struct

Description

Checks whether the given PHY device matches the specifiedPHY driver. For Clause 45 PHYs, iterates over the available deviceidentifiers and compares them against the driver’s expected PHY ID,applying the provided mask. For Clause 22 PHYs, a direct ID comparisonis performed.

Return

1 if the PHY device matches the driver, 0 otherwise.

structphy_device*get_phy_device(structmii_bus*bus,intaddr,boolis_c45)

reads the specified PHY device and returns itsphy_device struct

Parameters

structmii_bus*bus

the target MII bus

intaddr

PHY address on the MII bus

boolis_c45

If true the PHY uses the 802.3 clause 45 protocol

Description

Probe for a PHY ataddr onbus.

When probing for a clause 22 PHY, then read the ID registers. If we finda valid ID, allocate and return astructphy_device.

When probing for a clause 45 PHY, read the “devices in package” registers.If the “devices in package” appears valid, read the ID registers for eachMMD, allocate and return astructphy_device.

Returns an allocatedstructphy_device on success,-ENODEV if there isno PHY present, or-EIO on bus access error.

intphy_device_register(structphy_device*phydev)

Register the phy device on the MDIO bus

Parameters

structphy_device*phydev

phy_device structure to be added to the MDIO bus

voidphy_device_remove(structphy_device*phydev)

Remove a previously registered phy device from the MDIO bus

Parameters

structphy_device*phydev

phy_device structure to remove

Description

This doesn’t free the phy_device itself, it merely reverses the effectsofphy_device_register(). Usephy_device_free() to free the deviceafter calling this function.

intphy_get_c45_ids(structphy_device*phydev)

Read 802.3-c45 IDs for phy device.

Parameters

structphy_device*phydev

phy_device structure to read 802.3-c45 IDs

Description

Returns zero on success,-EIO on bus access error, or-ENODEV ifthe “devices in package” is invalid.

structphy_device*phy_find_next(structmii_bus*bus,structphy_device*pos)

finds the next PHY device on the bus

Parameters

structmii_bus*bus

the target MII bus

structphy_device*pos

cursor

Return

next phy_device on the bus, or NULL

intphy_connect_direct(structnet_device*dev,structphy_device*phydev,void(*handler)(structnet_device*),phy_interface_tinterface)

connect an ethernet device to a specific phy_device

Parameters

structnet_device*dev

the network device to connect

structphy_device*phydev

the pointer to the phy device

void(*handler)(structnet_device*)

callback function for state change notifications

phy_interface_tinterface

PHY device’s interface

structphy_device*phy_connect(structnet_device*dev,constchar*bus_id,void(*handler)(structnet_device*),phy_interface_tinterface)

connect an ethernet device to a PHY device

Parameters

structnet_device*dev

the network device to connect

constchar*bus_id

the id string of the PHY device to connect

void(*handler)(structnet_device*)

callback function for state change notifications

phy_interface_tinterface

PHY device’s interface

Description

Convenience function for connecting ethernet

devices to PHY devices. The default behavior is forthe PHY infrastructure to handle everything, and only notifythe connected driver when the link status changes. If youdon’t want, or can’t use the provided functionality, you maychoose to call only the subset of functions which providethe desired functionality.

voidphy_disconnect(structphy_device*phydev)

disable interrupts, stop state machine, and detach a PHY device

Parameters

structphy_device*phydev

target phy_device struct

intphy_sfp_connect_phy(void*upstream,structphy_device*phy)

Connect the SFP module’s PHY to the upstream PHY

Parameters

void*upstream

pointer to the upstream phy device

structphy_device*phy

pointer to the SFP module’s phy device

Description

This helper allows keeping track of PHY devices on the link. It adds theSFP module’s phy to the phy namespace of the upstream phy

Return

0 on success, otherwise a negative error code.

voidphy_sfp_disconnect_phy(void*upstream,structphy_device*phy)

Disconnect the SFP module’s PHY from the upstream PHY

Parameters

void*upstream

pointer to the upstream phy device

structphy_device*phy

pointer to the SFP module’s phy device

Description

This helper allows keeping track of PHY devices on the link. It removes theSFP module’s phy to the phy namespace of the upstream phy. As the module phywill be destroyed, re-inserting the same module will add a new phy with anew index.

voidphy_sfp_attach(void*upstream,structsfp_bus*bus)

attach the SFP bus to the PHY upstream network device

Parameters

void*upstream

pointer to the phy device

structsfp_bus*bus

sfp bus representing cage being attached

Description

This is used to fill in the sfp_upstream_ops .attach member.

voidphy_sfp_detach(void*upstream,structsfp_bus*bus)

detach the SFP bus from the PHY upstream network device

Parameters

void*upstream

pointer to the phy device

structsfp_bus*bus

sfp bus representing cage being attached

Description

This is used to fill in the sfp_upstream_ops .detach member.

intphy_sfp_probe(structphy_device*phydev,conststructsfp_upstream_ops*ops)

probe for a SFP cage attached to this PHY device

Parameters

structphy_device*phydev

Pointer to phy_device

conststructsfp_upstream_ops*ops

SFP’s upstream operations

intphy_attach_direct(structnet_device*dev,structphy_device*phydev,u32flags,phy_interface_tinterface)

attach a network device to a given PHY device pointer

Parameters

structnet_device*dev

network device to attach

structphy_device*phydev

Pointer to phy_device to attach

u32flags

PHY device’s dev_flags

phy_interface_tinterface

PHY device’s interface

Description

Called by drivers to attach to a particular PHY

device. The phy_device is found, and properly hooked upto the phy_driver. If no driver is attached, then ageneric driver is used. The phy_device is given a ptr tothe attaching device, and given a callback for link statuschange. The phy_device is returned to the attaching driver.This function takes a reference on the phy device.

structphy_device*phy_attach(structnet_device*dev,constchar*bus_id,phy_interface_tinterface)

attach a network device to a particular PHY device

Parameters

structnet_device*dev

network device to attach

constchar*bus_id

Bus ID of PHY device to attach

phy_interface_tinterface

PHY device’s interface

Description

Same as phy_attach_direct() except that a PHY bus_id

string is passed instead of a pointer to astructphy_device.

voidphy_detach(structphy_device*phydev)

detach a PHY device from its network device

Parameters

structphy_device*phydev

target phy_device struct

Description

This detaches the phy device from its network device and the phydriver, and drops the reference count taken inphy_attach_direct().

intphy_reset_after_clk_enable(structphy_device*phydev)

perform a PHY reset if needed

Parameters

structphy_device*phydev

target phy_device struct

Description

Some PHYs are known to need a reset after their refclk was

enabled. This function evaluates the flags and perform the reset if it’sneeded. Returns < 0 on error, 0 if the phy wasn’t reset and 1 if the phywas reset.

intgenphy_setup_forced(structphy_device*phydev)

configures/forces speed/duplex fromphydev

Parameters

structphy_device*phydev

target phy_device struct

Description

Configures MII_BMCR to force speed/duplex

to the values in phydev. Assumes that the values are valid.Please seephy_sanitize_settings().

intgenphy_restart_aneg(structphy_device*phydev)

Enable and Restart Autonegotiation

Parameters

structphy_device*phydev

target phy_device struct

intgenphy_check_and_restart_aneg(structphy_device*phydev,boolrestart)

Enable and restart auto-negotiation

Parameters

structphy_device*phydev

target phy_device struct

boolrestart

whether aneg restart is requested

Description

Check, and restart auto-negotiation if needed.

int__genphy_config_aneg(structphy_device*phydev,boolchanged)

restart auto-negotiation or write BMCR

Parameters

structphy_device*phydev

target phy_device struct

boolchanged

whether autoneg is requested

Description

If auto-negotiation is enabled, we configure the

advertising, and then restart auto-negotiation. If it is notenabled, then we write the BMCR.

intgenphy_c37_config_aneg(structphy_device*phydev)

restart auto-negotiation or write BMCR

Parameters

structphy_device*phydev

target phy_device struct

Description

If auto-negotiation is enabled, we configure the

advertising, and then restart auto-negotiation. If it is notenabled, then we write the BMCR. This function is intendedfor use with Clause 37 1000Base-X mode.

intgenphy_aneg_done(structphy_device*phydev)

return auto-negotiation status

Parameters

structphy_device*phydev

target phy_device struct

Description

Reads the status register and returns 0 either if

auto-negotiation is incomplete, or if there was an error.Returns BMSR_ANEGCOMPLETE if auto-negotiation is done.

intgenphy_update_link(structphy_device*phydev)

update link status inphydev

Parameters

structphy_device*phydev

target phy_device struct

Description

Update the value in phydev->link to reflect the

current link value. In order to do this, we need to readthe status register twice, keeping the second value.

intgenphy_read_status_fixed(structphy_device*phydev)

read the link parameters for !aneg mode

Parameters

structphy_device*phydev

target phy_device struct

Description

Read the current duplex and speed state for a PHY operating withautonegotiation disabled.

intgenphy_read_status(structphy_device*phydev)

check the link status and update current link state

Parameters

structphy_device*phydev

target phy_device struct

Description

Check the link, then figure out the current state

by comparing what we advertise with what the link partneradvertises. Start by checking the gigabit possibilities,then move on to 10/100.

intgenphy_c37_read_status(structphy_device*phydev,bool*changed)

check the link status and update current link state

Parameters

structphy_device*phydev

target phy_device struct

bool*changed

pointer where to store if link changed

Description

Check the link, then figure out the current state

by comparing what we advertise with what the link partneradvertises. This function is for Clause 37 1000Base-X mode.

If link has changed,changed is set to true, false otherwise.

intgenphy_soft_reset(structphy_device*phydev)

software reset the PHY via BMCR_RESET bit

Parameters

structphy_device*phydev

target phy_device struct

Description

Perform a software PHY reset using the standardBMCR_RESET bit and poll for the reset bit to be cleared.

Return

0 on success, < 0 on failure

intgenphy_read_abilities(structphy_device*phydev)

read PHY abilities from Clause 22 registers

Parameters

structphy_device*phydev

target phy_device struct

Description

Reads the PHY’s abilities and populatesphydev->supported accordingly.

Return

0 on success, < 0 on failure

voidphy_remove_link_mode(structphy_device*phydev,u32link_mode)

Remove a supported link mode

Parameters

structphy_device*phydev

phy_device structure to remove link mode from

u32link_mode

Link mode to be removed

Description

Some MACs don’t support all link modes which the PHYdoes. e.g. a 1G MAC often does not support 1000Half. Add a helperto remove a link mode.

voidphy_advertise_supported(structphy_device*phydev)

Advertise all supported modes

Parameters

structphy_device*phydev

target phy_device struct

Description

Called to advertise all supported modes, doesn’t touchpause mode advertising.

voidphy_advertise_eee_all(structphy_device*phydev)

Advertise all supported EEE modes

Parameters

structphy_device*phydev

target phy_device struct

Description

Per default phylib preserves the EEE advertising at the time ofphy probing, which might be a subset of the supported EEE modes. Use thisfunction when all supported EEE modes should be advertised. This does nottrigger auto-negotiation, so must be called beforephy_start()/phylink_start() which will start auto-negotiation.

voidphy_support_eee(structphy_device*phydev)

Set initial EEE policy configuration

Parameters

structphy_device*phydev

Target phy_device struct

Description

This function configures the initial policy for Energy Efficient Ethernet(EEE) on the specified PHY device, influencing that EEE capabilities areadvertised before the link is established. It should be called during PHYregistration by the MAC driver and/or the PHY driver (for SmartEEE PHYs)if MAC supports LPI or PHY is capable to compensate missing LPI functionalityof the MAC.

The function sets default EEE policy parameters, including preparing the PHYto advertise EEE capabilities based on hardware support.

It also sets the expected configuration for Low Power Idle (LPI) in the MACdriver. If the PHY framework determines that both local and remoteadvertisements support EEE, and the negotiated link mode is compatible withEEE, it will set enable_tx_lpi = true. The MAC driver is expected to act onthis setting by enabling the LPI timer if enable_tx_lpi is set.

voidphy_disable_eee(structphy_device*phydev)

Disable EEE for the PHY

Parameters

structphy_device*phydev

Target phy_device struct

Description

This function is used by MAC drivers for MAC’s which don’t support EEE.It disables EEE on the PHY layer.

voidphy_support_sym_pause(structphy_device*phydev)

Enable support of symmetrical pause

Parameters

structphy_device*phydev

target phy_device struct

Description

Called by the MAC to indicate is supports symmetricalPause, but not asym pause.

voidphy_support_asym_pause(structphy_device*phydev)

Enable support of asym pause

Parameters

structphy_device*phydev

target phy_device struct

Description

Called by the MAC to indicate is supports Asym Pause.

voidphy_set_sym_pause(structphy_device*phydev,boolrx,booltx,boolautoneg)

Configure symmetric Pause

Parameters

structphy_device*phydev

target phy_device struct

boolrx

Receiver Pause is supported

booltx

Transmit Pause is supported

boolautoneg

Auto neg should be used

Description

Configure advertised Pause support depending on ifreceiver pause and pause auto neg is supported. Generally calledfrom the set_pauseparam .ndo.

voidphy_set_asym_pause(structphy_device*phydev,boolrx,booltx)

Configure Pause and Asym Pause

Parameters

structphy_device*phydev

target phy_device struct

boolrx

Receiver Pause is supported

booltx

Transmit Pause is supported

Description

Configure advertised Pause support depending on iftransmit and receiver pause is supported. If there has been achange in adverting, trigger a new autoneg. Generally called fromthe set_pauseparam .ndo.

boolphy_validate_pause(structphy_device*phydev,structethtool_pauseparam*pp)

Test if the PHY/MAC support the pause configuration

Parameters

structphy_device*phydev

phy_device struct

structethtool_pauseparam*pp

requested pause configuration

Description

Test if the PHY/MAC combination supports the Pauseconfiguration the user is requesting. Returns True if it issupported, false otherwise.

voidphy_get_pause(structphy_device*phydev,bool*tx_pause,bool*rx_pause)

resolve negotiated pause modes

Parameters

structphy_device*phydev

phy_device struct

bool*tx_pause

pointer to bool to indicate whether transmit pause should beenabled.

bool*rx_pause

pointer to bool to indicate whether receive pause should beenabled.

Description

Resolve and return the flow control modes according to the negotiationresult. This includes checking that we are operating in full duplex mode.Seelinkmode_resolve_pause() for further details.

s32phy_get_internal_delay(structphy_device*phydev,constint*delay_values,intsize,boolis_rx)

returns the index of the internal delay

Parameters

structphy_device*phydev

phy_device struct

constint*delay_values

array of delays the PHY supports

intsize

the size of the delay array

boolis_rx

boolean to indicate to get the rx internal delay

Description

Returns the index within the array of internal delay passed in.If the device property is not present then the interface type is checkedif the interface defines use of internal delay then a 1 is returned otherwisea 0 is returned.The array must be in ascending order. If PHY does not have an ascending orderarray then size = 0 and the value of the delay property is returned.Return -EINVAL if the delay is invalid or cannot be found.

intphy_get_tx_amplitude_gain(structphy_device*phydev,structdevice*dev,enumethtool_link_mode_bit_indiceslinkmode,u32*val)

stores tx amplitude gain inval

Parameters

structphy_device*phydev

phy_device struct

structdevice*dev

pointer to the devices device struct

enumethtool_link_mode_bit_indiceslinkmode

linkmode for which the tx amplitude gain should be retrieved

u32*val

tx amplitude gain

Return

0 on success, < 0 on failure

intphy_get_mac_termination(structphy_device*phydev,structdevice*dev,u32*val)

stores MAC termination inval

Parameters

structphy_device*phydev

phy_device struct

structdevice*dev

pointer to the devices device struct

u32*val

MAC termination

Return

0 on success, < 0 on failure

structmdio_device*fwnode_mdio_find_device(structfwnode_handle*fwnode)

Given a fwnode, find the mdio_device

Parameters

structfwnode_handle*fwnode

pointer to the mdio_device’s fwnode

Description

If successful, returns a pointer to the mdio_device with the embeddedstructdevice refcount incremented by one, or NULL on failure.The caller should callput_device() on the mdio_device after its use.

structphy_device*fwnode_phy_find_device(structfwnode_handle*phy_fwnode)

For provided phy_fwnode, find phy_device.

Parameters

structfwnode_handle*phy_fwnode

Pointer to the phy’s fwnode.

Description

If successful, returns a pointer to the phy_device with the embeddedstructdevice refcount incremented by one, or NULL on failure.

structfwnode_handle*fwnode_get_phy_node(conststructfwnode_handle*fwnode)

Get the phy_node using the named reference.

Parameters

conststructfwnode_handle*fwnode

Pointer to fwnode from which phy_node has to be obtained.

Description

Refer return conditions offwnode_find_reference().For ACPI, only “phy-handle” is supported. Legacy DT properties “phy”and “phy-device” are not supported in ACPI. DT supports all the threenamed references to the phy node.

boolphy_uses_state_machine(structphy_device*phydev)

test whether consumer driver uses PAL state machine

Parameters

structphy_device*phydev

the target PHY device structure

Description

Ultimately, this aims to indirectly determine whether the PHY is attachedto a consumer which uses the state machine by callingphy_start() andphy_stop().

When the PHY driver consumer uses phylib, it must have previously calledphy_connect_direct() or one of its derivatives, so thatphy_prepare_link()has set up a hook for monitoring state changes.

When the PHY driver is used by the MAC driver consumer through phylink (theonly other provider of aphy_link_change() method), using the PHY statemachine is not optional.

Return

true if consumer callsphy_start() andphy_stop(), false otherwise.

intphy_register_fixup(constchar*bus_id,u32phy_uid,u32phy_uid_mask,int(*run)(structphy_device*))

creates a new phy_fixup and adds it to the list

Parameters

constchar*bus_id

A string which matches phydev->mdio.dev.bus_id (or PHY_ANY_ID)

u32phy_uid

Used to match against phydev->phy_id (the UID of the PHY)It can also be PHY_ANY_UID

u32phy_uid_mask

Applied to phydev->phy_id and fixup->phy_uid beforecomparison

int(*run)(structphy_device*)

The actual code to be run when a matching PHY is found

intget_phy_c45_ids(structmii_bus*bus,intaddr,structphy_c45_device_ids*c45_ids)

reads the specified addr for its 802.3-c45 IDs.

Parameters

structmii_bus*bus

the target MII bus

intaddr

PHY address on the MII bus

structphy_c45_device_ids*c45_ids

where to store the c45 ID information.

Description

Read the PHY “devices in package”. If this appears to be valid, readthe PHY identifiers for each device. Return the “devices in package”and identifiers inc45_ids.

Returns zero on success,-EIO on bus access error, or-ENODEV ifthe “devices in package” is invalid or no device responds.

intget_phy_c22_id(structmii_bus*bus,intaddr,u32*phy_id)

reads the specified addr for its clause 22 ID.

Parameters

structmii_bus*bus

the target MII bus

intaddr

PHY address on the MII bus

u32*phy_id

where to store the ID retrieved.

Description

Read the 802.3 clause 22 PHY ID from the PHY ataddr on thebus,placing it inphy_id. Return zero on successful read and the ID isvalid,-EIO on bus access error, or-ENODEV if no device respondsor invalid ID.

voidphy_prepare_link(structphy_device*phydev,void(*handler)(structnet_device*))

prepares the PHY layer to monitor link status

Parameters

structphy_device*phydev

target phy_device struct

void(*handler)(structnet_device*)

callback function for link status change notifications

Description

Tells the PHY infrastructure to handle the

gory details on monitoring link status (whether throughpolling or an interrupt), and to call back to theconnected device driver when the link status changes.If you want to monitor your own link state, don’t callthis function.

intphy_poll_reset(structphy_device*phydev)

Safely wait until a PHY reset has properly completed

Parameters

structphy_device*phydev

The PHY device to poll

Description

According to IEEE 802.3, Section 2, Subsection 22.2.4.1.1, as

published in 2008, a PHY reset may take up to 0.5 seconds. The MII BMCRregister must be polled until the BMCR_RESET bit clears.

Furthermore, any attempts to write to PHY registers may have no effector even generate MDIO bus errors until this is complete.

Some PHYs (such as the Marvell 88E1111) don’t entirely conform to thestandard and do not fully reset after the BMCR_RESET bit is set, and mayevenREQUIRE a soft-reset to properly restart autonegotiation. In aneffort to support such broken PHYs, this function is separate from thestandardphy_init_hw() which will zero all the other bits in the BMCRand reapply all driver-specific and board-specific fixups.

intgenphy_config_advert(structphy_device*phydev,constunsignedlong*advert)

sanitize and advertise auto-negotiation parameters

Parameters

structphy_device*phydev

target phy_device struct

constunsignedlong*advert

auto-negotiation parameters to advertise

Description

Writes MII_ADVERTISE with the appropriate values,

after sanitizing the values to make sure we only advertisewhat is supported. Returns < 0 on error, 0 if the PHY’s advertisementhasn’t changed, and > 0 if it has changed.

intgenphy_c37_config_advert(structphy_device*phydev)

sanitize and advertise auto-negotiation parameters

Parameters

structphy_device*phydev

target phy_device struct

Description

Writes MII_ADVERTISE with the appropriate values,

after sanitizing the values to make sure we only advertisewhat is supported. Returns < 0 on error, 0 if the PHY’s advertisementhasn’t changed, and > 0 if it has changed. This function is intendedfor Clause 37 1000Base-X mode.

intphy_probe(structdevice*dev)

probe and init a PHY device

Parameters

structdevice*dev

device to probe and init

Description

Take care of setting up the phy_device structure, set the state to READY.

intphy_driver_register(structphy_driver*new_driver,structmodule*owner)

register a phy_driver with the PHY layer

Parameters

structphy_driver*new_driver

new phy_driver to register

structmodule*owner

module owning this PHY

structmii_bus*mdio_find_bus(constchar*mdio_name)

Given the name of a mdiobus, find the mii_bus.

Parameters

constchar*mdio_name

The name of a mdiobus.

Return

a reference to the mii_bus, or NULL if none found. Theembeddedstructdevice will have its reference count incremented,and this must be put_deviced’ed once the bus is finished with.

structmii_bus*of_mdio_find_bus(structdevice_node*mdio_bus_np)

Given an mii_bus node, find the mii_bus.

Parameters

structdevice_node*mdio_bus_np

Pointer to the mii_bus.

Return

a reference to the mii_bus, or NULL if none found. Theembeddedstructdevice will have its reference count incremented,and this must be put once the bus is finished with.

Description

Because the association of a device_node and mii_bus is made viaof_mdiobus_register(), the mii_bus cannot be found before it isregistered withof_mdiobus_register().

int__mdiobus_read(structmii_bus*bus,intaddr,u32regnum)

Unlocked version of the mdiobus_read function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to read

Return

The register value if successful, negative error code on failure

Description

Read a MDIO bus register. Caller must hold the mdio bus lock.

NOTE

MUST NOT be called from interrupt context.

int__mdiobus_write(structmii_bus*bus,intaddr,u32regnum,u16val)

Unlocked version of the mdiobus_write function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to write

u16val

value to write toregnum

Return

Zero if successful, negative error code on failure

Description

Write a MDIO bus register. Caller must hold the mdio bus lock.

NOTE

MUST NOT be called from interrupt context.

int__mdiobus_modify_changed(structmii_bus*bus,intaddr,u32regnum,u16mask,u16set)

Unlocked version of the mdiobus_modify function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Return

1 if the register was modified, 0 if no change was needed,negative on any error condition

Description

Read, modify, and if any change, write the register value back to thedevice.

NOTE

MUST NOT be called from interrupt context.

int__mdiobus_c45_read(structmii_bus*bus,intaddr,intdevad,u32regnum)

Unlocked version of the mdiobus_c45_read function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to read

Return

The register value if successful, negative error code on failure

Description

Read a MDIO bus register. Caller must hold the mdio bus lock.

NOTE

MUST NOT be called from interrupt context.

int__mdiobus_c45_write(structmii_bus*bus,intaddr,intdevad,u32regnum,u16val)

Unlocked version of the mdiobus_write function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to write

u16val

value to write toregnum

Return

Zero if successful, negative error code on failure

Description

Write a MDIO bus register. Caller must hold the mdio bus lock.

NOTE

MUST NOT be called from interrupt context.

intmdiobus_read_nested(structmii_bus*bus,intaddr,u32regnum)

Nested version of the mdiobus_read function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to read

Return

The register value if successful, negative error code on failure

Description

In case of nested MDIO bus access avoid lockdep false positives byusingmutex_lock_nested().

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_read(structmii_bus*bus,intaddr,u32regnum)

Convenience function for reading a given MII mgmt register

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to read

Return

The register value if successful, negative error code on failure

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_c45_read(structmii_bus*bus,intaddr,intdevad,u32regnum)

Convenience function for reading a given MII mgmt register

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to read

Return

The register value if successful, negative error code on failure

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_c45_read_nested(structmii_bus*bus,intaddr,intdevad,u32regnum)

Nested version of the mdiobus_c45_read function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to read

Return

The register value if successful, negative error code on failure

Description

In case of nested MDIO bus access avoid lockdep false positives byusingmutex_lock_nested().

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_write_nested(structmii_bus*bus,intaddr,u32regnum,u16val)

Nested version of the mdiobus_write function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to write

u16val

value to write toregnum

Return

Zero if successful, negative error code on failure

Description

In case of nested MDIO bus access avoid lockdep false positives byusingmutex_lock_nested().

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_write(structmii_bus*bus,intaddr,u32regnum,u16val)

Convenience function for writing a given MII mgmt register

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to write

u16val

value to write toregnum

Return

Zero if successful, negative error code on failure

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_c45_write(structmii_bus*bus,intaddr,intdevad,u32regnum,u16val)

Convenience function for writing a given MII mgmt register

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to write

u16val

value to write toregnum

Return

Zero if successful, negative error code on failure

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_c45_write_nested(structmii_bus*bus,intaddr,intdevad,u32regnum,u16val)

Nested version of the mdiobus_c45_write function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to write

u16val

value to write toregnum

Return

Zero if successful, negative error code on failure

Description

In case of nested MDIO bus access avoid lockdep false positives byusingmutex_lock_nested().

NOTE

MUST NOT be called from interrupt context,because the bus read/write functions may wait for an interruptto conclude the operation.

intmdiobus_modify(structmii_bus*bus,intaddr,u32regnum,u16mask,u16set)

Convenience function for modifying a given mdio device register

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to write

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Return

0 on success, negative on any error condition

intmdiobus_c45_modify(structmii_bus*bus,intaddr,intdevad,u32regnum,u16mask,u16set)

Convenience function for modifying a given mdio device register

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to write

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Return

0 on success, negative on any error condition

intmdiobus_modify_changed(structmii_bus*bus,intaddr,u32regnum,u16mask,u16set)

Convenience function for modifying a given mdio device register and returning if it changed

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

u32regnum

register number to write

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Return

1 if the register was modified, 0 if no change was needed,negative on any error condition

intmdiobus_c45_modify_changed(structmii_bus*bus,intaddr,intdevad,u32regnum,u16mask,u16set)

Convenience function for modifying a given mdio device register and returning if it changed

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to write

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Return

1 if the register was modified, 0 if no change was needed,negative on any error condition

voidmdiobus_release(structdevice*d)

mii_bus device release callback

Parameters

structdevice*d

the targetstructdevice that contains the mii_bus

Description

called when the last reference to an mii_bus isdropped, to free the underlying memory.

int__mdiobus_c45_modify_changed(structmii_bus*bus,intaddr,intdevad,u32regnum,u16mask,u16set)

Unlocked version of the mdiobus_modify function

Parameters

structmii_bus*bus

the mii_bus struct

intaddr

the phy address

intdevad

device address to read

u32regnum

register number to modify

u16mask

bit mask of bits to clear

u16set

bit mask of bits to set

Return

1 if the register was modified, 0 if no change was needed,negative on any error condition

Description

Read, modify, and if any change, write the register value back to thedevice. Any error returns a negative number.

NOTE

MUST NOT be called from interrupt context.

intmdio_bus_match(structdevice*dev,conststructdevice_driver*drv)

determine if given MDIO driver supports the given MDIO device

Parameters

structdevice*dev

target MDIO device

conststructdevice_driver*drv

given MDIO driver

Return

1 if the driver supports the device, 0 otherwise

Description

This may require calling the devices own match function,

since different classes of MDIO devices have different match criteria.

PHYLINK

PHYLINK interfaces traditional network drivers with PHYLIB, fixed-links,and SFF modules (eg, hot-pluggable SFP) that may contain PHYs. PHYLINKprovides management of the link state and link modes.

structphylink_link_state

link state structure

Definition:

struct phylink_link_state {    unsigned long advertising[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    unsigned long lp_advertising[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    phy_interface_t interface;    int speed;    int duplex;    int pause;    int rate_matching;    unsigned int link:1;    unsigned int an_complete:1;};

Members

advertising

ethtool bitmask containing advertised link modes

lp_advertising

ethtool bitmask containing link partner advertised linkmodes

interface

linktypedefphy_interface_t mode

speed

link speed, one of the SPEED_* constants.

duplex

link duplex mode, one of DUPLEX_* constants.

pause

link pause state, described by MLO_PAUSE_* constants.

rate_matching

rate matching being performed, one of the RATE_MATCH_*constants. If rate matching is taking place, then the speed/duplex ofthe medium link mode (speed andduplex) and the speed/duplex of the phyinterface mode (interface) are different.

link

true if the link is up.

an_complete

true if autonegotiation has completed.

structphylink_config

PHYLINK configuration structure

Definition:

struct phylink_config {    struct device *dev;    enum phylink_op_type type;    bool poll_fixed_state;    bool mac_managed_pm;    bool mac_requires_rxc;    bool default_an_inband;    bool eee_rx_clk_stop_enable;    void (*get_fixed_state)(struct phylink_config *config, struct phylink_link_state *state);    unsigned long supported_interfaces[BITS_TO_LONGS(PHY_INTERFACE_MODE_MAX)];    unsigned long lpi_interfaces[BITS_TO_LONGS(PHY_INTERFACE_MODE_MAX)];    unsigned long mac_capabilities;    unsigned long lpi_capabilities;    u32 lpi_timer_default;    bool eee_enabled_default;    bool wol_phy_legacy;    bool wol_phy_speed_ctrl;    u32 wol_mac_support;};

Members

dev

a pointer to astructdevice associated with the MAC

type

operation type of PHYLINK instance

poll_fixed_state

if true, starts link_poll,if MAC link is atMLO_AN_FIXED mode.

mac_managed_pm

if true, indicate the MAC driver is responsible for PHY PM.

mac_requires_rxc

if true, the MAC always requires a receive clock from PHY.The PHY driver should start the clock signal as soon aspossible and avoid stopping it during suspend events.

default_an_inband

if true, defaults to MLO_AN_INBAND rather thanMLO_AN_PHY. A fixed-link specification will override.

eee_rx_clk_stop_enable

if true, PHY can stop the receive clock during LPI

get_fixed_state

callback to execute to determine the fixed link state,if MAC link is atMLO_AN_FIXED mode.

supported_interfaces

bitmap describing which PHY_INTERFACE_MODE_xxxare supported by the MAC/PCS.

lpi_interfaces

bitmap describing which PHY interface modes can supportLPI signalling.

mac_capabilities

MAC pause/speed/duplex capabilities.

lpi_capabilities

MAC speeds which can support LPI signalling

lpi_timer_default

Default EEE LPI timer setting.

eee_enabled_default

If set, EEE will be enabled by phylink at creation time

wol_phy_legacy

Use Wake-on-Lan with PHY even ifphy_can_wakeup() is false

wol_phy_speed_ctrl

Use phy speed control on suspend/resume

wol_mac_support

Bitmask of MAC supportedWAKE_* options

structphylink_mac_ops

MAC operations structure.

Definition:

struct phylink_mac_ops {    unsigned long (*mac_get_caps)(struct phylink_config *config, phy_interface_t interface);    struct phylink_pcs *(*mac_select_pcs)(struct phylink_config *config, phy_interface_t interface);    int (*mac_prepare)(struct phylink_config *config, unsigned int mode, phy_interface_t iface);    void (*mac_config)(struct phylink_config *config, unsigned int mode, const struct phylink_link_state *state);    int (*mac_finish)(struct phylink_config *config, unsigned int mode, phy_interface_t iface);    void (*mac_link_down)(struct phylink_config *config, unsigned int mode, phy_interface_t interface);    void (*mac_link_up)(struct phylink_config *config, struct phy_device *phy, unsigned int mode, phy_interface_t interface, int speed, int duplex, bool tx_pause, bool rx_pause);    void (*mac_disable_tx_lpi)(struct phylink_config *config);    int (*mac_enable_tx_lpi)(struct phylink_config *config, u32 timer, bool tx_clk_stop);    int (*mac_wol_set)(struct phylink_config *config, u32 wolopts, const u8 *sopass);};

Members

mac_get_caps

Get MAC capabilities for interface mode.

mac_select_pcs

Select a PCS for the interface mode.

mac_prepare

prepare for a major reconfiguration of the interface.

mac_config

configure the MAC for the selected mode and state.

mac_finish

finish a major reconfiguration of the interface.

mac_link_down

take the link down.

mac_link_up

allow the link to come up.

mac_disable_tx_lpi

disable LPI.

mac_enable_tx_lpi

enable and configure LPI.

mac_wol_set

configure Wake-on-Lan settings at the MAC.

Description

The individual methods are described more fully below.

unsignedlongmac_get_caps(structphylink_config*config,phy_interface_tinterface)

Get MAC capabilities for interface mode.

Parameters

structphylink_config*config

a pointer to astructphylink_config.

phy_interface_tinterface

PHY interface mode.

Description

Optional method. When not provided, config->mac_capabilities will be used.When implemented, this returns the MAC capabilities for the specifiedinterface mode where there is some special handling required by the MACdriver (e.g. not supporting half-duplex in certain interface modes.)

structphylink_pcs*mac_select_pcs(structphylink_config*config,phy_interface_tinterface)

Select a PCS for the interface mode.

Parameters

structphylink_config*config

a pointer to astructphylink_config.

phy_interface_tinterface

PHY interface mode for PCS

Description

Return thestructphylink_pcs for the specified interface mode, orNULL if none is required, or an error pointer on error.

This must not modify any state. It is used to query which PCS shouldbe used. Phylink will use this during validation to ensure that theconfiguration is valid, and when setting a configuration to internallyset the PCS that will be used.

intmac_prepare(structphylink_config*config,unsignedintmode,phy_interface_tiface)

prepare to change the PHY interface mode

Parameters

structphylink_config*config

a pointer to astructphylink_config.

unsignedintmode

one ofMLO_AN_FIXED,MLO_AN_PHY,MLO_AN_INBAND.

phy_interface_tiface

interface mode to switch to

Description

phylink will call this method at the beginning of a full initialisationof the link, which includes changing the interface mode or at initialstartup time. It may be called for the current mode. The MAC drivershould perform whatever actions are required, e.g. disabling theSerdes PHY.

This will be the first call in the sequence:-mac_prepare()-mac_config()-pcs_config()- possiblepcs_an_restart()-mac_finish()

Returns zero on success, or negative errno on failure which will bereported to the kernel log.

voidmac_config(structphylink_config*config,unsignedintmode,conststructphylink_link_state*state)

configure the MAC for the selected mode and state

Parameters

structphylink_config*config

a pointer to astructphylink_config.

unsignedintmode

one ofMLO_AN_FIXED,MLO_AN_PHY,MLO_AN_INBAND.

conststructphylink_link_state*state

a pointer to astructphylink_link_state.

Description

Note - not all members ofstate are valid. In particular,state->lp_advertising,state->link,state->an_complete are neverguaranteed to be correct, and so anymac_config() implementation mustnever reference these fields.

This will only be called to reconfigure the MAC for a “major” change ine.g. interface mode. It will not be called for changes in speed, duplexor pause modes or to change the in-band advertisement.

In all negotiation modes, as defined bymode,state->pause indicates thepause settings which should be applied as follows. IfMLO_PAUSE_AN is notset,MLO_PAUSE_TX andMLO_PAUSE_RX indicate whether the MAC should sendpause frames and/or act on received pause frames respectively. Otherwise,the results of in-band negotiation/status from the MAC PCS should be usedto control the MAC pause mode settings.

The action performed depends on the currently selected mode:

MLO_AN_FIXED,MLO_AN_PHY:

Configure for non-inband negotiation mode, where the link settingsare completely communicated viamac_link_up(). The physical linkprotocol from the MAC is specified bystate->interface.

state->advertising may be used, but is not required.

Older drivers (prior to themac_link_up() change) may usestate->speed,state->duplex andstate->pause to configure the MAC, but this isdeprecated; such drivers should be converted to usemac_link_up().

Other members ofstate must be ignored.

Valid state members: interface, advertising.Deprecated state members: speed, duplex, pause.

MLO_AN_INBAND:

place the link in an inband negotiation mode (such as 802.3z1000base-X or Cisco SGMII mode depending on thestate->interfacemode). In both cases, link state management (whether the linkis up or not) is performed by the MAC, and reported via thepcs_get_state() callback. Changes in link state must be madeby callingphylink_mac_change().

Interface mode specific details are mentioned below.

If in 802.3z mode, the link speed is fixed, dependent on thestate->interface. Duplex and pause modes are negotiated viathe in-band configuration word. Advertised pause modes are setaccording tostate->advertising. Beware of MACs which onlysupport full duplex at gigabit and higher speeds.

If in Cisco SGMII mode, the link speed and duplex mode are passedin the serial bitstream 16-bit configuration word, and the MACshould be configured to read these bits and acknowledge theconfiguration word. Nothing is advertised by the MAC. The MAC isresponsible for reading the configuration word and configuringitself accordingly.

Valid state members: interface, pause, advertising.

Implementations are expected to update the MAC to reflect therequested settings - i.o.w., if nothing has changed between twocalls, no action is expected. If only flow control settings havechanged, flow control should be updatedwithout taking the linkdown. This “update” behaviour is critical to avoid bouncing thelink up status.

intmac_finish(structphylink_config*config,unsignedintmode,phy_interface_tiface)

finish a to change the PHY interface mode

Parameters

structphylink_config*config

a pointer to astructphylink_config.

unsignedintmode

one ofMLO_AN_FIXED,MLO_AN_PHY,MLO_AN_INBAND.

phy_interface_tiface

interface mode to switch to

Description

phylink will call this if it calledmac_prepare() to allow the MAC tocomplete any necessary steps after the MAC and PCS have been configuredfor themode andiface. E.g. a MAC driver may wish to re-enable theSerdes PHY here if it was previously disabled bymac_prepare().

Returns zero on success, or negative errno on failure which will bereported to the kernel log.

voidmac_link_down(structphylink_config*config,unsignedintmode,phy_interface_tinterface)

notification that the link has gone down

Parameters

structphylink_config*config

a pointer to astructphylink_config.

unsignedintmode

link autonegotiation mode

phy_interface_tinterface

linktypedefphy_interface_t mode

Description

Notifies the MAC that the link has gone down. This will not be calledunlessmac_link_up() has been previously called.

The MAC should stop processing packets for transmission and reception.phylink will have callednetif_carrier_off() to notify the networkingstack that the link has gone down, so MAC drivers should not make thiscall.

Ifmode isMLO_AN_INBAND, then this function must not prevent thelink coming up.

voidmac_link_up(structphylink_config*config,structphy_device*phy,unsignedintmode,phy_interface_tinterface,intspeed,intduplex,booltx_pause,boolrx_pause)

notification that the link has come up

Parameters

structphylink_config*config

a pointer to astructphylink_config.

structphy_device*phy

any attached phy (deprecated - please use LPI interfaces)

unsignedintmode

link autonegotiation mode

phy_interface_tinterface

linktypedefphy_interface_t mode

intspeed

link speed

intduplex

link duplex

booltx_pause

link transmit pause enablement status

boolrx_pause

link receive pause enablement status

Description

Notifies the MAC that the link has come up, and the parameters of thelink as seen from the MACs point of view. Ifmac_link_up() has beencalled previously, there will be an intervening call tomac_link_down()before this method will be subsequently called.

speed,duplex,tx_pause andrx_pause indicate the finalised linksettings, and should be used to configure the MAC block appropriatelywhere these settings are not automatically conveyed from the PCS block,or if in-band negotiation (as defined by phylink_autoneg_inband(mode))is disabled.

Note that when 802.3z in-band negotiation is in use, it is possiblethat the user wishes to override the pause settings, and this shouldbe allowed when considering the implementation of this method.

Once configured, the MAC may begin to process packets for transmissionand reception.

Interface type selection must be done inmac_config().

voidmac_disable_tx_lpi(structphylink_config*config)

disable LPI generation at the MAC

Parameters

structphylink_config*config

a pointer to astructphylink_config.

Description

Disable generation of LPI at the MAC, effectively preventing the MACfrom indicating that it is idle.

intmac_enable_tx_lpi(structphylink_config*config,u32timer,booltx_clk_stop)

configure and enable LPI generation at the MAC

Parameters

structphylink_config*config

a pointer to astructphylink_config.

u32timer

LPI timeout in microseconds.

booltx_clk_stop

allow xMII transmit clock to be stopped during LPI

Description

Configure the LPI timeout accordingly. This will only be called whenthe link is already up, to cater for situations where the hardwareneeds to be programmed according to the link speed.

Enable LPI generation at the MAC, and configure whether the xMII transmitclock may be stopped.

Return

0 on success. Please consult with rmk before returning an error.

intmac_wol_set(structphylink_config*config,u32wolopts,constu8*sopass)

configure the Wake-on-Lan parameters

Parameters

structphylink_config*config

a pointer to astructphylink_config.

u32wolopts

Bitmask ofWAKE_* flags for enabled Wake-On-Lan modes.

constu8*sopass

SecureOn(tm) password; meaningful only forWAKE_MAGICSECURE

Description

Enable the specified Wake-on-Lan options at the MAC. Options that thePHY can handle will have been removed fromwolopts.

The presence of this method enables phylink-managed WoL support.

Return

0 on success.

structphylink_pcs

PHYLINK PCS instance

Definition:

struct phylink_pcs {    unsigned long supported_interfaces[BITS_TO_LONGS(PHY_INTERFACE_MODE_MAX)];    const struct phylink_pcs_ops *ops;    struct phylink *phylink;    bool poll;    bool rxc_always_on;};

Members

supported_interfaces

describing which PHY_INTERFACE_MODE_xxxare supported by this PCS.

ops

a pointer to thestructphylink_pcs_ops structure

phylink

pointer tostructphylink_config

poll

poll the PCS for link changes

rxc_always_on

The MAC driver requires the reference clockto always be on. Standalone PCS drivers whichdo not have access to a PHY device can checkthis instead of PHY_F_RXC_ALWAYS_ON.

Description

This structure is designed to be embedded within the PCS private data,and will be passed between phylink and the PCS.

Thephylink member is private to phylink and must not be touched bythe PCS driver.

structphylink_pcs_ops

MAC PCS operations structure.

Definition:

struct phylink_pcs_ops {    int (*pcs_validate)(struct phylink_pcs *pcs, unsigned long *supported, const struct phylink_link_state *state);    unsigned int (*pcs_inband_caps)(struct phylink_pcs *pcs, phy_interface_t interface);    int (*pcs_enable)(struct phylink_pcs *pcs);    void (*pcs_disable)(struct phylink_pcs *pcs);    void (*pcs_pre_config)(struct phylink_pcs *pcs, phy_interface_t interface);    int (*pcs_post_config)(struct phylink_pcs *pcs, phy_interface_t interface);    void (*pcs_get_state)(struct phylink_pcs *pcs, unsigned int neg_mode, struct phylink_link_state *state);    int (*pcs_config)(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, const unsigned long *advertising, bool permit_pause_to_mac);    void (*pcs_an_restart)(struct phylink_pcs *pcs);    void (*pcs_link_up)(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, int speed, int duplex);    void (*pcs_disable_eee)(struct phylink_pcs *pcs);    void (*pcs_enable_eee)(struct phylink_pcs *pcs);    int (*pcs_pre_init)(struct phylink_pcs *pcs);};

Members

pcs_validate

validate the link configuration.

pcs_inband_caps

query inband support for interface mode.

pcs_enable

enable the PCS.

pcs_disable

disable the PCS.

pcs_pre_config

pre-mac_config method (for errata)

pcs_post_config

post-mac_config method (for arrata)

pcs_get_state

read the current MAC PCS link state from the hardware.

pcs_config

configure the MAC PCS for the selected mode and state.

pcs_an_restart

restart 802.3z BaseX autonegotiation.

pcs_link_up

program the PCS for the resolved link configuration(where necessary).

pcs_disable_eee

optional notification to PCS that EEE has been disabledat the MAC.

pcs_enable_eee

optional notification to PCS that EEE will be enabled atthe MAC.

pcs_pre_init

configure PCS components necessary for MAC hardwareinitialization e.g. RX clock for stmmac.

intpcs_validate(structphylink_pcs*pcs,unsignedlong*supported,conststructphylink_link_state*state)

validate the link configuration.

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

unsignedlong*supported

ethtool bitmask for supported link modes.

conststructphylink_link_state*state

a const pointer to astructphylink_link_state.

Description

Validate the interface mode, and advertising’s autoneg bit, removing anymedia ethtool link modes that would not be supportable from the supportedmask. Phylink will propagate the changes to the advertising mask. See thestructphylink_mac_opsvalidate() method.

Returns -EINVAL if the interface mode/autoneg mode is not supported.Returns non-zero positive if the link state can be supported.

unsignedintpcs_inband_caps(structphylink_pcs*pcs,phy_interface_tinterface)

query PCS in-band capabilities for interface mode.

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

phy_interface_tinterface

interface mode to be queried

Description

Returns zero if it is unknown what in-band signalling is supported by thePHY (e.g. because the PHY driver doesn’t implement the method.) Otherwise,returns a bit mask of the LINK_INBAND_* values fromenumlink_inband_signalling to describe which inband modes are supportedfor this interface mode.

intpcs_enable(structphylink_pcs*pcs)

enable the PCS.

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

voidpcs_disable(structphylink_pcs*pcs)

disable the PCS.

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

voidpcs_get_state(structphylink_pcs*pcs,unsignedintneg_mode,structphylink_link_state*state)

Read the current inband link state from the hardware

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

unsignedintneg_mode

link negotiation mode (PHYLINK_PCS_NEG_xxx)

structphylink_link_state*state

a pointer to astructphylink_link_state.

Description

Read the current inband link state from the MAC PCS, reporting thecurrent speed instate->speed, duplex mode instate->duplex, pausemode instate->pause using theMLO_PAUSE_RX andMLO_PAUSE_TX bits,negotiation completion state instate->an_complete, and link up stateinstate->link. If possible,state->lp_advertising should also bepopulated.

Note that theneg_mode parameter is always the PHYLINK_PCS_NEG_xxxstate, not MLO_AN_xxx.

intpcs_config(structphylink_pcs*pcs,unsignedintneg_mode,phy_interface_tinterface,constunsignedlong*advertising,boolpermit_pause_to_mac)

Configure the PCS mode and advertisement

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

unsignedintneg_mode

link negotiation mode (see below)

phy_interface_tinterface

interface mode to be used

constunsignedlong*advertising

adertisement ethtool link mode mask

boolpermit_pause_to_mac

permit forwarding pause resolution to MAC

Description

Configure the PCS for the operating mode, the interface mode, and setthe advertisement mask.permit_pause_to_mac indicates whether thehardware may forward the pause mode resolution to the MAC.

When operating inMLO_AN_INBAND, inband should always be enabled,otherwise inband should be disabled.

For SGMII, there is no advertisement from the MAC side, the PCS shouldbe programmed to acknowledge the inband word from the PHY.

For 1000BASE-X, the advertisement should be programmed into the PCS.

For most 10GBASE-R, there is no advertisement.

Theneg_mode argument should be tested via the phylink_mode_*() family offunctions, or for PCS that set pcs->neg_mode true, should be testedagainst the PHYLINK_PCS_NEG_* definitions.

pcs_config() will be called when configuration of the PCS is requiredor when the advertisement is possibly updated. It must not unnecessarilydisrupt an established link.

When an autonegotiation restart is required for 802.3z modes, .pcs_config()should return a positive non-zero integer (e.g. 1) to indicate to phylinkto call thepcs_an_restart() method.

voidpcs_an_restart(structphylink_pcs*pcs)

restart 802.3z BaseX autonegotiation

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

Description

When PCS ops are present, this overridesmac_an_restart() instructphylink_mac_ops.

voidpcs_link_up(structphylink_pcs*pcs,unsignedintneg_mode,phy_interface_tinterface,intspeed,intduplex)

program the PCS for the resolved link configuration

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

unsignedintneg_mode

link negotiation mode (see below)

phy_interface_tinterface

linktypedefphy_interface_t mode

intspeed

link speed

intduplex

link duplex

Description

This call will be made just beforemac_link_up() to inform the PCS ofthe resolved link parameters. For example, a PCS operating in SGMIImode without in-band AN needs to be manually configured for the linkand duplex setting. Otherwise, this should be a no-op.

Themode argument should be tested via the phylink_mode_*() family offunctions, or for PCS that set pcs->neg_mode true, should be testedagainst the PHYLINK_PCS_NEG_* definitions.

voidpcs_disable_eee(structphylink_pcs*pcs)

Disable EEE at the PCS

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs

Description

Optional method informing the PCS that EEE has been disabled at the MAC.

voidpcs_enable_eee(structphylink_pcs*pcs)

Enable EEE at the PCS

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs

Description

Optional method informing the PCS that EEE is about to be enabled at the MAC.

intpcs_pre_init(structphylink_pcs*pcs)

Configure PCS components necessary for MAC initialization

Parameters

structphylink_pcs*pcs

a pointer to astructphylink_pcs.

Description

This function can be called by MAC drivers through thephylink_pcs_pre_init() wrapper, before their hardware is initialized. Itshould not be called after the link is brought up, as reconfiguring the PCSat this point could break the link.

Some MAC devices require specific hardware initialization to be performed bytheir associated PCS device before they can properly initialize their ownhardware. An example of this is the initialization of stmmac controllers,which requires an active REF_CLK signal to be provided by the PHY/PCS.

By callingphylink_pcs_pre_init(), MAC drivers can ensure that the PCS issetup in a way that allows for successful hardware initialization.

The specific configuration performed bypcs_pre_init() is dependent on themodel of PCS and the requirements of the MAC device attached to it. PCSdriver authors should consider whether their target device is to be used inconjunction with a MAC device whose driver callsphylink_pcs_pre_init(). MACdriver authors should document their requirements for the PCSpre-initialization.

intphylink_get_link_timer_ns(phy_interface_tinterface)

return the PCS link timer value

Parameters

phy_interface_tinterface

linktypedefphy_interface_t mode

Description

Return the PCS link timer setting in nanoseconds for the PHYinterfacemode, or -EINVAL if not appropriate.

boolphylink_mac_implements_lpi(conststructphylink_mac_ops*ops)

determine if MAC implements LPI ops

Parameters

conststructphylink_mac_ops*ops

phylink_mac_ops structure

Description

Returns true if the phylink MAC operations structure indicates that theLPI operations have been implemented, false otherwise.

structphylink

internal data type for phylink

Definition:

struct phylink {};

Members

voidphylink_set_port_modes(unsignedlong*mask)

set the port type modes in the ethtool mask

Parameters

unsignedlong*mask

ethtool link mode mask

Description

Sets all the port type modes in the ethtool mask. MAC drivers shoulduse this in their ‘validate’ callback.

intphylink_interface_max_speed(phy_interface_tinterface)

get the maximum speed of a phy interface

Parameters

phy_interface_tinterface

phy interface mode defined bytypedefphy_interface_t

Description

Determine the maximum speed of a phy interface. This is intended to helpdetermine the correct speed to pass to the MAC when the phy is performingrate matching.

Return

The maximum speed ofinterface

unsignedlongphylink_caps_to_link_caps(unsignedlongcaps)

Convert a set of MAC capabilities LINK caps

Parameters

unsignedlongcaps

A set of MAC capabilities

Return

The corresponding set of LINK_CAPA as defined in phy-caps.h

voidphylink_caps_to_linkmodes(unsignedlong*linkmodes,unsignedlongcaps)

Convert capabilities to ethtool link modes

Parameters

unsignedlong*linkmodes

ethtool linkmode mask (must be already initialised)

unsignedlongcaps

bitmask of MAC capabilities

Description

Set all possible pause, speed and duplex linkmodes inlinkmodes that aresupported by thecaps.linkmodes must have been initialised previously.

voidphylink_limit_mac_speed(structphylink_config*config,u32max_speed)

limit the phylink_config to a maximum speed

Parameters

structphylink_config*config

pointer to astructphylink_config

u32max_speed

maximum speed

Description

Mask off MAC capabilities for speeds higher than themax_speed parameter.Any further motifications of config.mac_capabilities will override this.

unsignedlongphylink_cap_from_speed_duplex(intspeed,unsignedintduplex)

Get mac capability from speed/duplex

Parameters

intspeed

the speed to search for

unsignedintduplex

the duplex to search for

Description

Find the mac capability for a given speed and duplex.

Return

A mask with the mac capability patchingspeed andduplex, or 0 ifthere were no matches.

unsignedlongphylink_get_capabilities(phy_interface_tinterface,unsignedlongmac_capabilities,intrate_matching)

get capabilities for a given MAC

Parameters

phy_interface_tinterface

phy interface mode defined bytypedefphy_interface_t

unsignedlongmac_capabilities

bitmask of MAC capabilities

intrate_matching

type of rate matching being performed

Description

Get the MAC capabilities that are supported by theinterface mode andmac_capabilities.

voidphylink_validate_mask_caps(unsignedlong*supported,structphylink_link_state*state,unsignedlongmac_capabilities)

Restrict link modes based on caps

Parameters

unsignedlong*supported

ethtool bitmask for supported link modes.

structphylink_link_state*state

pointer to astructphylink_link_state.

unsignedlongmac_capabilities

bitmask of MAC capabilities

Description

Calculate the supported link modes based onmac_capabilities, and restrictsupported andstate based on that. Use this function if your capabiliiesaren’t constant, such as if they vary depending on the interface.

voidphylink_pcs_neg_mode(structphylink*pl,structphylink_pcs*pcs,phy_interface_tinterface,constunsignedlong*advertising)

helper to determine PCS inband mode

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structphylink_pcs*pcs

a pointer tostructphylink_pcs

phy_interface_tinterface

interface mode to be used

constunsignedlong*advertising

adertisement ethtool link mode mask

Description

Determines the negotiation mode to be used by the PCS, and returnsone of:

  • PHYLINK_PCS_NEG_NONE: interface mode does not support inband

  • PHYLINK_PCS_NEG_OUTBAND: an out of band mode (e.g. reading the PHY)will be used.

  • PHYLINK_PCS_NEG_INBAND_DISABLED: inband mode selected but autonegdisabled

  • PHYLINK_PCS_NEG_INBAND_ENABLED: inband mode selected and autoneg enabled

Note

this is for cases where the PCS itself is involved in negotiation(e.g. Clause 37, SGMII and similar) not Clause 73.

intphylink_set_fixed_link(structphylink*pl,conststructphylink_link_state*state)

set the fixed link

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

conststructphylink_link_state*state

a pointer to astructphylink_link_state.

Description

This function is used when the link parameters are known and do not change,making it suitable for certain types of network connections.

Return

zero on success or negative error code.

structphylink*phylink_create(structphylink_config*config,conststructfwnode_handle*fwnode,phy_interface_tiface,conststructphylink_mac_ops*mac_ops)

create a phylink instance

Parameters

structphylink_config*config

a pointer to the targetstructphylink_config

conststructfwnode_handle*fwnode

a pointer to astructfwnode_handle describing the networkinterface

phy_interface_tiface

the desired link mode defined bytypedefphy_interface_t

conststructphylink_mac_ops*mac_ops

a pointer to astructphylink_mac_ops for the MAC.

Description

Create a new phylink instance, and parse the link parameters found innp.This will parse in-band modes, fixed-link or SFP configuration.

Note

the rtnl lock must not be held when calling this function.

Returns a pointer to astructphylink, or an error-pointer value. Usersmust useIS_ERR() to check for errors from this function.

voidphylink_destroy(structphylink*pl)

cleanup and destroy the phylink instance

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Destroy a phylink instance. Any PHY that has been attached must have beencleaned up viaphylink_disconnect_phy() prior to calling this function.

Note

the rtnl lock must not be held when calling this function.

boolphylink_expects_phy(structphylink*pl)

Determine if phylink expects a phy to be attached

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

When using fixed-link mode, or in-band mode with 1000base-X or 2500base-X,no PHY is needed.

Returns true if phylink will be expecting a PHY.

intphylink_connect_phy(structphylink*pl,structphy_device*phy)

connect a PHY to the phylink instance

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structphy_device*phy

a pointer to astructphy_device.

Description

Connectphy to the phylink instance specified bypl by callingphy_attach_direct(). Configure thephy according to the MAC driver’scapabilities, start the PHYLIB state machine and enable any interruptsthat the PHY supports.

This updates the phylink’s ethtool supported and advertising link modemasks.

Returns 0 on success or a negative errno.

intphylink_of_phy_connect(structphylink*pl,structdevice_node*dn,u32flags)

connect the PHY specified in the DT mode.

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structdevice_node*dn

a pointer to astructdevice_node.

u32flags

PHY-specific flags to communicate to the PHY device driver

Description

Connect the phy specified in the device nodedn to the phylink instancespecified bypl. Actions specified inphylink_connect_phy() will beperformed.

Returns 0 on success or a negative errno.

intphylink_fwnode_phy_connect(structphylink*pl,conststructfwnode_handle*fwnode,u32flags)

connect the PHY specified in the fwnode.

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

conststructfwnode_handle*fwnode

a pointer to astructfwnode_handle.

u32flags

PHY-specific flags to communicate to the PHY device driver

Description

Connect the phy specifiedfwnode to the phylink instance specifiedbypl.

Returns 0 on success or a negative errno.

voidphylink_disconnect_phy(structphylink*pl)

disconnect any PHY attached to the phylink instance.

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Disconnect any current PHY from the phylink instance described bypl.

voidphylink_mac_change(structphylink*pl,boolup)

notify phylink of a change in MAC state

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

boolup

indicates whether the link is currently up.

Description

The MAC driver should call this driver when the state of its linkchanges (eg, link failure, new negotiation results, etc.)

voidphylink_pcs_change(structphylink_pcs*pcs,boolup)

notify phylink of a change to PCS link state

Parameters

structphylink_pcs*pcs

pointer tostructphylink_pcs

boolup

indicates whether the link is currently up.

Description

The PCS driver should call this when the state of its link changes(e.g. link failure, new negotiation results, etc.) Note: it shouldnot determine “up” by reading the BMSR. If in doubt about the linkstate at interrupt time, then pass true ifpcs_get_state() returnsthe latched link-down state, otherwise pass false.

voidphylink_start(structphylink*pl)

start a phylink instance

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Start the phylink instance specified bypl, configuring the MAC for thedesired link mode(s) and negotiation style. This should be called from thenetwork device driver’sstructnet_device_opsndo_open() method.

voidphylink_stop(structphylink*pl)

stop a phylink instance

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Stop the phylink instance specified bypl. This should be called from thenetwork device driver’sstructnet_device_opsndo_stop() method. Thenetwork device’s carrier state should not be changed prior to calling thisfunction.

This will synchronously bring down the link if the link is not alreadydown (in other words, it will trigger amac_link_down() method call.)

voidphylink_rx_clk_stop_block(structphylink*pl)

block PHY ability to stop receive clock in LPI

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Disable the PHY’s ability to stop the receive clock while the receive pathis in EEE LPI state, until the number of calls tophylink_rx_clk_stop_block()are balanced by calls tophylink_rx_clk_stop_unblock().

voidphylink_rx_clk_stop_unblock(structphylink*pl)

unblock PHY ability to stop receive clock

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

All calls tophylink_rx_clk_stop_block() must be balanced with acorresponding call tophylink_rx_clk_stop_unblock() to restore the PHYsability to stop the receive clock when the receive path is in EEE LPI mode.

voidphylink_suspend(structphylink*pl,boolmac_wol)

handle a network device suspend event

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

boolmac_wol

true if the MAC needs to receive packets for Wake-on-Lan

Description

Handle a network device suspend event. There are several cases:

  • If Wake-on-Lan is not active, we can bring down the link betweenthe MAC and PHY by callingphylink_stop().

  • If Wake-on-Lan is active, and being handled only by the PHY, wecan also bring down the link between the MAC and PHY.

  • If Wake-on-Lan is active, but being handled by the MAC, the MACstill needs to receive packets, so we can not bring the link down.

Note

when phylink managed Wake-on-Lan is in use,mac_wol is ignored.(structphylink_mac_ops.mac_set_wol populated.)

voidphylink_prepare_resume(structphylink*pl)

prepare to resume a network device

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Optional, but if called must be called prior tophylink_resume().

Prepare to resume a network device, preparing the PHY as necessary.

voidphylink_resume(structphylink*pl)

handle a network device resume event

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Undo the effects ofphylink_suspend(), returning the link to anoperational state.

voidphylink_ethtool_get_wol(structphylink*pl,structethtool_wolinfo*wol)

get the wake on lan parameters for the PHY

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_wolinfo*wol

a pointer tostructethtool_wolinfo to hold the read parameters

Description

Read the wake on lan parameters from the PHY attached to the phylinkinstance specified bypl. If no PHY is currently attached, report nosupport for wake on lan.

intphylink_ethtool_set_wol(structphylink*pl,structethtool_wolinfo*wol)

set wake on lan parameters

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_wolinfo*wol

a pointer tostructethtool_wolinfo for the desired parameters

Description

Set the wake on lan parameters for the PHY attached to the phylinkinstance specified bypl. If no PHY is attached, returnsEOPNOTSUPPerror.

Returns zero on success or negative errno code.

intphylink_ethtool_ksettings_get(structphylink*pl,structethtool_link_ksettings*kset)

get the current link settings

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_link_ksettings*kset

a pointer to astructethtool_link_ksettings to hold link settings

Description

Read the current link settings for the phylink instance specified bypl.This will be the link settings read from the MAC, PHY or fixed linksettings depending on the current negotiation mode.

intphylink_ethtool_ksettings_set(structphylink*pl,conststructethtool_link_ksettings*kset)

set the link settings

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

conststructethtool_link_ksettings*kset

a pointer to astructethtool_link_ksettings for the desired modes

intphylink_ethtool_nway_reset(structphylink*pl)

restart negotiation

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

Restart negotiation for the phylink instance specified bypl. This willcause any attached phy to restart negotiation with the link partner, andif the MAC is in a BaseX mode, the MAC will also be requested to restartnegotiation.

Returns zero on success, or negative error code.

voidphylink_ethtool_get_pauseparam(structphylink*pl,structethtool_pauseparam*pause)

get the current pause parameters

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_pauseparam*pause

a pointer to astructethtool_pauseparam

intphylink_ethtool_set_pauseparam(structphylink*pl,structethtool_pauseparam*pause)

set the current pause parameters

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_pauseparam*pause

a pointer to astructethtool_pauseparam

intphylink_get_eee_err(structphylink*pl)

read the energy efficient ethernet error counter

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create().

Description

Read the Energy Efficient Ethernet error counter from the PHY associatedwith the phylink instance specified bypl.

Returns positive error counter value, or negative error code.

intphylink_ethtool_get_eee(structphylink*pl,structethtool_keee*eee)

read the energy efficient ethernet parameters

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_keee*eee

a pointer to astructethtool_keee for the read parameters

intphylink_ethtool_set_eee(structphylink*pl,structethtool_keee*eee)

set the energy efficient ethernet parameters

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structethtool_keee*eee

a pointer to astructethtool_keee for the desired parameters

intphylink_mii_ioctl(structphylink*pl,structifreq*ifr,intcmd)

generic mii ioctl interface

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

structifreq*ifr

a pointer to astructifreq for socket ioctls

intcmd

ioctl cmd to execute

Description

Perform the specified MII ioctl on the PHY attached to the phylink instancespecified bypl. If no PHY is attached, emulate the presence of the PHY.

SIOCGMIIPHY:

read register from the current PHY.

SIOCGMIIREG:

read register from the specified PHY.

SIOCSMIIREG:

set a register on the specified PHY.

Return

zero on success or negative error code.

intphylink_speed_down(structphylink*pl,boolsync)

set the non-SFP PHY to lowest speed supported by both link partners

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

boolsync

perform action synchronously

Description

If we have a PHY that is not part of a SFP module, then set the speedas described in thephy_speed_down() function. Please see this functionfor a description of thesync parameter.

Returns zero if there is no PHY, otherwise as perphy_speed_down().

intphylink_speed_up(structphylink*pl)

restore the advertised speeds prior to the call tophylink_speed_down()

Parameters

structphylink*pl

a pointer to astructphylink returned fromphylink_create()

Description

If we have a PHY that is not part of a SFP module, then restore thePHY speeds as perphy_speed_up().

Returns zero if there is no PHY, otherwise as perphy_speed_up().

voidphylink_decode_usxgmii_word(structphylink_link_state*state,uint16_tlpa)

decode the USXGMII word from a MAC PCS

Parameters

structphylink_link_state*state

a pointer to astructphylink_link_state.

uint16_tlpa

a 16 bit value which stores the USXGMII auto-negotiation word

Description

Helper for MAC PCS supporting the USXGMII protocol and the auto-negotiationcode word. Decode the USXGMII code word and populate the corresponding fields(speed, duplex) into the phylink_link_state structure.

voidphylink_decode_usgmii_word(structphylink_link_state*state,uint16_tlpa)

decode the USGMII word from a MAC PCS

Parameters

structphylink_link_state*state

a pointer to astructphylink_link_state.

uint16_tlpa

a 16 bit value which stores the USGMII auto-negotiation word

Description

Helper for MAC PCS supporting the USGMII protocol and the auto-negotiationcode word. Decode the USGMII code word and populate the corresponding fields(speed, duplex) into the phylink_link_state structure. The structure for thisword is the same as the USXGMII word, except it only supports speeds up to1Gbps.

voidphylink_mii_c22_pcs_decode_state(structphylink_link_state*state,unsignedintneg_mode,u16bmsr,u16lpa)

Decode MAC PCS state from MII registers

Parameters

structphylink_link_state*state

a pointer to astructphylink_link_state.

unsignedintneg_mode

link negotiation mode (PHYLINK_PCS_NEG_xxx)

u16bmsr

The value of theMII_BMSR register

u16lpa

The value of theMII_LPA register

Description

Helper for MAC PCS supporting the 802.3 clause 22 register set forclause 37 negotiation and/or SGMII control.

Parse the Clause 37 or Cisco SGMII link partner negotiation word intothe phylinkstate structure. This is suitable to be used for implementingthepcs_get_state() member of thestructphylink_pcs_ops structure ifaccessingbmsr andlpa cannot be done with MDIO directly.

voidphylink_mii_c22_pcs_get_state(structmdio_device*pcs,unsignedintneg_mode,structphylink_link_state*state)

read the MAC PCS state

Parameters

structmdio_device*pcs

a pointer to astructmdio_device.

unsignedintneg_mode

link negotiation mode (PHYLINK_PCS_NEG_xxx)

structphylink_link_state*state

a pointer to astructphylink_link_state.

Description

Helper for MAC PCS supporting the 802.3 clause 22 register set forclause 37 negotiation and/or SGMII control.

Read the MAC PCS state from the MII device configured inconfig andparse the Clause 37 or Cisco SGMII link partner negotiation word intothe phylinkstate structure. This is suitable to be directly pluggedinto thepcs_get_state() member of thestructphylink_pcs_opsstructure.

intphylink_mii_c22_pcs_encode_advertisement(phy_interface_tinterface,constunsignedlong*advertising)

configure the clause 37 PCS advertisement

Parameters

phy_interface_tinterface

the PHY interface mode being configured

constunsignedlong*advertising

the ethtool advertisement mask

Description

Helper for MAC PCS supporting the 802.3 clause 22 register set forclause 37 negotiation and/or SGMII control.

Encode the clause 37 PCS advertisement as specified byinterface andadvertising.

Return

The new value foradv, or-EINVAL if it should not be changed.

intphylink_mii_c22_pcs_config(structmdio_device*pcs,phy_interface_tinterface,constunsignedlong*advertising,unsignedintneg_mode)

configure clause 22 PCS

Parameters

structmdio_device*pcs

a pointer to astructmdio_device.

phy_interface_tinterface

the PHY interface mode being configured

constunsignedlong*advertising

the ethtool advertisement mask

unsignedintneg_mode

PCS negotiation mode

Description

Configure a Clause 22 PCS PHY with the appropriate negotiationparameters for themode,interface andadvertising parameters.Returns negative error number on failure, zero if the advertisementhas not changed, or positive if there is a change.

voidphylink_mii_c22_pcs_an_restart(structmdio_device*pcs)

restart 802.3z autonegotiation

Parameters

structmdio_device*pcs

a pointer to astructmdio_device.

Description

Helper for MAC PCS supporting the 802.3 clause 22 register set forclause 37 negotiation.

Restart the clause 37 negotiation with the link partner. This issuitable to be directly plugged into thepcs_get_state() memberof thestructphylink_pcs_ops structure.

SFP support

structsfp_bus

internal representation of a sfp bus

Definition:

struct sfp_bus {};

Members

structsfp_eeprom_id

raw SFP module identification information

Definition:

struct sfp_eeprom_id {    struct sfp_eeprom_base base;    struct sfp_eeprom_ext ext;};

Members

base

base SFP module identification structure

ext

extended SFP module identification structure

Description

See the SFF-8472 specification and related documents for the definitionof these structure members. This can be obtained fromhttps://www.snia.org/technology-communities/sff/specifications

structsfp_module_caps

sfp module capabilities

Definition:

struct sfp_module_caps {    unsigned long interfaces[BITS_TO_LONGS(PHY_INTERFACE_MODE_MAX)];    unsigned long link_modes[BITS_TO_LONGS(__ETHTOOL_LINK_MODE_MASK_NBITS)];    bool may_have_phy;    u8 port;};

Members

interfaces

bitmap of interfaces that the module may support

link_modes

bitmap of ethtool link modes that the module may support

may_have_phy

indicate whether the module may have an ethernet PHYThere is no way to be sure that a module has a PHY as the EEPROMdoesn’t contain this information. When set, this does not mean thatthe module definitely has a PHY.

port

one of ethtoolPORT_* definitions, parsed from the moduleEEPROM, orPORT_OTHER if the port type is not known.

structsfp_upstream_ops

upstream operations structure

Definition:

struct sfp_upstream_ops {    void (*attach)(void *priv, struct sfp_bus *bus);    void (*detach)(void *priv, struct sfp_bus *bus);    int (*module_insert)(void *priv, const struct sfp_eeprom_id *id);    void (*module_remove)(void *priv);    int (*module_start)(void *priv);    void (*module_stop)(void *priv);    void (*link_down)(void *priv);    void (*link_up)(void *priv);    int (*connect_phy)(void *priv, struct phy_device *);    void (*disconnect_phy)(void *priv, struct phy_device *);};

Members

attach

called when the sfp socket driver is bound to the upstream(mandatory).

detach

called when the sfp socket driver is unbound from the upstream(mandatory).

module_insert

called after a module has been detected to determinewhether the module is supported for the upstream device.

module_remove

called after the module has been removed.

module_start

called after the PHY probe step

module_stop

called before the PHY is removed

link_down

called when the link is non-operational for whateverreason.

link_up

called when the link is operational.

connect_phy

called when an I2C accessible PHY has been detectedon the module.

disconnect_phy

called when a module with an I2C accessible PHY hasbeen removed.

phy_interface_tsfp_select_interface(structsfp_bus*bus,constunsignedlong*link_modes)

Select appropriate phy_interface_t mode

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

constunsignedlong*link_modes

ethtool link modes mask

Description

Derive the phy_interface_t mode for the SFP module from the linkmodes mask.

voidsfp_bus_put(structsfp_bus*bus)

put a reference on thestructsfp_bus

Parameters

structsfp_bus*bus

thestructsfp_bus found viasfp_bus_find_fwnode()

Description

Put a reference on thestructsfp_bus and free the underlying structureif this was the last reference.

intsfp_get_module_info(structsfp_bus*bus,structethtool_modinfo*modinfo)

Get the ethtool_modinfo for a SFP module

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

structethtool_modinfo*modinfo

astructethtool_modinfo

Description

Fill in the type and eeprom_len parameters inmodinfo for a module onthe sfp bus specified bybus.

Returns 0 on success or a negative errno number.

intsfp_get_module_eeprom(structsfp_bus*bus,structethtool_eeprom*ee,u8*data)

Read the SFP module EEPROM

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

structethtool_eeprom*ee

astructethtool_eeprom

u8*data

buffer to contain the EEPROM data (must be at leastee->len bytes)

Description

Read the EEPROM as specified by the suppliedee. See the documentationforstructethtool_eeprom for the region to be read.

Returns 0 on success or a negative errno number.

intsfp_get_module_eeprom_by_page(structsfp_bus*bus,conststructethtool_module_eeprom*page,structnetlink_ext_ack*extack)

Read a page from the SFP module EEPROM

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

conststructethtool_module_eeprom*page

astructethtool_module_eeprom

structnetlink_ext_ack*extack

extack for reporting problems

Description

Read an EEPROM page as specified by the suppliedpage. See thedocumentation forstructethtool_module_eeprom for the page to be read.

Returns 0 on success or a negative errno number. More errorinformation might be provided via extack

voidsfp_upstream_start(structsfp_bus*bus)

Inform the SFP that the network device is up

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

Description

Inform the SFP socket that the network device is now up, so that themodule can be enabled by allowing TX_DISABLE to be deasserted. Thisshould be called from the network device driver’sstructnet_device_opsndo_open() method.

voidsfp_upstream_stop(structsfp_bus*bus)

Inform the SFP that the network device is down

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

Description

Inform the SFP socket that the network device is now up, so that themodule can be disabled by asserting TX_DISABLE, disabling the laserin optical modules. This should be called from the network devicedriver’sstructnet_device_opsndo_stop() method.

voidsfp_upstream_set_signal_rate(structsfp_bus*bus,unsignedintrate_kbd)

set data signalling rate

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

unsignedintrate_kbd

signalling rate in units of 1000 baud

Description

Configure the rate select settings on the SFP module for the signallingrate (not the same as the data rate).

Locks that may be held:

Phylink’s state_mutexrtnl lockSFP’s sm_mutex

structsfp_bus*sfp_bus_find_fwnode(conststructfwnode_handle*fwnode)

parse and locate the SFP bus from fwnode

Parameters

conststructfwnode_handle*fwnode

firmware node for the parent device (MAC or PHY)

Description

Parse the parent device’s firmware node for a SFP bus, and locatethe sfp_bus structure, incrementing its reference count. This mustbe put viasfp_bus_put() when done.

  • corresponding to the errors detailed forfwnode_property_get_reference_args().

  • -ENOMEM if we failed to allocate the bus.

  • an error from the upstream’sconnect_phy() method.

Return

  • on success, a pointer to the sfp_bus structure,

  • NULL if no SFP is specified,

  • on failure, an error pointer value:

intsfp_bus_add_upstream(structsfp_bus*bus,void*upstream,conststructsfp_upstream_ops*ops)

parse and register the neighbouring device

Parameters

structsfp_bus*bus

thestructsfp_bus found viasfp_bus_find_fwnode()

void*upstream

the upstream private data

conststructsfp_upstream_ops*ops

the upstream’sstructsfp_upstream_ops

Description

Add upstream driver for the SFP bus, and if the bus is complete, registerthe SFP bus usingsfp_register_upstream(). This takes a reference on thebus, so it is safe to put the bus after this call.

  • corresponding to the errors detailed forfwnode_property_get_reference_args().

  • -ENOMEM if we failed to allocate the bus.

  • an error from the upstream’sconnect_phy() method.

Return

  • on success, a pointer to the sfp_bus structure,

  • NULL if no SFP is specified,

  • on failure, an error pointer value:

voidsfp_bus_del_upstream(structsfp_bus*bus)

Delete a sfp bus

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

Description

Delete a previously registered upstream connection for the SFPmodule.bus should have been added bysfp_bus_add_upstream().

constchar*sfp_get_name(structsfp_bus*bus)

Get the SFP device name

Parameters

structsfp_bus*bus

a pointer to thestructsfp_bus structure for the sfp module

Description

Gets the SFP device’s name, ifbus has a registered socket. Callers musthold RTNL, and the returned name is only valid until RTNL is released.

Return

  • The name of the SFP device registered withsfp_register_socket()

  • NULL if no device was registered onbus