Movatterモバイル変換

Skip to main content

‘NN sparsity’ directory

See Also

Links

“Convolutional Differentiable Logic Gate Networks ”, Petersen et al 2024

Convolutional Differentiable Logic Gate Networks

“LoRA vs Full Fine-Tuning: An Illusion of Equivalence ”, Shuttleworth et al 2024

LoRA vs Full Fine-tuning: An Illusion of Equivalence

“On the Complexity of Neural Computation in Superposition ”, Adler & Shavit 2024

On the Complexity of Neural Computation in Superposition

“GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music ”

GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music

“High-Performance Deep Spiking Neural Networks With 0.3 Spikes per Neuron ”, Stanojevic et al 2024

High-performance deep spiking neural networks with 0.3 spikes per neuron

“LoRA Learns Less and Forgets Less ”, Biderman et al 2024

LoRA Learns Less and Forgets Less

“CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models ”, Lee et al 2024

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

“Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? ”, Jin et al 2024

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

“ReFT: Representation Finetuning for Language Models ”, Wu et al 2024

ReFT: Representation Finetuning for Language Models

“Mechanistic Design and Scaling of Hybrid Architectures ”, Poli et al 2024

Mechanistic Design and Scaling of Hybrid Architectures

“LTE: Training Neural Networks from Scratch With Parallel Low-Rank Adapters ”, Huh et al 2024

LTE: Training Neural Networks from Scratch with Parallel Low-Rank Adapters

“Scaling Laws for Fine-Grained Mixture of Experts ”, Krajewski et al 2024

Scaling Laws for Fine-Grained Mixture of Experts

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet ”

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

“Exponentially Faster Language Modeling ”, Belcak & Wattenhofer 2023

Exponentially Faster Language Modeling

“DiLoCo: Distributed Low-Communication Training of Language Models ”, Douillard et al 2023

DiLoCo: Distributed Low-Communication Training of Language Models

“Language Models Are Super Mario (DARE): Absorbing Abilities from Homologous Models As a Free Lunch ”, Yu et al 2023

Language Models are Super Mario (DARE): Absorbing Abilities from Homologous Models as a Free Lunch

“ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models ”, Luo et al 2023

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

“An Exact Mapping from ReLU Networks to Spiking Neural Networks ”, Stanojevic et al 2023

An exact mapping from ReLU networks to spiking neural networks

“The Impact of Depth and Width on Transformer Language Model Generalization ”, Petty et al 2023

The Impact of Depth and Width on Transformer Language Model Generalization

“Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time ”, Liu et al 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

“Fast Feedforward Networks ”, Belcak & Wattenhofer 2023

Fast Feedforward Networks

“Any Deep ReLU Network Is Shallow ”, Villani & Schoots 2023

Any Deep ReLU Network is Shallow

“Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning [Updated] ”, Lie 2023

Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning [Updated]

“JaxPruner: A Concise Library for Sparsity Research ”, Lee et al 2023

JaxPruner: A concise library for sparsity research

“Reusing Deep Neural Network Models through Model Re-Engineering ”, Qi et al 2023

Reusing Deep Neural Network Models through Model Re-engineering

“Accelerating Large GPT Training With Sparse Pre-Training and Dense Fine-Tuning ”, Thangarasa 2023

Accelerating Large GPT Training with Sparse Pre-Training and Dense Fine-Tuning

“MUX-PLMs: Pre-Training Language Models With Data Multiplexing ”, Murahari et al 2023

MUX-PLMs: Pre-training Language Models with Data Multiplexing

“DataMUX: Data Multiplexing for Neural Networks ”, Murahari et al 2023

DataMUX: Data Multiplexing for Neural Networks

“Deep Differentiable Logic Gate Networks ”, Petersen et al 2022

Deep Differentiable Logic Gate Networks

“The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers ”, Li et al 2022

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

“Neural Net Sparsity ”, Gwern 2022

Neural Net Sparsity

“Noise Transforms Feed-Forward Networks into Sparse Coding Networks ”, Anonymous 2022

Noise Transforms Feed-Forward Networks into Sparse Coding Networks

“Exploring Low Rank Training of Deep Neural Networks ”, Kamalakara et al 2022

Exploring Low Rank Training of Deep Neural Networks

“Monolith: Real Time Recommendation System With Collisionless Embedding Table ”, Liu et al 2022

Monolith: Real Time Recommendation System With Collisionless Embedding Table

“More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK) ”, Liu et al 2022

More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity (SLaK)

“Building Machine Translation Systems for the Next Thousand Languages ”, Bapna et al 2022

Building Machine Translation Systems for the Next Thousand Languages

“Monarch: Expressive Structured Matrices for Efficient and Accurate Training ”, Dao et al 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

“Efficient Language Modeling With Sparse All-MLP ”, Yu et al 2022

Efficient Language Modeling with Sparse All-MLP

“NeuPL: Neural Population Learning ”, Liu et al 2022

NeuPL: Neural Population Learning

“Datamodels: Predicting Predictions from Training Data ”, Ilyas et al 2022

Datamodels: Predicting Predictions from Training Data

“Spiking Neural Networks and Their Applications: A Review ”, Yamazaki et al 2022

Spiking Neural Networks and Their Applications: A Review

“Persia: An Open, Hybrid System Scaling Deep Learning-Based Recommenders up to 100 Trillion Parameters ”, Lian et al 2021

Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters

“EvilModel: Hiding Malware Inside of Neural Network Models ”, Wang et al 2021

EvilModel: Hiding Malware Inside of Neural Network Models

“LoRA: Low-Rank Adaptation of Large Language Models ”, Hu et al 2021

LoRA: Low-Rank Adaptation of Large Language Models

“On the Distribution, Sparsity, and Inference-Time Quantization of Attention Values in Transformers ”, Ji et al 2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

“The Neural Basis of Intelligence in Fine-Grained Cortical Topographies ”, Feilong et al 2021

The neural basis of intelligence in fine-grained cortical topographies

“Clusterability in Neural Networks ”, Filan et al 2021

Clusterability in Neural Networks

“Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks ”, Hoefler et al 2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

“Scaling down Deep Learning ”, Greydanus 2020

Scaling down Deep Learning

“Extreme Model Compression for On-Device Natural Language Understanding ”, Sathyendra et al 2020

Extreme Model Compression for On-device Natural Language Understanding

“Training Independent Subnetworks for Robust Prediction ”, Havasi et al 2020

Training independent subnetworks for robust prediction

“EventProp: Event-Based Backpropagation Can Compute Exact Gradients for Spiking Neural Networks ”, Wunderlich & Pehle 2020

EventProp: Event-Based Backpropagation can compute Exact Gradients for Spiking Neural Networks

“On Linear Identifiability of Learned Representations ”, Roeder et al 2020

On Linear Identifiability of Learned Representations

“Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited ”, Maddox et al 2020

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

“Bayesian Deep Learning and a Probabilistic Perspective of Generalization ”, Wilson & Izmailov 2020

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

“Neural Arithmetic Units ”, Madsen & Johansen 2020

Neural Arithmetic Units

“Linear Mode Connectivity and the Lottery Ticket Hypothesis ”, Frankle et al 2019

Linear Mode Connectivity and the Lottery Ticket Hypothesis

“Learning to Seek: Autonomous Source Seeking With Deep Reinforcement Learning Onboard a Nano Drone Microcontroller ”, Duisterhof et al 2019

Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

“Does Learning Require Memorization? A Short Tale about a Long Tail ”, Feldman 2019

Does Learning Require Memorization? A Short Tale about a Long Tail

“Weight Agnostic Neural Networks ”, Gaier & Ha 2019

Weight Agnostic Neural Networks

“StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-To-End Universal Style Transfer Networks ”, An et al 2019

StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-to-End Universal Style Transfer Networks

“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ”, Tan & Le 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

“Superposition of Many Models into One ”, Cheung et al 2019

Superposition of many models into one

“Playing Atari With Six Neurons ”, Cuccu et al 2018

Playing Atari with Six Neurons

“Measuring the Intrinsic Dimension of Objective Landscapes ”, Li et al 2018

Measuring the Intrinsic Dimension of Objective Landscapes

“SqueezeNext: Hardware-Aware Neural Network Design ”, Gholami et al 2018

SqueezeNext: Hardware-Aware Neural Network Design

“Wide Compression: Tensor Ring Nets ”, Wang et al 2018

Wide Compression: Tensor Ring Nets

“Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing ”, Rosenfeld & Tsotsos 2018

Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing

“Fix Your Classifier: the Marginal Value of Training the Last Weight Layer ”, Hoffer et al 2018

Fix your classifier: the marginal value of training the last weight layer

“Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition ”, Ye et al 2017

Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

“3D Semantic Segmentation With Submanifold Sparse Convolutional Networks ”, Graham et al 2017

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

“XUnit: Learning a Spatial Activation Function for Efficient Image Restoration ”, Kligvasser et al 2017

xUnit: Learning a Spatial Activation Function for Efficient Image Restoration

“Natural Language Processing With Small Feed-Forward Networks ”, Botha et al 2017

Natural Language Processing with Small Feed-Forward Networks

“ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices ”, Zhang et al 2017

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

“Submanifold Sparse Convolutional Networks ”, Graham & Maaten 2017

Submanifold Sparse Convolutional Networks

“Shake-Shake Regularization of 3-Branch Residual Networks ”, Gastaldi 2017

Shake-Shake regularization of 3-branch residual networks

“Using the Output Embedding to Improve Language Models ”, Press & Wolf 2016

Using the Output Embedding to Improve Language Models

“Deep Residual Learning for Image Recognition ”, He et al 2015

Deep Residual Learning for Image Recognition

“Tensorizing Neural Networks ”, Novikov et al 2015

Tensorizing Neural Networks

“Eight Pairs of Descending Visual Neurons in the Dragonfly Give Wing Motor Centers Accurate Population Vector of Prey Direction ”, Gonzalez-Bellido et al 2013

Eight pairs of descending visual neurons in the dragonfly give wing motor centers accurate population vector of prey direction

“The Cat Is out of the Bag: Cortical Simulations With 10⁹ Neurons, 10¹³ Synapses ”, Ananthanarayanan et al 2009

The cat is out of the bag: cortical simulations with 10⁹ neurons, 10¹³ synapses

“On the Computational Power of Threshold Circuits With Sparse Activity ”, Uchizawa et al 2006

On the Computational Power of Threshold Circuits with Sparse Activity

“Networks of Spiking Neurons: The Third Generation of Neural Network Models ”, Maass 1997

Networks of spiking neurons: The third generation of neural network models

“Characteristics of Sparsely Encoded Associative Memory ”, Amari 1989

Characteristics of sparsely encoded associative memory

“[2110.08152] Kronecker Decomposition for GPT Compression ”

[2110.08152] Kronecker Decomposition for GPT Compression :

View PDF:

/doc/www/arxiv.org/ae4a089397d3b8667469ba90ca313ead5a4bdcb0.pdf

“Higher Accuracy on Vision Models With EfficientNet-Lite ”

Higher accuracy on vision models with EfficientNet-Lite :

View HTML:

/doc/www/blog.tensorflow.org/5190b62fb9f2d53675a2f934d01f87ef413057a8.html

“Something Weird Is Happening With LLMs and Chess ”, Dynomight 2025

Something weird is happening with LLMs and chess

“Delivering Real-Time AI in the Palm of Your Hand ”

Delivering real-time AI in the palm of your hand :

View HTML:

/doc/www/engineering.fb.com/65910fdbbc7e7f5970d2ecf96c18a0eb77eab3cf.html

“Sparsity-Aware Deep Learning Inference Runtime for CPUs ”

Sparsity-aware deep learning inference runtime for CPUs

“Neuralmagic/sparseml: Libraries for Applying Sparsification Recipes to Neural Networks With a Few Lines of Code, Enabling Faster and Smaller Models ”

neuralmagic/sparseml: Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

“An Estimation of the Absolute Number of Axons Indicates That Human Cortical Areas Are Sparsely Connected ”

An estimation of the absolute number of axons indicates that human cortical areas are sparsely connected

“Creating a 17 KB Style Transfer Model With Layer Pruning and Quantization ”, Toole 2025

Creating a 17 KB style transfer model with layer pruning and quantization

“BERT-Large: Prune Once for DistilBERT Inference Performance ”

BERT-Large: Prune Once for DistilBERT Inference Performance :

View HTML:

/doc/www/neuralmagic.com/4e89fd35918a0a8e03c1d63ee7c5af3e1d76e968.html

“Circuits in Superposition: Compressing Many Small Neural Networks into One ”

Circuits in Superposition: Compressing many small neural networks into one :

View HTML:

/doc/www/www.greaterwrong.com/56cb7ccd134aaa922ba1f32126ca7c67fc25fb15.html#Read_in_interference

“Measuring the Intrinsic Dimension of Objective Landscapes [Video] ”

Measuring the Intrinsic Dimension of Objective Landscapes [video] :

https://www.youtube.com/watch?v=uSZWeRADTFI#uber

Sort By Magic

Annotations sorted by machine learning intoinferred 'tags'. This provides an alternative way to browse: instead of bydate order, one can browse intopic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`sparse-coding`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`model-generalization`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`efficient-training`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia

Autoencoder § Sparse autoencoder (SAE) :
https://en.wikipedia.org/wiki/Autoencoder#Sparse_autoencoder_(SAE)

Miscellaneous

Bibliography

https://arxiv.org/abs/2403.17844:“Mechanistic Design and Scaling of Hybrid Architectures ”,Michael Poli, Armin W. Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth,Kristian Kersting, Taiji Suzuki, Brian Hie,Stefano Ermon,Christopher Ré, Ce Zhang, Stefano Massaroli
link-bibliography
https://arxiv.org/abs/2311.10770:“Exponentially Faster Language Modeling ”,Peter Belcak,Roger Wattenhofer
link-bibliography
https://www.sciencedirect.com/science/article/pii/S0893608023005051:“An Exact Mapping from ReLU Networks to Spiking Neural Networks ”,Ana Stanojevic, Stanisław Woźniak, Guillaume Bellec, Giovanni Cherubini, Angeliki Pantazi,Wulfram Gerstner
link-bibliography
https://arxiv.org/abs/2310.17157:“Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time ”,Zichang Liu,Jue Wang,Tri Dao, Tianyi Zhou, Binhang Yuan,Zhao Song, Anshumali Shrivastava, Ce Zhang,Yuandong Tian,Christopher Re, Beidi Chen
link-bibliography
https://arxiv.org/abs/2308.14711:“Fast Feedforward Networks ”,Peter Belcak,Roger Wattenhofer
link-bibliography
https://arxiv.org/abs/2302.12441:“MUX-PLMs: Pre-Training Language Models With Data Multiplexing ”,Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang,Yuan Cao, Karthik Narasimhan
link-bibliography
https://arxiv.org/abs/2210.06313#google:“The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers ”,Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo,Sanjiv Kumar
link-bibliography
https://arxiv.org/abs/2207.03620:“More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK) ”,Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang
link-bibliography
https://arxiv.org/abs/2205.03983#google:“Building Machine Translation Systems for the Next Thousand Languages ”,Ankur Bapna, Isaac Caswell, Julia Kreutzer,Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa,Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen,Yonghui Wu, Macduff Hughes
link-bibliography
https://arxiv.org/abs/2204.00595:“Monarch: Expressive Structured Matrices for Efficient and Accurate Training ”,Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra,Christopher Ré
link-bibliography
https://arxiv.org/abs/2203.06850:“Efficient Language Modeling With Sparse All-MLP ”,Ping Yu,Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li
link-bibliography
https://arxiv.org/abs/2202.07415#deepmind:“NeuPL: Neural Population Learning ”,Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel,Nicolas Heess,Thore Graepel
link-bibliography
https://arxiv.org/abs/2106.09685#microsoft:“LoRA: Low-Rank Adaptation of Large Language Models ”,Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
link-bibliography
https://arxiv.org/abs/1905.11946#google:“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ”,Mingxing Tan,Quoc V. Le
link-bibliography
https://arxiv.org/abs/1803.10615:“SqueezeNext: Hardware-Aware Neural Network Design ”,Amir Gholami, Kiseok Kwon, Bichen Wu, Zizheng Tai, Xiangyu Yue, Peter Jin, Sicheng Zhao,Kurt Keutzer
link-bibliography

[ Send Anonymous Feedback ]

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]

[8]ページ先頭

©2009-2025 Movatter.jp