Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
@dnbaker
dnbaker
Follow
View dnbaker's full-sized avatar

Daniel Baker dnbaker

I write fast, memory-efficient software for scientific applications.

Block or report dnbaker

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more aboutblocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more aboutreporting abuse.

Report abuse
dnbaker/README.md

Software Engineer at Roche. Previously, I was a Senior Scientist at Pacific Biosciences (PacBio) after earning my PhD atJohns Hopkins University in the department ofComputer Science.Before that I was a Bioinformatics Scientist at ARUP Laboratories, where I worked oncell-free circulating tumor DNA (ctDNA) analysis and clinical genomics after my trainingin Physics [BS] and Biophysics/Computational Biology [MS].I've worked with biological data (sequence, molecular modeling, metabolomics, transcriptomics, metagenomics), telecommunications data, as well as graph algorithms, machine learning, and numerical optimization.

🔭 I've worked on similarity search, and clustering, and indexing for large-scale biological data, simd/gpu-accelerated and randomized algorithms.Most recently, I've been developing methods for human genetics, including long RNA-seq, VNTRs, and haplotype phasing.

😄 Pronouns: He/Him/His

A quick tour of my interests

  1. Practical randomized algorithms

This ranges from libraries providingsketch data structures andcoresets,as well as projects usingrandom projections andDCI.

My work on coresets and clustering is primarily part of theminicore project, with the aimsof providing a standard utility for coreset construction and weighted clustering, especially for exponential family models and shortest-paths metrics.

  1. Computational Biology

Thebonsai project provides methods formetagenomic analysis,along with k-mer encoding/decoding and I/O, while theDashing performs scalablesketching and comparison of sequence data.

BMFtools performs molecular demultiplication over sequencing barcoded data, reducing error rates while eliminating redundant information.Designed for ctDNA, this method can reduce error rates by orders of magnitude, allowing confident detection of very rare events.

scavenger has rust implementations using tch-rs for VAEs for count-based data, applied to single-cell transcriptomics.

I also co-developedpbfusion, a fast tool for characterizing transcriptional abnormalities.

  1. General C++

Most of my projects fall into this category, serving as tools I can reuse in various projects.

Some of my favorites:

  • vec provides type-generic abstractions over x86-64 vectorization, making it easy to write fast, portable code.
  • kspp is an RAII-based variant of kstring fromklib with extra niceties making appending printf-style formatting easy.
  • aesctr provides STL-style random number generators built on fast aes-ctr and wyhash
  • circularqueue provides a range-based circular queue container that uses power-of-two sizes

PinnedLoading

  1. sketchsketchPublic

    C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings

    C++ 153 13

  2. frpfrpPublic

    FRP: Fast Random Projections

    C++ 43 6

  3. ARUP-NGS/BMFtoolsARUP-NGS/BMFtoolsPublic

    Barcoded Molecular Families

    C++ 22 8

  4. minicoreminicorePublic

    Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.

    C++ 33 6

  5. dashing2dashing2Public

    Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.

    C++ 63 7

  6. bioseqbioseqPublic

    Tokenizers and Machine Learning Models for biological sequence data

    C++ 25 4


[8]ページ先頭

©2009-2025 Movatter.jp