Colin R. Moran

Software Engineer · Distributed Systems & Data Platforms

I design and build resilient systems, data pipelines, and tools that make debugging and scaling less painful.

Expertise

Building systems that scale

From distributed data platforms to cloud migrations, I focus on reliability, observability, and team growth.

10+
Years Experience
Building and delivering features for real customers.
3+
Cloud Migrations
On-prem to cloud with Kubernetes with serverless adoption.
100+
Engineering Interviews
Team development and mentorship.

Experience

What I've delivered

Roles where I owned systems, shipped improvements, and handled real-world constraints.

Software EngineerAxon

Remote · Marysville, OH
2022 – Present
  • Raised code coverage to ~80% across four reporter services, added metrics & monitoring to reduce mean-time-to-discovery
  • Reworked report generation flow to be more efficient and maintainable while improving data consistency
  • Implemented indexing for new entity types to reduce SQL Server load
  • Produced architecture diagrams to clarify system design and improve cross-team alignment
  • Conducted 100+ software engineering interviews and mentored junior/mid-level developers
  • Participated in on-call rotation to monitor and troubleshoot critical production incidents

Enterprise ArchitectCAS

Columbus, OH
2021 – 2022
  • Led selection of an enterprise DSML platform using frameworks, scorecards, and reference architectures
  • Designed a scalable bipartite graph, fuzzy-matching solution for citation matching using Apache Flink
  • Helped refine architecture processes and improved transparency and alignment with stakeholders
  • Started a machine learning community of practice and established ML ops best practices

TAP EngineerCAS

Columbus, OH
2019 – 2021
  • Led migration of on-prem applications to AWS, including breaking down work and collaborating with AWS experts
  • Drove adoption of AWS-native serverless and Kubernetes solutions to reduce operational costs
  • Created SDLC best practices for public cloud and standard operating procedures for major projects
  • Led efforts around cloud security and Kubernetes migration strategies

Software EngineerCAS

Columbus, OH
2018 – 2019
  • Developed Java applications and debugged complex production issues
  • Improved document workflows using ML and automated manual unlocks, saving hundreds of hours

Systems I've Built

Real systems, services, and infrastructure

A sampling of the distributed systems, pipelines, and tools I've designed or implemented.

Start here

Axon Reporting Platform

Incremental refactoring of production report generation services to improve performance, data consistency, and observability while maintaining system availability.

Problem

Report generation services lacked consistency, had poor observability, and struggled with SQL Server load as data volumes grew.

Outcomes

  • ~40% SQL Server load reduction
  • ~80% code coverage across services
  • Improved MTTR from hours to minutes

Technologies

ScalaKubernetesMSSQLPrometheusGrafana

Production Platforms

Enterprise DSML Platform Selection & Architecture

AWSKubernetesMLOpsPythonApache Spark

Problem

Organization needed a scalable, production-ready platform for data science and machine learning workloads with proper MLOps practices.

Multimodal Vector Search & Retrieval Platform

RustScalagRPCConsistent HashingPrometheus

Problem

Built a high-performance vector search system supporting multiple data modalities with horizontal scaling and consistent hashing.

Data & Processing Pipelines

Fuzzy Citation Matching Pipeline

Apache FlinkScalaGraph AlgorithmsFuzzy Matching

Problem

Needed to match citations across large document sets with high accuracy, handling variations in formatting, abbreviations, and partial matches.

Libraries & Tooling

Cypher-to-Protobuf Translation Library

RustTree-sitterProtobufCypherGraph Databases

Problem

Needed to parse Cypher graph queries into structured schemas for graph backends, enabling type-safe query construction.

Architecture Stories

How I think about tradeoffs

Design narratives behind key systems: constraints, options considered, and final choices.

Report Generation: Performance & Consistency

Software Engineer · Axon · 2022–2024

Incremental refactoring of report generation services to improve performance, data consistency, and observability while maintaining system availability.

Impact

  • ~40% SQL Server load reduction
  • ~80% code coverage
  • Improved MTTR from hours to minutes
TypeScriptNode.jsSQL ServerPrometheusGrafanaObservability
Read case study

ML Platform: Selection & Architecture

Enterprise Architect · CAS · 2021–2022

Selection and rollout of a production-ready data science and machine learning platform, establishing MLOps best practices and reference architectures.

Impact

  • Accelerated ML project delivery
  • Standardized MLOps practices
  • Established ML community of practice
AWSKubernetesMLOpsPythonApache SparkArchitecture
Read case study

Cloud Migration: On-Prem to AWS

TAP Engineer · CAS · 2019–2021

Migration of on-premises applications to AWS, driving adoption of serverless and Kubernetes solutions to reduce operational costs and improve scalability.

Impact

  • Reduced operational costs
  • Improved scalability and reliability
  • Established cloud SDLC best practices
AWSKubernetesDockerServerlessTerraformCloud Architecture
Read case study

Citation Matching: Distributed Processing

Enterprise Architect · CAS · 2021–2022

Scalable bipartite graph solution for citation matching using distributed processing, handling millions of document pairs with high accuracy.

Impact

  • Scalable to millions of document pairs
  • High accuracy fuzzy matching
  • Efficient distributed processing
Apache FlinkScalaGraph AlgorithmsFuzzy MatchingDistributed Systems
Read case study

Code & Snippets

Executable examples

Algorithms, data structures, and system behaviors you can run right here in the browser.

Consistent Hashing Implementation

Distributed SystemsRust

A consistent hashing ring implementation for distributed data placement, used in the vector search platform.

consistent_hashing_implementation.rs
Loading editor...

Flink Citation Matching Pipeline

Data PipelinesScala

Scala implementation of a citation matching pipeline using Apache Flink for distributed processing.

flink_citation_matching_pipeline.scala
Loading editor...

Tree-sitter Cypher Parser

Libraries & ToolingRust

Rust implementation using Tree-sitter to parse Cypher queries into structured AST nodes.

tree-sitter_cypher_parser.rs
Loading editor...

Observability Middleware

ObservabilityTypeScript

TypeScript middleware for adding metrics and tracing to Next.js API routes.

observability_middleware.ts
Loading editor...

Distributed Job Orchestration

Data PipelinesScala

Scala implementation of a Dagster-like pipeline orchestrator for data engineering workflows.

distributed_job_orchestration.scala
Loading editor...

Intent & Philosophy

How I approach engineering work

What I optimize for when I design systems, collaborate with teams, and make tradeoffs.

I design and build distributed systems, data platforms, and cloud infrastructure. My work focuses on reliability, observability, and making systems that scale without breaking.

System Design & Architecture

Good architecture starts with understanding constraints: technical, business, and organizational. Rather than chasing the latest trends, I focus on solutions that balance performance, maintainability, and team velocity. Incremental improvements over big-bang rewrites, always considering operational complexity.

When designing distributed systems, observability comes first. You can't fix what you can't see. I design for failure, plan for scale, and always consider the human operators who will maintain the system. Whether choosing between microservices and monoliths, or selecting a data processing framework, I evaluate tradeoffs explicitly and document the reasoning.

Reliability & Observability

Production systems fail. The question is how quickly you can detect, diagnose, and resolve issues. I've seen too many systems where debugging production problems means grepping through logs or guessing at root causes. That's why metrics, tracing, and structured logging are integrated from the start.

The focus is on reducing mean-time-to-discovery (MTTD) and mean-time-to-resolution (MTTR). This means thoughtful alerting (not alert fatigue), comprehensive metrics (not just request counts), and clear runbooks. I've improved incident response by adding observability tooling, and I've seen how good monitoring can turn a 4-hour debugging session into a 10-minute fix.

On-call experience has taught me that reliability isn't just about preventing failures—it's about making failures manageable. Systems need graceful degradation, circuit breakers, and clear failure modes. When something breaks at 2 AM, the system should give you enough information to understand what's wrong and how to fix it.

Mentoring, Interviewing, & Community Building

After conducting over 100 engineering interviews, I've learned that hiring is about finding people who can grow, not just people who know specific technologies. The focus is on system design thinking, problem-solving approaches, and cultural fit. Candidates get clear feedback, and the process is treated as a two-way conversation.

Mentoring is about creating space for others to learn and grow. I've worked with junior and mid-level developers on everything from debugging production issues to designing their first microservices. In practice, this means pairing, code reviews as teaching moments, and helping people understand the "why" behind decisions, not just the "what".

Communities of practice are powerful ways to share knowledge and standardize approaches. I started a machine learning community of practice at CAS to help teams learn from each other and establish best practices. These communities work best when they're bottom-up, focused on real problems, and have clear goals. I've seen how they can transform organizational culture and accelerate learning.

Education

M.S. Computer Science, Architecture (in progress)

Franklin University

B.S. Computer Science, Magna Cum Laude

Franklin University

Technologies

Languages

Scala, Rust, Java, Python, TypeScript/JavaScript

Platforms & Infrastructure

AWS, Docker, Kubernetes, Kafka, Spark, Flink

Observability

Prometheus, Grafana, DataDog, X-Ray

Contact

Let's talk

Whether you're hiring, looking for feedback, or just want to chat systems, feel free to reach out.

Get in Touch

I'm always interested in discussing distributed systems, cloud architecture, ML platforms, or opportunities to build reliable systems at scale.

  • Distributed systems & data platforms
  • Cloud architecture & ML infrastructure
  • Reliability, observability, and incident response
Location
Marysville, OH — Remote