Hearing the Strike

Hearing the Strike

Dev blog #7. Sometimes the best sensor is the one you weren't using. I started listening.

Close your eyes on a range and you can still tell a pure strike from a chunked one. That crisp crack off the center of the face is unmistakable. A thin one pings. A fat one thuds. You've never needed to see the ball to know you flushed it. Your ears already knew.

For a while I was so focused on the camera that I ignored the most obvious thing in the world: the phone has a microphone, and a golf swing makes one of the most distinct sounds in all of sport. The strike is loud, sharp, and over in an instant. There's a real argument that it's the single cleanest "that just happened" signal the whole swing produces.


The phone was already listening. I just hadn't thought to use its ears.

Why sound is such a good tattle-tale

Think back to the last post and all those fake-outs: setting the club down, nudging the ball, picking it up. Almost none of those make the sound of a real strike. You can quietly set a club behind a ball. You cannot quietly flush a 7-iron. That sharp crack is hard to counterfeit, which makes it a wonderful tie-breaker when the picture alone is unsure.

But a microphone hears everything

Here's where the romance meets the engineering. To your ear, the crack of a strike stands out effortlessly. To the phone, sound shows up as a relentless stream of numbers, tens of thousands of samples every second, a flat river of audio with no labels on it. Somewhere in that river is a single spike that matters, surrounded by a lot that doesn't: range chatter, the bay next to you, a ball machine, wind battering the mic, your own feet shifting on the mat.

And "loud" on its own is useless, because a strike isn't really about loudness, it's about suddenness. A flushed shot is a transient, a near-instant spike that appears and is gone in a blink. The job isn't "find the loud part," it's "find the sharp onset that looks like a strike and ignore everything else that's merely noisy." That distinction is most of the battle.

The bar has to move with the room

There's no fixed number for "that was a strike," either, and this one bit me. A quiet indoor net and a busy outdoor range are completely different worlds of background noise. Set the bar where it works indoors and the range trips it constantly. Set it for the range and you miss softer indoor contact. So the threshold can't be a constant. It has to keep a running sense of how loud the room is right now and judge each candidate against that, so the same swing reads correctly whether you're in a silent garage or standing next to the world's chattiest foursome.

One crack, one detection

A quieter problem, and a sneaky one. When the app does catch a strike, it has to fire exactly once. A real crack rings out and decays, and there's a genuine risk of counting the same strike several times as it echoes, or mistaking its tail for a second swing. So there's a cool-down after each hit, a refusal to fire again for a beat. Easy to say, annoying to land, because the audio arrives in small back-to-back chunks and a strike can fall right on the seam between two of them. The cool-down has to carry its memory across that seam, or a strike that straddles the boundary gets either double-counted or dropped entirely. Tiny detail, real bug, a few hours of my life.

The hardest part: two senses, two clocks

Here's the one that genuinely humbled me, and it's pure engineering. The camera and the microphone are two separate sensors with two separate pipelines, and they do not agree on what time it is. The mic might hear the crack at one timestamp while the video frame showing that very instant carries a slightly different one. For a human, sight and sound just fuse, automatically, for free. For the phone, lining up "the moment the ears heard the strike" with "the frame where the ball is being crushed" is a real problem, and if you get it wrong, your two trustworthy signals end up pointing at two different moments and the whole advantage evaporates.

That alignment is the toll you pay for using both senses at once, and it's exactly why they're worth combining despite the hassle. The ears give you a sharp, reliable "now" that the eyes can struggle with in a cluttered frame. The eyes give you proof that the ball actually flew, which the ears can't see. Used together, the sound says "a strike happened right about here," and the flight evidence confirms "yes, and the ball genuinely left on a trajectory." Each one covers the other's blind spot, but only once they're talking about the same instant.

The part I'll keep quiet

How I actually pick the real strike out of all that noise, and how I get the two clocks to shake hands, is firmly in the vault. Turning "the phone heard something loud" into "that specific sound was a clean strike at this exact frame" is most of the work, and it's exactly the kind of thing this blog is here to tease, not to hand out.

But the idea itself is the fun bit, and it's the kind of thing I love about building this: the answer wasn't a fancier camera or a huge ML model. It was remembering that the device in your hand has more than one way to sense the world, and that your own ears have been doing this job effortlessly your whole golfing life.

Next up: a change of scenery. I built a whole 3D driving range that runs in a web browser, and I'll show you why.

Want to follow along? Pre-register and stick around. See you on the range.

0 comments

Leave a comment

Please note, comments need to be approved before they are published.