← Back to Resources

Detect malicious files with BinLib: a private binary library

A file's most defensible feature is the one its author never bothered to change. That is the quiet argument running under Matt's walkthrough of Binary Library, the LimaCharlie feature everyone calls BinLib, a per-organization repository of the binaries observed across an environment. He moves through hash lookups, metadata pivots, import hash matching, tagging, and YARA scanning, but those are not five separate features. They are a ladder, climbing from the identifiers an adversary changes for free toward the ones they almost never touch. The reason to care is that most detection starts at the bottom of that ladder and stops there.

The binary is the part worth keeping

Code has to run as a process, and that single fact makes process execution the bedrock of most malicious-activity detection. It is why Matt argues EDR remains one of the highest-value telemetry sources a team collects. The trouble is that a process event is disposable. It carries a file path, a command line, and a hash, all describing one execution, most of which can differ the next time the same logic runs. What persists, and what is worth profiling over time, is the binary being pulled and what it does.

That is the gap BinLib fills. LimaCharlie already records process activity, so BinLib zooms in on one sparse but rich event, code_identity, and turns it into its own observable telemetry stream across Windows, Linux, and macOS. Process starts and terminations are plentiful; code identity events are few and far between, which is exactly why they reward attention. They carry the metadata that lets you reason about an executable rather than an instance of it: signature and certificate details, whether the file is signed at all, hash values, and the import hash. BinLib watches for these and produces two events of its own. A first_seen event marks that a path or hash appeared for the first time. An acquired event marks that the file's actual data was pulled into the library. Because BinLib is itself a stream, an operator can watch it work, pivot off it, and write detection and response rules directly against it.

Deduplication is what makes a private library affordable

A naive version of this idea drowns in copies. Matt's example is a Windows update or patch that pushes the same DLL across a fleet, which shows up in BinLib as a dense wave of binaries over a few minutes. BinLib deduplicates at the organization level. It recognizes that Microsoft never changed that DLL, keeps the first_seen records that capture each new path, and declines to reacquire data it already holds. New data is grabbed only on a genuinely new hash.

This matters for two audiences at once. For an analyst, learning to read those waves as installs and updates rather than incidents is what keeps normal activity from eating the day. For a provider scaling BinLib across large fleets, deduplication is the difference between a usable archive and an unbounded storage bill. Matt is direct about the economics. Enabling BinLib is free; the cost rides on artifact ingestion and storage, so dedup at ten thousand endpoints means a shared DLL is stored once rather than ten thousand times. "If I turn bin lib on for fifty thousand Windows endpoints, better get ready," he notes, which is more honest than most feature demos.

Climbing toward the identifiers adversaries do not change

Here is where the argument earns itself. BinLib can be approached from two directions: from the stream, asking what the platform has observed, or from the outside, arriving with a hash from a threat report, an ISAC feed, or another system and asking whether it was ever seen. Matt demonstrates the second with a ToddyCat write-up pulled from LimaCharlie's intel channel, drops its hashes into BinLib, and confirms the binaries were never present. He frames this as the question an analyst can close in about three seconds, months after onboarding, without scanning a single endpoint.

But he is careful not to oversell the hash. A clean result is one signal, not proof of absence, because one byte of change mints a new hash. Metadata fields like company name or signature subject give broader pivots and catch more, yet they are still editable: Microsoft Corporation, Microsoft Corp, and Microsoft Corporation LLC may mean nothing to a person and everything to a brittle rule. The import hash sits a rung higher. It hashes the imports a binary loads at runtime, the libraries and functions it depends on, and adversaries rarely refactor those internals between builds. A new ransomware variant gets a new name, new company details, and a new file hash, while its import hash holds steady. Matt makes the point structurally by searching company name equals Microsoft Corporation, pulling back a hundred files, then narrowing on a single import hash down to one. The fragile field gathered a crowd; the durable one isolated the file.

The top of the ladder needs no prior knowledge at all. YARA, built into the LimaCharlie sensor, classifies files on behavioral and code-based traits, hex strings, wildcarded strings, PE information, and the logic that combines them, rather than static attributes an author can swap. Because BinLib retains the binary data, a YARA scan runs retroactively across that stored population. Matt scopes a hunt to the first hundred files where company name is not Microsoft Corporation, runs a Volt Typhoon rule set, and offers to tag any match, all from one console with no endpoint touched, the scan itself emitting observable events that feed back into the workflow. Tagging closes the loop in the other direction. It is operator-applied context that separates production from development, marks living-off-the-land binaries like bitsadmin against MITRE ATT&CK and the LOLBAS project, or brands a file with a threat actor after a match, so the next rule can raise or lower fidelity on a tag instead of re-deriving it.

For an MSSP or MDR, the synthesis is the pitch. A provider does not want to detect on what an adversary edits between campaigns; it wants to detect on what survives them, and it wants to answer a client's "have you ever seen this" against history rather than a live scan. BinLib is the posture LimaCharlie applies everywhere, turning a fleeting observation into durable, searchable infrastructure you own, applied to the one event that underwrites most of detection.

See what agentic SecOps looks like in your environment

LimaCharlie gives MSSPs and MDRs a fully programmable SecOps Cloud Platform, with transparent usage-based pricing, API-first integration across every telemetry source, and the infrastructure to run multi-tenant operations at scale.