Why this page
Logs arrive with mixed formats, timezones, and precision. One bad timestamp breaks ordering, dashboards, and incident timelines. This guide gives you quick checks, regex templates, and language snippets to normalize log times fast—and points you to converters when you need to verify edge cases.
Quick diagnostic checklist
- Identify format: ISO 8601/RFC 3339, Apache/Nginx (
[dd/Mon/yyyy:HH:mm:ss Z]), syslog (MMM d HH:mm:ss), or custom. - Check timezone: explicit offset (
+0800,Z) vs. implicit local. Default to UTC if missing. - Check precision: seconds vs. milliseconds vs. microseconds; ensure consistent length.
- Handle DST edges: parse into UTC first; avoid naive local parsing.
- Enforce range guards: reject implausible years (e.g.,
<2000or>2100).
Common log formats (templates)
- ISO 8601 / RFC 3339:
2024-12-01T10:15:30Z,2024-12-01T18:15:30+08:00 - Apache/Nginx:
[01/Dec/2024:10:15:30 +0000] - Syslog:
Dec 1 10:15:30 hostname app[123]: message(no year/offset) - Custom numeric:
1701425730(seconds),1701425730000(ms)
Normalization steps
- Detect format and timezone. If missing, assume UTC.
- Parse into an aware datetime (UTC).
- If numeric epoch: scale by length (10s/13ms/16µs/19ns).
- Store in a canonical column (e.g., UTC millis) and optionally keep the raw string.
- Validate range; drop or flag outliers before indexing.
Code snippets
JavaScript / Node.js
JAVASCRIPT1import { DateTime } from "luxon"; 2 3export function parseLogTime(raw) { 4 const trimmed = raw.trim(); 5 // ISO/RFC first 6 const iso = DateTime.fromISO(trimmed, { setZone: true }); 7 if (iso.isValid) return iso.toUTC().toMillis(); 8 9 // Apache/Nginx: [01/Dec/2024:10:15:30 +0000] 10 const m = trimmed.match(/\\[(\\d{2}\\/\\w{3}\\/\\d{4}:\\d{2}:\\d{2}:\\d{2} [+-]\\d{4})\\]/); 11 if (m) { 12 const dt = DateTime.fromFormat(m[1], "dd/MMM/yyyy:HH:mm:ss ZZZZ", { zone: "utc" }); 13 if (dt.isValid) return dt.toMillis(); 14 } 15 16 // Numeric epoch (seconds/ms/us/ns) 17 if (/^\\d+$/.test(trimmed)) { 18 const len = trimmed.length; 19 if (len === 10) return Number(trimmed) * 1000; 20 if (len === 13) return Number(trimmed); 21 if (len === 16) return Number(trimmed) / 1000; 22 if (len === 19) return Number(trimmed) / 1_000_000; 23 } 24 25 throw new Error("Unrecognized log timestamp"); 26}
Python
PYTHON1from datetime import datetime, timezone, timedelta 2from dateutil import parser 3 4def parse_log_time(raw: str) -> int: 5 s = raw.strip() 6 # ISO/RFC 3339, syslog with tz, etc. 7 try: 8 dt = parser.isoparse(s) 9 if dt.tzinfo is None: 10 dt = dt.replace(tzinfo=timezone.utc) 11 return int(dt.astimezone(timezone.utc).timestamp() * 1000) 12 except Exception: 13 pass 14 15 # Apache/Nginx like: [01/Dec/2024:10:15:30 +0000] 16 if s.startswith("[") and s.endswith("]"): 17 inner = s[1:-1] 18 dt = datetime.strptime(inner, "%d/%b/%Y:%H:%M:%S %z") 19 return int(dt.astimezone(timezone.utc).timestamp() * 1000) 20 21 # Numeric epoch 22 if s.isdigit(): 23 n = len(s) 24 if n == 10: 25 return int(s) * 1000 26 if n == 13: 27 return int(s) 28 if n == 16: 29 return int(int(s) / 1000) 30 if n == 19: 31 return int(int(s) / 1_000_000) 32 raise ValueError("Unrecognized log timestamp")
Regex templates
REGEX1# ISO 8601 / RFC 3339 with offset 2^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:Z|[+-]\\d{2}:?\\d{2})$ 3 4# Apache/Nginx 5^\\[\\d{2}/[A-Za-z]{3}/\\d{4}:\\d{2}:\\d{2}:\\d{2} [+-]\\d{4}\\]$ 6 7# Syslog (no year/offset, add externally) 8^[A-Za-z]{3}\\s+\\d{1,2}\\s\\d{2}:\\d{2}:\\d{2}\\s.+$
Ordering, indexing, and storage
- Normalize to UTC milliseconds (or higher) before storage; keep raw text if you need to reparse.
- Use range filters (
WHERE occurred_at_ms >= ... AND occurred_at_ms < ...) to stay index-friendly. - If ingestion mixes formats, add a validated_at column and quarantine failures.
- Partition by date for high-volume logs; avoid casting in filters.
Common pitfalls (and fixes)
- Missing timezone: default to UTC; log parser should add source host timezone if known.
- DST gaps/overlaps: parse as UTC first; only localize when displaying.
- Mixed precision: seconds and milliseconds mixed in one column; enforce length on ingestion.
- Syslog missing year: append the current year during parsing, then validate range.
- Function-wrapped filters: avoid
DATE(occurred_at)in WHERE; use generated date columns.
FAQ
- How do I detect milliseconds vs seconds? Check digit length: 10=seconds, 13=ms, 16=µs, 19=ns.
- What if the timezone is missing? Assume UTC, or inject the known host timezone before parsing.
- Why are dashboards out of order? Likely mixed precision or string sorting; normalize to numeric UTC and index.
- How to batch clean historical logs? Use a staging table, parse + validate, then move only clean rows; see related tools.