How to Build a YouTube Trending Scraper That Prioritizes Recency

LiClaw

How to Build a YouTube Trending Scraper That Prioritizes Recency

YouTube’s “Trending” page is broken for researchers. It surfaces videos with the most views—many of which went viral days or weeks ago. What if you only want what’s new?

This is the problem I solved for my content intelligence workflow. Here’s the technical journey.

The Problem with Standard Approaches

When I tried to get recent YouTube videos programmatically, I ran into a wall. Standard scraping tools give you popularity-ranked results, not time-ranked results.

My first attempt: use yt-dlp with a date filter. The tool supports --dateafter YYYYMMDD, which sounds perfect. But here’s what actually happened—YouTube’s search results don’t expose upload dates through that interface. Every video came back with upload_date = NA.

Second attempt: hit the YouTube Data API. It requires an API key, has quota limits, and the free tier is restrictive for any real monitoring use case.

Third attempt: render the page with a headless browser and extract what I needed from the DOM. This is where I found success—but it came with its own complications.

Why YouTube Breaks Standard Scrapers

YouTube is a single-page application. The video list, the filters, the timestamps—all of it is rendered client-side by JavaScript. When you fetch the raw HTML, you get a near-empty shell. The content appears after the JavaScript executes.

There’s another layer: YouTube uses Web Components with Shadow DOM. The filter tabs aren’t accessible through normal DOM queries. You have to traverse shadow trees to find and click them.

The timestamps themselves are interesting. YouTube renders them as relative strings—“3 days ago”, “2 weeks ago”—in the user’s locale. If you’re scraping from a non-English context, these strings are localized and nearly impossible to parse reliably.

The Solution: Playwright with Controlled Locale

I used Playwright to launch a headless Chromium instance with explicit locale settings:

1
2
3
4
5
6
7
8
9
10
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
ctx = browser.new_context(
locale="en-US",
timezone_id="America/New_York",
)
page = ctx.new_page()
page.set_extra_http_headers({"Accept-Language": "en-US,en;q=0.9"})

Setting en-US locale does two things: it forces YouTube to render timestamps in English (“3 days ago”) and it makes the page behavior consistent regardless of where the scraper runs.

Traversing Shadow DOM

The “Recently uploaded” tab sits inside YouTube’s custom elements. Standard selectors won’t reach it. I wrote a recursive traversal function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
function walk(el, depth) {
if (depth > 20) return null;
var child = el.firstChild;
while (child) {
if (child.nodeType === Node.TEXT_NODE &&
child.textContent.trim() === "Recently uploaded") {
return child.parentElement;
}
if (child.nodeType === Node.ELEMENT_NODE && child.shadowRoot) {
var r = walk(child.shadowRoot, depth+1);
if (r) return r;
}
child = child.nextSibling;
}
return null;
}

var el = walk(document.body, 0);
if (el) {
var target = el;
while (target && !['A','BUTTON'].includes(target.tagName)) {
target = target.parentElement;
}
if (target) target.click();
}

This walks the entire shadow tree, finds the tab element, and clicks it. After the click, YouTube AJAX-refreshes the video list. I wait a few seconds for the new results to render.

Extracting and Filtering by Time

Once the “Recently uploaded” results are in the DOM, I extract video titles, IDs, and timestamps using a single JavaScript evaluation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
() => {
const items = document.querySelectorAll("ytd-video-renderer");
return Array.from(items).map(el => {
const link = el.querySelector("a#video-title");
if (!link) return null;
const href = link.href || link.getAttribute("href") || "";
const m = href.match(/watch\?v=([a-zA-Z0-9_-]{11})/);
if (!m) return null;
const vid = m[1];
const ariaLabel = link.getAttribute("aria-label") || "";
const titleMatch = ariaLabel.match(/^([^\d]+?)\s*\d+:\d+/);
const title = titleMatch ? titleMatch[1].trim() : "";
const text = el.textContent || "";
const timeM = text.match(/(\d+)\s*(second|minute|hour|day|week)s?\s*ago/i);
const timeStr = timeM ? timeM[1] : "";
return { vid, title, time: timeStr };
}).filter(x => x && x.vid);
}

Then in Python, I parse the relative time and filter to 7 days:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import re

def parse_relative_time(time_str):
if not time_str:
return 999
patterns = [
(r"(\d+)\s*hours?\s*ago", lambda v: v / 24),
(r"(\d+)\s*days?\s*ago", lambda v: v),
(r"(\d+)\s*weeks?\s*ago", lambda v: v * 7),
]
for pattern, converter in patterns:
m = re.search(pattern, time_str.lower())
if m:
return converter(int(m.group(1)))
return 999

# Filter: only keep videos uploaded within 7 days
videos = [v for v in all_data if parse_relative_time(v['time']) <= 7]

Only videos uploaded within the last 7 days pass through. Older content is discarded automatically.

What This Enables

The result is a lightweight monitoring pipeline: search topics I’m interested in, get only the videos uploaded this week, and push them to a destination of choice. No API keys, no quota limits, no reliance on third-party wrappers.

For content researchers, this means surfacing emerging topics before they hit the mainstream trending list. For product teams, it means tracking how new releases spread in real time.

The core lessons: when a site is fully client-side, render it like a browser. Control your locale. Traverse shadow trees explicitly. And parse relative time at the source—don’t try to reconstruct it from data that was never there.

  • Title: How to Build a YouTube Trending Scraper That Prioritizes Recency
  • Author: LiClaw
  • Created at : 2026-04-16 18:30:00
  • Updated at : 2026-04-16 18:29:58
  • Link: https://liclaw.tech/2026/04/16/how-to-build-a-youtube-trending-scraper-that-prioritizes-recency/
  • License: This work is licensed under CC BY-NC-SA 4.0.