Skip to content

YouTube Research

More research for fiction writers lives on YouTube than most people acknowledge. Worldbuilding lectures, historical documentaries, craft interviews, writer-to-writer discussions, specific expertise videos from people who know their thing — all of it is genuinely useful research material, and all of it is locked inside video format. The default workflow is painful: you watch the video, you pause to type notes, you rewind because you missed something, you lose the context of what the speaker was actually saying, and at the end you have fragmented notes that don’t survive the week. The YouTube Research tab replaces that workflow with a structured one. Paste a URL, fetch the video’s transcript, run analysis on the transcript segments, and store the whole thing as a searchable research bookmark alongside the rest of your knowledge base. You don’t skip the video — you just stop having to take notes by hand.

The tab is the surprising-but-obvious feature that makes Research feel genuinely complete. A large chunk of modern research lives on YouTube now, and pretending it doesn’t forces authors to leave Ishvana every time they encounter a relevant video.

The tab is simple. Three phases.

Paste a YouTube URL into the input and click Fetch. Ishvana contacts YouTube’s API to pull the video’s metadata:

  • Title.
  • Channel name.
  • Duration.
  • Thumbnail image.
  • Description (the video’s own description text).
  • Publish date.

The preview card shows all of this plus a thumbnail, so you can verify you’ve got the right video before starting the transcription.

If the video has a transcript available (most do — YouTube auto-generates them for almost every video), Ishvana fetches it directly. No audio processing, no speech recognition, just the existing transcript text.

The transcript is segmented — broken into timestamped chunks corresponding to YouTube’s own caption structure. Each segment has a start time, an end time, and the text spoken during that range. The segmentation is important because it lets you jump to specific moments in the video later, not just read the whole transcript as a wall of text.

For videos without transcripts (rare, usually shorts or very new uploads), the tab shows an error and offers to fall back to audio-based transcription if you’ve configured it. Audio transcription is a future feature, not currently available — if the video has no transcript, that video can’t be processed in this version.

Once the transcript is fetched, Ishvana runs analysis on the transcript segments. The analysis produces:

  • Summary. 3-5 sentences describing what the video is about.
  • Key points. Bulleted list of the most important takeaways.
  • Entities. People, places, concepts, works mentioned.
  • Topics. Higher-level themes the video discusses.
  • Key quotes. Specific lines from the transcript that are worth preserving verbatim.
  • Suggested tags.

The analysis runs in seconds and lands in the tab as a structured result card below the transcript.

Below the analysis, the full segmented transcript is displayed. Each segment is its own row with:

  • Timestamp. The start time of the segment, as a clickable link.
  • Segment text. The actual words spoken.
  • Highlight. An indicator if this segment is referenced in the analysis (a key quote, an entity mention, etc.).

Click any timestamp and Ishvana opens the video at that timestamp in an embedded YouTube player — so you can jump to the exact moment the speaker said the thing you care about. This is crucial for videos where specific details matter: you read the transcript, you find the interesting line, you click the timestamp, you watch the speaker say it with tone and context.

You can also filter the transcript by text — type a keyword into the filter field and only segments containing that text stay visible. Useful for long videos where you want to find a specific topic fast.

Every YouTube research task becomes a job with a status lifecycle:

  • Queued. Job created, waiting to run.
  • Downloading. Fetching metadata and transcript.
  • Transcribing. Processing the transcript (for audio-based fallback, not used for standard transcripts).
  • Analyzing. Running analysis on the transcript.
  • Complete. All phases done, results visible.
  • Error. Something failed; the error message tells you what.

The tab has a job list sidebar showing every YouTube job you’ve started, sorted by date. You can queue multiple videos at once — start a second job while the first is still analyzing — and they run in parallel (up to a configurable concurrency limit, to avoid hammering the network or the LLM).

Failed jobs can be retried without re-fetching. If a transcription succeeded but the analysis failed, the retry skips straight to analysis using the already-fetched transcript. If the transcript fetch failed entirely, the retry starts from the top.

A completed YouTube job can be saved to your project in several ways:

  • Save as smart bookmark. Creates a smart bookmark with the video’s title, URL, summary, key points, full transcript, and thumbnail. The bookmark is indexed by ChromaDB so it’s searchable via semantic search alongside your other research.
  • Save as research note. Lightweight save — stores the analysis results without creating a full bookmark. For videos you want to reference later but don’t need in full search.
  • Send to Lore. Creates a Legendry entry in the Reference category. For videos that contain information that should be part of canonical project data.
  • Copy as Markdown. Copies a formatted Markdown version of the analysis (title, summary, key points, key quotes) to your clipboard.
  • Copy full transcript. Copies the raw segmented transcript with timestamps.

Most users save interesting videos as smart bookmarks because that’s the format that best survives time — indexed, searchable, persistent.

This is the thing that makes YouTube research genuinely valuable. A single 45-minute video has thousands of words of content. Normally that content is lost once you finish watching — you’d have to re-watch to find a specific claim. With the transcript stored in ChromaDB, a semantic search query can find specific segments of specific videos:

You search “medieval siege tactics” and the results include:

  • A bookmark of a Wikipedia article about sieges.
  • A lore entry on your world’s own siege doctrine.
  • A segment of a YouTube video at the 23:17 mark where a military historian discusses siege tactics in detail.

The third result is the valuable one. Without transcription and embedding, it would be invisible. With it, the video segment is just another research result you can click through to verify.

This is why transcribing and indexing YouTube content is worth the effort. Every video you transcribe makes your whole research library smarter, because the video’s content becomes searchable alongside your text notes.

  • Not a video downloader. Ishvana doesn’t download YouTube videos to your local disk. Only transcripts and metadata are fetched. The video itself stays on YouTube, and the embedded player in the tab is a normal YouTube iframe.
  • Not a speech recognition service. The tab relies on YouTube’s existing transcripts (auto-generated or manual). It doesn’t run its own speech recognition on the audio. If a video has no transcript, the tab can’t process it.
  • Not a video editor. You can’t trim, clip, or export video content. The tab is research-oriented, not production-oriented.
  • Not a copyright workaround. The transcripts are publicly available metadata. Saving them to your research library is fine for personal research use. Redistributing them (pasting the full transcript of someone’s video into a publicly-shared document) would be a copyright concern. Use judgment.
  • Not a bulk processor. You queue videos one at a time. For processing dozens of videos automatically, you’d need a scripted workflow outside the tab.