Almost a decade or two ago I authored my own DVD for fun. I embedded subtitles in that DVD. I now want to “rip” my DVD’s subtitles using the SRT format rather than recreate them from scratch.

Note: This process is not guaranteed to work well

Using mencoder the subtitles can be ripped in the split .sub and .idx “VobSub” format.

mkdir /tmp/subtitles
cd /tmp/subtitles

title=1
subtitle_id=0

mencoder \
  -dvd-device /home/my-user/my-dvd.iso \
  dvd://${title} \
  -nosound \
  -ovc copy -o /dev/null \
  -sid ${subtitle_id} \
  -vobsubout my-subtitles

Two files should now exist in the current directory. my-subtitles.sub and my-subtitles.idx.

Then vobsub2srt and tesseract can be used to convert those subtitle files to an English language .srt subtitle file.

vobsub2srt uses tesseract which uses OCR (optical character recognition) to “read” the bitmap subtitle images from the DVD and try to translate them into simple text. Note that, as stated above, the OCR process is subject to flaws and will almost certainly not work as well as desired.

This assumes the tesseract-data-eng (or similar package that provides the desired language) is installed.

vobsub2srt --tesseract-lang eng my-subtitles

Open the resulting my-subtitles.srt file to verify how well the rendered subtitle output looks.