MythMiner

MythMiner is a RapidMiner process which uses the recording status of television programs in the MythTV database for suggesting new programs to record. (It also works in the RapidAnalytics server.)

Download current version: 1.1 | Forum

Slides of the presentation at Linuxwochen Wien 2011 (in German)

News in version 1.1

News in version 1.0

Usage

1. Install and use MythTV

In order to use MythMiner, you need a fully working MythTV system including its MySQL database. MythMiner uses information about past recordings in the MythTV database for determining which future programs are interesting, so you must use your MythTV at least for a couple of days and systematically record interesting programs.

Like with a spam filter, it is important that you "train" your database. MythMiner works with program categories, titles and descriptions and assumes that you recorded everything you are interested in and did not record anything you are not interested in. If you are interested in programs about history, you should record many of them so MythMiner can learn that they are interesting. If you don't record enough interesting programs, the words in the descriptions of programs you would have found interesting will be learned as "not interesting". This will lead to bad results.

You might need to configure the MySQL database of MythTV for access across the network if you will use RapidMiner on another computer.

2. Install RapidMiner

Download and install RapidMiner as described on the website. If you need more help, you will find lots of information in the RapidMiner forums.
Start RapidMiner and select Help: Update RapidMiner. You will see a list of updates and available extensions. Double click on those you want to install.

MythMiner uses the following extensions: Text Processing; Reporting Extension; Web Mining. Install at least these. You can install more extensions, but don't install the R extension unless you need it. (It doesn't conflict with MythMiner but it's hard to install because of software dependencies.)

After installation, restart RapidMiner and check in Help: Manage Extensions if your extensions are active.

Create a database connection to your MythTV server: Click Tools / Manage Database Connections. Create a new MySQL connection named MythTV (exactly like this; otherwise you will need to change each database query) and enter the necessary data. Use the Test button to check if RapidMiner can connect the database.

You should watch a few of the Video Tutorials to get an idea about using RapidMiner.

3. Install MythMiner

Create a new empty directory, e. g. /home/you/MythMiner or c:\data\MythMiner. Download MythMiner and unzip its contents into this directory.
Remember this directory, it will be your "MythMiner path".

4. Configuring MythMiner

Start RapidMiner and import the process "MythMiner.rmp" from your MythMiner directory. (You will see it if you set the list in the bottom from "Process file (xml)" to "Process file (rmp)".) The first entry in the top left corner is the "Configure process" step. Click on this step and select "Edit List (11)". A window pops up which lets you set parameters for MythMiner. ("Macros" in RapidMiner terminology.) You need to edit some values in the right column. (Don't change the macro names in the left column.)

5. Using MythMiner

First, execute the process in RapidMiner. It could take a few minutes (the SQL queries in the MythTV database are quite complex and then building the model also takes lots of processor time).

A file called index.html will be created in the MythMiner directory. You can view this file in any web browser. It will contain a table with the following columns:

Adjust the parameters until you are happy with the results.

For experienced RapidMiner users: Optionally (if RapidMiner has operators for your language) you can open the "Process documents from data" and "Create text attributes from future data" steps and change in both of them the containers "Perform Stemming" and "Filter stopwords". These are language dependent but can result in higher accuracy and/or faster execution. Double-click the container steps and click the operators in the window to the right (the left one is empty). Make the needed changes (e.g. other language, other operator). Then click the blue upwards arrow to go back to the Vector Creation step, and change the select_which parameter to 2.

Scheduling the process

You can use the process scheduling mechanism of your operating system (Cron on UNIX, Scheduled Tasks on Windows) to execute the MythMiner process daily.

The result is an HTML file which you can open in your browser each day or mail to yourself. An example script for Linux is included in the MythMiner distribution.

Alternatively, you can use RapidAnalytics for scheduling the process. RapidAnalytics from Rapid-I (who also make RapidMiner) is a server environment for executing RapidMiner processes. (When using the Report Extension, you need to copy the contents of the RapidMiner lib/freehep/ directory to the RapidAnalytics server into server/default/lib/.)


© Balázs Bárány. (Homepage)
Last change: 2011-09-06.