General info

|n this part of the Practical course sequence analysis you will be confronted with the situation of integrating several NGS analysis programs (which you programmed in the block before) into a typical workflow engine, namely KNIME.

Date Content Lecturer
19.06 Introduction to KNIME and how to use SeqAn in KNIME Knut Reinert, Stephan Aiche
26.06 Pulling your data into KNIME Stephan Aiche
03.07 Advanced KNIME workflows Stephan Aiche

Day 1 (19.06)

  • Introduction to workflow systems and KNIME.
  • Assignment 1: Install KNIME and work through the IRIS set tutorial using KNIME_quickstart.pdf. The data set can be found in the KNIME distribution or here and an explanation of the data set here.
  • Assignment 2: Build the SeqAn KNIME plugin including your apps.
    • Download the KNIME SDK for your platform and install/extract it.
    • Open the KNIME SDK and install the File Handling Extensions
      • Click Help -> Install new Software
      • Select the "KNIME Update Site"
      • In KNIME Labs Extensions you will find the "KNIME File Handling Nodes"
    • Download the GenericKNIMENodes (GKN) source code (see below)
    • Create the file in the GKN directory with the following content
      Note: On windows you need to use slashes instead of backslashes
    • Import the GKN source code into the KNIME SDK (File -> Import -> Existing project into workspace)
    • Enable your apps for KNIME integration by adding the following line to the CMakeLists.txt of your apps
    • Prepare your SeqAn installation to build a KNIME plugin, e.g.,
      make prepare_workflow_plugin
    • Execute the GKN node generator in the GenericKNIMENodes directory
      ant -Dplugin.dir=/path/to/your/seqan/build/workflow_plugin_dir
    • Import the generated nodes into the KNIME SDK (File -> Import -> Existing project into workspace, directory is <GKN-directory>/generated_plugin)
    • Start KNIME out of the KNIME SDK (see here)
    • Your nodes should be available under Community Nodes/SeqAn
    • Nodes for input and output files can be found under Community Nodes/!GenericKnimeNodes
    • In case your nodes are not there:
      • Check if you used the argument parser for your app
      • Check if you set valid values (i.e., filetypes) for every input and output file of your app
      • Check the output of the node generator call (ant ...) if it contains any information (e.g., Exception ...)
      • After you fixed all these errors, rerun the prepare_workflow_plugin target and the node generator (ant).
      • In the KNIME SDK refresh all SeqAn projects (right click -> Refresh) and force the KNIME SDK to rebuild all the plugins (Project -> Clean -> Clean all projects)
  • Assignment 3: Create a simple pipeline combining your tools (trimming, de-multiplexing, read mapping)
  • Assignment 4: Load the mapping results of Razers3 and try to visualise the coverage of mapped reads w.r.t. to the genome location.
    • Use Razers3 to map the bee_reads against the Varroa Destructor genome (data see below).
    • Import the mapping results into KNIME (hint: the Razers3 output format is described here.
    • Visualise the mapping reads (hint: have a look at JFreeChart nodes (no hiliting) and KNIME histograms).
    • (optional) Compare the coverage of different read qualities.

Day 2 (26.06)

  • Assignment 1: Work through KNIME node generation tutorial (KNIME developer guide).
  • Assignment 2: Implement your own table reader for the output of your read mapper.
    • Use the New Node Wizard to create a new node
    • Open the plugin.xml file of the project containing the the node
    • Goto the Dependencies tab
    • In the section Required Plug-ins click on Add and add the plugin.
    • Change the signature of the NodeModel::configure method from
      protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) throws InvalidSettingsException
      // ..
      protected DataTableSpec[] configure(final PortObjectSpec[] inSpecs) throws InvalidSettingsException
      // ..
    • Change the signature of the NodeModel::execute method from
      protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
        // ..
      protected BufferedDataTable[] execute(final PortObject[] inData, final ExecutionContext exec) throws Exception
        // ..
    • Change the NodeModel constructor to
      protected YourNodeModel() {
        super(new PortType[] { URIPortObject.TYPE }, 
              new PortType[] { new PortType(BufferedDataTable.class) });
    • Adept the configure() method to the layout of your output file. Your configure method needs to return a DataTableSpec that corresponds to the structure of your file, e.g., a file with an integer and a string column could be realised in the following DataTableSpec
      DataColumnSpec[] columnsSpec = new DataColumnSpec[2];
      columnsSpec[0] = new DataColumnSpecCreator("int-colum", IntCell.TYPE).createSpec();
      columnsSpec[1] = new DataColumnSpecCreator("string-column", StringCell.TYPE).createSpec();
      DataTableSpec outputSpec = new DataTableSpec(columnsSpec);      
    • Read the content of the incoming file in the execute method and fill the DataTable accordingly. Hint: You can extract the File from the inData using the following code snippet
      File theFileToRead = new File(((URIPortObject) inData[0]).getURIContents()
      . With the following code snippet you can create a table
      BufferedDataContainer container = exec.createDataContainer(outputSpec);
      // for each line in your file
      RowKey key = new RowKey("Row " + rowIdx);
      DataCell[] cells = new DataCell[2];
      cells[0] = new IntCell(yourIntValue);
      cells[1] = new StringCell(yourStringValue);
      DataRow row = new DefaultRow(key, cells);
      // .. end for each
      // at the end of execute 
      BufferedDataTable out = container.getTable();
      return new BufferedDataTable[] { out };
      outputSpec is the same data table spec you created above.
  • Assignment 3: Since your read mapper produces more then one entry per read use KNIME to filter the read table.
  • Assignment 4: Create a workflow using your tools, the reader node, and the filter to read the results of a mapping run into a KNIME table.
  • Assignment 5: Visualise the number of mapped reads per location.

Day 3 (03.07)

  • Exam assignment lottery.
  • Assignment 1: Benchmark your read mapper against SeqAn's Razers3. Razers3, if configured correctly, is a full sensitive read mapper, so your tool should report all hits that Razers3 reports. If not, check your tool. Use the Drosophila data set Reads&Genome (drosophila_*)
    • Integrate Razers3 parallel to your own mapper into your read mapping pipeline.
    • Create a reader node for Razers3's output.
    • Map the results of Razers3 to your own filtered results.
    • Customise your read mapper to map also against the reverse strand and compare the results to those of Razers3. Note: Your read mapper should report always the position on the forward strand.
    • (optional) Customise your read mapper to report also the number of errors and compare them to the ones reported by Razers3.

Day 4 (10.07)

  • Exams

Topic revision: r18 - 03 Jul 2013, StephanAiche
  • Printable version of this topic (p) Printable version of this topic (p)