Tutorials

Tutorials

The following is a list of online self-study tutorials prepared by the SCF and partners. Note that a zip file with all the (non-screencast) materials for each tutorial can be found by following the (materials on Github) link and using the "Download ZIP" button in the lower right of the Github page.

Basics of UNIX

Provides a basic introduction to the UNIX command line (including Linux and the Mac terminal).
Last updated August 2021. Prepared by Chris Paciorek.

Introduction to LaTeX

A quick introduction to LaTeX, a powerful and flexible system for formatting documents, especially those using mathematical notation. Focuses on demonstration using a concrete example.
Last updated August 2015. Prepared by Chris Paciorek.

Dynamic documents with code chunks

A quick introduction to embedding R, bash, and Python code in PDF and HTML documents using R Markdown, LaTeX based (knitr and Sweave) formats, and Jupyter notebooks.
Last updated November 2019. Prepared by Chris Paciorek.

Introduction to git and Github

The basics of git, a version control system, and hosting git repositories on Github.
Last updated August 2017. Prepared by Jarrod Millman.

Using the bash shell

UNIX utilities, shortcuts, shell scripting, job control, and regular expressions.
Last updated September 2019. Prepared by Jarrod Millman and Chris Paciorek.

String processing

String processing, including regular expressions, in R and Python.
Last updated September 2019. Prepared by Chris Paciorek.

Flexible parallel processing using Dask in Python and future in R

Parallel processing on one or more machines, including using distributed datasets in Dask.
Last updated April 2020. Prepared by Chris Paciorek.

Working with large datasets in SQL, R, and Python

Using databases from R and Python, plus material on packages in R for working with large datasets.
Last updated January 2021. Prepared by Chris Paciorek.

Using make for workflows

How to use make to automate workflows and make them reproducible.
Last updated August 2015. Prepared by Chris Paciorek.

Writing efficient R code

How to assess the speed of your code and write code that will run quickly in R.
Last updated October 2021. Prepared by Chris Paciorek.

Debugging in R

How to use R's debugging tools, handle errors, and avoid bugs.
Last updated August 2021. Prepared by Chris Paciorek.

Parallel processing basics in R, Python, Matlab, and C

How to use threaded linear algebra and basic parallel looping on a single computer with multiple cores.
Last updated November 2017. Prepared by Chris Paciorek.

Distributed parallel processing in R, Python, Matlab, and C

How to use parallelization tools for distributed computing (multiple computers or cluster nodes) in R, Python, Matlab, and C.
Last updated September 2017. Prepared by Chris Paciorek.

Last updated March 2021.