Hi, I'm Mark

Welcome! My name is Mark Roepke and I'm a data scientist interested in data science education.

For my day job, I work as a Curriculum Engineer at Databricks in the education space and focus on the design and development of Spark-focused training curriculum and certification exams for developers and data scientists.

I am specifically interested in the area of data science education and provide data science workshops in various data science technologies. I have given workshops in corporate and university settings in R, Python, and GitHub. I am certified by RStudio to design and deliver tidyverse training workshops. If you're interested in connecting about providing a data science training workshop, feel free to contact me using a method at the bottom of the site.


Blog

  • Getting Started with Spark

    This post will walk through how to get started with Apache Spark. It will cover a high-level overview of Spark, how to install Spark, and how to launch Spark in a few different ways. Spark Overview What is Spark? Apache Spark, originally founded in 2009 at UC Berkeley, is a cluster computing platform designed to be fast and general through distributed computing. It unifies common data tasks like batch applications, iterative algorithms, interactive queries, and...

  • Tips for Creating Slideshows in Jupyter

    In a previous post I wrote about how to create interactive slideshows in Jupyter Notebooks. It covered the benefits, basic technology, basic slide creation, and how to view and operate the slideshow. This post dives a little deeper by providing key tips on creating better-looking slides, customizing formatting, exporting to HTML, hosting slides live online, etc. Tip 1: Split slides into two columns A built-in option in slideshow platforms like PowerPoint is to split content...

  • Article: Data Science Training @ 84.51°

    I recently co-wrote an article on data science training at 84.51° with Brad Boehmke. Brad’s role is specifically designed to focus on the areas of training and internal tool development, and a lot of my side-of-desk time goes toward designing, developing and delivering trainings within the data science function. The article discusses our research-backed strategies on the development of data scientists through focused and objective-based trainings. We discuss how we organize our training workshops and...

  • Creating Interactive Slideshows in Jupyter Notebooks

    Have you ever wanted to create a slideshow using Python? I’ve found a lot of useful tools for making slideshows in Jupyter Notebooks while developing Python for data science workshops for the University of Cincinnati and 84.51°, but I’m yet to see all of this information in one place. This blog post changes that by directly teaching you how to create interactive slideshows in Jupyter Notebooks. This post covers the benefits of using Python and...