Scientific Computing Training Series

Cornell University’s Center for Advanced Computing together with Weill Cornell Medicine's Scientific Computing Unit, ITS, and Clinical and Translational Science Center are pleased to offer a new Scientific Computing Training Series. 

Training sessions are held via Zoom. They are open to all workforce members and students of Cornell University, WCM, WCM-Q, and Cornell Tech. They are free of charge and there is no need to apply. Click the title of each course below to find course information and Zoom registration links.

Download a Scientific Computing Training Series flyer:

Fall 2022

Nov. 1: Data Transfer Tools

Instructor: Ben Trumbore

Description: An introduction to the numerous tools that can be used to transfer data between computer systems and to cloud storage. Different tools offer different transfer speeds, security, and ease of use, and some automatically recover from failures and allow syncing between systems. Tools covered in this session include FTP, SCP, SFTP, rsync, rclone and Globus.

Prereqs: Familiarity with basic Linux commands.

Time: 9am-10am EST

Slides + Recording:

 

Nov. 9: Introduction to Modern R Data Analysis

Instructor: Christopher Cameron

Description: Overview of R Studio interfaces (scripts, console and Rmarkdown notebooks) and popular modern data manipulation and visualization packages in R (especially ggplot and dplyr). This is primarily useful for people seeking an entry point into the R ecosystem for research and data analysis tasks.

PrereqsSome experience with statistical concepts and tabular data formats or spreadsheet software.

Time: 9am-10am EST

Nov. 15: Introduction to Jupyter Lab for Python

Instructor: Christopher Cameron

Description: Jupyter Notebooks are a popular format for scientific communication that intermingles descriptive text, code, statistical analysis, results, and visualizations in a single document. This workshop showcases the features of Jupyter notebooks, demonstrates how to use and share notebooks effectively, and explains how to address common pain points. This workshop is useful for people who send, receive or use Jupyter notebooks. 

Prereqs: A basic familiarity with interactive Python is helpful but not required. 

Time: 9am-10am EST



Dec. 6: Introduction to Python

Instructor: Chris Myers

Description: An introduction to both the Python programming language and the broader Python software ecosystem of packages that support different sorts of tasks, for those interested in learning the language or deciding if Python is something that they want to learn more about. Where pertinent, connections to other programming languages and technical computing environments will be highlighted.

Prereqs: No knowledge of the Python language is assumed. Some prior programming experience in any language would be helpful, since there will be some expectation of familiarity with basic programming concepts.

Time: 9am-10am EST

 

Dec. 13: Linux for Researchers

Instructor: Steve Lantz

Description: Presents an introduction to using Linux operating systems. Includes practical techniques for working with the file system, descriptions of common commands and information about customizing a user’s environment. Can be tailored to a specific flavor of Linux.

Prereqs: Some familiarity with hierarchal file systems and a modern computer operating system (macOS, Linux, or Windows).

Time: 9am-10am EST

 

Winter 2023

Feb. 7: Data Management in Science Research

Instructor: Adam Brazier

Description: An overview of managing data workflows for scientific computing, starting with data collection and aggregation, through processing and storing in an accessible form. We will cover some issues relating to security policy, integration of Identity and Access Management and retention policy (but this is not a security policy workshop!), possible storage venues and formats, models for aggregating and distributing data such as the pub/sub model, and modes of data storage such as relational database, file system, cloud, noSQL, Data Lake, etc.

Level: Introductory/Intermediate

Prereqs: Some knowledge of software and data processing

Time: 9am-10am EST

Feb. 14: R Basics

Instructor: Christopher Cameron

Description: Learn to read R analysis scripts in this introduction to the R language. We will examine language fundamentals like built-in in data types, conditional execution, flow control, and indexing, then look at some basic data summary and modeling functions with an emphasis on how R is meant to be used. 

Level: Introductory

Prereqs: Some experience with statistical concepts and tabular data formats or spreadsheet software. 

Mar. 14: Python for Scientific Computing and Data Science

Instructor: Chris Myers

Description: An examination of the core components of the Python software ecosystem for scientific computing and data science, with a particular focus on numpy, scipy, and pandas. This lecture will describe the overall design and structure of these packages and some of their components, complemented by code examples that demonstrate some of the key functionality. Also addressed will be issues of performance and the integration of these core packages in the larger Python ecosystem.

Level: Intermediate

Prereqs: Some familiarity with the Python language would be useful but is not required.

Time: 9am-10am EDT

Spring 2023

Mar. 21: Python for Digital Humanities and Social Science

Instructor: Christopher Cameron

Description: Humans generate messy data. While statistics-focused environments like R and Stata are great for data analysis, these specialized tools can be difficult to use with data that defies tabular representation. Human data, like written language, social relationships, images, and social media content, require flexible tools that can handle complexity. In this talk, we will provide an overview of Python, highlight how this free and open-source programming language supports digital humanities and social science research, and discuss Cornell and web-based resources to help you get started using Python in your research.

Level: Introductory

Prereqs: Some experience working with tabular data formats or spreadsheet software is helpful but not required.

Time: 12:15pm-1:15pm EDT

Mar. 28: Creating the Best Visualizations for your Data

Instructor: Ben Trumbore

Description: An introduction to choosing the best type of chart to use for the data you have and the message you want to convey. Includes a breakdown of the different types of data you might have and descriptions of the main types of 2D data visualization. Does not include instruction for any particular visualization tool.

Level: Introductory

Prereqs: None

Time: 9am-10am EDT

Apr. 11: Revision Control with Git

Instructor: Steve Lantz

Description: Git is a widely used tool for revision tracking and collaborative code development. The talk introduces Git and how to use it effectively in conjunction with a repository hosting service like GitHub.

Level: Intermediate

Prereqs: Programming ability and activity at a level to warrant revision tracking and (possibly) collaborative development of codes.

Apr. 25: Python for Data Visualization

Instructor: Chris Myers

Description: An examination of some of the Python packages that support data visualization for various use cases, providing both a general discussion of capabilities and multiple code examples demonstrating specific functionality. This lecture will address the generation of both static images suitable for inclusion in publications and presentations, and interactive data visualizations useful for exploring complex datasets and steering computations. Packages examined include matplotlib, pandas, seaborn, plotnine, bokeh, plotly, and possibly others.

Level: Introductory/Intermediate

Prereqs: Some familiarity with the Python language would be useful but is not required.

May 2: Research Project Software Continuity

Instructor: Adam Brazier

Description: While producing long-lasting software in academic research domains shares many of the same problems as commercial development, the environment is often different. In particular, the number of coders is often smaller, the people writing code may be learning as they go, development of software is often not their main career goal, and the funding model is different. This means that industry approaches to producing, maintaining, and operating software may not apply, or may have to be modified for the research environment. In this talk we will see some ideas, based on experience of research software at a variety of scales, to suit the different situations in which researchers develop software.

May 9: Working with Excel Files in Python and C#

Instructor: Ben Trumbore

Description: An introduction to working with Excel spreadsheets from within computer programs and scripts. Python and C# examples will be given for reading Excel files and accessing their contents, as well as populating, formatting, and writing new Excel files.

Level: Intermediate

Prereqs: Familiarity with Excel files and modest experience programming in Python, C# or Java.

May 23: Case Study - Scripting ImageJ and PowerPoint with Python

Instructor: Christopher Cameron

Description: Do you have a workflow with elements that can be automated? Sometimes the hardest part is knowing what might be possible. This case study involves using Python to process multichannel confocal microscopy images with ImageJ and then organize the output into PowerPoint slides.

Level: Introductory

Prereqs: None

 

June 6: Using the Whole Processor

Instructor: Steve Lantz

Description: Parallel processing is no longer just a concern for supercomputers--these days, it takes place in nearly all computing devices down to laptops and cell phones. This presentation describes parallel computing capabilities that are found within single processors and how applications can access them through techniques such as multithreading and vectorization.

Level: Introductory/Intermediate

Prereqs: Familiarity with programming in any language and with using a command-line interface 

Time: 9am-10am EDT

Registerhttps://cornell.zoom.us/meeting/register/tJcscu6hqzkoG90o_qUlX3aOha4AYE1xBGu6

June 20: Using Relational Databases for Research

Instructor: Adam Brazier

Description: An introduction to the use of relational (SQL) databases, with a brief overview of database structure then covering SQL queries, some information on best practices, and development tools. We will mostly deal with ANSI SQL which will run on most Relational Database Management Systems (RDBMs), noting some important inter-RDBMS differences. Covered will be SQL queries for data retrieval, insertion and deletion, correlated subqueries and how to construct a complicated query. We will also discuss the interface between the database and the code, including the use of Object-Relational Model tools and stored procedures. 

Level: Introductory/Intermediate

Prereqs: This session does not require having knowledge of how to write SQL queries for data extraction, insertion, and deletion, but is a convenient companion to such a workshop or pre-existing knowledge.

Time: 9am-10am EDT

Registerhttps://cornell.zoom.us/meeting/register/tJwrf-msrjspHN1tqB11pZ0Z6X7zXJpyt5Am

Need Help?

myHelpdesk
(212) 746-4878
Monday-Sunday
Open: 24/7 (Excluding holidays)
SMARTDesk
WCM Library Commons
1300 York Ave
New York, NY
10065
M-F
9AM - 5PM
Make an appointment

575 Lexington Ave
3rd Floor
New York, NY
10022
Temporarily Closed