Scientific Computing Training Series

Cornell University's Center for Advanced Computing together with Weill Cornell Medicine's Scientific Computing Unit, ITS, and Clinical and Translational Science Center are pleased to offer a new Scientific Computing Training Series. 

Training sessions are held via Zoom. They are open to all workforce members and students of Cornell University, WCM, WCM-Q, and Cornell Tech. They are free of charge and there is no need to apply. Click the title of each course below to find course information and Zoom registration links.

Download a Scientific Computing Training Series flyer:

Winter/Spring 2024

Feb. 6: Introduction to Python

Time: 9am-10am ET, with optional discussion time immediately afterward.

InstructorBen Trumbore, Cornell University Center for Advanced Computing

DescriptionThis lecture will introduce the Python programming language, the Python software ecosystem, some key concepts in computer programming, and how those concepts are implemented in Python. The Python ecosystem contains a rich set of packages and tools to support research and data analysis in several different application areas; being able to use the Python programming language to customize computing workflows that leverage those tools enhances researcher productivity and capability. The material is intended both for people new to programming or new to Python who want to get started, and for more experienced Python programmers who would like to get a different perspective on how Python supports a variety of programming tasks. This will be an encore presentation of material from a previous SCTS talk.

Level: Introductory

Prereqs: None

Feb. 13: Intermediate Applied R

Time: 9am-10am ET, with optional discussion time immediately afterward.

InstructorChris Cameron, Cornell University Center for Advanced Computing

DescriptionPractical R techniques and best practices for researchers. Topics include R as a programming language, performance and optimization, and parallelizing R code.

Level: Intermediate

PrereqsSome familiarity with R    

Feb. 20: Intermediate Python

Time: 9am-10am ET, with optional discussion time immediately afterward.

Instructor: Chris Myers, Cornell University Center for Advanced Computing

DescriptionPython is both a programming language and a software ecosystem that is widely used to support many tasks. Programmers new to Python are able to begin working productively for simple tasks and for well-documented pipelines. This lecture will address some more advanced features of both the language and the ecosystem that are useful for tackling more complex and numerically intensive computations that arise in scientific computing. 

March 5: Python Use Cases

Time: 9am-10am ET, with optional discussion time immediately afterward.

InstructorChris Cameron, Cornell University Center for Advanced Computing

DescriptionA collection of use cases for Python based on examples solicited from WCM researchers and CAC involvement in research projects.

March 12: AI, Machine Learning, and Deep Learning with Python

Time: 9am-10am ET, with optional discussion time immediately afterward.

Instructor: Chris Myers, Cornell University Center for Advanced Computing

DescriptionArtificial intelligence, Machine Learning, and Deep Learning with neural networks comprise a powerful set of tools and techniques to enable computer programs to learn from data. Many popular packages supporting these approaches are available for use with the Python programming language, due in part to Python's expressive syntax, rich ecosystem, and ability to link to code written in other languages. This lecture will provide an overview of Machine Learning and Deep Learning, an introduction to some key Python packages supporting work in those areas, and some examples of their use.

Level: Intermediate

PrereqsSome familiarity with Python 

March 19: Deep Learning and Generative AI Use at Research Hospitals

Time: 9am-10am ET, with optional discussion time immediately afterward.

InstructorBennett Wineholt, Cornell University Center for Advanced Computing

DescriptionHospitals are already deploying deep learning and generative AI technologies to benefit patients. How are they deriving benefits and safeguarding quality of patient care while deploying these cutting-edge technologies? We will focus on two specific use cases which illustrate how to deploy and supervise deep learning techniques and large language models in health care settings. Image segmentation of multi-channel brain MR images can assist radiologists in identifying areas of significant medical interest. Hospital patient readmission prediction from reading providers' unstructured notes can save patient lives by keeping needful patients in the hospital, while allowing lower risk patients the convenience and time savings to head home.

LevelIntermediate

PrereqsFamiliarity with Python and Linux 

April 9: AWS 101

Time: 9am-10am ET, with optional discussion time immediately afterward.

Instructor: AWS Solutions Architect Team 

Description

In this session, we will cover the AWS services which are most commonly used in research: EC2 and AWS storage. 

1. EC2 - What is EC2? Types of EC2 instances? how to save costs when using EC2?
2. Storage on AWS - What is S3, EBS, EFS? When to use? Various features?
3. The Cornell team will cover the process and policy for provisioning AWS accounts.

Slides

Recording

 

April 16: SageMaker 101 (AI/ML on AWS)

Time: 9am-10am ET, with optional discussion time immediately afterward.

Instructor: AWS Solutions Architect Team 

DescriptionAmazon SageMaker is the AI/ML service used to build, train and deploy models. It encompasses a broad set of tools like Studio, Canvas, notebook instances, debuggers, profilers, pipelines, MLOps, and more. In this session, we will cover the below:

1. Introduction to AI/ML on AWS
2. Introduction to SageMaker and its capabilities.

LevelIntroductory

PrereqsNone

Slides

Recording

 

April 23: SageMaker Studio

Time: 9am-10am ET, with optional discussion time immediately afterward.

Instructor: AWS Solutions Architect Team 

DescriptionAmazon Sagemaker Studio, a part of the SageMaker umbrella of services, is an IDE for ML that provides a single unified interface for all the tools, including Jupyter notebooks and RStudio, you need to take your models from experimentation to production and boost your productivity. SM Studio provides access to all ML resources in one place.

In this session, we will provide an overview of SM studio and its features: Data preparation and Feature Engineering using Data Wrangler, Build ML models using Jupyter notebooks, Train and Deploy models.

LevelIntroductory

PrereqsNone

Registerhttps://cornell.zoom.us/meeting/register/tJ0lfuqurzsqE9da1XVHsKznPf10y66fjT6c

April 30: GenAI on AWS

Time: 9am-10am ET, with optional discussion time immediately afterward.

Instructor: AWS Solutions Architect Team 

Description: In this session, we will introduce you to the Generative AI landscape on AWS. We will discuss the AWS services used to build applications for summarization, image generation, chatbots, and fundamentals of Retrieval Augmented Generation for Generative AI.

1. Introduction to Bedrock, Knowledgebase and Agents and how to use them to build Gen AI applications.
2. Introduction to Amazon Q for Business to build a virtual assistant.
 

LevelIntroductory

PrereqsNone

Registerhttps://cornell.zoom.us/meeting/register/tJEtcu-sqj4pGNGZpRTUHTMKYaAOoaUlsaPc

Fall 2023

Oct. 4: Introduction to Python

InstructorChris Myers

DescriptionThis lecture will introduce the Python programming language, the Python software ecosystem, some key concepts in computer programming, and how those concepts are implemented in Python. The Python ecosystem contains a rich set of packages and tools to support research and data analysis in several different application areas; being able to use the Python programming language to customize computing workflows that leverage those tools enhances researcher productivity and capability. The material is intended both for people new to programming or new to Python who want to get started, and for more experienced Python programmers who would like to get a different perspective on how Python supports a variety of programming tasks.

Level: Introductory

Prereqs: None

Oct. 18: JupyterLab (in the Cloud) for Python

InstructorChristopher Cameron

DescriptionJupyter is the most common Python interface used by researchers. Cloud computing providers, like Amazon and Google, offer their own Jupyter-alike interfaces. This lecture provides an overview of the JupyterLab interface and cloud-based derivatives. It is designed to familiarize new users with common Jupyter-like interfaces, features and best practices.

Level: Introductory

 

Nov 1: Scientific Computing with Python (with hands-on)

InstructorChris Myers

DescriptionThis lecture will provide an overview of select core components of the Python software ecosystem for scientific computing and data science, with a particular focus on numpy, scipy, pandas, and matplotlib. The lecture will include both descriptions of the overall design and structure of those packages and their key components, and numerous code examples that demonstrate some of the important functionality. Opportunities for live, hands-on exercises using these packages will be integrated throughout the lecture, all as part of a Jupyter notebook that will include both the lecture content and the hands-on exercises. The Python ecosystem for scientific computing and data science ecosystem enables researchers to use proven and widely-used tools that are easily customized for specific problems using the Python programming language.

Level: Intermediate

PrereqsSome familiarity with the Python programming language or other languages used for scientific computing (e.g., R, MATLAB) would be useful, but is not required.  The hands-on exercises will be coordinated through the use of an online cloud environment providing support for running Jupyter notebooks, although participants should feel free to use their own local machines if they are familiar with running Jupyter and installing whatever additional packages might be necessary.  Instructions about these details will be circulated in advance of the lecture, but participants should be prepared to set up and/or sign up for accounts in those environments before the lecture so that they are ready to run the hands-on exercises during the allotted time.

Nov. 15: Getting Started with R

InstructorChristopher Cameron

DescriptionRStudio is a common R interface used by researchers. This overview is designed to familiarize new users with the interface, features and best practices so they are ready to delve into conducting their own analyses. New content includes an overview of Quarto — an evolution of Rmarkdown — for documenting and sharing data analysis and research.

Nov. 29: Data Analysis with R

InstructorChristopher Cameron

Description: This lecture presents several examples of data analysis and visualization in R. It will demonstrate a variety of analyses intended to help researchers determine if learning R is a good investment for their research, including new data analysis examples drawn from the WCM community.

Level: Introductory/Intermediate

Prereqs: Some familiarity with R

Winter/Spring 2023

Feb. 7: Data Management in Science Research

Instructor: Adam Brazier

Description: An overview of managing data workflows for scientific computing, starting with data collection and aggregation, through processing and storing in an accessible form. We will cover some issues relating to security policy, integration of Identity and Access Management and retention policy (but this is not a security policy workshop!), possible storage venues and formats, models for aggregating and distributing data such as the pub/sub model, and modes of data storage such as relational database, file system, cloud, noSQL, Data Lake, etc.

Level: Introductory/Intermediate

Prereqs: Some knowledge of software and data processing

Time: 9am-10am EST

Feb. 14: R Basics

Instructor: Christopher Cameron

Description: Learn to read R analysis scripts in this introduction to the R language. We will examine language fundamentals like built-in in data types, conditional execution, flow control, and indexing, then look at some basic data summary and modeling functions with an emphasis on how R is meant to be used. 

Level: Introductory

Prereqs: Some experience with statistical concepts and tabular data formats or spreadsheet software. 

Mar. 14: Python for Scientific Computing and Data Science

Instructor: Chris Myers

Description: An examination of the core components of the Python software ecosystem for scientific computing and data science, with a particular focus on numpy, scipy, and pandas. This lecture will describe the overall design and structure of these packages and some of their components, complemented by code examples that demonstrate some of the key functionality. Also addressed will be issues of performance and the integration of these core packages in the larger Python ecosystem.

Level: Intermediate

Prereqs: Some familiarity with the Python language would be useful but is not required.

Time: 9am-10am EDT

Mar. 21: Python for Digital Humanities and Social Science

Instructor: Christopher Cameron

Description: Humans generate messy data. While statistics-focused environments like R and Stata are great for data analysis, these specialized tools can be difficult to use with data that defies tabular representation. Human data, like written language, social relationships, images, and social media content, require flexible tools that can handle complexity. In this talk, we will provide an overview of Python, highlight how this free and open-source programming language supports digital humanities and social science research, and discuss Cornell and web-based resources to help you get started using Python in your research.

Level: Introductory

Prereqs: Some experience working with tabular data formats or spreadsheet software is helpful but not required.

Time: 12:15pm-1:15pm EDT

Mar. 28: Creating the Best Visualizations for your Data

Instructor: Ben Trumbore

Description: An introduction to choosing the best type of chart to use for the data you have and the message you want to convey. Includes a breakdown of the different types of data you might have and descriptions of the main types of 2D data visualization. Does not include instruction for any particular visualization tool.

Level: Introductory

Prereqs: None

Time: 9am-10am EDT

Apr. 11: Revision Control with Git

Instructor: Steve Lantz

Description: Git is a widely used tool for revision tracking and collaborative code development. The talk introduces Git and how to use it effectively in conjunction with a repository hosting service like GitHub.

Level: Intermediate

Prereqs: Programming ability and activity at a level to warrant revision tracking and (possibly) collaborative development of codes.

Apr. 25: Python for Data Visualization

Instructor: Chris Myers

Description: An examination of some of the Python packages that support data visualization for various use cases, providing both a general discussion of capabilities and multiple code examples demonstrating specific functionality. This lecture will address the generation of both static images suitable for inclusion in publications and presentations, and interactive data visualizations useful for exploring complex datasets and steering computations. Packages examined include matplotlib, pandas, seaborn, plotnine, bokeh, plotly, and possibly others.

Level: Introductory/Intermediate

Prereqs: Some familiarity with the Python language would be useful but is not required.

May 2: Research Project Software Continuity

Instructor: Adam Brazier

Description: While producing long-lasting software in academic research domains shares many of the same problems as commercial development, the environment is often different. In particular, the number of coders is often smaller, the people writing code may be learning as they go, development of software is often not their main career goal, and the funding model is different. This means that industry approaches to producing, maintaining, and operating software may not apply, or may have to be modified for the research environment. In this talk we will see some ideas, based on experience of research software at a variety of scales, to suit the different situations in which researchers develop software.

May 9: Working with Excel Files in Python and C#

Instructor: Ben Trumbore

Description: An introduction to working with Excel spreadsheets from within computer programs and scripts. Python and C# examples will be given for reading Excel files and accessing their contents, as well as populating, formatting, and writing new Excel files.

Level: Intermediate

Prereqs: Familiarity with Excel files and modest experience programming in Python, C# or Java.

May 23: Case Study - Scripting ImageJ and PowerPoint with Python

Instructor: Christopher Cameron

Description: Do you have a workflow with elements that can be automated? Sometimes the hardest part is knowing what might be possible. This case study involves using Python to process multichannel confocal microscopy images with ImageJ and then organize the output into PowerPoint slides.

Level: Introductory

Prereqs: None

 

June 6: Using the Whole Processor

Instructor: Steve Lantz

Description: Parallel processing is no longer just a concern for supercomputers--these days, it takes place in nearly all computing devices down to laptops and cell phones. This presentation describes parallel computing capabilities that are found within single processors and how applications can access them through techniques such as multithreading and vectorization.

Level: Introductory/Intermediate

Prereqs: Familiarity with programming in any language and with using a command-line interface 

June 20: Using Relational Databases for Research

Instructor: Adam Brazier

Description: An introduction to the use of relational (SQL) databases, with a brief overview of database structure then covering SQL queries, some information on best practices, and development tools. We will mostly deal with ANSI SQL which will run on most Relational Database Management Systems (RDBMs), noting some important inter-RDBMS differences. Covered will be SQL queries for data retrieval, insertion and deletion, correlated subqueries and how to construct a complicated query. We will also discuss the interface between the database and the code, including the use of Object-Relational Model tools and stored procedures. 

Level: Introductory/Intermediate

Prereqs: This session does not require having knowledge of how to write SQL queries for data extraction, insertion, and deletion, but is a convenient companion to such a workshop or pre-existing knowledge.

Fall 2022

Nov. 1: Data Transfer Tools

Instructor: Ben Trumbore

Description: An introduction to the numerous tools that can be used to transfer data between computer systems and to cloud storage. Different tools offer different transfer speeds, security, and ease of use, and some automatically recover from failures and allow syncing between systems. Tools covered in this session include FTP, SCP, SFTP, rsync, rclone and Globus.

Prereqs: Familiarity with basic Linux commands.

Time: 9am-10am EST

Slides + Recording:

 

Nov. 9: Introduction to Modern R Data Analysis

Instructor: Christopher Cameron

Description: Overview of R Studio interfaces (scripts, console and Rmarkdown notebooks) and popular modern data manipulation and visualization packages in R (especially ggplot and dplyr). This is primarily useful for people seeking an entry point into the R ecosystem for research and data analysis tasks.

PrereqsSome experience with statistical concepts and tabular data formats or spreadsheet software.

Time: 9am-10am EST

Nov. 15: Introduction to Jupyter Lab for Python

Instructor: Christopher Cameron

Description: Jupyter Notebooks are a popular format for scientific communication that intermingles descriptive text, code, statistical analysis, results, and visualizations in a single document. This workshop showcases the features of Jupyter notebooks, demonstrates how to use and share notebooks effectively, and explains how to address common pain points. This workshop is useful for people who send, receive or use Jupyter notebooks. 

Prereqs: A basic familiarity with interactive Python is helpful but not required. 

Time: 9am-10am EST



Dec. 6: Introduction to Python

Instructor: Chris Myers

Description: An introduction to both the Python programming language and the broader Python software ecosystem of packages that support different sorts of tasks, for those interested in learning the language or deciding if Python is something that they want to learn more about. Where pertinent, connections to other programming languages and technical computing environments will be highlighted.

Prereqs: No knowledge of the Python language is assumed. Some prior programming experience in any language would be helpful, since there will be some expectation of familiarity with basic programming concepts.

Time: 9am-10am EST

 

Dec. 13: Linux for Researchers

Instructor: Steve Lantz

Description: Presents an introduction to using Linux operating systems. Includes practical techniques for working with the file system, descriptions of common commands and information about customizing a user’s environment. Can be tailored to a specific flavor of Linux.

Prereqs: Some familiarity with hierarchal file systems and a modern computer operating system (macOS, Linux, or Windows).

Time: 9am-10am EST

 

Need Help?

myHelpdesk
(212) 746-4878
Monday-Sunday
Open: 24/7 (Excluding holidays)
SMARTDesk
WCM Library Commons
1300 York Ave
New York, NY
10065
M-F
9AM - 5PM
Make an appointment

575 Lexington Ave
3rd Floor
New York, NY
10022
Temporarily Closed