Information for Students
This guide provides an overview of the tasks you will need to complete before the semester begins, during the semester, and at the end of the semester. This high-level information will help you navigate your coursework effectively using the UC Berkeley DataHub.
Before the Semester Begins
1. Browser and Internet Check
-
Browser Compatibility: Ensure you are using a compatible web browser (Chrome, Firefox, Safari) that is updated to the latest version.
-
Internet Connection: Verify you have a stable internet connection to access DataHub smoothly.
2. Account Setup and Access
-
Activate CalNet ID: Ensure your CalNet ID(link is external) is activated and functioning properly.
-
Access DataHub: Go to datahub.berkeley.edu(link is external) and log in with your CalNet ID to verify you can access the platform. ( Allow bcourses to authenticate “DataHub is requesting access to your account”.)
3. Familiarize Yourself with DataHub
-
Overview: Review student resources(link is external) to understand the varied features of DataHub.
-
Interface Tour: Explore the DataHub interface, including the JupyterLab(link is external) and RStudio(link is external) environments.
During the Semester
1. Accessing Course Materials
-
Course Hub: Use nbgitpuller(link is external) links shared by your course instructors to launch notebooks in DataHub
-
Notebooks and Scripts: Open and work on Jupyter Notebooks, R scripts, or other files as provided by your instructor.
-
To manage files in JupyterHub:
-
To upload a file, click the "Upload" button(link is external) in the DataHub interface and select the file from your local machine.
-
To download a file, right-click on the file in the DataHub interface and select the "Download" option(link is external)
2. Completing Assignments
-
Regular Use: Regularly log in to DataHub to complete assignments, run analyses, and work on projects.
-
Save Work: Save your progress frequently. It's good practice to manually save your work as well.
-
Check Storage Space: Delete unnecessary files; Regularly check the storage size of the home directories and back up content if you are exceeding 5-10 GB. You can do this by opening a Terminal, and executing `du -sh`
-
Don’t Duplicate Shared Directory Content: If your course work requires shared directories where instructors are storing large datasets, don’t create a copy of the same files in your home directory. As a practice, always read data from the shared directories.
-
Do Your Work in Sub Directories: Create a sub directory for each assignment and do your work there. Avoid working on assignments from the root directory as they may lead to data issues if done wrongly.
3. Collaboration and Sharing
-
Note: DataHub doesn’t offer collaboration(link is external) tools that allow students to work with each other on the same notebook. We are testing the feature extensively and will roll it out when the known security and data corruption issues are solved.
-
Instructor Feedback: Share your work with instructors or TAs for feedback by downloading and submitting your notebooks as required.
4. Troubleshooting
-
Restart Kernel/Server: Try restarting your kernel(link is external) as a classic troubleshooting step to see if the error goes away. If the problem persists, restart your server(link is external)
-
Having too many notebooks open on Datahub can cause issues. To check running processes and kill them follow instructions to kill process in Curriculum Guide(link is external)
-
Support: Reach out to course TAs for technical help, and they will contact the DataHub staff if they are unable to resolve your issue.
-
What if I can’t access DataHub?
-
Ensure your CalNet ID is active and try logging in again. If the problem persists, inform your TA or check out this guide for additional help.
-
How do I install additional packages?
-
Use `!pip install package-name` in a Jupyter Notebook cell for Python packages, or `install.packages("package-name")` in the R console.
-
Can I use DataHub off-campus?
-
Yes, you can access DataHub from anywhere with an internet connection.
-
What should I do if I encounter a technical issue?
-
First, try restarting your kernel. If the issue persists, contact your course TA, and if they can’t resolve it they will reach out to DataHub staff.
End of the Semester
1. Backup Your Work
-
Backup coursework: Back up your notebooks and data to either your personal device or an external storage service like Google Drive, Dropbox. The data present in the course specific hubs will go away by the end of the semester.
-
User Home Directory Archiving: Files unused for 30 days will be archived and stored in a low cost storage. You will need to open a request with the DataHub team(link is external) to retrieve your unused files.
2. Clean Up Your Workspace
-
Clean Up: Clean up your DataHub workspace by deleting unnecessary files and folders.
-
Feedback: Provide feedback on your experience with the DataHub team to help improve the service for future students.