CommunityData:Introduction to CDSC Resources: Difference between revisions

From CommunityData
Line 23: Line 23:


When using Hyak (or other servers), these pages might be helpful:
When using Hyak (or other servers), these pages might be helpful:
* [[CommunityData:Tmux]] — Using tmux (terminal multiplexer) to keep a persistent session on a server. You can check out the [https://github.com/tmux/tmux/wiki tmux git repo] or its [https://en.wikipedia.org/wiki/Tmux Wikipedia page] for more information about this.  
* [[CommunityData:Tmux]] — You can use tmux (terminal multiplexer) to keep a persistent session on a server. Check out the [https://github.com/tmux/tmux/wiki tmux git repo] or its [https://en.wikipedia.org/wiki/Tmux Wikipedia page] for more information about this.  
* [[CommunityData:Hyak Spark]] — Spark is a powerful tool that helps build programs dealing with large datasets. Read through the documentation linked to see if it's something that would make sense for your project.
* [[CommunityData:Hyak Spark]] — Spark is a powerful tool that helps build programs dealing with large datasets.


Nada is used for backups.
Nada is used for backups.

Revision as of 02:47, 8 August 2019

If you're new to the group, welcome!

This is an introduction to the various technical tools we use (as we use many) in our research work. It may be helpful to look at before diving into everything and starting your research with/in this group. You can find any of the resources mentioned below on the Resources page, (mostly) organized by alphabetical order for quick finding.

To start, here's some common shorthand that members might use.

Communication Channels

  • One might contact specific members directly.
  • We use email lists to communicate things relevant to the entire group or subgroup, like upcoming events or circulating papers for feedback: CDSC - Email
  • We communicate (chat) much more frequently on IRC
  • For weekly meetings and other (video)calls, we videocall using Jitsi. There are a lot of us, which can make calls a little hectic, so please keep in mind some Jitsi etiquette.
  • We also have a calendar of group-wide events: CDSC Calendar.

Shared Resources

  • We maintain a large shared Zotero directory that is really helpful for finding relevant papers and smooths the process of collaboration (as one can see the papers and sources stored by collaborators as well).
  • We also have a Git repository with some shared resources (both technical and non-technical) on it:

Servers and Data Stuff

Hyak is a supercomputer system that may or may not be relevant to your research project. For example, if you're running code on a huge dataset, you might want to use Hyak. You can learn more about it here. If you want to get an account and get set up on Hyak, look at the Hyak Set-Up page:

When using Hyak (or other servers), these pages might be helpful:

Nada is used for backups.

Re: Wiki Data

Creating Documents and Presentations

Planning

You can develop a research plan in whatever way works best, but one thing that may be useful is the outline of a Matsuzaki-style planning documents. You can see a detailed outline description here to help guide the planning process. If you scroll to the bottom, you'll see who to contact to get some good examples of planning documents.

Paper building

We typically write LaTeX documents. One option to do this is to use the web-based Overleaf. Another option, using CDSC TeX templates, is detailed here. These comes with some assumptions about your workflow, which you can learn about here: CommunityData:Build papers.

If you're creating graphs and tables or formatting numbers in R that you want to put into a TeX document, you should look at the knitr package.

Some more specific things that might crop up:

  • CommunityData:Embedding fonts in PDFsggplot2 creates PDFs with fonts that are not embedded which, in turn, causes the ACM to bounce our papers back. This page describes how to fix it.

Building presentation slides

Below are some options to creating presentation slides (though, feel free to use what you want nd are most comfortable with):

  • CommunityData:Beamer — Beamer is a LaTeX document class for creating presentation slides. This is a link to installing/using Mako's beamer templates.
    • Again, like the CDSC TeX templates, these Beamer templates also come with some assumptions about your workflow, which you can learn about here (again): CommunityData:Build papers.

Misc. Resources

Technical

Non-technical