Editing CommunityData:Introduction to CDSC Resources
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 5: | Line 5: | ||
To start, here's some [https://wiki.communitydata.science/CommunityData:Jargon common shorthand] that members might use. It's a little outdated but has some acronyms, names of things, etc. that might pop up in conversation. | To start, here's some [https://wiki.communitydata.science/CommunityData:Jargon common shorthand] that members might use. It's a little outdated but has some acronyms, names of things, etc. that might pop up in conversation. | ||
There's [https://communitydata.science/~mako/cdsc_only/jitsi-onboarding_session-20200629.mp4 a video introduction from June 29 2020 online here]. It's hosted in [[Mako's]] <code>cdsc_only</code> video | There's [https://communitydata.science/~mako/cdsc_only/jitsi-onboarding_session-20200629.mp4 a video introduction from June 29 2020 online here]. It's hosted in [[Mako's]] <code>cdsc_only</code> video pile so there's a username and and password but you can ask anybody in the group and they should be able to get it by searching their email for "cdsc_only". | ||
== Communication Channels == | == Communication Channels == | ||
Line 18: | Line 18: | ||
We also have some public facing channels: | We also have some public facing channels: | ||
* | * It's publish we publish material on the [https://blog.communitydata.science Community Data Science blog] | ||
* We have a collective Twitter account ([https://twitter.com/comdatasci]). We can grant you access | |||
== Collaboration tools == | == Collaboration tools == | ||
* '' | * ''This wiki'': The CDSC Wiki includes group resources, as well as things like research project pages and course websites. It is highly recommended that you create an account and then reach out to someone else in the group to make you an admin. This will help you to avoid having your edits reverted. | ||
* ''Bibliographic references'': We maintain a large shared [[CommunityData:Zotero|Zotero]] directory that is really helpful for finding relevant papers and smooths the process of collaboration (as one can see the papers and sources stored by collaborators as well). Please review the Zotero etiquette described on the " | * ''Bibliographic references'': We maintain a large shared [[CommunityData:Zotero|Zotero]] directory that is really helpful for finding relevant papers and smooths the process of collaboration (as one can see the papers and sources stored by collaborators as well). Please review the Zotero etiquette described on the "Adding and Organizing References" and "Tips and Tricks" sections of [[CommunityData:Zotero|Zotero]] before using the shared folder. | ||
* ''LaTeX authoring'': Some of us work on papers and presentations together in [https://overleaf.com Overleaf]. See additional info about this [[CommunityData:Introduction_to_CDSC_Resources#Creating_Documents_and_Presentations|below]]. You can get a free account to join a project or two and use the basic functionalities of Overleaf. More sustained use of more features probably means you should join the cdsc account or another paid account. We don't have a CDSC overleaf info page (yet) | * ''LaTeX authoring'': Some of us work on papers and presentations together in [https://overleaf.com Overleaf]. See additional info about this [[CommunityData:Introduction_to_CDSC_Resources#Creating_Documents_and_Presentations|below]]. You can get a free account to join a project or two and use the basic functionalities of Overleaf. More sustained use of more features probably means you should join the cdsc account or another paid account. We don't have a CDSC overleaf info page (yet), but if you think you need to join the group account, contact Aaron about that. | ||
* ''Version control'': We also have a Git repository with some shared resources (both technical and non-technical) on it: | * ''Version control'': We also have a Git repository with some shared resources (both technical and non-technical) on it: | ||
** ''Git repositories'': [[CommunityData:Git]] — How to get set up on the git server to create, clone, work on/in shared git repositories we maintain. | ** ''Git repositories'': [[CommunityData:Git]] — How to get set up on the git server to create, clone, work on/in shared git repositories we maintain. | ||
Line 32: | Line 32: | ||
Much of our work is pretty computational/quantitative and involves large datasets. We have multiple computing resources and servers. | Much of our work is pretty computational/quantitative and involves large datasets. We have multiple computing resources and servers. | ||
;Hyak: Hyak is a supercomputer system that is hosted at UW but that the whole group uses for conducting statistical analysis and data processing. Hyak is necessary if you need large amounts of storage (e.g., tens of terabytes) or if you need large amount of computational resources (e.g., CPU time, memory, etc). '' | ;Hyak: Hyak is a supercomputer system that is hosted at UW but that the whole group uses for conducting statistical analysis and data processing. Hyak is necessary if you need large amounts of storage (e.g., tens of terabytes) or if you need large amount of computational resources (e.g., CPU time, memory, etc). ''Severs in Hyak do not direct access to the Internet.'' That means that Hyak is not useful for collecting data from APIs, etc. Access requires a UW NetID but they will be sponsored for you. You can learn more about it at: [[CommunityData:Hyak]] which has various links to tutorials/documentation as well. | ||
:In order to use Hyak, you need to get an account setup. This is documented on [[CommunityData:Hyak setup]]. | :In order to use Hyak, you need to get an account setup. This is documented on [[CommunityData:Hyak setup]]. | ||
Line 44: | Line 44: | ||
When using servers, these pages might be helpful: | When using servers, these pages might be helpful: | ||
* [[CommunityData:Tmux]] — You can use tmux (terminal multiplexer) to keep a persistent session on a server, even if you're not logged into the server. This is especially helpful when you ssh to a server and then run a job that runs for quite a while and then you can't stay logged in the whole time. Check out the [https://github.com/tmux/tmux/wiki tmux git repo] or its [https://en.wikipedia.org/wiki/Tmux Wikipedia page] for more information about this. | * [[CommunityData:Tmux]] — You can use tmux (terminal multiplexer) to keep a persistent session on a server, even if you're not logged into the server. This is especially helpful when you ssh to a server and then run a job that runs for quite a while and then you can't stay logged in the whole time. Check out the [https://github.com/tmux/tmux/wiki tmux git repo] or its [https://en.wikipedia.org/wiki/Tmux Wikipedia page] for more information about this. | ||
* [[CommunityData:Hyak Spark]] — Spark is a powerful tool that helps build programs dealing with large datasets. It's great for Wikimedia | * [[CommunityData:Hyak Spark]] — Spark is a powerful tool that helps build programs dealing with large datasets. It's great for Wikimedia data dumps. | ||
=== Wiki Data in particular=== | === Wiki Data in particular=== | ||
Line 54: | Line 54: | ||
== Creating Documents and Presentations == | == Creating Documents and Presentations == | ||
=== Planning === | === Planning === | ||
You can develop a research plan in whatever way works best, but one thing that may be useful is the outline of a | You can develop a research plan in whatever way works best, but one thing that may be useful is the outline of a Matsuzaki-style planning documents. You can see a detailed outline description [https://wiki.communitydata.science/CommunityData:Planning_document here] to help guide the planning process. If you scroll to the bottom, you'll see who to contact to get some good examples of planning documents. | ||
Also helpful in developing a research plan might be some of the readings in this course taught by Aaron to PhD students: [https://wiki.communitydata.science/Practice_of_scholarship_(Spring_2019) Practice of Scholarship (SP19)]. | Also helpful in developing a research plan might be some of the readings in this course taught by Aaron to PhD students: [https://wiki.communitydata.science/Practice_of_scholarship_(Spring_2019) Practice of Scholarship (SP19)]. | ||
Line 80: | Line 80: | ||
=== Non-technical === | === Non-technical === | ||
* [[CommunityData:Advice on writing a background section to an academic paper]] | * [[CommunityData:Advice on writing a background section to an academic paper]] | ||
* See some past and upcoming lab retreats [[CommunityData:Resources#Ongoing_and_Future_Meetings_and_Meetups | * See some past and upcoming lab retreats [[https://wiki.communitydata.science/CommunityData:Resources#Ongoing_and_Future_Meetings_and_Meetups here]]. |