Suggestions for improving Etherpad plugins

In this blog post I will propose some methods to improve Etherpad plugins. This is a musing for myself a brain dump / wish list of things we might want to have in the future.

Test coverage

Does a plugin have test coverage? No test coverage discovered?

How to: Open Github issue including example source code.

Does the plugin have CI? If not create it..

How to: Create .travis file – Possible to automatically register repo as active in travis? Example travis test file

Automatically test plugins when a new version of Etherpad is released.

How to: Github actions?

License

No License?

How to: Suggest Apache 2 license to plugin as pull request

Translations

If plugin has i18n, automatically register plugins to Translate Wiki. If not, suggest supporting i18n.

Readme

Does the README file include a license?

Does the README contain an animated gif of user experience?

Does the plugin contain references to Etherpad as Etherpad lite?

Common faults

Is .ep_initialized included in the repo root? Send PR suggesting to remove it

Does .gitignore exist in root? If not? Send PR suggesting content

Does packages.json include link to repository?

Does package include npm-debug.log? Send PR removing it

JShint/lint?

Spacetime Characterization of Real-Time Collaborative Editing author feedback

This is feedback for the authors of the paper on “Spacetime Characterization of Real-Time Collaborative Editing“. It is an excellent paper full of useful insight for Etherpad.

I’m writing this as an author and maintainer of Etherpad, which for this study was primarily used for providing the data used in the analysis.

I have some suggestions for future research/development of the project with a goal to help improve Etherpad and other RTCE.

I improved the documentation and bugfix the RTCE Analysis and my efforts on this are available on the Etherpad Foundation RTCE Github Repository.

Simulate data to validate analysis.

Spacetime data could be simulated to ensure analysis is accurate.

Suggestion: Generate space-time edits with known values. etherpad-loadtest could do this with just a few hours work probably.

Programming complexity

A single programming language could be used to complete the task.

Suggestion: Rewrite analysis tasks in NodeJS using Etherpad’s built in methods. This would be a few days work.

Supplemental features

Analyzing RTCE along side other communication methods (IE Video conferencing within Etherpad) would be extremely useful. See ep_webrtc, ep_author_follow, ep_comments_page plugins for example..

Suggestion: Speak to WMF to discuss the potential of them rolling out Video chat for a trial period and do a side-by-side comparison to see if there is any difference in findings. This would be a few days prep work then a few months to gather data.

Multiple sources of data.

Using a single source(WMF) for data is not ideal.

Suggestion: Analyze video.etherpad.com, 26LLC (online tutoring platform), WMF and FramaPads content.

Database support

Only supporting a single database store(MySQL) is not ideal as some large instances use Postgres, Maria, Redis etc.

Suggestion: Use Etherpad internals to generate data allowing for analysis of a much larger set of Etherpad instances.

Conclusion

The goal of RTCE software is to reduce the amount of changesets and as such complimentary tools are included as they are often computationally cheaper than editing the pad; some examples are; Chat, Video Chat, Voice Chat and Comments(with suggestions) functionality.

Changesets or RTCE edits are relatively computationally expensive compared to a user doing peer to peer video chat, we want to minimize them so the measurement of success for Etherpad is reducing changesets to complete a document, which is contrary to this paper and may be somewhat surprising to the authors!

Automatically Storing Etherpad Pad contents in Git

This is a how to guide / hack approach for storing Etherpad contents as git commits.

Create git repo

Using your favorite tool / platform. In this instance we use Github to create a new repository called etherpad-storage

Clone git repo

git clone repoURL ~/etherpad-storage

Edit Etherpad settings

Open up settings.json

Replace dirty_db.js with dirty_git_db.js

Replace var/dirty.db with ~/etherpad-storage/dirty.db

Grab the new logic

Grab dirty_git_db.js and put it in your src/node_modules/ueberdb2 folder. Ideally this would be how you do it

curl https://gist.githubusercontent.com/JohnMcLear/3a57977ea4dad39b021b8ee8aac2a730/raw/8a025ec3d4fbeef6cc5520e89ea3eb52c10a7737/dirty_git_db.js -o src/node_modules/ueberdb2/dirty_git_db.js

Install simple-git

From within your Etherpad folder..

cd src/node_modules/ueberdb2 && npm install simple-git

Restart and test.

Restart Etherpad, when you type in a pad your changes should be committed to github and your console should be filled with log messages you will probably want to remove..

Conclusion

This is just a simple hack, it’s not designed for production but it might be useful for someone as reference. The goal ultimately is to create a plugin that provides this functionality. It should be easy enough to do!

Etherpad’s Video Conferencing security: What do we do differently to Zoom?

I was asked in a tweet to touch on the topics of Etherpad and Security.

I initially wrote this post explaining how Etherpad handles Security etc. but then I figured I should a side by side comparison with Zoom because that’s the currently criticized product. 

Etherpad is not perfect, it’s not a commercial product, it is commercially used by some large organizations but this is not a sales pitch.  Etherpad is a community movement, it’s goal is not to provide shareholder value or increase profitability.  It’s goal is to provide a collaboration tool used by anyone, anywhere without exception. 

Let’s also be clear, other stella video conferencing tools exist, we love Jitsi for example, Etherpad’s functionality is no where near Jitsi for Video conferencing.  Etherpad provides document editing and video conferencing in one package.  If you just want video conferencing, Jitsi ( and other open source software exists ) are your ally.

Would I use Zoom for anything where privacy or safe guarding is a consideration? No.

Would I use it for chatting with my family? Yes.

Facebook privacy concerns

Does Etherpad have problems with Social platform integration? No, but there is a social plugin if people want to bring social elements in. That’s up to the site admin.

Malicious code silently being deployed

Does Etherpad install any non documented code on your machine? No. Everything is open source. Admins can install plugins which aren’t provided as a core part of Etherpad, a plugin could do something nefarious as we don’t run them sandboxed. 

Side musing:

It’s worth noting noting Etherpad relies on thousands of dependencies, so it’s entirely plausible that one of those could inject malicious code into the software.  To mitigate this we run security audits on release. 

Github also provides ongoing dependency vulnerability monitoring so we’re constantly updating the software to resolve vulnerability related issues, usually 1 or so moderate a week and 1 critical a month. 

That’s the honest truth about modern software, it’s vulnerable.  It’s ALL vulnerable so you have a choice to use something that is honest about it’s vulnerabilities or hides behind the door of closed source providing an ignorance is bliss situation.

Does Etherpad’s video conferencing plugin (ep_webrtc) access your camera or any hardware without users permission? No. We use “getUserMedia” which is provided through the web browser so users permission is required.

Video call traffic and encryption

Can Etherpad be installed locally to avoid traffic leaving your site/school/premises? Yes. If you do host your own we recommend hosting your own TURN/STUN server too to ensure ALL data stays local.

Does Etherpad provide end to end encrypted video calls. ‘Sort of’, in 85% of circumstances, yes Etherpad’s video calls are sent direct from user to user. Etherpad(and the ep_webrtc plugin) does 0 processing of calls, by default we use Google’s STUN servers to negotiate the connectivity between two clients but you can change this to your own (and in the docs it says you should). For the other 15%, the video stream has to be passed through a central point which in our case is through a TURN hosting provider out in Canada.  Again, to emphasize you can just run your own TURN server.

There is something specifically worth mentioning for Educators. School firewalls and routing is complex. The 85% connectivity without TURN (so the amount that can go direct user to user[this is the best case scenario we have reached so far]) is with a lot of debugging and tweaking with US districts. The UK is yet to embrace this level of debugging, routing and because of this the 85% will likely start at 50% until schools / local authorities start deploying their own TURN servers to stop the traffic having to go external.

Is Peer to peer video connectivity the best way?

From a bandwidth perspective, mheh, ish, you hit issues with bandwidth with less users(but moving forward bandwidth will be increasing to meet this demand so p2p will win).

From a privacy perspective, yes.

From a functionality perspective, mheh, maybe, peer to peer connections are more difficult to establish than through a central service (hence the TURN/STUN) and if you have a central server you can negotiate and manage stream bandwidth depending on # of users.  Zoom does what it does well, better than Etherpad. But where Etherpad shines is it’s transparency and flexibility.  Let’s say you don’t want video and only want Audio?  You can make that change.  You have control as a user, developer, devops, manager etc.

What does the future look like?

Will Etherpad provide a commercial solution at some point? Probably not, we cherish our container partners dearly and we want them to be able to benefit from deploying Etherpad to clients and also let them care about scaling/deployment etc so we can focus on building the software.  Lots of services exist that let you deploy Etherpad, docker is a thing, npm is a thing, git is a thing.  You can even install Etherpad on Windows Server for the ultimate roller-coaster experience.

What can I do to help Etherpad or get involved?

Download it, use it, promote it, contribute to the code.

Further reading:

https://webrtc-security.github.io/