Spacetime Characterization of Real-Time Collaborative Editing author feedback

This is feedback for the authors of the paper on “Spacetime Characterization of Real-Time Collaborative Editing“. It is an excellent paper full of useful insight for Etherpad.

I’m writing this as an author and maintainer of Etherpad, which for this study was primarily used for providing the data used in the analysis.

I have some suggestions for future research/development of the project with a goal to help improve Etherpad and other RTCE.

I improved the documentation and bugfix the RTCE Analysis and my efforts on this are available on the Etherpad Foundation RTCE Github Repository.

Simulate data to validate analysis.

Spacetime data could be simulated to ensure analysis is accurate.

Suggestion: Generate space-time edits with known values. etherpad-loadtest could do this with just a few hours work probably.

Programming complexity

A single programming language could be used to complete the task.

Suggestion: Rewrite analysis tasks in NodeJS using Etherpad’s built in methods. This would be a few days work.

Supplemental features

Analyzing RTCE along side other communication methods (IE Video conferencing within Etherpad) would be extremely useful. See ep_webrtc, ep_author_follow, ep_comments_page plugins for example..

Suggestion: Speak to WMF to discuss the potential of them rolling out Video chat for a trial period and do a side-by-side comparison to see if there is any difference in findings. This would be a few days prep work then a few months to gather data.

Multiple sources of data.

Using a single source(WMF) for data is not ideal.

Suggestion: Analyze video.etherpad.com, 26LLC (online tutoring platform), WMF and FramaPads content.

Database support

Only supporting a single database store(MySQL) is not ideal as some large instances use Postgres, Maria, Redis etc.

Suggestion: Use Etherpad internals to generate data allowing for analysis of a much larger set of Etherpad instances.

Conclusion

The goal of RTCE software is to reduce the amount of changesets and as such complimentary tools are included as they are often computationally cheaper than editing the pad; some examples are; Chat, Video Chat, Voice Chat and Comments(with suggestions) functionality.

Changesets or RTCE edits are relatively computationally expensive compared to a user doing peer to peer video chat, we want to minimize them so the measurement of success for Etherpad is reducing changesets to complete a document, which is contrary to this paper and may be somewhat surprising to the authors!

Automatically Storing Etherpad Pad contents in Git

This is a how to guide / hack approach for storing Etherpad contents as git commits.

Create git repo

Using your favorite tool / platform. In this instance we use Github to create a new repository called etherpad-storage

Clone git repo

git clone repoURL ~/etherpad-storage

Edit Etherpad settings

Open up settings.json

Replace dirty_db.js with dirty_git_db.js

Replace var/dirty.db with ~/etherpad-storage/dirty.db

Grab the new logic

Grab dirty_git_db.js and put it in your src/node_modules/ueberdb2 folder. Ideally this would be how you do it

curl https://gist.githubusercontent.com/JohnMcLear/3a57977ea4dad39b021b8ee8aac2a730/raw/8a025ec3d4fbeef6cc5520e89ea3eb52c10a7737/dirty_git_db.js -o src/node_modules/ueberdb2/dirty_git_db.js

Install simple-git

From within your Etherpad folder..

cd src/node_modules/ueberdb2 && npm install simple-git

Restart and test.

Restart Etherpad, when you type in a pad your changes should be committed to github and your console should be filled with log messages you will probably want to remove..

Conclusion

This is just a simple hack, it’s not designed for production but it might be useful for someone as reference. The goal ultimately is to create a plugin that provides this functionality. It should be easy enough to do!

Etherpad’s Video Conferencing security: What do we do differently to Zoom?

I was asked in a tweet to touch on the topics of Etherpad and Security.

I initially wrote this post explaining how Etherpad handles Security etc. but then I figured I should a side by side comparison with Zoom because that’s the currently criticized product. 

Etherpad is not perfect, it’s not a commercial product, it is commercially used by some large organizations but this is not a sales pitch.  Etherpad is a community movement, it’s goal is not to provide shareholder value or increase profitability.  It’s goal is to provide a collaboration tool used by anyone, anywhere without exception. 

Let’s also be clear, other stella video conferencing tools exist, we love Jitsi for example, Etherpad’s functionality is no where near Jitsi for Video conferencing.  Etherpad provides document editing and video conferencing in one package.  If you just want video conferencing, Jitsi ( and other open source software exists ) are your ally.

Would I use Zoom for anything where privacy or safe guarding is a consideration? No.

Would I use it for chatting with my family? Yes.

Facebook privacy concerns

Does Etherpad have problems with Social platform integration? No, but there is a social plugin if people want to bring social elements in. That’s up to the site admin.

Malicious code silently being deployed

Does Etherpad install any non documented code on your machine? No. Everything is open source. Admins can install plugins which aren’t provided as a core part of Etherpad, a plugin could do something nefarious as we don’t run them sandboxed. 

Side musing:

It’s worth noting noting Etherpad relies on thousands of dependencies, so it’s entirely plausible that one of those could inject malicious code into the software.  To mitigate this we run security audits on release. 

Github also provides ongoing dependency vulnerability monitoring so we’re constantly updating the software to resolve vulnerability related issues, usually 1 or so moderate a week and 1 critical a month. 

That’s the honest truth about modern software, it’s vulnerable.  It’s ALL vulnerable so you have a choice to use something that is honest about it’s vulnerabilities or hides behind the door of closed source providing an ignorance is bliss situation.

Does Etherpad’s video conferencing plugin (ep_webrtc) access your camera or any hardware without users permission? No. We use “getUserMedia” which is provided through the web browser so users permission is required.

Video call traffic and encryption

Can Etherpad be installed locally to avoid traffic leaving your site/school/premises? Yes. If you do host your own we recommend hosting your own TURN/STUN server too to ensure ALL data stays local.

Does Etherpad provide end to end encrypted video calls. ‘Sort of’, in 85% of circumstances, yes Etherpad’s video calls are sent direct from user to user. Etherpad(and the ep_webrtc plugin) does 0 processing of calls, by default we use Google’s STUN servers to negotiate the connectivity between two clients but you can change this to your own (and in the docs it says you should). For the other 15%, the video stream has to be passed through a central point which in our case is through a TURN hosting provider out in Canada.  Again, to emphasize you can just run your own TURN server.

There is something specifically worth mentioning for Educators. School firewalls and routing is complex. The 85% connectivity without TURN (so the amount that can go direct user to user[this is the best case scenario we have reached so far]) is with a lot of debugging and tweaking with US districts. The UK is yet to embrace this level of debugging, routing and because of this the 85% will likely start at 50% until schools / local authorities start deploying their own TURN servers to stop the traffic having to go external.

Is Peer to peer video connectivity the best way?

From a bandwidth perspective, mheh, ish, you hit issues with bandwidth with less users(but moving forward bandwidth will be increasing to meet this demand so p2p will win).

From a privacy perspective, yes.

From a functionality perspective, mheh, maybe, peer to peer connections are more difficult to establish than through a central service (hence the TURN/STUN) and if you have a central server you can negotiate and manage stream bandwidth depending on # of users.  Zoom does what it does well, better than Etherpad. But where Etherpad shines is it’s transparency and flexibility.  Let’s say you don’t want video and only want Audio?  You can make that change.  You have control as a user, developer, devops, manager etc.

What does the future look like?

Will Etherpad provide a commercial solution at some point? Probably not, we cherish our container partners dearly and we want them to be able to benefit from deploying Etherpad to clients and also let them care about scaling/deployment etc so we can focus on building the software.  Lots of services exist that let you deploy Etherpad, docker is a thing, npm is a thing, git is a thing.  You can even install Etherpad on Windows Server for the ultimate roller-coaster experience.

What can I do to help Etherpad or get involved?

Download it, use it, promote it, contribute to the code.

Further reading:

https://webrtc-security.github.io/

Why I think Educators should give Etherpad a try

I contribute to a collaborative editor tool, Etherpad. It is excellent and really doesn’t get enough love from educators Vs Google and Microsoft’s offerings when I think it should.

Etherpad has video conferencing visible in the document editor. 

Video.etherpad.com screenshot

Etherpad can be installed within 20 seconds.  No technical knowledge required.

Etherpad is a relatively long running project, in fact Google used our operational transform engine in Google Docs and Microsoft ‘borrowed’ heavily for 365.  Etherpad was the original web collaborative editor.

Etherpad is open source, built by a community, meaning we have complete transparency so if you get stuck or have an element you dislike, you can change it.

Etherpad is part of Software Freedom Conservancy which is a registered charity.

Etherpad wont change under you, Etherpad wont introduce a ribbon without the ability to disable it, Etherpad wont introduce a clippy without giving you a strong enough alcohol to banish it.

Etherpad can’t go bankrupt if an economy tanks.

Etherpad has way more educational resources than Google Docs / Microsoft 365. Plugins such as MathJax, Text Statistics (grading of quality of writing), Accessibility such as text to speech and it’s used by huge organizations (Wikimedia/UN) meaning Etherpad focuses heavily on localization and accessibility requirements such as Aria.

Etherpad’s video and audio calls are not centralized, removing the main flaw currently constraining Google during these trying times. Also when you are back in a school environment if you do video calls between departments the video feed will stay inside your environment removing bandwidth requirements / costs.

Etherpad can be hosted within your organization, even at home.

Etherpad doesn’t require any sign up, but various plugins exist if you want to have authentication to whatever mechanism you want and also for pad management. The world is your oyster.

Etherpad never loses document fidelity on export. Unlike Google docs and 365 when you take a document from Etherpad to another Etherpad instance no history of the document is lost. This means that any auditing for regulatory bodies is easy.

Phew, that’s a lot of reasons to consider trying Etherpad 🙂

I salute Google and Microsoft and tools like Jitsi and other educational tools for everything they do for Education.  Etherpad is different from it’s core and allows those more directly involved in Education to be empowered without a dependency on commercial software.

Etherpad website for more information and to download Etherpad

Try Etherpad Video conferencing