Re: Language-independent and cross-language docs
+1 on setting up a top-level documentation project. I think that
establishing an information hierarchy to help people understand all
the layers of the project is more important than the choice of the
documentation tool -- for example, if we started with Sphinx and
decided to move later to something else, there are tools to exist with
converting between markup languages (though it would require some
I'm sort of neutral on combining the current language-specific
documentation projects into a monolithic documentation project. My
prior for this would be that the top-level documentation should
* High level overview of the Arrow project: components, languages, and vision
* Columnar specification documents (migrating the current Markdown
documents in format/) and other specification documents
* High level project roadmap and contributor guide
* Guides for maintainers / committers
* Getting started guide for each language
The top-level documentation could direct users to the
language-specific API and usage docs (i.e. like the current Python
I'm interested what people think about how to integrate this
statically-generated content with our current Jekyll-based website.
One could argue that all this top-level documentation could be handled
by Jekyll (or equivalent static site generator)
On Thu, May 17, 2018 at 3:44 PM, Uwe L. Korn <uwelk@xxxxxxxxxx> wrote:
> I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language.
> Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze  (Doxygen to Sphinx) available.
> To directly answer the questions:
> - Should we do this at all (i.e. build up a central documentation system)?
> - Should we use Sphinx for it?
> Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it.
> - To which extent our current docs should be migrated to Sphinx (apart
> from the Python docs, which already use Sphinx)? For example, should
> the specs (currently standalone pages written in Markdown) be migrated
> to Sphinx for better cross-referencing and navigation? What about the
> C++ tutorial pages? etc.
> I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there.
> - Should we preferably have a single Sphinx doctree, or several
> independent per-topic / per-language doctrees?
> I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections.
> Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs . There is a Sphinx extension that seems to something similar to this . This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python).
> : https://breathe.readthedocs.io/en/latest/
> : http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations
> : http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html
> On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote:
>> In the following PR discussion it was mentioned that we currently lack a
>> central documentation system for cross-language topics:
>> Sphinx looks like a reasonable contender for that purpose. For that who
>> don't know it, Sphinx is a documentation system initially developed for
>> the Python language, which quickly became widely-used amongst Python
>> projects, and is now being used by non-Python projects as well. For
>> example, the LLVM docs (https://llvm.org/docs/) and even the Linux
>> kernel online docs are now written using Sphinx
>> Sphinx uses reStructuredText (a.k.a "reST") as its basic markup
>> language, but with many extensions. It allows for structured
>> documentation with extensive cross-referencing (even between independent
>> Sphinx sites, using the "intersphinx" extension).
>> The questions here are:
>> - Should we do this at all (i.e. build up a central documentation system)?
>> - Should we use Sphinx for it?
>> - To which extent our current docs should be migrated to Sphinx (apart
>> from the Python docs, which already use Sphinx)? For example, should
>> the specs (currently standalone pages written in Markdown) be migrated
>> to Sphinx for better cross-referencing and navigation? What about the
>> C++ tutorial pages? etc.
>> - Should we preferably have a single Sphinx doctree, or several
>> independent per-topic / per-language doctrees?