Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now that it's Saturday I've had time to play with this and I believe I've found the answers to my questions...

1. Based on the Clojure REPL example in the main README, I think the answer is YES, though not exactly the way I had imagined. It seems what you would need to do is write a top-level script (or java "main") that starts a "system", starts "workers", runs your job using that system, then stops the system and exits. A little clunky compared to how Luigi does it, but usable.

2. Best as I can tell the answer is NO. Neither the documented API, nor the implementation of the API in src/clj/titanoboa/handler.clj contain any hint of an ability to operate on a job id, beyond retrieving the result of its execution.

Additional commentary: 1. Resuming failed jobs As implied by my question above, the ability to resume a failed job is essential. One of the major reasons to adopt DAG-structured code is parallel execution, and Titanoboa has that. But the OTHER major reason is to allow partially-failed computation to retry/resume without repeating already-completed work. In particular in the ETL space, we often have job graphs composed of hundreds of nodes, with total runtimes measured in hours. If my 100-node job graph fails due to an error in node #78, preventing an additional 15 downstream nodes from running, I don't want want to run all 100 nodes again after I fix the problem. I want to resume executing my graph at #78, and expect only the 16 total affected nodes to execute, since everything else ran correctly the first time (and presumably persisted their outputs). Luigi gets this one right. Airflow sorta tries but it's clunky and you can tell it's not a priority.

2. Flow/Dependency direction When designing a workflow, either in the GUI or as EDN, you tell Titanoboa what jobs are "next". This is intuitive because it comports with our notion of execution flow through a graph of jobs, but it gets things backwards. That is, when we write A->B->C, we are thinking that A will execute, and then B, and then C (perhaps results will be passed from step to step). It is often better though to describe this as A<-B<-C, which reads as C depends on B, B depends on A, and A depends on nothing. Structuring our thinking in this way focuses the mind on what inputs a node requires in order to perform its effect or compute its output, rather than on what operations should follow it in time. Luigi and Airflow both get this one right.

3. Properties The way Titanoboa defines workflow-level "properties", into which job-level properties are merged, and the way properties flow along the path of execution, is very nice. A constant problem with Luigi is how to flow values from one Task to the next without using an excessive number of Parameters. I can't say for sure that Titanoboa's properties construct doesn't have the same problems, without taking the time to actually use it to build a large project, but on the surface it looks good.

4. Logging I noticed that when a step's function returns a map, to be integrated into "properties", that return value is not logged. The message in the log is like "Step [my-cool-step-name] finshed with result []" which is both unhelpful, and not even literally true, as it most certainly did have a result! When a step returns a scalar value, it does get logged. I found this inconsistency frustrating.

Also, the stdout/stderr of each step function apparently goes to /dev/null. I find this odd as the placeholder function when you build a new workflow is (println "Hello World!") but if you actually execute that you'll discover that our classic greeting vanishes into the void. This is a major shortcoming. As a point of comparison, one of the biggest value-adds of using Jenkins as a job scheduler is how it automatically captures the output of anything you run, saves it in a durable log file, AND lets you view it in real time. Job orchestration systems that don't match that level of log-friendliness drive me nuts.

5. Versioning The built-in versioning system is great. Two thumbs up. I don't know how it would work if I were writing my jobs in proper Clojure or Java code in their own repo, but I kinda don't care because the value of storing and versioning what I do in the UI is so great.

6. UI -> data I love the way the interactive UI is just there to generate EDN. In a way this mirrors how Jenkins' UI builds its job XML files, but you have to go hunting for those and they're hard to read (because XML). Being able to see what EDN is generated by your actions in the UI, _right there in the UI_, is fantastic.

7. UI issues The UI is great but it has quite a bit of low-hanging-fruit improvements that could be made. - the run job popup forces you to choose a system every time, even if there is only one - being able to draw arrows in the visualization is cool, but I could not figure out how to delete them there. Needs work. - the UI doesn't lay out well on small screens (I'm on an old 13" Air), I had to zoom to 80% just to be able to see the X to delete a property. It would help if the Workflows panel on the left (the least important UI element by far!) could be collapsed (edit: it can be collapsed, but the collapse button is on the other side of the screen which makes no sense) - the box that pops up after starting a job has nothing clickable in it. I have to close it and go to the jobs tab - the jobs tab doesn't refresh when it loads, even if I just started a job, which needlessly adds clicks to the main workflow - the jobs tab has an "archived" sub-tab but no apparent way to actually move a job to the archive

Overall, there's a lot of promise here, and it's amazing to me that you built this by yourself. Still, it has a long way to go. I recommend spending some time with Luigi, which I still think is the best general way of DAG-structuring real world workflow code, and with Jenkins which remains far and away the best UI-driven job orchestration system. It seems you're already familiar with Airflow, but I would recommend you treat it mainly as an example of what not to do.



Really appreciate your time looking into this and apologies, missed your original post. You got the answeres right. Re 2 - it shouldn't be that hard to add, since everything is just data. It is however partially covered by the retry property on each step. Will read through your additional comments and will respond tomorrow (busy day, plus it's midnight here in Europe)! Cheers Miro




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: