As I’m currently working on a compiler from a madeup language for compiler lectures, I had to find a rather decently fast division algorithm for ARM9, which is my target platform, since DIV command was introduced not sooner than ARM11 architecture.

So I found a simple divison algorithm, that speeds up basic “school child” subtractive division with shifting:

 CMP             R2, #0
 BEQ divide_end
 ;check for divide by zero!

 MOV      R0,#0     ;clear R0 to accumulate result
 MOV      R3,#1     ;set bit 0 in R3, which will be
                    ;shifted left then right
.start
 CMP      R2,R1
 MOVLS    R2,R2,LSL#1
 MOVLS    R3,R3,LSL#1
 BLS      start
 ;shift R2 left until it is about to
 ;be bigger than R1
 ;shift R3 left in parallel in order
 ;to flag how far we have to go

.next
 CMP       R1,R2      ;carry set if R1>R2 (don't ask why)
 SUBCS     R1,R1,R2   ;subtract R2 from R1 if this would
                      ;give a positive answer
 ADDCS     R0,R0,R3   ;and add the current bit in R3 to
                      ;the accumulating answer in R0

 MOVS      R3,R3,LSR#1     ;Shift R3 right into carry flag
 MOVCC     R2,R2,LSR#1     ;and if bit 0 of R3 was zero, also
                           ;shift R2 right
 BCC       next            ;If carry not clear, R3 has shifted
                           ;back to where it started, and we
                           ;can end

.divide_end
 MOV       R25, R24        ;exit routine
 

More in-depth description of the algorithm workings is available on the source site.