Coding, Design & GraphicsIf you are programming a script, designing a web page, building your own graphics or anything related and need to discuss it, get help and tips or general advice, then you should post your thoughts to this section.
oh, erm, yeah, so how do I do convert p-code into machine code ?
I generate the p-code which is generated from c# type source code. the actual language is TNBL (the next big language)
p-code is pseudo code, its like in between source code and machine code. consisting of the most basic operations which most closely reflect the source code.
the target for the machine code could be 386 or 8 bit micro controller - (which is seriously limited)
I'm thinking like, put into an object all the logic of what each opcode does, into a big list, in a way so it can be compared to a desired p-code operation. Some complex opcodes might consist of several p-code operations.
then cycle through all the opcodes until it can match all the p-code operations. but I'm thinking this might take some time to execute. the x86 has a ludicrous amount of possible variations of opcodes too.
of course register assignments and temporary variables complicates things even more.
the pseudo code isn't really standardised, its just what ever comes out of the parser ATM.
there are a number of systems that use an intermediate code, but basically i don't think there can be a great deal of difference fundamentally. the code used by the .net CLI framework might be a good starting point. although rather complex. I think UNIX/gnu use a different system, and not to confuse Microsoft "p-code" which is just an object file format for relocatable code rather than pseudo code. lisp and others also have a distinct p-code format, with some using more than one software stack.
I read up a bit on doing this but it seems any approach to convert p-code to machine code from a specification of the CPU instructions has failed. but then again I also read the same about an LL(k) parser but I wrote one last month or so, although strictly its a decision tree based parser which is neither LL or LR.
the problem might be that although the p code references indexed variables and the CPU opcodes could be produced so that the registers they access etc are represented in p-code as indexes too, but to match these up in a lookup operation would require an almost infinite combination of possible alternatives for each opcode to appear in the table.
I'm thinking that if each individual source p-code statement is adjusted so that all the indexes it uses are consecutively numbered from zero it might be a good start, then just adjust them to point to the correct ones after.
ofc the source p-code indexes refer to variables in memory, rather than registers. the CPU opcodes refer to variables/memory either directly or though a total of 8 addressing modes for 386 which makes it difficult to match up instructions which would be most efficient. I was reading the early CPUs had 20 or more addressing modes, with a possible infinite number of combinations for architectures which had an indirection bit in the memory itself.
selecting variables to promote to registers is obviously a critical step.
one way might be to create by hand various conversions to machine code for as many types of typical statements as possible.
lets take as an example the very first mnemonic of 386 assembler is AAA - (surprised its not aardvark ?)
AAA -- ASCII Adjust after Addition
Opcode Instruction Clocks Description
37 AAA 4 ASCII adjust AL after addition
Operation
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL := (AL + 6) AND 0FH;
AH := AH + 1;
AF := 1;
CF := 1;
ELSE
CF := 0;
AF := 0;
FI;
Description
Execute AAA only following an ADD instruction that leaves a byte result in the AL register. The lower nibbles of the operands of the ADD instruction should be in the range 0 through 9 (BCD digits). In this case, AAA adjusts AL to contain the correct decimal digit result. If the addition produced a decimal carry, the AH register is incremented, and the carry and auxiliary carry flags are set to 1. If there was no decimal carry, the carry and auxiliary flags are set to 0 and AH is unchanged. In either case, AL is left with its top nibble set to 0. To convert AL to an ASCII result, follow the AAA instruction with OR AL, 30H.
Flags Affected
AF and CF as described above; OF, SF, ZF, and PF are undefined
this is actually surprising to find this level of detail about its operation with pseudo code.
If you wanted to do this operation in a high level language such as c it would be a shame not to use this instruction without having to use inline assembler or native/embedded functions etc.
but it is unlikely to be a very close match to whatever one would write to achieve this.
some of the operations above would likely not fit at all but they would not actually matter.
I think the obvious problem is one of execution time to explore all the necessary permutations.
there are many more complications too, like doing 32 bit maths if you only have 16 bit CPU, (or 8 bit !) where you have to string together a whole load of operations, clearly this method cant be achieved with a simple lookup table.
maybe the best fitting CPU operation could be found which only does some of the bits and the remaining operations needed to calculate the remaining bits are reinterpreted into p-code and the process repeated.
most compilers call functions to do oversized arithmetic, but this can be quite inefficient, especially if not all the result is used, such as multiplying 2 x 16 bit numbers to give a 32 bit result but then just using the top 16 bits.
with a small micro controller its important to get the best optimisation as ram and code size is severely limited which means no high level languages seem to exist for them.
Are you actually looking for an optimal solution? e.g. an optimising compiler. Or will anything just do.
I'm guessing with all these multi-core/threaded processors around the obvious thing is to introduce parallelism into the process on a per function basis. I would seem to be a tricky task to get more than 1 thread to work on the same function though unless it could be pipelined in some way.
well, it needs to be moderately good to start with, the idea is to improve it as and when ,,,
it would certainly need to be some sort of optimised result, if you have code generated from a CPU instruction set definition then this is unavoidable.
I'm currently thinking to wards a decision based tree approach, and in some ways it seems slightly easier than I first thought, rather than scan all the possible CPU instructions, it is only necessary to scan small subsets which can be selected depending on major type, eg add etc.
to simplify things further, the addressing modes of the x86 can be thought of as a separate fetch/store (virtual) instruction, which is looked up in the same way, but gets swallowed up if the opcode allows it or gets implemented by other instructions.
at the function level the chance to do any major optimisation like multi threading is rather limited, often all that can be done is things like loops rolled out and assigned different threads if possible.
I would particularly like to implement SSE instructions automatically, this is something which i heard was impracticable. as for shader code too ,,,
at the other end of the scale my 8 bit micro-controller has one register, and 256 bytes of ram, and a handful of instructions. it has no indirect addressing mode, so no instruction to make use of a pointer.
however it does have some complicated stuff to get round limitations, such as writing an address to a hardware register allows access to that ram address from another hardware register, so gets round the lack of pointers.
When I said multi-threading I was referring to the compiler not the output it produces. If 16+ cores is the future then it would help if the compiler itself can actually utilise all these cores to compile the program.
Don't want to hijack the thread, just wanted to say... OMG... 16 cores, hell until buying Win7, my quad core wasn't even utilised on WinXP, surely we are still a good few years away from utilisation of anything like that, right? Or am I thinking something different completely and being thick? This thread kind of wooshed with me a bit, I really need to keep away from threads with anything to do with C#, I know bugger all about it.
I dont think its worth thinking about multithreading for compiling one file, the usual thing is to compile files on seperate processors, this would give the most utilisation. gfx cards already havve 256 cores, and nvidea are selling special gfx cards which you can use all 256 for general purpose supercomputing in your pc.